temporal-difference learning
a way of evaluating policy: use experience(sequence of reward \(\{R_{t+1},R_{t+2}...\}\)) to predict/estiamte
a way of evaluating policy: use experience(sequence of reward \(\{R_{t+1},R_{t+2}...\}\)) to predict/estiamte
Created: 2024-04-03 Wed 20:58