Q-function
Table of Contents
Q-learning learning process to learn Q-function, which is defined by: \[ Q(S_t,A_t) \gets Q(S_t,A_t) + \alpha[R_{t+1} + \gamma \max_aQ(S_{t+1},a) - Q(S_t,A_t)] \]
- initilialize
- The initial value is arbitarily generated, except for the terminal state, for which its initial value is 0.
- parameters
- \(\alpha\)
- stepsize, \(\in (0,1]\)
- \(\gamma\)
- discount factor, \(\in (0,1]\)
- (no term)
- it is a method, as a existing analysis/estimation using the system(Q-function) is used in the update term.
1. refernce
https://richard-warren.github.io/blog/rl_intro_3/ [1] chapter 6.5
Bibliography
Backlinks
Touati, Ahmed and Ollivier, Yann ::: Learning One Representation to Optimize All Rewards
(core idea)
find, somehow, Fz and B with respect to a Q-function and on the states and actions(the problem). F would give you a parameterized policy for all \(\mathbf{R}^d\), and B can be combined with reward to give you the index/parameter to determine the exact optimal policy from the lots of policies derived from \(F\).
Thus, F represents all of the futures from one state(to another), and B represents a way of getting there
Backlinks
Touati, Ahmed and Ollivier, Yann ::: Learning One Representation to Optimize All Rewards
(core idea)
find, somehow, Fz and B with respect to a Q-function and on the states and actions(the problem). F would give you a parameterized policy for all \(\mathbf{R}^d\), and B can be combined with reward to give you the index/parameter to determine the exact optimal policy from the lots of policies derived from \(F\).
Thus, F represents all of the futures from one state(to another), and B represents a way of getting there