Q-function

Table of Contents

Q-learning learning process to learn Q-function, which is defined by: \[ Q(S_t,A_t) \gets Q(S_t,A_t) + \alpha[R_{t+1} + \gamma \max_aQ(S_{t+1},a) - Q(S_t,A_t)] \]

initilialize
The initial value is arbitarily generated, except for the terminal state, for which its initial value is 0.
parameters
\(\alpha\)
stepsize, \(\in (0,1]\)
\(\gamma\)
discount factor, \(\in (0,1]\)
(no term)
it is a method, as a existing analysis/estimation using the system(Q-function) is used in the update term.

1. refernce

Bibliography

[1]
R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction, Second edition. in Adaptive computation and machine learning series. Cambridge, Massachusetts: The MIT Press, 2018.

Backlinks

Touati, Ahmed and Ollivier, Yann ::: Learning One Representation to Optimize All Rewards

(core idea)

find, somehow, Fz and B with respect to a Q-function and on the states and actions(the problem). F would give you a parameterized policy for all \(\mathbf{R}^d\), and B can be combined with reward to give you the index/parameter to determine the exact optimal policy from the lots of policies derived from \(F\).

Thus, F represents all of the futures from one state(to another), and B represents a way of getting there

Backlinks

Touati, Ahmed and Ollivier, Yann ::: Learning One Representation to Optimize All Rewards

(core idea)

find, somehow, Fz and B with respect to a Q-function and on the states and actions(the problem). F would give you a parameterized policy for all \(\mathbf{R}^d\), and B can be combined with reward to give you the index/parameter to determine the exact optimal policy from the lots of policies derived from \(F\).

Thus, F represents all of the futures from one state(to another), and B represents a way of getting there

Author: Linfeng He

Created: 2024-04-03 Wed 23:21