::: Fast Imitation via Behavior Foundation Models
Table of Contents
a proof of concept using sucessor measures(namely forward-backward framewor forward-backward framework)
They basically learn a set of [F,B,Cov B, \(\pi_z\)](which aside from Cov B is expectation of \(B(s)B(s)^T\) over the distribution of time on each state s when performing \(\pi_z\) are calculated by forward-backward framework.) first, with an algorithm in [1]
1. Problem
Imitation learning process are slow and costly:
- demo ietrations
- interaction with environment
- complex RL routine
- prior konwledge to behaviour property
2. related work 1
- Behavioural Cloning
- maximize likelyhood of trained policy to replicate expert actions
- error from covariate shift
- many demo
- regularized BC
- BC from observations only
- infer/design reward + RL
- expert behavior -> reward -> RL
- SQIL - reward (label) 1 for expert, 0 for non-expert sample
- learn discriminator to seperate expert from non-expert samples
- compute distance from expert and non-expert transition(action-on-world)
- IL -> goal-conditioned task -> goal-oriented LR <- goal <- expert trajectories
- Apprenticesihp learning
- known reward function, match the performance of expert
- Distribution Matching
- minimize some f-divergence between stationary distribution of learned policy and expert’s distribution
3. preliminaries
- Markov Decision processes
4. Method
using a behaviour foundation model, with lots of tweaks and techniques:
- FB(forward-backword) framework - FB-IL Touati, Ahmed and Ollivier, Yann ::: Learning One Representation to Optimize All Rewards
5. Innovation
Bibliography
[1]
A. Touati, J. Rapin, and Y. Ollivier, “Does Zero-Shot Reinforcement Learning Exist?” arXiv, Mar. 2023. doi: 10.48550/arXiv.2209.14935.
Backlinks
Here’s a script to insert multiple org-roam nodes
(defun hermanhel-strings-to-hash (strings) "Convert a list of STRINGS to a hash table with the strings as keys." (let ((hash (make-hash-table :test 'equal))) (dolist (str strings) (puthash str t hash)) hash)) (defun hermanhel-org-roam-insert-multiple-nodes-as-list () (interactive) (let ( (candidates (hermanhel-strings-to-hash (org-roam--get-titles))) (selected-nodes (citar--select-multiple "References: " candidates)) ) (dolist (title selected-nodes) (insert "+ " "[[roam:" title "]]" "\n") ) ) )
Footnotes:
1
a quite consice and comprehensive one, but about imitation learning techniques