::: Fast Imitation via Behavior Foundation Models

1. Problem
2. related work
3. preliminaries
4. Method
5. Innovation
Bibliography
Backlinks
- org-roam bulk insert

a proof of concept using sucessor measures(namely forward-backward framewor forward-backward framework)

They basically learn a set of [F,B,Cov B, \(\pi_z\)](which aside from Cov B is expectation of \(B(s)B(s)^T\) over the distribution of time on each state s when performing \(\pi_z\) are calculated by forward-backward framework.) first, with an algorithm in [1]

1. Problem

Imitation learning process are slow and costly:

demo ietrations
interaction with environment
complex RL routine
prior konwledge to behaviour property

2. related work ¹

Behavioural Cloning

maximize likelyhood of trained policy to replicate expert actions

error from covariate shift
many demo
regularized BC
BC from observations only

infer/design reward + RL

expert behavior -> reward -> RL

SQIL - reward (label) 1 for expert, 0 for non-expert sample
learn discriminator to seperate expert from non-expert samples
compute distance from expert and non-expert transition(action-on-world)
IL -> goal-conditioned task -> goal-oriented LR <- goal <- expert trajectories

Apprenticesihp learning

known reward function, match the performance of expert

Distribution Matching

minimize some f-divergence between stationary distribution of learned policy and expert’s distribution

3. preliminaries

Markov Decision processes

4. Method

using a behaviour foundation model, with lots of tweaks and techniques:

FB(forward-backword) framework - FB-IL Touati, Ahmed and Ollivier, Yann ::: Learning One Representation to Optimize All Rewards

5. Innovation

Bibliography

[1]

A. Touati, J. Rapin, and Y. Ollivier, “Does Zero-Shot Reinforcement Learning Exist?” arXiv, Mar. 2023. doi: 10.48550/arXiv.2209.14935.

Backlinks

Here’s a script to insert multiple org-roam nodes

(defun hermanhel-strings-to-hash (strings)
  "Convert a list of STRINGS to a hash table with the strings as keys."
  (let ((hash (make-hash-table :test 'equal)))
    (dolist (str strings)
      (puthash str t hash))
    hash))

(defun hermanhel-org-roam-insert-multiple-nodes-as-list ()
  (interactive)
(let
    (
     (candidates (hermanhel-strings-to-hash (org-roam--get-titles)))
     (selected-nodes (citar--select-multiple "References: " candidates))
     )
(dolist (title selected-nodes)
      (insert "+ " "[[roam:" title "]]" "\n")
      )
)
)

Footnotes:

a quite consice and comprehensive one, but about imitation learning techniques