::: Fast Imitation via Behavior Foundation Models

Table of Contents

a proof of concept using sucessor measures(namely forward-backward framewor forward-backward framework)

They basically learn a set of [F,B,Cov B, \(\pi_z\)](which aside from Cov B is expectation of \(B(s)B(s)^T\) over the distribution of time on each state s when performing \(\pi_z\) are calculated by forward-backward framework.) first, with an algorithm in [1]

1. Problem

Imitation learning process are slow and costly:

  • demo ietrations
  • interaction with environment
  • complex RL routine
  • prior konwledge to behaviour property

2. related work 1

Behavioural Cloning
maximize likelyhood of trained policy to replicate expert actions
  • error from covariate shift
  • many demo
  • regularized BC
  • BC from observations only
infer/design reward + RL
expert behavior -> reward -> RL
  • SQIL - reward (label) 1 for expert, 0 for non-expert sample
  • learn discriminator to seperate expert from non-expert samples
  • compute distance from expert and non-expert transition(action-on-world)
  • IL -> goal-conditioned task -> goal-oriented LR <- goal <- expert trajectories
Apprenticesihp learning
known reward function, match the performance of expert
Distribution Matching
minimize some f-divergence between stationary distribution of learned policy and expert’s distribution

3. preliminaries

  • Markov Decision processes

4. Method

using a behaviour foundation model, with lots of tweaks and techniques:

5. Innovation

Bibliography

[1]
A. Touati, J. Rapin, and Y. Ollivier, “Does Zero-Shot Reinforcement Learning Exist?” arXiv, Mar. 2023. doi: 10.48550/arXiv.2209.14935.

Backlinks

Here’s a script to insert multiple org-roam nodes

(defun hermanhel-strings-to-hash (strings)
  "Convert a list of STRINGS to a hash table with the strings as keys."
  (let ((hash (make-hash-table :test 'equal)))
    (dolist (str strings)
      (puthash str t hash))
    hash))

(defun hermanhel-org-roam-insert-multiple-nodes-as-list ()
  (interactive)
(let
    (
     (candidates (hermanhel-strings-to-hash (org-roam--get-titles)))
     (selected-nodes (citar--select-multiple "References: " candidates))
     )
(dolist (title selected-nodes)
      (insert "+ " "[[roam:" title "]]" "\n")
      )
)
)

Footnotes:

1

a quite consice and comprehensive one, but about imitation learning techniques

Author: Linfeng He

Created: 2024-04-03 Wed 19:37