::: ANTGPT: CAN LARGE LANGUAGE MODELS HELP LONG-TERM ACTION ANTICIPATION FROM VIDEOS?
Table of Contents
1. long-term action anticipation
predict actor future behehaviour from (make-into-verb-noun-seq (observe-video))
2. approach
2.1. top-down
infer goal, plan the next action
2.2. bottom-up
predit next action autoregressively by modeling temporal dynamics