Autonomous Learning Empirical Inference Conference Paper 2025

Zero-Shot Offline Imitation Learning via Optimal Transport

arXiv
Thumb ticker sm georg 2018 crop small
Empirical Inference, Autonomous Learning
Senior Research Scientist
Thumb ticker sm prof pic
Empirical Inference
  • Doctoral Researcher
Thumb ticker sm neckarmueller 1 cropped
Empirical Inference
  • Doctoral Researcher

Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks.

Author(s): Rupf, Thomas and Bagatella, Marco and Gürtler, Nico and Frey, Jonas and Martius, Georg
Links:
Book Title: Proceedings of the 42nd International Conference on Machine Learning (ICML)
Volume: 267
Pages: 52345--52381
Year: 2025
Month: July
Series: Proceedings of Machine Learning Research
Editors: Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry
Publisher: PMLR
BibTeX Type: Conference Paper (inproceedings)
Event Name: International Conference on Machine Learning
Event Place: Vancouver Convention Center
State: Published
URL: https://proceedings.mlr.press/v267/rupf25a.html
Eprint: arXiv:2410.08751

BibTeX

@inproceedings{rupf2024:ZILOT,
  title = {Zero-Shot Offline Imitation Learning via Optimal Transport},
  booktitle = {Proceedings of the 42nd International Conference on Machine Learning (ICML)},
  abstract = {Zero-shot imitation learning algorithms hold the promise of reproducing unseen behavior from as little as a single demonstration at test time. Existing practical approaches view the expert demonstration as a sequence of goals, enabling imitation with a high-level goal selector, and a low-level goal-conditioned policy. However, this framework can suffer from myopic behavior: the agent's immediate actions towards achieving individual goals may undermine long-term objectives. We introduce a novel method that mitigates this issue by directly optimizing the occupancy matching objective that is intrinsic to imitation learning. We propose to lift a goal-conditioned value function to a distance between occupancies, which are in turn approximated via a learned world model. The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zero-shot imitation, as we demonstrate in complex, continuous benchmarks.},
  volume = {267},
  pages = {52345--52381},
  series = {Proceedings of Machine Learning Research},
  editors = {Singh, Aarti and Fazel, Maryam and Hsu, Daniel and Lacoste-Julien, Simon and Berkenkamp, Felix and Maharaj, Tegan and Wagstaff, Kiri and Zhu, Jerry},
  publisher = {PMLR},
  month = jul,
  year = {2025},
  author = {Rupf, Thomas and Bagatella, Marco and G{\"u}rtler, Nico and Frey, Jonas and Martius, Georg},
  eprint = {arXiv:2410.08751},
  url = {https://proceedings.mlr.press/v267/rupf25a.html},
  month_numeric = {7}
}