Autonomous Learning Embodied Vision Conference Paper 2020

Sample-efficient Cross-Entropy Method for Real-time Planning

Thumb ticker sm clspicture
Autonomous Learning
Thumb ticker sm img 20210812 091658 cropped
Robust Machine Learning
  • Postdoctoral Researcher
Thumb ticker sm face teeth crop
Embodied Vision
Thumb ticker sm profilepic stueckler
Embodied Vision
Max Planck Research Group Leader
Thumb ticker sm michal
Autonomous Learning
Thumb ticker sm georg 2018 crop small
Empirical Inference, Autonomous Learning
Senior Research Scientist

Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.

Author(s): Cristina Pinneri and Shambhuraj Sawant and Sebastian Blaes and Jan Achterhold and Joerg Stueckler and Michal Rolinek and Georg Martius
Book Title: Conference on Robot Learning 2020
Year: 2020
Project(s):
Bibtex Type: Conference Paper (inproceedings)
State: Published
URL: https://corlconf.github.io/corl2020/paper_217/
Electronic Archiving: grant_archive
Links:

BibTex

@inproceedings{PinneriEtAl2020:iCEM,
  title = {Sample-efficient Cross-Entropy Method for Real-time Planning},
  booktitle = {Conference on Robot Learning 2020},
  abstract = {Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.},
  year = {2020},
  slug = {pinnerietal2020-icem},
  author = {Pinneri, Cristina and Sawant, Shambhuraj and Blaes, Sebastian and Achterhold, Jan and Stueckler, Joerg and Rolinek, Michal and Martius, Georg},
  url = {https://corlconf.github.io/corl2020/paper_217/ }
}