Back

Autonomous Learning Members Publications

Model-based Reinforcement Learning and Planning

iCEM Policy Extraction Risk-Averse Planning

Members

Empirical Inference, Autonomous Learning
Senior Research Scientist
Autonomous Learning
Robust Machine Learning
Postdoctoral Researcher
no image
Autonomous Learning

Publications

Autonomous Learning Conference Paper Risk-Averse Zero-Order Trajectory Optimization Vlastelica*, M., Blaes*, S., Pinneri, C., Martius, G. In Conference on Robot Learning, 164, PMLR, 5th Conference on Robot Learning (CoRL 2021) , November 2022, *Equal Contribution (Published)
We introduce a simple but effective method for managing risk in zero-order trajectory optimization that involves probabilistic safety constraints and balancing of optimism in the face of epistemic uncertainty and pessimism in the face of aleatoric uncertainty of an ensemble of stochastic neural networks. Various experiments indicate that the separation of uncertainties is essential to performing well with data-driven MPC approaches in uncertain and safety-critical control environments.
OpenReview PDF URL BibTeX

Autonomous Learning Conference Paper Extracting Strong Policies for Robotics Tasks from Zero-order Trajectory Optimizers Pinneri*, C., Sawant*, S., Blaes, S., Martius, G. In The Ninth International Conference on Learning Representations (ICLR), 9th International Conference on Learning Representations (ICLR 2021) , May 2021, *equal contribution (Published)
Solving high-dimensional, continuous robotic tasks is a challenging optimization problem. Model-based methods that rely on zero-order optimizers like the cross-entropy method (CEM) have so far shown strong performance and are considered state-of-the-art in the model-based reinforcement learning community. However, this success comes at the cost of high computational complexity, being therefore not suitable for real-time control. In this paper, we propose a technique to jointly optimize the trajectory and distill a policy, which is essential for fast execution in real robotic systems. Our method builds upon standard approaches, like guidance cost and dataset aggregation, and introduces a novel adaptive factor which prevents the optimizer from collapsing to the learner's behavior at the beginning of the training. The extracted policies reach unprecedented performance on challenging tasks as making a humanoid stand up and opening a door without reward shaping
OpenReview URL BibTeX

Autonomous Learning Embodied Vision Conference Paper Sample-efficient Cross-Entropy Method for Real-time Planning Pinneri, C., Sawant, S., Blaes, S., Achterhold, J., Stueckler, J., Rolinek, M., Martius, G. In Conference on Robot Learning 2020, 2020 (Published)
Trajectory optimizers for model-based reinforcement learning, such as the Cross-Entropy Method (CEM), can yield compelling results even in high-dimensional control tasks and sparse-reward environments. However, their sampling inefficiency prevents them from being used for real-time planning and control. We propose an improved version of the CEM algorithm for fast planning, with novel additions including temporally-correlated actions and memory, requiring 2.7-22x less samples and yielding a performance increase of 1.2-10x in high-dimensional control problems.
Paper Code Spotlight-Video URL BibTeX