PILCO: A Model-Based and Data-Efficient Approach to Policy Search
WebIn this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks.
| Author(s): | Deisenroth, MP. and Rasmussen, CE. |
| Links: | |
| Book Title: | Proceedings of the 28th International Conference on Machine Learning, ICML 2011 |
| Pages: | 465-472 |
| Year: | 2011 |
| Day: | 0 |
| Editors: | L Getoor and T Scheffer |
| Publisher: | Omnipress |
| BibTeX Type: | Conference Paper (inproceedings) |
| Event Place: | Bellevue, Washington, USA |
| Digital: | 0 |
| Electronic Archiving: | grant_archive |
BibTeX
@inproceedings{DeisenrothRT2011,
title = {PILCO: A Model-Based and Data-Efficient Approach to Policy Search},
booktitle = {Proceedings of the 28th International Conference on Machine Learning, ICML 2011},
abstract = {In this paper, we introduce PILCO, a practical, data-efficient model-based policy search method. PILCO reduces model bias, one of the key problems of model-based reinforcement learning, in a principled way. By learning a probabilistic dynamics model and explicitly incorporating model uncertainty into long-term planning, PILCO can cope with very little data and facilitates learning from scratch in only a few trials. Policy evaluation is performed in closed form using state-of-the-art approximate inference. Furthermore, policy gradients are computed analytically for policy improvement. We report unprecedented learning efficiency on challenging and high-dimensional control tasks. },
pages = {465-472},
editors = {L Getoor and T Scheffer},
publisher = {Omnipress},
year = {2011},
author = {Deisenroth, MP. and Rasmussen, CE.}
}