Empirical Inference Conference Paper 2012

Hierarchical Relative Entropy Policy Search

PDF Web
Thumb ticker sm 12009745 10103538825457245 7502907506146263960 n
Empirical Inference
Research Group Leader

Many real-world problems are inherently hierarchically structured. The use of this structure in an agent's policy may well be the key to improved scalability and higher performance. However, such hierarchical structures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy - the `mixed option' policy. Here, a gating network fi rst decides which of the options to execute and, subsequently, the option-policy determines the action. In this paper, we reformulate learning a hierarchical policy as a latent variable estimation problem and subsequently extend the Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solutions while also showing an increased performance in terms of learning speed and quality of the found policy in comparison to the nonhierarchical approach.

Author(s): Daniel, C. and Neumann, G. and Peters, J.
Links:
Book Title: Fifteenth International Conference on Artificial Intelligence and Statistics
Volume: 22
Pages: 273--281
Year: 2012
Month: April
Day: 0
Series: JMLR Proceedings
Editors: Lawrence, N. D. and Girolami, M.
Publisher: JMLR.org
Bibtex Type: Conference Paper (inproceedings)
Event Name: AISTATS 2012
Event Place: La Palma, Canary Islands, Spain
Electronic Archiving: grant_archive

BibTex

@inproceedings{DanielNP2012,
  title = {Hierarchical Relative Entropy Policy Search},
  booktitle = {Fifteenth International Conference on Artificial Intelligence and Statistics},
  abstract = {Many real-world problems are inherently hierarchically
  structured. The use of this structure in an agent's policy may well be the key to improved scalability and higher performance. However, such hierarchical structures cannot be exploited by current policy search algorithms. We will concentrate on a basic, but highly relevant hierarchy - the
  `mixed option' policy. Here, a gating network first decides which of the options to execute and, subsequently, the option-policy determines the action. In this paper, we reformulate learning a hierarchical policy as a latent variable estimation problem and subsequently extend the
  Relative Entropy Policy Search (REPS) to the latent variable case. We show that our Hierarchical REPS can learn versatile solutions while also showing an increased performance in terms of learning speed and quality
  of the found policy in comparison to the nonhierarchical
  approach.},
  volume = {22},
  pages = {273--281},
  series = {JMLR Proceedings},
  editors = {Lawrence, N. D. and Girolami, M.},
  publisher = {JMLR.org},
  month = apr,
  year = {2012},
  slug = {danielnp2012},
  author = {Daniel, C. and Neumann, G. and Peters, J.},
  month_numeric = {4}
}