Efficient Sample Reuse in EM-Based Policy Search

PDF Web

Empirical Inference

Jan Peters

Research Group Leader

Empirical Inference

Hirotaka Hachiya

Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend a EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse, is demonstrated through a robot learning experiment.

Author(s):	Hachiya, H. and Peters, J. and Sugiyama, M.
Links:	PDF Web
Book Title:	16th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Journal:	Machine Learning and Knowledge Discovery in Databases: European Conference ECML PKDD 2009
Pages:	469-484
Year:	2009
Month:	September
Day:	0
Editors:	Buntine, W. , M. Grobelnik, D. Mladenic, J. Shawe-Taylor
Publisher:	Springer

BibTeX Type:	Conference Paper (inproceedings)

Address:	Berlin, Germany
DOI:	10.1007/978-3-642-04180-8_48
Event Name:	ECML PKDD 2009
Event Place:	Bled, Slovenia

Electronic Archiving:	grant_archive
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

BibTeX

@inproceedings{6068,
  title = {Efficient Sample Reuse in EM-Based Policy Search},
  journal = {Machine Learning and Knowledge Discovery in Databases: European Conference ECML PKDD 2009},
  booktitle = {16th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases},
  abstract = {Direct policy search is a promising reinforcement learning framework in particular for controlling in continuous, high-dimensional systems such as anthropomorphic robots. Policy search often requires a large number of samples for obtaining a stable policy update estimator due to its high flexibility. However, this is prohibitive when the sampling cost is expensive. In this paper, we extend a EM-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, called Reward-weighted Regression with sample Reuse, is demonstrated through a robot learning experiment.},
  pages = {469-484},
  editors = {Buntine, W. , M. Grobelnik, D. Mladenic, J. Shawe-Taylor},
  publisher = {Springer},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  address = {Berlin, Germany},
  month = sep,
  year = {2009},
  author = {Hachiya, H. and Peters, J. and Sugiyama, M.},
  doi = {10.1007/978-3-642-04180-8_48},
  month_numeric = {9}
}