Empirical Inference
Article
2011
Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning
Web
Empirical Inference
Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments.
| Author(s): | Hachiya, H. and Peters, J. and Sugiyama, M. |
| Links: | |
| Journal: | Neural Computation |
| Volume: | 23 |
| Number (issue): | 11 |
| Pages: | 2798-2832 |
| Year: | 2011 |
| Month: | November |
| Day: | 0 |
| BibTeX Type: | Article (article) |
| DOI: | 10.1162/NECO_a_00199 |
| Digital: | 0 |
| Electronic Archiving: | grant_archive |
BibTeX
@article{HachiyaPS2011,
title = {Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning},
journal = {Neural Computation},
abstract = {Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments. },
volume = {23},
number = {11},
pages = {2798-2832},
month = nov,
year = {2011},
author = {Hachiya, H. and Peters, J. and Sugiyama, M.},
doi = {10.1162/NECO_a_00199},
month_numeric = {11}
}
