Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

Web

Empirical Inference

Jan Peters

Research Group Leader

Empirical Inference

Hirotaka Hachiya

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments.

Author(s):	Hachiya, H. and Peters, J. and Sugiyama, M.
Links:	Web
Journal:	Neural Computation
Volume:	23
Number (issue):	11
Pages:	2798-2832
Year:	2011
Month:	November
Day:	0

BibTeX Type:	Article (article)

DOI:	10.1162/NECO_a_00199

Digital:	0
Electronic Archiving:	grant_archive

BibTeX

@article{HachiyaPS2011,
  title = {Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning},
  journal = {Neural Computation},
  abstract = {Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R), is demonstrated through robot learning experiments. },
  volume = {23},
  number = {11},
  pages = {2798-2832},
  month = nov,
  year = {2011},
  author = {Hachiya, H. and Peters, J. and Sugiyama, M.},
  doi = {10.1162/NECO_a_00199},
  month_numeric = {11}
}