Recurrent Policy Gradients

Institute Homepage

Institute Homepage EN Sign In

Back

Empirische Inferenz Article 2010

PDF Web

Empirische Inferenz

Jan Peters

Research Group Leader

Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches. In this paper, we present a policy gradient method, the Recurrent Policy Gradient which constitutes a model-free reinforcement learning method. It is aimed at training limited-memory stochastic policies on problems which require long-term memories of past observations. The approach involves approximating a policy gradient for a recurrent neural network by backpropagating return-weighted characteristic eligibilities through time. Using a Long Short-Term Memory RNN architecture, we are able to outperform previous RL methods on three important benchmark tasks. Furthermore, we show that using history-dependent baselines helps reducing estimation variance significantly, thus enabling our approach to tackle more challenging, highly stochastic environments.

Author(s):	Wierstra, D. and Förster, A. and Peters, J. and Schmidhuber, J.
Links:	PDF Web
Journal:	Logic Journal of the IGPL
Volume:	18
Number (issue):	5
Pages:	620-634
Year:	2010
Month:	October
Day:	0

Bibtex Type:	Article (article)

DOI:	10.1093/jigpal/jzp049

Digital:	0
Electronic Archiving:	grant_archive
Language:	en
Organization:	Max-Planck-Gesellschaft
School:	Biologische Kybernetik

BibTex

@article{5879,
  title = {Recurrent Policy Gradients},
  journal = {Logic Journal of the IGPL},
  abstract = {Reinforcement learning for partially observable Markov decision problems (POMDPs) is a challenge as it requires policies with an internal state. Traditional approaches suffer significantly from this shortcoming and usually make strong assumptions on the problem domain such as perfect system models, state-estimators and a Markovian hidden system. Recurrent neural networks (RNNs) offer a natural framework for dealing with policy learning using hidden state and require only few limiting assumptions. As they can be trained well using gradient descent, they are suited for policy gradient approaches.
  In this paper, we present a policy gradient method, the Recurrent Policy Gradient which constitutes a model-free reinforcement learning method. It is aimed at training limited-memory stochastic policies on problems which require long-term memories of past observations. The approach involves approximating a policy gradient for a recurrent neural network by backpropagating return-weighted characteristic eligibilities through time. Using a Long Short-Term Memory RNN architecture, we are able to outperform previous RL methods on three important benchmark tasks. Furthermore, we show that using history-dependent baselines helps reducing estimation variance significantly, thus enabling our approach to tackle more challenging, highly stochastic environments.},
  volume = {18},
  number = {5},
  pages = {620-634},
  organization = {Max-Planck-Gesellschaft},
  school = {Biologische Kybernetik},
  month = oct,
  year = {2010},
  slug = {5879},
  author = {Wierstra, D. and F{\"o}rster, A. and Peters, J. and Schmidhuber, J.},
  month_numeric = {10}
}

Forschung

Abteilungen

Forschungsgruppen

Personen

Kontakt

Our Institute

Unsere Geschichte

Karriere

Überblick über Promotionsprogramme

Karriere

Service-Einrichtungen

Zentrale Wissenschaftliche Einrichtungen

Werkstätten

Campus Services

Impact

Kooperationen

Initiativen und Partner

Forschung

Abteilungen

Forschungsgruppen

Personen

Kontakt

Our Institute

Unsere Geschichte

Karriere

Überblick über Promotionsprogramme

Karriere

Service-Einrichtungen

Zentrale Wissenschaftliche Einrichtungen

Werkstätten

Campus Services

Impact

Kooperationen

Initiativen und Partner

BibTex