Policy gradient methods
2010
Article
ei
Policy gradient methods are a type of reinforcement learning techniques that rely upon optimizing parametrized policies with respect to the expected return (long-term cumulative reward) by gradient descent. They do not suffer from many of the problems that have been marring traditional reinforcement learning approaches such as the lack of guarantees of a value function, the intractability problem resulting from uncertain state information and the complexity arising from continuous states & actions.
Author(s): | Peters, J. |
Journal: | Scholarpedia |
Volume: | 5 |
Number (issue): | 11 |
Pages: | 3698 |
Year: | 2010 |
Month: | November |
Day: | 0 |
Department(s): | Empirical Inference |
Bibtex Type: | Article (article) |
Digital: | 0 |
DOI: | 10.4249/scholarpedia.3698 |
Language: | en |
Organization: | Max-Planck-Gesellschaft |
School: | Biologische Kybernetik |
Links: |
Web
|
BibTex @article{6940, title = {Policy gradient methods}, author = {Peters, J.}, journal = {Scholarpedia}, volume = {5}, number = {11}, pages = {3698}, organization = {Max-Planck-Gesellschaft}, school = {Biologische Kybernetik}, month = nov, year = {2010}, doi = {10.4249/scholarpedia.3698}, month_numeric = {11} } |