Coherent Inference on Optimal Play in Game Trees
PDF WebRound-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.
| Author(s): | Hennig, P. and Stern, D. and Graepel, T. |
| Links: | |
| Book Title: | JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010 |
| Pages: | 326-333 |
| Year: | 2010 |
| Month: | May |
| Day: | 0 |
| Editors: | Teh, Y.W. , M. Titterington |
| Publisher: | JMLR |
| BibTeX Type: | Conference Paper (inproceedings) |
| Address: | Cambridge, MA, USA |
| Event Name: | Thirteenth International Conference on Artificial Intelligence and Statistics |
| Event Place: | Chia Laguna Resort, Italy |
| Digital: | 0 |
| Electronic Archiving: | grant_archive |
BibTeX
@inproceedings{HennigSG2010,
title = {Coherent Inference on Optimal Play in Game Trees},
booktitle = {JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010},
abstract = {Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time. },
pages = {326-333},
editors = {Teh, Y.W. , M. Titterington },
publisher = {JMLR},
address = {Cambridge, MA, USA},
month = may,
year = {2010},
author = {Hennig, P. and Stern, D. and Graepel, T.},
month_numeric = {5}
}