Coherent Inference on Optimal Play in Game Trees

Institute Homepage

Institute Homepage Sign In

Back

Empirische Inferenz Probabilistic Numerics Conference Paper 2010

PDF Web

Probabilistic Numerics, Empirische Inferenz

Philipp Hennig

Affiliated Researcher

Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time.

Author(s):	Hennig, P. and Stern, D. and Graepel, T.
Links:	PDF Web
Book Title:	JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010
Pages:	326-333
Year:	2010
Month:	May
Day:	0
Editors:	Teh, Y.W. , M. Titterington
Publisher:	JMLR

Bibtex Type:	Conference Paper (inproceedings)

Address:	Cambridge, MA, USA
Event Name:	Thirteenth International Conference on Artificial Intelligence and Statistics
Event Place:	Chia Laguna Resort, Italy

Digital:	0
Electronic Archiving:	grant_archive

BibTex

@inproceedings{HennigSG2010,
  title = {Coherent Inference on Optimal Play in Game Trees},
  booktitle = {JMLR Workshop and Conference Proceedings Volume 9: AISTATS 2010},
  abstract = {Round-based games are an instance of discrete planning problems. Some of the best contemporary game tree search algorithms use random roll-outs as data. Relying on a good policy, they learn on-policy values by propagating information upwards in the tree, but not between sibling nodes. Here, we present a generative model and a corresponding approximate message passing scheme for inference on the optimal, off-policy value of nodes in smooth AND/OR trees, given random roll-outs. The crucial insight is that the distribution of values in game trees is not completely arbitrary. We define a generative model of the on-policy values using a latent score for each state, representing the value under the random roll-out policy. Inference on the values under the optimal policy separates into an inductive, pre-data step and a deductive, post-data part. Both can be solved approximately with Expectation Propagation, allowing off-policy value inference for any node in the (exponentially big) tree in linear time. },
  pages = {326-333},
  editors = {Teh, Y.W. , M. Titterington },
  publisher = {JMLR},
  address = {Cambridge, MA, USA},
  month = may,
  year = {2010},
  slug = {hennigsg2010},
  author = {Hennig, P. and Stern, D. and Graepel, T.},
  month_numeric = {5}
}