The University of Southampton
University of Southampton Institutional Repository

Bayesian mixture modelling and inference based Thompson sampling in Monte-Carlo tree search

Bayesian mixture modelling and inference based Thompson sampling in Monte-Carlo tree search
Bayesian mixture modelling and inference based Thompson sampling in Monte-Carlo tree search
Monte-Carlo tree search is drawing great interest in the domain of planning under uncertainty, particularly when little or no domain knowledge is available. One of the central problems is the trade-off between exploration and exploitation. In this paper we present a novel Bayesian mixture modelling and inference based Thompson sampling approach to addressing this dilemma. The proposed Dirichlet-NormalGamma MCTS (DNG-MCTS) algorithm represents the uncertainty of the accumulated reward for actions in the MCTS search tree as a mixture of Normal distributions and inferences on it in Bayesian settings by choosing conjugate priors in the form of combinations of Dirichlet and NormalGamma distributions. Thompson sampling is used to select the best action at each decision node. Experimental results show that our proposed algorithm has achieved the state-of-the-art comparing with popular UCT algorithm in the context of online planning for general Markov decision processes
Bai, Aijun
e2d2c724-6e95-4394-88a8-e66bff8221e8
Wu, Feng
b79f9800-2819-40c8-96e7-3ad85f866f5e
Chen, Xiaoping
3256467f-026f-4cea-beb6-20948f6f4d93
Bai, Aijun
e2d2c724-6e95-4394-88a8-e66bff8221e8
Wu, Feng
b79f9800-2819-40c8-96e7-3ad85f866f5e
Chen, Xiaoping
3256467f-026f-4cea-beb6-20948f6f4d93

Bai, Aijun, Wu, Feng and Chen, Xiaoping (2013) Bayesian mixture modelling and inference based Thompson sampling in Monte-Carlo tree search. Advances in Neural Information Processing Systems (NIPS-13), Nevada, United States. 05 - 10 Dec 2013.

Record type: Conference or Workshop Item (Paper)

Abstract

Monte-Carlo tree search is drawing great interest in the domain of planning under uncertainty, particularly when little or no domain knowledge is available. One of the central problems is the trade-off between exploration and exploitation. In this paper we present a novel Bayesian mixture modelling and inference based Thompson sampling approach to addressing this dilemma. The proposed Dirichlet-NormalGamma MCTS (DNG-MCTS) algorithm represents the uncertainty of the accumulated reward for actions in the MCTS search tree as a mixture of Normal distributions and inferences on it in Bayesian settings by choosing conjugate priors in the form of combinations of Dirichlet and NormalGamma distributions. Thompson sampling is used to select the best action at each decision node. Experimental results show that our proposed algorithm has achieved the state-of-the-art comparing with popular UCT algorithm in the context of online planning for general Markov decision processes

Text
805.pdf - Version of Record
Download (361kB)

More information

Published date: December 2013
Venue - Dates: Advances in Neural Information Processing Systems (NIPS-13), Nevada, United States, 2013-12-05 - 2013-12-10
Organisations: Agents, Interactions & Complexity

Identifiers

Local EPrints ID: 360941
URI: http://eprints.soton.ac.uk/id/eprint/360941
PURE UUID: e563733f-5632-47b9-94d6-c0ef86da77a4

Catalogue record

Date deposited: 10 Jan 2014 09:35
Last modified: 09 Apr 2020 16:31

Export record

Contributors

Author: Aijun Bai
Author: Feng Wu
Author: Xiaoping Chen

University divisions

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×