The University of Southampton
University of Southampton Institutional Repository

Thompson sampling based Monte-Carlo planning in POMDPs

Thompson sampling based Monte-Carlo planning in POMDPs
Thompson sampling based Monte-Carlo planning in POMDPs
Monte-Carlo tree search (MCTS) has been drawing
great interest in recent years for planning under uncertainty. One of the key challenges is the tradeoff
between exploration and exploitation. To address
this, we introduce a novel online planning algorithm
for large POMDPs using Thompson sampling based
MCTS that balances between cumulative and simple regrets.
The proposed algorithm — Dirichlet-Dirichlet-
NormalGamma based Partially Observable Monte-
Carlo Planning (D2NG-POMCP) — treats the accumulated
reward of performing an action from a belief
state in the MCTS search tree as a random variable following
an unknown distribution with hidden parameters.
Bayesian method is used to model and infer the
posterior distribution of these parameters by choosing
the conjugate prior in the form of a combination of two
Dirichlet and one NormalGamma distributions. Thompson
sampling is exploited to guide the action selection in
the search tree. Experimental results confirmed that our
algorithm outperforms the state-of-the-art approaches
on several common benchmark problems.
Bai, Aijun
e2d2c724-6e95-4394-88a8-e66bff8221e8
Wu, Feng
b79f9800-2819-40c8-96e7-3ad85f866f5e
Zhang, Zongzhang
9d159fa8-62ec-4dbe-b202-0c9b960e03f7
Chen, Xiaoping
3256467f-026f-4cea-beb6-20948f6f4d93
Bai, Aijun
e2d2c724-6e95-4394-88a8-e66bff8221e8
Wu, Feng
b79f9800-2819-40c8-96e7-3ad85f866f5e
Zhang, Zongzhang
9d159fa8-62ec-4dbe-b202-0c9b960e03f7
Chen, Xiaoping
3256467f-026f-4cea-beb6-20948f6f4d93

Bai, Aijun, Wu, Feng, Zhang, Zongzhang and Chen, Xiaoping (2014) Thompson sampling based Monte-Carlo planning in POMDPs. Proceedings of the 24th International Conference on Automated Planning and Scheduling (ICAPS-14), Portsmouth, United States. 21 - 26 Jun 2014. (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

Monte-Carlo tree search (MCTS) has been drawing
great interest in recent years for planning under uncertainty. One of the key challenges is the tradeoff
between exploration and exploitation. To address
this, we introduce a novel online planning algorithm
for large POMDPs using Thompson sampling based
MCTS that balances between cumulative and simple regrets.
The proposed algorithm — Dirichlet-Dirichlet-
NormalGamma based Partially Observable Monte-
Carlo Planning (D2NG-POMCP) — treats the accumulated
reward of performing an action from a belief
state in the MCTS search tree as a random variable following
an unknown distribution with hidden parameters.
Bayesian method is used to model and infer the
posterior distribution of these parameters by choosing
the conjugate prior in the form of a combination of two
Dirichlet and one NormalGamma distributions. Thompson
sampling is exploited to guide the action selection in
the search tree. Experimental results confirmed that our
algorithm outperforms the state-of-the-art approaches
on several common benchmark problems.

Text
full.pdf - Author's Original
Download (361kB)

More information

Accepted/In Press date: June 2014
Venue - Dates: Proceedings of the 24th International Conference on Automated Planning and Scheduling (ICAPS-14), Portsmouth, United States, 2014-06-21 - 2014-06-26
Organisations: Agents, Interactions & Complexity

Identifiers

Local EPrints ID: 360985
URI: http://eprints.soton.ac.uk/id/eprint/360985
PURE UUID: 1054b540-c04e-4424-833d-a2949c9e365f

Catalogue record

Date deposited: 14 Jan 2014 10:39
Last modified: 14 Mar 2024 15:44

Export record

Contributors

Author: Aijun Bai
Author: Feng Wu
Author: Zongzhang Zhang
Author: Xiaoping Chen

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×