The University of Southampton
University of Southampton Institutional Repository

Sample-based policy iteration for constrained DEC-POMDPs

Sample-based policy iteration for constrained DEC-POMDPs
Sample-based policy iteration for constrained DEC-POMDPs
We introduce constrained DEC-POMDPs — an extension of the standard DEC-POMDPs that includes constraints on the optimality of the overall team rewards. Constrained DEC-POMDPs present a natural framework for modeling cooperative multi-agent problems with limited resources. To solve such DEC-POMDPs, we propose a novel sample-based policy iteration algorithm. The algorithm builds on multi-agent dynamic programming and benefits from several recent advances in DEC-POMDP algorithms such as MBDP [12] and TBDP [13]. Specifically, it improves the joint policy by solving a series of standard nonlinear programs (NLPs), thereby building on recent advances in NLP solvers. Our experimental results confirm the algorithm can efficiently solve constrained DECPOMDPs that cause general DEC-POMDP algorithms to fail.
978-1-61499-097-0
858-863
Wu, Feng
b79f9800-2819-40c8-96e7-3ad85f866f5e
Jennings, Nicholas
ab3d94cc-247c-4545-9d1e-65873d6cdb30
Chen, Xiaoping
3256467f-026f-4cea-beb6-20948f6f4d93
Wu, Feng
b79f9800-2819-40c8-96e7-3ad85f866f5e
Jennings, Nicholas
ab3d94cc-247c-4545-9d1e-65873d6cdb30
Chen, Xiaoping
3256467f-026f-4cea-beb6-20948f6f4d93

Wu, Feng, Jennings, Nicholas and Chen, Xiaoping (2012) Sample-based policy iteration for constrained DEC-POMDPs. ECAI 2012: 20th European Conference on Artificial Intelligence, Montpellier, France. 27 - 31 Aug 2012. pp. 858-863 .

Record type: Conference or Workshop Item (Paper)

Abstract

We introduce constrained DEC-POMDPs — an extension of the standard DEC-POMDPs that includes constraints on the optimality of the overall team rewards. Constrained DEC-POMDPs present a natural framework for modeling cooperative multi-agent problems with limited resources. To solve such DEC-POMDPs, we propose a novel sample-based policy iteration algorithm. The algorithm builds on multi-agent dynamic programming and benefits from several recent advances in DEC-POMDP algorithms such as MBDP [12] and TBDP [13]. Specifically, it improves the joint policy by solving a series of standard nonlinear programs (NLPs), thereby building on recent advances in NLP solvers. Our experimental results confirm the algorithm can efficiently solve constrained DECPOMDPs that cause general DEC-POMDP algorithms to fail.

Text
ecai2012.pdf - Author's Original
Download (272kB)

More information

Accepted/In Press date: 27 August 2012
Published date: August 2012
Venue - Dates: ECAI 2012: 20th European Conference on Artificial Intelligence, Montpellier, France, 2012-08-27 - 2012-08-31
Organisations: Agents, Interactions & Complexity

Identifiers

Local EPrints ID: 339937
URI: http://eprints.soton.ac.uk/id/eprint/339937
ISBN: 978-1-61499-097-0
PURE UUID: 56153b39-bc64-474c-abb1-a2e7e4f021e5

Catalogue record

Date deposited: 06 Jun 2012 08:45
Last modified: 14 Mar 2024 11:17

Export record

Contributors

Author: Feng Wu
Author: Nicholas Jennings
Author: Xiaoping Chen

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×