Sample-based policy iteration for constrained DEC-POMDPs
Sample-based policy iteration for constrained DEC-POMDPs
We introduce constrained DEC-POMDPs — an extension of the standard DEC-POMDPs that includes constraints on the optimality of the overall team rewards. Constrained DEC-POMDPs present a natural framework for modeling cooperative multi-agent problems with limited resources. To solve such DEC-POMDPs, we propose a novel sample-based policy iteration algorithm. The algorithm builds on multi-agent dynamic programming and benefits from several recent advances in DEC-POMDP algorithms such as MBDP [12] and TBDP [13]. Specifically, it improves the joint policy by solving a series of standard nonlinear programs (NLPs), thereby building on recent advances in NLP solvers. Our experimental results confirm the algorithm can efficiently solve constrained DECPOMDPs that cause general DEC-POMDP algorithms to fail.
978-1-61499-097-0
858-863
Wu, Feng
b79f9800-2819-40c8-96e7-3ad85f866f5e
Jennings, Nicholas
ab3d94cc-247c-4545-9d1e-65873d6cdb30
Chen, Xiaoping
3256467f-026f-4cea-beb6-20948f6f4d93
August 2012
Wu, Feng
b79f9800-2819-40c8-96e7-3ad85f866f5e
Jennings, Nicholas
ab3d94cc-247c-4545-9d1e-65873d6cdb30
Chen, Xiaoping
3256467f-026f-4cea-beb6-20948f6f4d93
Wu, Feng, Jennings, Nicholas and Chen, Xiaoping
(2012)
Sample-based policy iteration for constrained DEC-POMDPs.
ECAI 2012: 20th European Conference on Artificial Intelligence, Montpellier, France.
27 - 31 Aug 2012.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
We introduce constrained DEC-POMDPs — an extension of the standard DEC-POMDPs that includes constraints on the optimality of the overall team rewards. Constrained DEC-POMDPs present a natural framework for modeling cooperative multi-agent problems with limited resources. To solve such DEC-POMDPs, we propose a novel sample-based policy iteration algorithm. The algorithm builds on multi-agent dynamic programming and benefits from several recent advances in DEC-POMDP algorithms such as MBDP [12] and TBDP [13]. Specifically, it improves the joint policy by solving a series of standard nonlinear programs (NLPs), thereby building on recent advances in NLP solvers. Our experimental results confirm the algorithm can efficiently solve constrained DECPOMDPs that cause general DEC-POMDP algorithms to fail.
Text
ecai2012.pdf
- Author's Original
More information
Accepted/In Press date: 27 August 2012
Published date: August 2012
Venue - Dates:
ECAI 2012: 20th European Conference on Artificial Intelligence, Montpellier, France, 2012-08-27 - 2012-08-31
Organisations:
Agents, Interactions & Complexity
Identifiers
Local EPrints ID: 339937
URI: http://eprints.soton.ac.uk/id/eprint/339937
ISBN: 978-1-61499-097-0
PURE UUID: 56153b39-bc64-474c-abb1-a2e7e4f021e5
Catalogue record
Date deposited: 06 Jun 2012 08:45
Last modified: 14 Mar 2024 11:17
Export record
Contributors
Author:
Feng Wu
Author:
Nicholas Jennings
Author:
Xiaoping Chen
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics