The University of Southampton
University of Southampton Institutional Repository

DCOPS and bandits: Exploration and exploitation in decentralised coordination

DCOPS and bandits: Exploration and exploitation in decentralised coordination
DCOPS and bandits: Exploration and exploitation in decentralised coordination
Real life coordination problems are characterised by stochasticity and a lack of a priori knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely adopted framework for modelling decentralised coordination tasks, assumes perfect knowledge of these factors, thus limiting its practical applicability. To address this shortcoming, we introduce the MAB–DCOP, in which the interactions between agents are modelled by multi-armed bandits (MABs). Unlike canonical DCOPs, a MAB–DCOP is not a single shot optimisation problem. Rather, it is a sequential one in which agents need to coordinate in order to strike a balance between acquiring knowledge about the a priori unknown and stochastic interactions (exploration), and taking the currently believed optimal joint action (exploitation), so as to maximise the cumulative global utility over a finite time horizon. We propose Heist, the first asymptotically optimal algorithm for coordination under stochasticity and lack of prior knowledge. Heist solves MAB–DCOPs in a decentralised fashion using a generalised distributive law (GDL) message passing phase to find the joint action with the highest upper confidence bound (UCB) on global utility. We demonstrate that Heist outperforms other state of the art techniques from the MAB and DCOP literature by up to 1.5 orders of magnitude on MAB–DCOPs in experimental settings.
0-9817381-2-5
289-297
Stranders, Ruben
cca79d07-0668-4231-a80f-5fae6617644c
Tran-Thanh, Long
e0666669-d34b-460e-950d-e8b139fab16c
Delle Fave, Francesco Maria
1a71a79a-fb96-4bfc-9158-36f8dcb5d96f
Rogers, Alex
f9130bc6-da32-474e-9fab-6c6cb8077fdc
Jennings, Nick
ab3d94cc-247c-4545-9d1e-65873d6cdb30
Stranders, Ruben
cca79d07-0668-4231-a80f-5fae6617644c
Tran-Thanh, Long
e0666669-d34b-460e-950d-e8b139fab16c
Delle Fave, Francesco Maria
1a71a79a-fb96-4bfc-9158-36f8dcb5d96f
Rogers, Alex
f9130bc6-da32-474e-9fab-6c6cb8077fdc
Jennings, Nick
ab3d94cc-247c-4545-9d1e-65873d6cdb30

Stranders, Ruben, Tran-Thanh, Long, Delle Fave, Francesco Maria, Rogers, Alex and Jennings, Nick (2012) DCOPS and bandits: Exploration and exploitation in decentralised coordination. Proc. 11th Int. Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Valencia, Spain. pp. 289-297 .

Record type: Conference or Workshop Item (Paper)

Abstract

Real life coordination problems are characterised by stochasticity and a lack of a priori knowledge about the interactions between agents. However, decentralised constraint optimisation problems (DCOPs), a widely adopted framework for modelling decentralised coordination tasks, assumes perfect knowledge of these factors, thus limiting its practical applicability. To address this shortcoming, we introduce the MAB–DCOP, in which the interactions between agents are modelled by multi-armed bandits (MABs). Unlike canonical DCOPs, a MAB–DCOP is not a single shot optimisation problem. Rather, it is a sequential one in which agents need to coordinate in order to strike a balance between acquiring knowledge about the a priori unknown and stochastic interactions (exploration), and taking the currently believed optimal joint action (exploitation), so as to maximise the cumulative global utility over a finite time horizon. We propose Heist, the first asymptotically optimal algorithm for coordination under stochasticity and lack of prior knowledge. Heist solves MAB–DCOPs in a decentralised fashion using a generalised distributive law (GDL) message passing phase to find the joint action with the highest upper confidence bound (UCB) on global utility. We demonstrate that Heist outperforms other state of the art techniques from the MAB and DCOP literature by up to 1.5 orders of magnitude on MAB–DCOPs in experimental settings.

Text
mab_dcops.pdf - Accepted Manuscript
Download (740kB)

More information

Published date: 2012
Venue - Dates: Proc. 11th Int. Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), Valencia, Spain, 2012-06-01
Organisations: Agents, Interactions & Complexity

Identifiers

Local EPrints ID: 273086
URI: http://eprints.soton.ac.uk/id/eprint/273086
ISBN: 0-9817381-2-5
PURE UUID: e6741286-3bd1-408c-801c-e263708df141
ORCID for Long Tran-Thanh: ORCID iD orcid.org/0000-0003-1617-8316

Catalogue record

Date deposited: 02 Jan 2012 11:01
Last modified: 14 Mar 2024 10:18

Export record

Contributors

Author: Ruben Stranders
Author: Long Tran-Thanh ORCID iD
Author: Francesco Maria Delle Fave
Author: Alex Rogers
Author: Nick Jennings

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×