The University of Southampton
University of Southampton Institutional Repository

TAPE: leveraging agent topology for cooperative multi-agent policy gradient

TAPE: leveraging agent topology for cooperative multi-agent policy gradient
TAPE: leveraging agent topology for cooperative multi-agent policy gradient
Multi-Agent Policy Gradient (MAPG) has made significant progress in recent years. However, centralized critics in state-of-the-art MAPG methods still face the centralized-decentralized mismatch (CDM) issue, which means sub-optimal actions by some agents will affect other agent's policy learning. While using individual critics for policy updates can avoid this issue, they severely limit cooperation among agents. To address this issue, we propose an agent topology framework, which decides whether other agents should be considered in policy gradient and achieves compromise between facilitating cooperation and alleviating the CDM issue. The agent topology allows agents to use coalition utility as learning objective instead of global utility by centralized critics or local utility by individual critics. To constitute the agent topology, various models are studied. We propose Topology-based multi-Agent Policy gradiEnt (TAPE) for both stochastic and deterministic MAPG methods. We prove the policy improvement theorem for stochastic TAPE and give a theoretical explanation for the improved cooperation among agents. Experiment results on several benchmarks show the agent topology is able to facilitate agent cooperation and alleviate CDM issue respectively to improve performance of TAPE. Finally, multiple ablation studies and a heuristic graph search algorithm are devised to show the efficacy of the agent topology.
multiagent learning, reinforcement learning
2159-5399
17496-17504
AAAI Press
Luo, Xingzhou
c1b6445f-e9fc-46a3-924a-df7595f37c43
Zhang, Junge
b2979c02-6a81-4cb8-86da-daa87b81aff4
Norman, Tim
663e522f-807c-4569-9201-dc141c8eb50d
Huang, Kaiqi
cc9e50a8-f09d-493c-9584-dd09b938cc40
Du, Yali
d8afc603-bd01-4e78-908a-46cb5fd4ff4d
Wooldridge, Michael
Dy, Jennifer
Natarajan, Sriraam
Luo, Xingzhou
c1b6445f-e9fc-46a3-924a-df7595f37c43
Zhang, Junge
b2979c02-6a81-4cb8-86da-daa87b81aff4
Norman, Tim
663e522f-807c-4569-9201-dc141c8eb50d
Huang, Kaiqi
cc9e50a8-f09d-493c-9584-dd09b938cc40
Du, Yali
d8afc603-bd01-4e78-908a-46cb5fd4ff4d
Wooldridge, Michael
Dy, Jennifer
Natarajan, Sriraam

Luo, Xingzhou, Zhang, Junge, Norman, Tim, Huang, Kaiqi and Du, Yali (2024) TAPE: leveraging agent topology for cooperative multi-agent policy gradient. Wooldridge, Michael, Dy, Jennifer and Natarajan, Sriraam (eds.) In Proceedings of the 38th AAAI Conference on Artificial Intelligence. vol. 38, AAAI Press. pp. 17496-17504 . (doi:10.1609/aaai.v38i16.29699).

Record type: Conference or Workshop Item (Paper)

Abstract

Multi-Agent Policy Gradient (MAPG) has made significant progress in recent years. However, centralized critics in state-of-the-art MAPG methods still face the centralized-decentralized mismatch (CDM) issue, which means sub-optimal actions by some agents will affect other agent's policy learning. While using individual critics for policy updates can avoid this issue, they severely limit cooperation among agents. To address this issue, we propose an agent topology framework, which decides whether other agents should be considered in policy gradient and achieves compromise between facilitating cooperation and alleviating the CDM issue. The agent topology allows agents to use coalition utility as learning objective instead of global utility by centralized critics or local utility by individual critics. To constitute the agent topology, various models are studied. We propose Topology-based multi-Agent Policy gradiEnt (TAPE) for both stochastic and deterministic MAPG methods. We prove the policy improvement theorem for stochastic TAPE and give a theoretical explanation for the improved cooperation among agents. Experiment results on several benchmarks show the agent topology is able to facilitate agent cooperation and alleviate CDM issue respectively to improve performance of TAPE. Finally, multiple ablation studies and a heuristic graph search algorithm are devised to show the efficacy of the agent topology.

This record has no associated files available for download.

More information

Published date: 24 March 2024
Additional Information: Publisher Copyright: Copyright © 2024, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved.
Venue - Dates: The 38th Annual AAAI Conference on Artificial Intelligence, Vancouver Convention Centre, Vancouver, Canada, 2024-02-20 - 2024-02-27
Keywords: multiagent learning, reinforcement learning

Identifiers

Local EPrints ID: 490157
URI: http://eprints.soton.ac.uk/id/eprint/490157
ISSN: 2159-5399
PURE UUID: 8695cadb-7598-431b-a101-eb44f76e5494
ORCID for Tim Norman: ORCID iD orcid.org/0000-0002-6387-4034

Catalogue record

Date deposited: 16 May 2024 16:34
Last modified: 31 Jul 2024 01:48

Export record

Altmetrics

Contributors

Author: Xingzhou Luo
Author: Junge Zhang
Author: Tim Norman ORCID iD
Author: Kaiqi Huang
Author: Yali Du
Editor: Michael Wooldridge
Editor: Jennifer Dy
Editor: Sriraam Natarajan

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×