The University of Southampton
University of Southampton Institutional Repository

Multi-agent actor-critic with time dynamical opponent model

Multi-agent actor-critic with time dynamical opponent model
Multi-agent actor-critic with time dynamical opponent model
In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other. Since the agents adapt their policies during learning, not only the behavior of a single agent becomes non-stationary, but also the environment as perceived by the agent. This renders it particularly challenging to perform policy improvement. In this paper, we propose to exploit the fact that the agents seek to improve their expected cumulative reward and introduce a novel Time Dynamical Opponent Model (TDOM) to encode the knowledge that the opponent policies tend to improve over time. We motivate TDOM theoretically by deriving a lower bound of the log objective of an individual agent and further propose Multi-Agent Actor-Critic with Time Dynamical Opponent Model (TDOM-AC). We evaluate the proposed TDOM-AC on a differential game and the Multi-agent Particle Environment. We show empirically that TDOM achieves superior opponent behavior prediction during test time. The proposed TDOM-AC methodology outperforms state-of-the-art Actor-Critic methods on the performed tasks in cooperative and especially in mixed cooperative-competitive environments. TDOM-AC results in a more stable training and a faster convergence. Our code is available at https://github.com/Yuantian013/TDOM-AC.
0925-2312
165-172
Tian, Yuan
c66ed5b1-2e87-4c26-8bd8-5dc1314cc268
Kladny, Klaus-Rudolf
9c62dd91-9a32-4bcd-b7bc-24fbb7e5d2fe
Wang, Qin
b018eb23-13bc-4226-a1ed-7b4951cca7af
Huang, Zhiwu
84f477cd-9097-44dd-a33e-ff71f253d36b
Fink, Olga
1902ad46-555e-498e-8117-2bcb12b4958a
Tian, Yuan
c66ed5b1-2e87-4c26-8bd8-5dc1314cc268
Kladny, Klaus-Rudolf
9c62dd91-9a32-4bcd-b7bc-24fbb7e5d2fe
Wang, Qin
b018eb23-13bc-4226-a1ed-7b4951cca7af
Huang, Zhiwu
84f477cd-9097-44dd-a33e-ff71f253d36b
Fink, Olga
1902ad46-555e-498e-8117-2bcb12b4958a

Tian, Yuan, Kladny, Klaus-Rudolf, Wang, Qin, Huang, Zhiwu and Fink, Olga (2023) Multi-agent actor-critic with time dynamical opponent model. Neurocomputing, 517, 165-172. (doi:10.1016/j.neucom.2022.10.045).

Record type: Article

Abstract

In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other. Since the agents adapt their policies during learning, not only the behavior of a single agent becomes non-stationary, but also the environment as perceived by the agent. This renders it particularly challenging to perform policy improvement. In this paper, we propose to exploit the fact that the agents seek to improve their expected cumulative reward and introduce a novel Time Dynamical Opponent Model (TDOM) to encode the knowledge that the opponent policies tend to improve over time. We motivate TDOM theoretically by deriving a lower bound of the log objective of an individual agent and further propose Multi-Agent Actor-Critic with Time Dynamical Opponent Model (TDOM-AC). We evaluate the proposed TDOM-AC on a differential game and the Multi-agent Particle Environment. We show empirically that TDOM achieves superior opponent behavior prediction during test time. The proposed TDOM-AC methodology outperforms state-of-the-art Actor-Critic methods on the performed tasks in cooperative and especially in mixed cooperative-competitive environments. TDOM-AC results in a more stable training and a faster convergence. Our code is available at https://github.com/Yuantian013/TDOM-AC.

This record has no associated files available for download.

More information

Accepted/In Press date: 18 October 2022
Published date: 14 January 2023

Identifiers

Local EPrints ID: 501646
URI: http://eprints.soton.ac.uk/id/eprint/501646
ISSN: 0925-2312
PURE UUID: 7b895a4b-99e0-40bd-ba23-e416420dc400
ORCID for Zhiwu Huang: ORCID iD orcid.org/0000-0002-7385-079X

Catalogue record

Date deposited: 04 Jun 2025 17:12
Last modified: 05 Jun 2025 02:08

Export record

Altmetrics

Contributors

Author: Yuan Tian
Author: Klaus-Rudolf Kladny
Author: Qin Wang
Author: Zhiwu Huang ORCID iD
Author: Olga Fink

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×