The University of Southampton
University of Southampton Institutional Repository

Online Markov decision processes with non-oblivious strategic adversary

Online Markov decision processes with non-oblivious strategic adversary
Online Markov decision processes with non-oblivious strategic adversary

We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of O(Tlog(L)+τ2Tlog(|A|)) where L is the size of adversary’s pure strategy set and | A| denotes the size of agent’s action space.Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of O(Tlog(L)+τ2Tklog(k)) where k depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can solve games with prohibitively large action space. Finally, to better understand the learning dynamics of no-regret methods, under the same setting of no-external regret adversary in OMDPs, we introduce an algorithm that achieves last-round convergence to a NE result. To our best knowledge, this is the first work leading to the last iteration result in OMDPs.

Game theory, Last round convergence, Multi-agent system, Non-oblivious adversary, Online Markov decision processes, Online learning
1387-2532
Dinh, Le Cong
e89b4443-9eff-4790-b101-9eabe5ef947c
Mguni, David Henry
69cefca7-a4cd-449d-a004-e6cdec19ec5c
Tran-Thanh, Long
633282bf-f7ff-4137-ada6-6d4f19262676
Wang, Jun
314d9b85-aba4-4b91-85a9-17bbe661144d
Yang, Yaodong
ab0292c3-8ed7-4220-af1a-3af6ac0c0d46
Dinh, Le Cong
e89b4443-9eff-4790-b101-9eabe5ef947c
Mguni, David Henry
69cefca7-a4cd-449d-a004-e6cdec19ec5c
Tran-Thanh, Long
633282bf-f7ff-4137-ada6-6d4f19262676
Wang, Jun
314d9b85-aba4-4b91-85a9-17bbe661144d
Yang, Yaodong
ab0292c3-8ed7-4220-af1a-3af6ac0c0d46

Dinh, Le Cong, Mguni, David Henry, Tran-Thanh, Long, Wang, Jun and Yang, Yaodong (2023) Online Markov decision processes with non-oblivious strategic adversary. Autonomous Agents and Multi-Agent Systems, 37 (1), [15]. (doi:10.1007/s10458-023-09599-5).

Record type: Article

Abstract

We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of O(Tlog(L)+τ2Tlog(|A|)) where L is the size of adversary’s pure strategy set and | A| denotes the size of agent’s action space.Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of O(Tlog(L)+τ2Tklog(k)) where k depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can solve games with prohibitively large action space. Finally, to better understand the learning dynamics of no-regret methods, under the same setting of no-external regret adversary in OMDPs, we introduce an algorithm that achieves last-round convergence to a NE result. To our best knowledge, this is the first work leading to the last iteration result in OMDPs.

Text
JAAMAS_Online_Markov_Decision_Processes - Accepted Manuscript
Download (5MB)

More information

Accepted/In Press date: 4 January 2023
Published date: 27 January 2023
Additional Information: Publisher Copyright: © 2023, Springer Science+Business Media, LLC, part of Springer Nature.
Keywords: Game theory, Last round convergence, Multi-agent system, Non-oblivious adversary, Online Markov decision processes, Online learning

Identifiers

Local EPrints ID: 474212
URI: http://eprints.soton.ac.uk/id/eprint/474212
ISSN: 1387-2532
PURE UUID: 80f0a4ec-9952-4947-922f-0abaf661bf75
ORCID for Le Cong Dinh: ORCID iD orcid.org/0000-0002-3306-0603

Catalogue record

Date deposited: 16 Feb 2023 17:34
Last modified: 17 Mar 2024 07:41

Export record

Altmetrics

Contributors

Author: Le Cong Dinh ORCID iD
Author: David Henry Mguni
Author: Long Tran-Thanh
Author: Jun Wang
Author: Yaodong Yang

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×