Online Markov decision processes with non-oblivious strategic adversary
Online Markov decision processes with non-oblivious strategic adversary
We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of O(Tlog(L)+τ2Tlog(|A|)) where L is the size of adversary’s pure strategy set and | A| denotes the size of agent’s action space.Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of O(Tlog(L)+τ2Tklog(k)) where k depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can solve games with prohibitively large action space. Finally, to better understand the learning dynamics of no-regret methods, under the same setting of no-external regret adversary in OMDPs, we introduce an algorithm that achieves last-round convergence to a NE result. To our best knowledge, this is the first work leading to the last iteration result in OMDPs.
Game theory, Last round convergence, Multi-agent system, Non-oblivious adversary, Online Markov decision processes, Online learning
Dinh, Le Cong
e89b4443-9eff-4790-b101-9eabe5ef947c
Mguni, David Henry
69cefca7-a4cd-449d-a004-e6cdec19ec5c
Tran-Thanh, Long
633282bf-f7ff-4137-ada6-6d4f19262676
Wang, Jun
314d9b85-aba4-4b91-85a9-17bbe661144d
Yang, Yaodong
ab0292c3-8ed7-4220-af1a-3af6ac0c0d46
27 January 2023
Dinh, Le Cong
e89b4443-9eff-4790-b101-9eabe5ef947c
Mguni, David Henry
69cefca7-a4cd-449d-a004-e6cdec19ec5c
Tran-Thanh, Long
633282bf-f7ff-4137-ada6-6d4f19262676
Wang, Jun
314d9b85-aba4-4b91-85a9-17bbe661144d
Yang, Yaodong
ab0292c3-8ed7-4220-af1a-3af6ac0c0d46
Dinh, Le Cong, Mguni, David Henry, Tran-Thanh, Long, Wang, Jun and Yang, Yaodong
(2023)
Online Markov decision processes with non-oblivious strategic adversary.
Autonomous Agents and Multi-Agent Systems, 37 (1), [15].
(doi:10.1007/s10458-023-09599-5).
Abstract
We study a novel setting in Online Markov Decision Processes (OMDPs) where the loss function is chosen by a non-oblivious strategic adversary who follows a no-external regret algorithm. In this setting, we first demonstrate that MDP-Expert, an existing algorithm that works well with oblivious adversaries can still apply and achieve a policy regret bound of O(Tlog(L)+τ2Tlog(|A|)) where L is the size of adversary’s pure strategy set and | A| denotes the size of agent’s action space.Considering real-world games where the support size of a NE is small, we further propose a new algorithm: MDP-Online Oracle Expert (MDP-OOE), that achieves a policy regret bound of O(Tlog(L)+τ2Tklog(k)) where k depends only on the support size of the NE. MDP-OOE leverages the key benefit of Double Oracle in game theory and thus can solve games with prohibitively large action space. Finally, to better understand the learning dynamics of no-regret methods, under the same setting of no-external regret adversary in OMDPs, we introduce an algorithm that achieves last-round convergence to a NE result. To our best knowledge, this is the first work leading to the last iteration result in OMDPs.
Text
JAAMAS_Online_Markov_Decision_Processes
- Accepted Manuscript
More information
Accepted/In Press date: 4 January 2023
Published date: 27 January 2023
Additional Information:
Publisher Copyright:
© 2023, Springer Science+Business Media, LLC, part of Springer Nature.
Keywords:
Game theory, Last round convergence, Multi-agent system, Non-oblivious adversary, Online Markov decision processes, Online learning
Identifiers
Local EPrints ID: 474212
URI: http://eprints.soton.ac.uk/id/eprint/474212
ISSN: 1387-2532
PURE UUID: 80f0a4ec-9952-4947-922f-0abaf661bf75
Catalogue record
Date deposited: 16 Feb 2023 17:34
Last modified: 17 Mar 2024 07:41
Export record
Altmetrics
Contributors
Author:
Le Cong Dinh
Author:
David Henry Mguni
Author:
Long Tran-Thanh
Author:
Jun Wang
Author:
Yaodong Yang
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics