Online learning in the presence of strategic adversary

This thesis offers a comprehensive exploration of the online learning problem in which an agent needs to strategise against a strategic adversary (also known as a no-regret adversary). Through examination of three interrelated settings, we have devised novel algorithms that achieve improved performance guarantees and last round convergence to the Nash Equilibrium both in theoretical and empirical contexts. Our findings open the door to further investigation of complex problems in online learning and game theory, where strategic adversaries play a crucial role in a multitude of applications.
In the first of the three main chapters comprising our study, we examine the problem of playing against a strategic adversary under a two-player zero-sum game setting. In this scenario, we introduce a new no-dynamic regret algorithm, namely the Last Round Convergence of Asymmetric Games (LRCA), that achieves last round convergence to the minimax equilibrium. Building on this work, the second main chapter investigates the more general problem of online linear optimization and proposes several new algorithms, including Online Single Oracle (OSO), Accurate Follow the Regularized Leader (AFTRL), and Prod-Best Response algorithm (Prod-BR). These algorithms achieve state-of-the-art performance guarantees against a strategic adversary, such as no-forward regret and no-dynamic regret. Additionally, we show that a special case of AFTRL, the Accurate Multiplicative Weights Update (AMWU), can achieve last round convergence to the Nash equilibrium in self-play settings. In the third and final main chapter, we extend our results to the challenging setting of Online Markov Decision Processes (OMDPs), which have many significant applications in practice. Here, we propose two new algorithms, MDP-Online Oracle Expert (MDP-OOE) and Last Round Convergence-OMDP (LRC-OMDP), that achieve no-policy regret and last round convergence to the Nash equilibrium, respectively, against a strategic adversary.

University of Southampton

Dinh, Le Cong

e89b4443-9eff-4790-b101-9eabe5ef947c

15 May 2023

Dinh, Le Cong

e89b4443-9eff-4790-b101-9eabe5ef947c

Tran-Thanh, Long

633282bf-f7ff-4137-ada6-6d4f19262676

Zemkoho, Alain

30c79e30-9879-48bd-8d0b-e2fbbc01269e

Nguyen, Tri-Dung

a6aa7081-6bf7-488a-b72f-510328958a8e

Dinh, Le Cong (2023) Online learning in the presence of strategic adversary. University of Southampton, Doctoral Thesis, 156pp.

Record type: Thesis (Doctoral)

Abstract

Text

Le_Cong_Dinh_PhD_Thesis_pdf_A_3b - Version of Record

Available under License University of Southampton Thesis Licence.

Download (56MB)

Text

Final-thesis-submission-Examination-Mr-Le-Dinh

Restricted to Repository staff only

Available under License University of Southampton Thesis Licence.

More information

Published date: 15 May 2023

Related URLs:

Learn more about School of Electronics and Computer Science research

Identifiers

Local EPrints ID: 476759

URI: http://eprints.soton.ac.uk/id/eprint/476759

PURE UUID: 402b1d54-1e64-4a3f-8382-d60c13680f33

ORCID for Le Cong Dinh:

orcid.org/0000-0002-3306-0603

ORCID for Alain Zemkoho:

orcid.org/0000-0003-1265-4178

ORCID for Tri-Dung Nguyen:

orcid.org/0000-0002-4158-9099

Catalogue record

Date deposited: 15 May 2023 16:32

Last modified: 17 Mar 2024 03:37

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Le Cong Dinh

Thesis advisor: Long Tran-Thanh

Thesis advisor: Alain Zemkoho

Thesis advisor: Tri-Dung Nguyen

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information