Online learning in the presence of strategic adversary
Online learning in the presence of strategic adversary
This thesis offers a comprehensive exploration of the online learning problem in which an agent needs to strategise against a strategic adversary (also known as a no-regret adversary). Through examination of three interrelated settings, we have devised novel algorithms that achieve improved performance guarantees and last round convergence to the Nash Equilibrium both in theoretical and empirical contexts. Our findings open the door to further investigation of complex problems in online learning and game theory, where strategic adversaries play a crucial role in a multitude of applications.
In the first of the three main chapters comprising our study, we examine the problem of playing against a strategic adversary under a two-player zero-sum game setting. In this scenario, we introduce a new no-dynamic regret algorithm, namely the Last Round Convergence of Asymmetric Games (LRCA), that achieves last round convergence to the minimax equilibrium. Building on this work, the second main chapter investigates the more general problem of online linear optimization and proposes several new algorithms, including Online Single Oracle (OSO), Accurate Follow the Regularized Leader (AFTRL), and Prod-Best Response algorithm (Prod-BR). These algorithms achieve state-of-the-art performance guarantees against a strategic adversary, such as no-forward regret and no-dynamic regret. Additionally, we show that a special case of AFTRL, the Accurate Multiplicative Weights Update (AMWU), can achieve last round convergence to the Nash equilibrium in self-play settings. In the third and final main chapter, we extend our results to the challenging setting of Online Markov Decision Processes (OMDPs), which have many significant applications in practice. Here, we propose two new algorithms, MDP-Online Oracle Expert (MDP-OOE) and Last Round Convergence-OMDP (LRC-OMDP), that achieve no-policy regret and last round convergence to the Nash equilibrium, respectively, against a strategic adversary.
University of Southampton
Dinh, Le Cong
e89b4443-9eff-4790-b101-9eabe5ef947c
15 May 2023
Dinh, Le Cong
e89b4443-9eff-4790-b101-9eabe5ef947c
Tran-Thanh, Long
633282bf-f7ff-4137-ada6-6d4f19262676
Zemkoho, Alain
30c79e30-9879-48bd-8d0b-e2fbbc01269e
Nguyen, Tri-Dung
a6aa7081-6bf7-488a-b72f-510328958a8e
Dinh, Le Cong
(2023)
Online learning in the presence of strategic adversary.
University of Southampton, Doctoral Thesis, 156pp.
Record type:
Thesis
(Doctoral)
Abstract
This thesis offers a comprehensive exploration of the online learning problem in which an agent needs to strategise against a strategic adversary (also known as a no-regret adversary). Through examination of three interrelated settings, we have devised novel algorithms that achieve improved performance guarantees and last round convergence to the Nash Equilibrium both in theoretical and empirical contexts. Our findings open the door to further investigation of complex problems in online learning and game theory, where strategic adversaries play a crucial role in a multitude of applications.
In the first of the three main chapters comprising our study, we examine the problem of playing against a strategic adversary under a two-player zero-sum game setting. In this scenario, we introduce a new no-dynamic regret algorithm, namely the Last Round Convergence of Asymmetric Games (LRCA), that achieves last round convergence to the minimax equilibrium. Building on this work, the second main chapter investigates the more general problem of online linear optimization and proposes several new algorithms, including Online Single Oracle (OSO), Accurate Follow the Regularized Leader (AFTRL), and Prod-Best Response algorithm (Prod-BR). These algorithms achieve state-of-the-art performance guarantees against a strategic adversary, such as no-forward regret and no-dynamic regret. Additionally, we show that a special case of AFTRL, the Accurate Multiplicative Weights Update (AMWU), can achieve last round convergence to the Nash equilibrium in self-play settings. In the third and final main chapter, we extend our results to the challenging setting of Online Markov Decision Processes (OMDPs), which have many significant applications in practice. Here, we propose two new algorithms, MDP-Online Oracle Expert (MDP-OOE) and Last Round Convergence-OMDP (LRC-OMDP), that achieve no-policy regret and last round convergence to the Nash equilibrium, respectively, against a strategic adversary.
Text
Le_Cong_Dinh_PhD_Thesis_pdf_A_3b
- Version of Record
Text
Final-thesis-submission-Examination-Mr-Le-Dinh
Restricted to Repository staff only
More information
Published date: 15 May 2023
Identifiers
Local EPrints ID: 476759
URI: http://eprints.soton.ac.uk/id/eprint/476759
PURE UUID: 402b1d54-1e64-4a3f-8382-d60c13680f33
Catalogue record
Date deposited: 15 May 2023 16:32
Last modified: 17 Mar 2024 03:37
Export record
Contributors
Author:
Le Cong Dinh
Thesis advisor:
Long Tran-Thanh
Thesis advisor:
Tri-Dung Nguyen
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics