The University of Southampton
University of Southampton Institutional Repository

Online double oracle

Online double oracle
Online double oracle
Solving strategic games with huge action spaces is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) from game theory.
Our method---\emph{Online Double Oracle (ODO)}---is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO, ODO is \emph{rational} in the sense that each agent in ODO can exploit a strategic adversary with a regret bound of $\mathcal{O}(\sqrt{ k \log(k)/T})$, where $k$ is not the total number of pure strategies, but rather the size of \emph{effective strategy set}. In many applications, we empirically show that $k$ is linearly dependent on the support size of the NE. On tens of different real-world matrix games, ODO outperforms DO, PSRO, and no-regret algorithms such as Multiplicative Weights Update by a significant margin, both in terms of convergence rate to a NE, and average payoff against strategic adversaries.
Online Learning, Adversary, Solving large games
Dinh, Le Cong
e89b4443-9eff-4790-b101-9eabe5ef947c
McAleer, Stephen
171096cb-2dba-42b3-8f49-65489563e355
Tian, Zheng
36dfd681-de2c-4727-b559-8057a82fb27e
Perez-Nieves, Nicolas
c8368423-4515-440a-961b-572e00b7a7b9
Slumbers, Oliver
92efb6e4-08bb-46f5-857d-385f5b0d7316
Mguni, David Henry
69cefca7-a4cd-449d-a004-e6cdec19ec5c
Wang, Jun
314d9b85-aba4-4b91-85a9-17bbe661144d
Bou Ammar, Haitham
c1d4f122-d413-4786-8ee2-002f6ac48f38
Yang, Yaodong
ab0292c3-8ed7-4220-af1a-3af6ac0c0d46
Dinh, Le Cong
e89b4443-9eff-4790-b101-9eabe5ef947c
McAleer, Stephen
171096cb-2dba-42b3-8f49-65489563e355
Tian, Zheng
36dfd681-de2c-4727-b559-8057a82fb27e
Perez-Nieves, Nicolas
c8368423-4515-440a-961b-572e00b7a7b9
Slumbers, Oliver
92efb6e4-08bb-46f5-857d-385f5b0d7316
Mguni, David Henry
69cefca7-a4cd-449d-a004-e6cdec19ec5c
Wang, Jun
314d9b85-aba4-4b91-85a9-17bbe661144d
Bou Ammar, Haitham
c1d4f122-d413-4786-8ee2-002f6ac48f38
Yang, Yaodong
ab0292c3-8ed7-4220-af1a-3af6ac0c0d46

Dinh, Le Cong, McAleer, Stephen, Tian, Zheng, Perez-Nieves, Nicolas, Slumbers, Oliver, Mguni, David Henry, Wang, Jun, Bou Ammar, Haitham and Yang, Yaodong (2022) Online double oracle. TMLR: Transactions on Machine Learning Research. (In Press)

Record type: Article

Abstract

Solving strategic games with huge action spaces is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) from game theory.
Our method---\emph{Online Double Oracle (ODO)}---is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO, ODO is \emph{rational} in the sense that each agent in ODO can exploit a strategic adversary with a regret bound of $\mathcal{O}(\sqrt{ k \log(k)/T})$, where $k$ is not the total number of pure strategies, but rather the size of \emph{effective strategy set}. In many applications, we empirically show that $k$ is linearly dependent on the support size of the NE. On tens of different real-world matrix games, ODO outperforms DO, PSRO, and no-regret algorithms such as Multiplicative Weights Update by a significant margin, both in terms of convergence rate to a NE, and average payoff against strategic adversaries.

Text
2103.07780 - Accepted Manuscript
Restricted to Repository staff only
Request a copy
Text
online_double_oracle - Version of Record
Available under License Creative Commons Attribution.
Download (3MB)

More information

Accepted/In Press date: 4 October 2022
Keywords: Online Learning, Adversary, Solving large games

Identifiers

Local EPrints ID: 471822
URI: http://eprints.soton.ac.uk/id/eprint/471822
PURE UUID: f7321c57-0cf5-42d4-a18a-bcb70ae52e7a
ORCID for Le Cong Dinh: ORCID iD orcid.org/0000-0002-3306-0603

Catalogue record

Date deposited: 21 Nov 2022 17:44
Last modified: 16 Mar 2024 22:51

Export record

Contributors

Author: Le Cong Dinh ORCID iD
Author: Stephen McAleer
Author: Zheng Tian
Author: Nicolas Perez-Nieves
Author: Oliver Slumbers
Author: David Henry Mguni
Author: Jun Wang
Author: Haitham Bou Ammar
Author: Yaodong Yang

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×