Online double oracle

Solving strategic games with huge action spaces is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) from game theory.
Our method---\emph{Online Double Oracle (ODO)}---is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO, ODO is \emph{rational} in the sense that each agent in ODO can exploit a strategic adversary with a regret bound of $\mathcal{O}(\sqrt{ k \log(k)/T})$, where $k$ is not the total number of pure strategies, but rather the size of \emph{effective strategy set}. In many applications, we empirically show that $k$ is linearly dependent on the support size of the NE. On tens of different real-world matrix games, ODO outperforms DO, PSRO, and no-regret algorithms such as Multiplicative Weights Update by a significant margin, both in terms of convergence rate to a NE, and average payoff against strategic adversaries.

Online Learning, Adversary, Solving large games

Dinh, Le Cong

e89b4443-9eff-4790-b101-9eabe5ef947c

McAleer, Stephen

171096cb-2dba-42b3-8f49-65489563e355

Tian, Zheng

36dfd681-de2c-4727-b559-8057a82fb27e

Perez-Nieves, Nicolas

c8368423-4515-440a-961b-572e00b7a7b9

Slumbers, Oliver

92efb6e4-08bb-46f5-857d-385f5b0d7316

Mguni, David Henry

69cefca7-a4cd-449d-a004-e6cdec19ec5c

Wang, Jun

314d9b85-aba4-4b91-85a9-17bbe661144d

Bou Ammar, Haitham

c1d4f122-d413-4786-8ee2-002f6ac48f38

Yang, Yaodong

ab0292c3-8ed7-4220-af1a-3af6ac0c0d46

Dinh, Le Cong

e89b4443-9eff-4790-b101-9eabe5ef947c

McAleer, Stephen

171096cb-2dba-42b3-8f49-65489563e355

Tian, Zheng

36dfd681-de2c-4727-b559-8057a82fb27e

Perez-Nieves, Nicolas

c8368423-4515-440a-961b-572e00b7a7b9

Slumbers, Oliver

92efb6e4-08bb-46f5-857d-385f5b0d7316

Mguni, David Henry

69cefca7-a4cd-449d-a004-e6cdec19ec5c

Wang, Jun

314d9b85-aba4-4b91-85a9-17bbe661144d

Bou Ammar, Haitham

c1d4f122-d413-4786-8ee2-002f6ac48f38

Yang, Yaodong

ab0292c3-8ed7-4220-af1a-3af6ac0c0d46

Dinh, Le Cong, McAleer, Stephen, Tian, Zheng, Perez-Nieves, Nicolas, Slumbers, Oliver, Mguni, David Henry, Wang, Jun, Bou Ammar, Haitham and Yang, Yaodong (2022) Online double oracle. TMLR: Transactions on Machine Learning Research. (In Press)

Record type: Article

Abstract

Text

2103.07780 - Accepted Manuscript

Restricted to Repository staff only

Request a copy

Text

online_double_oracle - Version of Record

Available under License Creative Commons Attribution.

Download (3MB)

More information

Accepted/In Press date: 4 October 2022

Related URLs:

Keywords: Online Learning, Adversary, Solving large games

Learn more about School of Electronics and Computer Science research

Identifiers

Local EPrints ID: 471822

URI: http://eprints.soton.ac.uk/id/eprint/471822

PURE UUID: f7321c57-0cf5-42d4-a18a-bcb70ae52e7a

ORCID for Le Cong Dinh:

orcid.org/0000-0002-3306-0603

Catalogue record

Date deposited: 21 Nov 2022 17:44

Last modified: 16 Mar 2024 22:51

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Le Cong Dinh

Author: Stephen McAleer

Author: Zheng Tian

Author: Nicolas Perez-Nieves

Author: Oliver Slumbers

Author: David Henry Mguni

Author: Jun Wang

Author: Haitham Bou Ammar

Author: Yaodong Yang

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information