Online double oracle
Online double oracle
Solving strategic games with huge action spaces is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) from game theory.
Our method---\emph{Online Double Oracle (ODO)}---is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO, ODO is \emph{rational} in the sense that each agent in ODO can exploit a strategic adversary with a regret bound of $\mathcal{O}(\sqrt{ k \log(k)/T})$, where $k$ is not the total number of pure strategies, but rather the size of \emph{effective strategy set}. In many applications, we empirically show that $k$ is linearly dependent on the support size of the NE. On tens of different real-world matrix games, ODO outperforms DO, PSRO, and no-regret algorithms such as Multiplicative Weights Update by a significant margin, both in terms of convergence rate to a NE, and average payoff against strategic adversaries.
Online Learning, Adversary, Solving large games
Dinh, Le Cong
e89b4443-9eff-4790-b101-9eabe5ef947c
McAleer, Stephen
171096cb-2dba-42b3-8f49-65489563e355
Tian, Zheng
36dfd681-de2c-4727-b559-8057a82fb27e
Perez-Nieves, Nicolas
c8368423-4515-440a-961b-572e00b7a7b9
Slumbers, Oliver
92efb6e4-08bb-46f5-857d-385f5b0d7316
Mguni, David Henry
69cefca7-a4cd-449d-a004-e6cdec19ec5c
Wang, Jun
314d9b85-aba4-4b91-85a9-17bbe661144d
Bou Ammar, Haitham
c1d4f122-d413-4786-8ee2-002f6ac48f38
Yang, Yaodong
ab0292c3-8ed7-4220-af1a-3af6ac0c0d46
Dinh, Le Cong
e89b4443-9eff-4790-b101-9eabe5ef947c
McAleer, Stephen
171096cb-2dba-42b3-8f49-65489563e355
Tian, Zheng
36dfd681-de2c-4727-b559-8057a82fb27e
Perez-Nieves, Nicolas
c8368423-4515-440a-961b-572e00b7a7b9
Slumbers, Oliver
92efb6e4-08bb-46f5-857d-385f5b0d7316
Mguni, David Henry
69cefca7-a4cd-449d-a004-e6cdec19ec5c
Wang, Jun
314d9b85-aba4-4b91-85a9-17bbe661144d
Bou Ammar, Haitham
c1d4f122-d413-4786-8ee2-002f6ac48f38
Yang, Yaodong
ab0292c3-8ed7-4220-af1a-3af6ac0c0d46
Dinh, Le Cong, McAleer, Stephen, Tian, Zheng, Perez-Nieves, Nicolas, Slumbers, Oliver, Mguni, David Henry, Wang, Jun, Bou Ammar, Haitham and Yang, Yaodong
(2022)
Online double oracle.
TMLR: Transactions on Machine Learning Research.
(In Press)
Abstract
Solving strategic games with huge action spaces is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) from game theory.
Our method---\emph{Online Double Oracle (ODO)}---is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO, ODO is \emph{rational} in the sense that each agent in ODO can exploit a strategic adversary with a regret bound of $\mathcal{O}(\sqrt{ k \log(k)/T})$, where $k$ is not the total number of pure strategies, but rather the size of \emph{effective strategy set}. In many applications, we empirically show that $k$ is linearly dependent on the support size of the NE. On tens of different real-world matrix games, ODO outperforms DO, PSRO, and no-regret algorithms such as Multiplicative Weights Update by a significant margin, both in terms of convergence rate to a NE, and average payoff against strategic adversaries.
Text
2103.07780
- Accepted Manuscript
Restricted to Repository staff only
Request a copy
Text
online_double_oracle
- Version of Record
More information
Accepted/In Press date: 4 October 2022
Keywords:
Online Learning, Adversary, Solving large games
Identifiers
Local EPrints ID: 471822
URI: http://eprints.soton.ac.uk/id/eprint/471822
PURE UUID: f7321c57-0cf5-42d4-a18a-bcb70ae52e7a
Catalogue record
Date deposited: 21 Nov 2022 17:44
Last modified: 16 Mar 2024 22:51
Export record
Contributors
Author:
Le Cong Dinh
Author:
Stephen McAleer
Author:
Zheng Tian
Author:
Nicolas Perez-Nieves
Author:
Oliver Slumbers
Author:
David Henry Mguni
Author:
Jun Wang
Author:
Haitham Bou Ammar
Author:
Yaodong Yang
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics