# Online double oracle

Dinh, Le Cong, McAleer, Stephen, Tian, Zheng, Perez-Nieves, Nicolas, Slumbers, Oliver, Mguni, David Henry, Wang, Jun, Bou Ammar, Haitham and Yang, Yaodong (2022) Online double oracle. TMLR: Transactions on Machine Learning Research. (In Press)

Record type: Article

## Abstract

Solving strategic games with huge action spaces is a critical yet under-explored topic in economics, operations research and artificial intelligence. This paper proposes new learning algorithms for solving two-player zero-sum normal-form games where the number of pure strategies is prohibitively large. Specifically, we combine no-regret analysis from online learning with Double Oracle (DO) from game theory.
Our method---\emph{Online Double Oracle (ODO)}---is provably convergent to a Nash equilibrium (NE). Most importantly, unlike normal DO, ODO is \emph{rational} in the sense that each agent in ODO can exploit a strategic adversary with a regret bound of $\mathcal{O}(\sqrt{ k \log(k)/T})$, where $k$ is not the total number of pure strategies, but rather the size of \emph{effective strategy set}. In many applications, we empirically show that $k$ is linearly dependent on the support size of the NE. On tens of different real-world matrix games, ODO outperforms DO, PSRO, and no-regret algorithms such as Multiplicative Weights Update by a significant margin, both in terms of convergence rate to a NE, and average payoff against strategic adversaries.

Accepted/In Press date: 4 October 2022
Keywords: Online Learning, Adversary, Solving large games

## Contributors

Author: Le Cong Dinh
Author: Stephen McAleer
Author: Zheng Tian
Author: Nicolas Perez-Nieves
Author: Oliver Slumbers
Author: David Henry Mguni
Author: Jun Wang
Author: Haitham Bou Ammar
Author: Yaodong Yang