Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards

Sykulski, Adam M., Adams, Niall M. and Jennings, Nicholas R. (2010) Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards s.n.


[img] PDF ASykulski_UnknownRewards_(2).pdf - Other
Download (293kB)


Many Aladdin problems involve autonomous agents interacting in environments where they must learn and act at the same time. In this report, we consider a specific class of problems where agents have no prior knowledge of the rewards received for the actions they select, which may be typical when agents are acting in a dynamic and uncertain domain. This uncertainty means that agents have to learn as they play, which creates an exploration-exploitation tradeoff to each agent when selecting an action. We use results from both game theory and decision theory to make insights into how agents should act in an unknown environment, and effectively balance this exploration-exploitation tradeoff, which is dependent on the behaviour of the other agents in the environment. In more detail, we investigate 2-player repeated 2×2 games where the payoff (or reward) structure is unknown a priori and the rewards received are observed with noise. We prove that, when an agent selects between the 2 actions using non-explorative strategies, convergence to a Nash equilibrium is not guaranteed in the absence of any additional exploration. Furthermore, we show that an agent that explores using e-greedy exploration, can exploit a non-explorative agent to gain a larger reward in finite time, but only for certain game structures. To this end, approximations of the reward to each agent are constructed for all finite-length 2×2 games, for both explorative and non-explorative strategies, such that the optimal amount of exploration can be approximated. We make use of conditional independence patterns in the decision process, which allow our approximations to scale linearly in the length of the game.

Item Type: Monograph (Project Report)
Organisations: Electronics & Computer Science
ePrint ID: 271350
Date :
Date Event
March 2010Accepted/In Press
Date Deposited: 06 Jul 2010 00:00
Last Modified: 17 Apr 2017 18:16
Further Information:Google Scholar
URI: http://eprints.soton.ac.uk/id/eprint/271350

Actions (login required)

View Item View Item