Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards

Many Aladdin problems involve autonomous agents interacting in environments where they must learn and act at the same time. In this report, we consider a specific class of problems where agents have no prior knowledge of the rewards received for the actions they select, which may be typical when agents are acting in a dynamic and uncertain domain. This uncertainty means that agents have to learn as they play, which creates an exploration-exploitation tradeoff to each agent when selecting an action. We use results from both game theory and decision theory to make insights into how agents should act in an unknown environment, and effectively balance this exploration-exploitation tradeoff, which is dependent on the behaviour of the other agents in the environment. In more detail, we investigate 2-player repeated 2×2 games where the payoff (or reward) structure is unknown a priori and the rewards received are observed with noise. We prove that, when an agent selects between the 2 actions using non-explorative strategies, convergence to a Nash equilibrium is not guaranteed in the absence of any additional exploration. Furthermore, we show that an agent that explores using e-greedy exploration, can exploit a non-explorative agent to gain a larger reward in finite time, but only for certain game structures. To this end, approximations of the reward to each agent are constructed for all finite-length 2×2 games, for both explorative and non-explorative strategies, such that the optimal amount of exploration can be approximated. We make use of conditional independence patterns in the decision process, which allow our approximations to scale linearly in the length of the game.

Sykulski, Adam M.

6cec63f1-86f7-435f-8192-cc1fe10d9fad

Adams, Niall M.

fde7ce9b-ec81-432d-99b1-d8643a9bdea5

Jennings, Nicholas R.

569702cf-15b9-4a7f-8e38-d2d5f08cf365

Sykulski, Adam M.

6cec63f1-86f7-435f-8192-cc1fe10d9fad

Adams, Niall M.

fde7ce9b-ec81-432d-99b1-d8643a9bdea5

Jennings, Nicholas R.

569702cf-15b9-4a7f-8e38-d2d5f08cf365

Sykulski, Adam M., Adams, Niall M. and Jennings, Nicholas R. (2010) Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards (In Press)

Record type: Monograph (Project Report)

Abstract

Text

ASykulski_UnknownRewards_(2).pdf - Other

Download (293kB)

More information

Accepted/In Press date: March 2010

Organisations: Electronics & Computer Science

Learn more about Electronics & Computer Science research

Identifiers

Local EPrints ID: 271350

URI: http://eprints.soton.ac.uk/id/eprint/271350

PURE UUID: 1674a454-e97c-4baf-97ea-b1d342ba9eaf

Catalogue record

Date deposited: 06 Jul 2010 00:00

Last modified: 21 Aug 2025 08:56

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Adam M. Sykulski

Author: Niall M. Adams

Author: Nicholas R. Jennings

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information