The University of Southampton
University of Southampton Institutional Repository

Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards

Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards
Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards
Many Aladdin problems involve autonomous agents interacting in environments where they must learn and act at the same time. In this report, we consider a specific class of problems where agents have no prior knowledge of the rewards received for the actions they select, which may be typical when agents are acting in a dynamic and uncertain domain. This uncertainty means that agents have to learn as they play, which creates an exploration-exploitation tradeoff to each agent when selecting an action. We use results from both game theory and decision theory to make insights into how agents should act in an unknown environment, and effectively balance this exploration-exploitation tradeoff, which is dependent on the behaviour of the other agents in the environment. In more detail, we investigate 2-player repeated 2×2 games where the payoff (or reward) structure is unknown a priori and the rewards received are observed with noise. We prove that, when an agent selects between the 2 actions using non-explorative strategies, convergence to a Nash equilibrium is not guaranteed in the absence of any additional exploration. Furthermore, we show that an agent that explores using e-greedy exploration, can exploit a non-explorative agent to gain a larger reward in finite time, but only for certain game structures. To this end, approximations of the reward to each agent are constructed for all finite-length 2×2 games, for both explorative and non-explorative strategies, such that the optimal amount of exploration can be approximated. We make use of conditional independence patterns in the decision process, which allow our approximations to scale linearly in the length of the game.
Sykulski, Adam M.
6cec63f1-86f7-435f-8192-cc1fe10d9fad
Adams, Niall M.
fde7ce9b-ec81-432d-99b1-d8643a9bdea5
Jennings, Nicholas R.
569702cf-15b9-4a7f-8e38-d2d5f08cf365
Sykulski, Adam M.
6cec63f1-86f7-435f-8192-cc1fe10d9fad
Adams, Niall M.
fde7ce9b-ec81-432d-99b1-d8643a9bdea5
Jennings, Nicholas R.
569702cf-15b9-4a7f-8e38-d2d5f08cf365

Sykulski, Adam M., Adams, Niall M. and Jennings, Nicholas R. (2010) Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards (In Press)

Record type: Monograph (Project Report)

Abstract

Many Aladdin problems involve autonomous agents interacting in environments where they must learn and act at the same time. In this report, we consider a specific class of problems where agents have no prior knowledge of the rewards received for the actions they select, which may be typical when agents are acting in a dynamic and uncertain domain. This uncertainty means that agents have to learn as they play, which creates an exploration-exploitation tradeoff to each agent when selecting an action. We use results from both game theory and decision theory to make insights into how agents should act in an unknown environment, and effectively balance this exploration-exploitation tradeoff, which is dependent on the behaviour of the other agents in the environment. In more detail, we investigate 2-player repeated 2×2 games where the payoff (or reward) structure is unknown a priori and the rewards received are observed with noise. We prove that, when an agent selects between the 2 actions using non-explorative strategies, convergence to a Nash equilibrium is not guaranteed in the absence of any additional exploration. Furthermore, we show that an agent that explores using e-greedy exploration, can exploit a non-explorative agent to gain a larger reward in finite time, but only for certain game structures. To this end, approximations of the reward to each agent are constructed for all finite-length 2×2 games, for both explorative and non-explorative strategies, such that the optimal amount of exploration can be approximated. We make use of conditional independence patterns in the decision process, which allow our approximations to scale linearly in the length of the game.

Text
ASykulski_UnknownRewards_(2).pdf - Other
Download (293kB)

More information

Accepted/In Press date: March 2010
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 271350
URI: http://eprints.soton.ac.uk/id/eprint/271350
PURE UUID: 1674a454-e97c-4baf-97ea-b1d342ba9eaf

Catalogue record

Date deposited: 06 Jul 2010 00:00
Last modified: 14 Mar 2024 09:29

Export record

Contributors

Author: Adam M. Sykulski
Author: Niall M. Adams
Author: Nicholas R. Jennings

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×