Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards
Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards
Many Aladdin problems involve autonomous agents interacting in environments where they must learn and act at the same time. In this report, we consider a specific class of problems where agents have no prior knowledge of the rewards received for the actions they select, which may be typical when agents are acting in a dynamic and uncertain domain. This uncertainty means that agents have to learn as they play, which creates an exploration-exploitation tradeoff to each agent when selecting an action. We use results from both game theory and decision theory to make insights into how agents should act in an unknown environment, and effectively balance this exploration-exploitation tradeoff, which is dependent on the behaviour of the other agents in the environment. In more detail, we investigate 2-player repeated 2×2 games where the payoff (or reward) structure is unknown a priori and the rewards received are observed with noise. We prove that, when an agent selects between the 2 actions using non-explorative strategies, convergence to a Nash equilibrium is not guaranteed in the absence of any additional exploration. Furthermore, we show that an agent that explores using e-greedy exploration, can exploit a non-explorative agent to gain a larger reward in finite time, but only for certain game structures. To this end, approximations of the reward to each agent are constructed for all finite-length 2×2 games, for both explorative and non-explorative strategies, such that the optimal amount of exploration can be approximated. We make use of conditional independence patterns in the decision process, which allow our approximations to scale linearly in the length of the game.
Sykulski, Adam M.
6cec63f1-86f7-435f-8192-cc1fe10d9fad
Adams, Niall M.
fde7ce9b-ec81-432d-99b1-d8643a9bdea5
Jennings, Nicholas R.
569702cf-15b9-4a7f-8e38-d2d5f08cf365
Sykulski, Adam M.
6cec63f1-86f7-435f-8192-cc1fe10d9fad
Adams, Niall M.
fde7ce9b-ec81-432d-99b1-d8643a9bdea5
Jennings, Nicholas R.
569702cf-15b9-4a7f-8e38-d2d5f08cf365
Sykulski, Adam M., Adams, Niall M. and Jennings, Nicholas R.
(2010)
Exploitation by Exploration: 2-player Repeated 2×2 Games with Unknown Rewards
(In Press)
Record type:
Monograph
(Project Report)
Abstract
Many Aladdin problems involve autonomous agents interacting in environments where they must learn and act at the same time. In this report, we consider a specific class of problems where agents have no prior knowledge of the rewards received for the actions they select, which may be typical when agents are acting in a dynamic and uncertain domain. This uncertainty means that agents have to learn as they play, which creates an exploration-exploitation tradeoff to each agent when selecting an action. We use results from both game theory and decision theory to make insights into how agents should act in an unknown environment, and effectively balance this exploration-exploitation tradeoff, which is dependent on the behaviour of the other agents in the environment. In more detail, we investigate 2-player repeated 2×2 games where the payoff (or reward) structure is unknown a priori and the rewards received are observed with noise. We prove that, when an agent selects between the 2 actions using non-explorative strategies, convergence to a Nash equilibrium is not guaranteed in the absence of any additional exploration. Furthermore, we show that an agent that explores using e-greedy exploration, can exploit a non-explorative agent to gain a larger reward in finite time, but only for certain game structures. To this end, approximations of the reward to each agent are constructed for all finite-length 2×2 games, for both explorative and non-explorative strategies, such that the optimal amount of exploration can be approximated. We make use of conditional independence patterns in the decision process, which allow our approximations to scale linearly in the length of the game.
Text
ASykulski_UnknownRewards_(2).pdf
- Other
More information
Accepted/In Press date: March 2010
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 271350
URI: http://eprints.soton.ac.uk/id/eprint/271350
PURE UUID: 1674a454-e97c-4baf-97ea-b1d342ba9eaf
Catalogue record
Date deposited: 06 Jul 2010 00:00
Last modified: 14 Mar 2024 09:29
Export record
Contributors
Author:
Adam M. Sykulski
Author:
Niall M. Adams
Author:
Nicholas R. Jennings
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics