The University of Southampton
University of Southampton Institutional Repository

On-Line Adaptation of Exploration in the One-Armed Bandit with Covariates Problem

On-Line Adaptation of Exploration in the One-Armed Bandit with Covariates Problem
On-Line Adaptation of Exploration in the One-Armed Bandit with Covariates Problem
Many sequential decision making problems require an agent to balance exploration and exploitation to maximise long-term reward. Existing policies that address this tradeoff typically have parameters that are set a priori to control the amount of exploration. In finite-time problems, the optimal values of these parameters are highly dependent on the problem faced. In this paper, we propose adapting the amount of exploration performed on-line, as information is gathered by the agent. To this end we introduce a novel algorithm, e-ADAPT, which has no free parameters. The algorithm adapts as it plays and sequentially chooses whether to explore or exploit, driven by the amount of uncertainty in the system. We provide simulation results for the onearmed bandit with covariates problem, which demonstrate the effectiveness of e-ADAPT to correctly control the amount of exploration in finite-time problems and yield rewards that are close to optimally tuned off-line policies. Furthermore, we show that e-ADAPT is robust to a high-dimensional covariate, as well as misspecified models. Finally, we describe how our methods could be extended to other sequential decision making problems, such as dynamic bandit problems. with changing reward structures.
Exploration-exploitation tradeoff, sequential decision making, on-line learning, one-armed bandit problem
459-464
Sykulski, Adam M.
6cec63f1-86f7-435f-8192-cc1fe10d9fad
Adams, Niall M.
fde7ce9b-ec81-432d-99b1-d8643a9bdea5
Jennings, Nicholas R.
ab3d94cc-247c-4545-9d1e-65873d6cdb30
Sykulski, Adam M.
6cec63f1-86f7-435f-8192-cc1fe10d9fad
Adams, Niall M.
fde7ce9b-ec81-432d-99b1-d8643a9bdea5
Jennings, Nicholas R.
ab3d94cc-247c-4545-9d1e-65873d6cdb30

Sykulski, Adam M., Adams, Niall M. and Jennings, Nicholas R. (2010) On-Line Adaptation of Exploration in the One-Armed Bandit with Covariates Problem. 9th International Conference on Machine Learning and Applications (ICMLA 2010), Washington DC, United States. 12 - 14 Dec 2010. pp. 459-464 .

Record type: Conference or Workshop Item (Other)

Abstract

Many sequential decision making problems require an agent to balance exploration and exploitation to maximise long-term reward. Existing policies that address this tradeoff typically have parameters that are set a priori to control the amount of exploration. In finite-time problems, the optimal values of these parameters are highly dependent on the problem faced. In this paper, we propose adapting the amount of exploration performed on-line, as information is gathered by the agent. To this end we introduce a novel algorithm, e-ADAPT, which has no free parameters. The algorithm adapts as it plays and sequentially chooses whether to explore or exploit, driven by the amount of uncertainty in the system. We provide simulation results for the onearmed bandit with covariates problem, which demonstrate the effectiveness of e-ADAPT to correctly control the amount of exploration in finite-time problems and yield rewards that are close to optimally tuned off-line policies. Furthermore, we show that e-ADAPT is robust to a high-dimensional covariate, as well as misspecified models. Finally, we describe how our methods could be extended to other sequential decision making problems, such as dynamic bandit problems. with changing reward structures.

Text
PID1505865.pdf - Version of Record
Download (166kB)

More information

Published date: December 2010
Additional Information: Event Dates: 12-14 Dec, 2010
Venue - Dates: 9th International Conference on Machine Learning and Applications (ICMLA 2010), Washington DC, United States, 2010-12-12 - 2010-12-14
Keywords: Exploration-exploitation tradeoff, sequential decision making, on-line learning, one-armed bandit problem
Organisations: Agents, Interactions & Complexity

Identifiers

Local EPrints ID: 271615
URI: http://eprints.soton.ac.uk/id/eprint/271615
PURE UUID: 5431d594-7227-4284-97d1-20b1250d4703

Catalogue record

Date deposited: 05 Oct 2010 13:23
Last modified: 14 Mar 2024 09:35

Export record

Contributors

Author: Adam M. Sykulski
Author: Niall M. Adams
Author: Nicholas R. Jennings

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×