The University of Southampton
University of Southampton Institutional Repository

The Exploration-Exploitation Tradeoff in Sequential Decision Making Problems

The Exploration-Exploitation Tradeoff in Sequential Decision Making Problems
The Exploration-Exploitation Tradeoff in Sequential Decision Making Problems
Sequential decision making problems often require an agent to act in an environment where data is noisy or not fully observed. The agent will have to learn how different actions relate to different rewards, and must therefore balance the need to explore and exploit in an effective strategy. In this report, sequential decision making problems are considered through extensions of the multi-armed bandit framework. Firstly, the bandit problem is extended to a Multi-Agent System (MAS), where agents control individual arms but can communicate potentially useful information with each other. This framework allows for a better understanding of the exploration-exploitation tradeoff in scenarios where there are multiple agents interacting in a noisy environment. To this end, we present a novel strategy for action and communication decisions and we demonstrate the benefits of such a strategy empirically. This motivates a theoretical analysis of one-armed bandit problems, to develop ideas of how different strategies are optimally tuned. Specifically, the expected rewards of e-greedy strategies are derived, as well as proofs governing their optimal tuning.
s.n.
Sykulski, Adam M.
6cec63f1-86f7-435f-8192-cc1fe10d9fad
Sykulski, Adam M.
6cec63f1-86f7-435f-8192-cc1fe10d9fad

Sykulski, Adam M. (2009) The Exploration-Exploitation Tradeoff in Sequential Decision Making Problems s.n.

Record type: Monograph (Project Report)

Abstract

Sequential decision making problems often require an agent to act in an environment where data is noisy or not fully observed. The agent will have to learn how different actions relate to different rewards, and must therefore balance the need to explore and exploit in an effective strategy. In this report, sequential decision making problems are considered through extensions of the multi-armed bandit framework. Firstly, the bandit problem is extended to a Multi-Agent System (MAS), where agents control individual arms but can communicate potentially useful information with each other. This framework allows for a better understanding of the exploration-exploitation tradeoff in scenarios where there are multiple agents interacting in a noisy environment. To this end, we present a novel strategy for action and communication decisions and we demonstrate the benefits of such a strategy empirically. This motivates a theoretical analysis of one-armed bandit problems, to develop ideas of how different strategies are optimally tuned. Specifically, the expected rewards of e-greedy strategies are derived, as well as proofs governing their optimal tuning.

PDF
TRANSFER_REPORT_ADAM.PDF - Other
Download (355kB)

More information

Accepted/In Press date: May 2009
Organisations: Electronics & Computer Science

Identifiers

Local EPrints ID: 271349
URI: https://eprints.soton.ac.uk/id/eprint/271349
PURE UUID: 70edf59d-1af0-4e43-910d-ce4dca841ee0

Catalogue record

Date deposited: 05 Jul 2010 23:53
Last modified: 18 Jul 2017 06:44

Export record

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×