The University of Southampton
University of Southampton Institutional Repository

Budget-limited multi-armed bandits

Budget-limited multi-armed bandits
Budget-limited multi-armed bandits
Decision making under uncertainty is one of the most important challenges within the research field of artificial intelligence, as they present many everyday situations that agents have to face. Within these situations, an agent has to choose from a set of options, whose payoff is uncertain (i.e. unknown and nondeterministic) to the agent. Common to such decision making problems is the need of balancing between exploration and exploitation, where the agent, in order to maximise its total payoff, must decide whether to choose the option expected to provide the best payoff (exploitation) or to try an alternative option for potential future benefit (exploration). Among many decision under uncertainty abstractions, multi–armed bandits are perhaps one of the most common and best studied, as they present one of the clearest examples of the trade–off between exploration and exploitation. Whilst the standard bandit model has a broad applicability, it does not completely describe a number of real–world decision making problems. Specifically, in many cases, pulling choice of arm (i.e. making a decision)is further constrained by several costs or limitations. In this thesis, we introduce the budget–limited bandit model, a variant of the standard bandits, in which pulling an arm is costly, and is limited by a fixed budget. This model is motivated by a number of real–world applications, such as wireless sensor networks, or online advertisement.

We demonstrate that our bandit model cannot be reduced to other existing bandits, as it requires a different optimal behaviour. Given this, the main objective of this thesis
is to provide novel pulling algorithms that efficiently tackle the budget–limited bandit problem. Such algorithms, however, have to meet a number of requirements from both
the empirical and the theoretical perspectives. The former refers to the constraints desired by the motivations of real–world applications, whilst the latter aims to provide
theoretical performance guarantees. To begin with, we propose a simple pulling algorithm, the budget–limited ?–first, that addresses the empirical requirements. In more detail, the budget–limited ?–first algorithm is an empirically efficient algorithm with low computational cost, which, however, does not fulfil the theoretical requirements. To provide theoretical guarantees, we introduce two budget–limited UCB based algorithms, namely: KUBE and fractional KUBE,iv that efficiently tackle the theoretical requirements. In particular, we prove that these algorithms achieve asymptotically optimal performance regret bounds, which only differ from the best optimal bound by a constant factor. However, we demonstrate in extensive simulations that these algorithms are typically outperformed by the budget–limited ?–first. As a result, to efficiently trade off between theoretical and empirical requirements, we develop two decreasing ?–greedy based approaches, namely: KDE and fractional KDE, that achieve good performance from both the theoretical and the empirical perspective. Specifically, we show that, similar to the budget–limited UCB based algorithms, both KDE and fractional KDE achieve asymptotically optimal performance regret bounds. In addition, we also demonstrate that these algorithms perform well, compared to the budget–limited ? first.

To provide a grounding for the algorithms we develop, the second part of this thesis contains a running example of a wireless sensor network (WSN) scenario, in which we tackle
the problem of long–term information collection, a key research challenge within the domain of WSNs. In more detail, we demonstrate that by using the budget–limited bandit algorithms, we advance the state–of–the–art within this domain. In so doing, we first decompose
the problem of long–term information collection into two sub–problems, namely the energy management and the maximal information throughput routing problems. We then tackle the former with a budget–limited multi–armed bandit based approach, and we propose an optimal decentralised algorithm for the latter. Following this, we demonstrate that the budget–limited bandit based energy management, in conjunction with the optimal routing algorithm, outperforms the state–of–the–art information collecting algorithms in the domain of WSNs.
Tran-Thanh, Long
e0666669-d34b-460e-950d-e8b139fab16c
Tran-Thanh, Long
e0666669-d34b-460e-950d-e8b139fab16c
Jennings, Nicholas
ab3d94cc-247c-4545-9d1e-65873d6cdb30
Rogers, Alexander
f9130bc6-da32-474e-9fab-6c6cb8077fdc

Tran-Thanh, Long (2012) Budget-limited multi-armed bandits. University of Southampton, Faculty of Physical and Applied Sciences, Doctoral Thesis, 173pp.

Record type: Thesis (Doctoral)

Abstract

Decision making under uncertainty is one of the most important challenges within the research field of artificial intelligence, as they present many everyday situations that agents have to face. Within these situations, an agent has to choose from a set of options, whose payoff is uncertain (i.e. unknown and nondeterministic) to the agent. Common to such decision making problems is the need of balancing between exploration and exploitation, where the agent, in order to maximise its total payoff, must decide whether to choose the option expected to provide the best payoff (exploitation) or to try an alternative option for potential future benefit (exploration). Among many decision under uncertainty abstractions, multi–armed bandits are perhaps one of the most common and best studied, as they present one of the clearest examples of the trade–off between exploration and exploitation. Whilst the standard bandit model has a broad applicability, it does not completely describe a number of real–world decision making problems. Specifically, in many cases, pulling choice of arm (i.e. making a decision)is further constrained by several costs or limitations. In this thesis, we introduce the budget–limited bandit model, a variant of the standard bandits, in which pulling an arm is costly, and is limited by a fixed budget. This model is motivated by a number of real–world applications, such as wireless sensor networks, or online advertisement.

We demonstrate that our bandit model cannot be reduced to other existing bandits, as it requires a different optimal behaviour. Given this, the main objective of this thesis
is to provide novel pulling algorithms that efficiently tackle the budget–limited bandit problem. Such algorithms, however, have to meet a number of requirements from both
the empirical and the theoretical perspectives. The former refers to the constraints desired by the motivations of real–world applications, whilst the latter aims to provide
theoretical performance guarantees. To begin with, we propose a simple pulling algorithm, the budget–limited ?–first, that addresses the empirical requirements. In more detail, the budget–limited ?–first algorithm is an empirically efficient algorithm with low computational cost, which, however, does not fulfil the theoretical requirements. To provide theoretical guarantees, we introduce two budget–limited UCB based algorithms, namely: KUBE and fractional KUBE,iv that efficiently tackle the theoretical requirements. In particular, we prove that these algorithms achieve asymptotically optimal performance regret bounds, which only differ from the best optimal bound by a constant factor. However, we demonstrate in extensive simulations that these algorithms are typically outperformed by the budget–limited ?–first. As a result, to efficiently trade off between theoretical and empirical requirements, we develop two decreasing ?–greedy based approaches, namely: KDE and fractional KDE, that achieve good performance from both the theoretical and the empirical perspective. Specifically, we show that, similar to the budget–limited UCB based algorithms, both KDE and fractional KDE achieve asymptotically optimal performance regret bounds. In addition, we also demonstrate that these algorithms perform well, compared to the budget–limited ? first.

To provide a grounding for the algorithms we develop, the second part of this thesis contains a running example of a wireless sensor network (WSN) scenario, in which we tackle
the problem of long–term information collection, a key research challenge within the domain of WSNs. In more detail, we demonstrate that by using the budget–limited bandit algorithms, we advance the state–of–the–art within this domain. In so doing, we first decompose
the problem of long–term information collection into two sub–problems, namely the energy management and the maximal information throughput routing problems. We then tackle the former with a budget–limited multi–armed bandit based approach, and we propose an optimal decentralised algorithm for the latter. Following this, we demonstrate that the budget–limited bandit based energy management, in conjunction with the optimal routing algorithm, outperforms the state–of–the–art information collecting algorithms in the domain of WSNs.

Text
LTT_PhD_thesis.pdf - Other
Download (1MB)

More information

Published date: April 2012
Organisations: University of Southampton, Agents, Interactions & Complexity

Identifiers

Local EPrints ID: 337660
URI: https://eprints.soton.ac.uk/id/eprint/337660
PURE UUID: 31a81c7d-e4c9-4965-9481-918b07ee4b34
ORCID for Long Tran-Thanh: ORCID iD orcid.org/0000-0003-1617-8316

Catalogue record

Date deposited: 27 Jun 2012 10:22
Last modified: 31 Jul 2019 00:38

Export record

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×