Adversarial blocking bandits

We consider a general adversarial multi-armed blocking bandit setting where each played arm can be blocked (unavailable) for some time periods and the reward per arm is given at each time period adversarially without obeying any distribution. The setting models scenarios of allocating scarce limited supplies (e.g., arms) where the supplies replenish and can be reused only after certain time periods. We first show that, in the optimization setting, when the blocking durations and rewards are known in advance, finding an optimal policy (e.g., determining which arm per round) that maximises the cumulative reward is strongly NP-hard, eliminating the possibility of a fully polynomial-time approximation scheme (FPTAS) for the problem unless P = NP. To complement our result, we show that a greedy algorithm that plays the best available arm at each round provides an approximation guarantee that depends on the blocking durations and the path variance of the rewards. In the bandit setting, when the blocking durations and rewards are not known, we design two algorithms, RGA and RGA-META, for the case of bounded duration an path variation. In particular, when the variation budget B_T is known in advance, RGA can achieve O(\sqrt{T(2\tilde{D}+K)B_{T}}) dynamic approximate regret. On the other hand, when B_T is not known, we show that the dynamic approximate regret of RGA-META is at most O((K+\tilde{D})^{1/4}\tilde{B}^{1/2}T^{3/4}) where \tilde{B} is the maximal path variation budget within each batch of RGA-META (which is provably in order of o(\sqrt{T}). We also prove that if either the variation budget or the maximal blocking duration is unbounded, the approximate regret will be at least Theta(T). We also show that the regret upper bound of RGA is tight if the blocking durations are bounded above by an order of O(1).

Online Learning, Bandit Algorithms, Sequential Decision Making

Neural Information Processing Systems Foundation

Bishop, Nicholas

e2b8dc1a-a609-4709-84af-9b2455fd73e6

Chan, Hau

4d760146-3e9b-4ba9-8cdb-74203c759421

Mandal, Debmalya

f09a45db-9c07-4d64-a891-0dcb073af277

Tran-Thanh, Long

e0666669-d34b-460e-950d-e8b139fab16c

Larochelle, H.

Ranzato, M.

Hadsell, R.

Balcan, M.F.

Lin, H.

2020

Bishop, Nicholas

e2b8dc1a-a609-4709-84af-9b2455fd73e6

Chan, Hau

4d760146-3e9b-4ba9-8cdb-74203c759421

Mandal, Debmalya

f09a45db-9c07-4d64-a891-0dcb073af277

Tran-Thanh, Long

e0666669-d34b-460e-950d-e8b139fab16c

Larochelle, H.

Ranzato, M.

Hadsell, R.

Balcan, M.F.

Lin, H.

Bishop, Nicholas, Chan, Hau, Mandal, Debmalya and Tran-Thanh, Long (2020) Adversarial blocking bandits. Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F. and Lin, H. (eds.) In Advances in Neural Information Processing Systems 33 (NeurIPS 2020). Neural Information Processing Systems Foundation..

Record type: Conference or Workshop Item (Paper)

Abstract

Text

Adversarial Blocking Bandits - Author's Original

Download (363kB)

More information

Accepted/In Press date: 25 September 2020

Published date: 2020

Related URLs:

https://papers.nips.cc/paper/2...tract.html

Keywords: Online Learning, Bandit Algorithms, Sequential Decision Making

Learn more about School of Electronics and Computer Science research

Identifiers

Local EPrints ID: 445488

URI: http://eprints.soton.ac.uk/id/eprint/445488

PURE UUID: f1520424-d368-4c6b-86d6-aabdcc26c312

ORCID for Nicholas Bishop:

orcid.org/0000-0001-7062-9072

ORCID for Long Tran-Thanh:

orcid.org/0000-0003-1617-8316

Catalogue record

Date deposited: 11 Dec 2020 17:30

Last modified: 09 Apr 2024 22:02

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Nicholas Bishop

Author: Hau Chan

Author: Debmalya Mandal

Author: Long Tran-Thanh

Editor: H. Larochelle

Editor: M. Ranzato

Editor: R. Hadsell

Editor: M.F. Balcan

Editor: H. Lin

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information