The University of Southampton
University of Southampton Institutional Repository

Earning while learning: using Thompson Sampling to maximize rewards from online sales.

Earning while learning: using Thompson Sampling to maximize rewards from online sales.
Earning while learning: using Thompson Sampling to maximize rewards from online sales.
The problem of finding the best option amongst a range of suboptimal candidates in an uncertain environment is a challenging task in a number of domains ranging from clinical trials to advertising, website optimization and dynamic pricing.

Initially, very little is known about the performance of the different options and the decision maker needs to simultaneously learn about the performance of the different options and earn some reward from the decisions made. This introduces a trade-off between "exploration", the phase where new information is being acquired and "exploitation", where the goal is maximizing rewards or alternatively minimizing total regret. Regret is defined as the difference between the reward of an oracle strategy that selects the best option at each time step and the reward of the option we choose.

In this thesis, we develop new algorithms based on Thompson Sampling that improve the overall performance and minimize total regret. Numerical experiments are performed on simulated datasets in order to examine the effect of the algorithms’ hyperparameters, to assess the robustness of the algorithms presented and compare the performance of our new algorithms with current algorithms. We use benchmarking experiments for a fair comparison of the different algorithms
on the simulated datasets.

An additional complication, especially common in the area of revenue management, is seasonal changes that have an impact on the performance of the different options and consequently affect our decisions. In order to tackle the challenge of non-stationarity we deploy contextual Thompson Sampling to account for seasonality and develop a new algorithm that combines contextual Thompson Sampling with a standard statistical model selection method to solve the problem of unknown seasonality in the reward distribution of the candidate options.

Finally, we focus on an application of dynamic pricing in which we develop an algorithm that learns how to price a product in a competitive environment where the demand function is unknown. Using simulation we compare our algorithm in an oligopoly and duopoly setting with a set of other algorithms introduced elsewhere in the literature.
University of Southampton
Ellina, Andria
7347cbed-1419-4877-8b56-490db931ca0b
Ellina, Andria
7347cbed-1419-4877-8b56-490db931ca0b
Currie, Christine
dcfd0972-1b42-4fac-8a67-0258cfdeb55a

Ellina, Andria (2020) Earning while learning: using Thompson Sampling to maximize rewards from online sales. University of Southampton, Doctoral Thesis, 171pp.

Record type: Thesis (Doctoral)

Abstract

The problem of finding the best option amongst a range of suboptimal candidates in an uncertain environment is a challenging task in a number of domains ranging from clinical trials to advertising, website optimization and dynamic pricing.

Initially, very little is known about the performance of the different options and the decision maker needs to simultaneously learn about the performance of the different options and earn some reward from the decisions made. This introduces a trade-off between "exploration", the phase where new information is being acquired and "exploitation", where the goal is maximizing rewards or alternatively minimizing total regret. Regret is defined as the difference between the reward of an oracle strategy that selects the best option at each time step and the reward of the option we choose.

In this thesis, we develop new algorithms based on Thompson Sampling that improve the overall performance and minimize total regret. Numerical experiments are performed on simulated datasets in order to examine the effect of the algorithms’ hyperparameters, to assess the robustness of the algorithms presented and compare the performance of our new algorithms with current algorithms. We use benchmarking experiments for a fair comparison of the different algorithms
on the simulated datasets.

An additional complication, especially common in the area of revenue management, is seasonal changes that have an impact on the performance of the different options and consequently affect our decisions. In order to tackle the challenge of non-stationarity we deploy contextual Thompson Sampling to account for seasonality and develop a new algorithm that combines contextual Thompson Sampling with a standard statistical model selection method to solve the problem of unknown seasonality in the reward distribution of the candidate options.

Finally, we focus on an application of dynamic pricing in which we develop an algorithm that learns how to price a product in a competitive environment where the demand function is unknown. Using simulation we compare our algorithm in an oligopoly and duopoly setting with a set of other algorithms introduced elsewhere in the literature.

Text
Thesis_all_chapters_final (1) - Version of Record
Available under License University of Southampton Thesis Licence.
Download (6MB)
Text
e-thesis form - Version of Record
Restricted to Repository staff only
Available under License University of Southampton Thesis Licence.

More information

Submitted date: April 2019
Published date: 2020

Identifiers

Local EPrints ID: 452895
URI: http://eprints.soton.ac.uk/id/eprint/452895
PURE UUID: 32259233-4567-4af5-a6ea-4bb421bacbd1
ORCID for Christine Currie: ORCID iD orcid.org/0000-0002-7016-3652

Catalogue record

Date deposited: 06 Jan 2022 17:47
Last modified: 17 Mar 2024 02:56

Export record

Contributors

Author: Andria Ellina
Thesis advisor: Christine Currie ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×