Strategyproof reinforcement learning for online resource allocation
Strategyproof reinforcement learning for online resource allocation
We consider an online resource allocation problem where tasks with specific values, sizes and resource requirements arrive dynamically over time, and have to be either serviced or rejected immediately. Reinforcement learning is a promising approach for this, but existing work on reinforcement learning has neglected that task owners may misreport their task requirements or values strategically when this is to their benefit. To address this, we apply mechanism design and propose a novel mechanism based on reinforcement learning that aims to maximise social welfare, is strategyproof and individually rational (i.e., truthful reporting and participation are incentivised). In experiments, we show that our algorithm achieves results that are typically within 90% of the optimal social welfare, while outperforming approaches that use fixed pricing (by up to 86% in specific cases).
1296–1304
Stein, Sebastian
cb2325e7-5e63-475e-8a69-9db2dfbdb00b
Ochal, Mateusz
d9ab415d-8f23-49f3-9eef-5f0f891249e4
Moisoiu, Ioana-Adriana
247980df-0336-49b8-97d1-0b1895d504f5
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
Ganti, Raghu
2a43a38b-8bad-466a-b877-9b55a4c2bc80
He, Ting
3e347968-3aec-4015-8ddd-8cc28e7b1276
La Porta, Tom
338bb4e0-022e-467b-92b0-5876e26d5442
9 May 2020
Stein, Sebastian
cb2325e7-5e63-475e-8a69-9db2dfbdb00b
Ochal, Mateusz
d9ab415d-8f23-49f3-9eef-5f0f891249e4
Moisoiu, Ioana-Adriana
247980df-0336-49b8-97d1-0b1895d504f5
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
Ganti, Raghu
2a43a38b-8bad-466a-b877-9b55a4c2bc80
He, Ting
3e347968-3aec-4015-8ddd-8cc28e7b1276
La Porta, Tom
338bb4e0-022e-467b-92b0-5876e26d5442
Stein, Sebastian, Ochal, Mateusz, Moisoiu, Ioana-Adriana, Gerding, Enrico, Ganti, Raghu, He, Ting and La Porta, Tom
(2020)
Strategyproof reinforcement learning for online resource allocation.
In AAMAS '20: Proceedings of the 19th International Conference on Autonomous Agents and MultiAgent Systems.
University of Auckland.
.
(doi:10.5555/3398761.3398911).
Record type:
Conference or Workshop Item
(Paper)
Abstract
We consider an online resource allocation problem where tasks with specific values, sizes and resource requirements arrive dynamically over time, and have to be either serviced or rejected immediately. Reinforcement learning is a promising approach for this, but existing work on reinforcement learning has neglected that task owners may misreport their task requirements or values strategically when this is to their benefit. To address this, we apply mechanism design and propose a novel mechanism based on reinforcement learning that aims to maximise social welfare, is strategyproof and individually rational (i.e., truthful reporting and participation are incentivised). In experiments, we show that our algorithm achieves results that are typically within 90% of the optimal social welfare, while outperforming approaches that use fixed pricing (by up to 86% in specific cases).
Text
srl-aamas-main
- Accepted Manuscript
More information
Accepted/In Press date: 15 January 2020
Published date: 9 May 2020
Venue - Dates:
19th International Conference on Autonomous Agents and Multiagent Systems, , Auckland, New Zealand, 2020-05-09 - 2020-05-13
Identifiers
Local EPrints ID: 438382
URI: http://eprints.soton.ac.uk/id/eprint/438382
PURE UUID: 22f6276f-d428-47cd-8e17-7348796ae76f
Catalogue record
Date deposited: 09 Mar 2020 17:30
Last modified: 17 Mar 2024 03:13
Export record
Altmetrics
Contributors
Author:
Sebastian Stein
Author:
Mateusz Ochal
Author:
Ioana-Adriana Moisoiu
Author:
Enrico Gerding
Author:
Raghu Ganti
Author:
Ting He
Author:
Tom La Porta
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics