Reward function design in multi-agent reinforcement learning for traffic signal control
Reward function design in multi-agent reinforcement learning for traffic signal control
In recent years, there has been increased interest in Reinforcement Learning (RL) for Traffic Signal Control (TSC), with implementations of RL touted as a potential successor to the current commercial solutions in place. Commercial systems, such as Microprocessor Optimised Vehicle Actuation (MOVA) and Split, Cycle, and Offset Optimisation Technique (SCOOT), can adapt to the changing traffic state, but do not learn the specific traffic characteristics of an intersection, and leave much to be desired when performance is compared to the potential benefits of using RL for TSC. Furthermore, distributed RL can provide the unique benefits of scalability and decentralisation for road infrastructure. However, using RL for TSC introduces the problem of non-stationarity where the changing policies of RL agents, tasked with optimal control of traffic signals, directly impacts the observed state of the system and therefore the policies of other agents.
This non-stationarity can be mitigated through careful consideration and selection of an appropriate reward function. However, existing literature does not consider the impact of the reward function on the performance of agents in a non-stationary environment such as TSC. In this paper, we select 12 reward functions from the literature, and empirically evaluate them compared to a baseline of a commercial solution in a multi-agent setting. Furthermore, we are particularly interested in the performance of agents when used in a real-world scenario, and so we use demand calibrated data from Ingolstadt, Germany to compare the average waiting time and trip duration of vehicles. We find that reward functions which often perform well in a single intersection setting may not outperform commercial solutions in a multi-agent setting due to their impact on the demand profile of other agents. Furthermore, the reward functions which include the waiting time of agents produce the most predictable demand profile, in turn leading to increased throughput than alternatively proposed solutions.
traffic signal control, intelligent traffic management, Reinforcement Learning, Problem of Non-Stationarity, Multi-agent reinforcement learning
1-13
Koohy, Behrad
1d8bf838-48c3-46ec-b2d3-a1c5001ccaaf
Stein, Sebastian
cb2325e7-5e63-475e-8a69-9db2dfbdb00b
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
Manla, Ghaithaa
5e520dba-a097-490e-9f8b-fe4a51f8e211
July 2022
Koohy, Behrad
1d8bf838-48c3-46ec-b2d3-a1c5001ccaaf
Stein, Sebastian
cb2325e7-5e63-475e-8a69-9db2dfbdb00b
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
Manla, Ghaithaa
5e520dba-a097-490e-9f8b-fe4a51f8e211
Koohy, Behrad, Stein, Sebastian, Gerding, Enrico and Manla, Ghaithaa
(2022)
Reward function design in multi-agent reinforcement learning for traffic signal control.
ATT'22: Workshop Agents in Traffic and Transportation, July 25, 2022, Vienna, Austria: Part of IJCAI 2022, Austria, Vienna, Austria.
23 - 29 Jul 2022.
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
In recent years, there has been increased interest in Reinforcement Learning (RL) for Traffic Signal Control (TSC), with implementations of RL touted as a potential successor to the current commercial solutions in place. Commercial systems, such as Microprocessor Optimised Vehicle Actuation (MOVA) and Split, Cycle, and Offset Optimisation Technique (SCOOT), can adapt to the changing traffic state, but do not learn the specific traffic characteristics of an intersection, and leave much to be desired when performance is compared to the potential benefits of using RL for TSC. Furthermore, distributed RL can provide the unique benefits of scalability and decentralisation for road infrastructure. However, using RL for TSC introduces the problem of non-stationarity where the changing policies of RL agents, tasked with optimal control of traffic signals, directly impacts the observed state of the system and therefore the policies of other agents.
This non-stationarity can be mitigated through careful consideration and selection of an appropriate reward function. However, existing literature does not consider the impact of the reward function on the performance of agents in a non-stationary environment such as TSC. In this paper, we select 12 reward functions from the literature, and empirically evaluate them compared to a baseline of a commercial solution in a multi-agent setting. Furthermore, we are particularly interested in the performance of agents when used in a real-world scenario, and so we use demand calibrated data from Ingolstadt, Germany to compare the average waiting time and trip duration of vehicles. We find that reward functions which often perform well in a single intersection setting may not outperform commercial solutions in a multi-agent setting due to their impact on the demand profile of other agents. Furthermore, the reward functions which include the waiting time of agents produce the most predictable demand profile, in turn leading to increased throughput than alternatively proposed solutions.
Text
Reward Function Design in Multi-Agent Reinforcement Learning for Traffic Signal Control
- Accepted Manuscript
Text
1
- Version of Record
More information
Accepted/In Press date: 3 June 2022
Published date: July 2022
Venue - Dates:
ATT'22: Workshop Agents in Traffic and Transportation, July 25, 2022, Vienna, Austria: Part of IJCAI 2022, Austria, Vienna, Austria, 2022-07-23 - 2022-07-29
Keywords:
traffic signal control, intelligent traffic management, Reinforcement Learning, Problem of Non-Stationarity, Multi-agent reinforcement learning
Identifiers
Local EPrints ID: 458201
URI: http://eprints.soton.ac.uk/id/eprint/458201
PURE UUID: 0cf15d1a-8da7-4901-bab1-bad77dc86970
Catalogue record
Date deposited: 30 Jun 2022 17:21
Last modified: 17 Mar 2024 03:13
Export record
Contributors
Author:
Behrad Koohy
Author:
Sebastian Stein
Author:
Enrico Gerding
Author:
Ghaithaa Manla
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics