The University of Southampton
University of Southampton Institutional Repository

Reward function design in multi-agent reinforcement learning for traffic signal control

Reward function design in multi-agent reinforcement learning for traffic signal control
Reward function design in multi-agent reinforcement learning for traffic signal control
In recent years, there has been increased interest in Reinforcement Learning (RL) for Traffic Signal Control (TSC), with implementations of RL touted as a potential successor to the current commercial solutions in place. Commercial systems, such as Microprocessor Optimised Vehicle Actuation (MOVA) and Split, Cycle, and Offset Optimisation Technique (SCOOT), can adapt to the changing traffic state, but do not learn the specific traffic characteristics of an intersection, and leave much to be desired when performance is compared to the potential benefits of using RL for TSC. Furthermore, distributed RL can provide the unique benefits of scalability and decentralisation for road infrastructure. However, using RL for TSC introduces the problem of non-stationarity where the changing policies of RL agents, tasked with optimal control of traffic signals, directly impacts the observed state of the system and therefore the policies of other agents.
This non-stationarity can be mitigated through careful consideration and selection of an appropriate reward function. However, existing literature does not consider the impact of the reward function on the performance of agents in a non-stationary environment such as TSC. In this paper, we select 12 reward functions from the literature, and empirically evaluate them compared to a baseline of a commercial solution in a multi-agent setting. Furthermore, we are particularly interested in the performance of agents when used in a real-world scenario, and so we use demand calibrated data from Ingolstadt, Germany to compare the average waiting time and trip duration of vehicles. We find that reward functions which often perform well in a single intersection setting may not outperform commercial solutions in a multi-agent setting due to their impact on the demand profile of other agents. Furthermore, the reward functions which include the waiting time of agents produce the most predictable demand profile, in turn leading to increased throughput than alternatively proposed solutions.
traffic signal control, intelligent traffic management, Reinforcement Learning, Problem of Non-Stationarity, Multi-agent reinforcement learning
Koohy, Behrad
1d8bf838-48c3-46ec-b2d3-a1c5001ccaaf
Stein, Sebastian
cb2325e7-5e63-475e-8a69-9db2dfbdb00b
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
Manla, Ghaithaa
5e520dba-a097-490e-9f8b-fe4a51f8e211
Koohy, Behrad
1d8bf838-48c3-46ec-b2d3-a1c5001ccaaf
Stein, Sebastian
cb2325e7-5e63-475e-8a69-9db2dfbdb00b
Gerding, Enrico
d9e92ee5-1a8c-4467-a689-8363e7743362
Manla, Ghaithaa
5e520dba-a097-490e-9f8b-fe4a51f8e211

Koohy, Behrad, Stein, Sebastian, Gerding, Enrico and Manla, Ghaithaa (2022) Reward function design in multi-agent reinforcement learning for traffic signal control. ATT'22: Workshop Agents in Traffic and Transportation, July 25, 2022, Vienna, Austria: Part of IJCAI 2022, Austria, Vienna, Austria. 23 - 29 Jul 2022. (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

In recent years, there has been increased interest in Reinforcement Learning (RL) for Traffic Signal Control (TSC), with implementations of RL touted as a potential successor to the current commercial solutions in place. Commercial systems, such as Microprocessor Optimised Vehicle Actuation (MOVA) and Split, Cycle, and Offset Optimisation Technique (SCOOT), can adapt to the changing traffic state, but do not learn the specific traffic characteristics of an intersection, and leave much to be desired when performance is compared to the potential benefits of using RL for TSC. Furthermore, distributed RL can provide the unique benefits of scalability and decentralisation for road infrastructure. However, using RL for TSC introduces the problem of non-stationarity where the changing policies of RL agents, tasked with optimal control of traffic signals, directly impacts the observed state of the system and therefore the policies of other agents.
This non-stationarity can be mitigated through careful consideration and selection of an appropriate reward function. However, existing literature does not consider the impact of the reward function on the performance of agents in a non-stationary environment such as TSC. In this paper, we select 12 reward functions from the literature, and empirically evaluate them compared to a baseline of a commercial solution in a multi-agent setting. Furthermore, we are particularly interested in the performance of agents when used in a real-world scenario, and so we use demand calibrated data from Ingolstadt, Germany to compare the average waiting time and trip duration of vehicles. We find that reward functions which often perform well in a single intersection setting may not outperform commercial solutions in a multi-agent setting due to their impact on the demand profile of other agents. Furthermore, the reward functions which include the waiting time of agents produce the most predictable demand profile, in turn leading to increased throughput than alternatively proposed solutions.

Text
Reward Function Design in Multi-Agent Reinforcement Learning for Traffic Signal Control - Accepted Manuscript
Download (233kB)

More information

Accepted/In Press date: 3 June 2022
Venue - Dates: ATT'22: Workshop Agents in Traffic and Transportation, July 25, 2022, Vienna, Austria: Part of IJCAI 2022, Austria, Vienna, Austria, 2022-07-23 - 2022-07-29
Keywords: traffic signal control, intelligent traffic management, Reinforcement Learning, Problem of Non-Stationarity, Multi-agent reinforcement learning

Identifiers

Local EPrints ID: 458201
URI: http://eprints.soton.ac.uk/id/eprint/458201
PURE UUID: 0cf15d1a-8da7-4901-bab1-bad77dc86970
ORCID for Sebastian Stein: ORCID iD orcid.org/0000-0003-2858-8857
ORCID for Enrico Gerding: ORCID iD orcid.org/0000-0001-7200-552X

Catalogue record

Date deposited: 30 Jun 2022 17:21
Last modified: 01 Jul 2022 01:44

Export record

Contributors

Author: Behrad Koohy
Author: Sebastian Stein ORCID iD
Author: Enrico Gerding ORCID iD
Author: Ghaithaa Manla

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×