Reward function design in multi-agent reinforcement learning for traffic signal control

In recent years, there has been increased interest in Reinforcement Learning (RL) for Traffic Signal Control (TSC), with implementations of RL touted as a potential successor to the current commercial solutions in place. Commercial systems, such as Microprocessor Optimised Vehicle Actuation (MOVA) and Split, Cycle, and Offset Optimisation Technique (SCOOT), can adapt to the changing traffic state, but do not learn the specific traffic characteristics of an intersection, and leave much to be desired when performance is compared to the potential benefits of using RL for TSC. Furthermore, distributed RL can provide the unique benefits of scalability and decentralisation for road infrastructure. However, using RL for TSC introduces the problem of non-stationarity where the changing policies of RL agents, tasked with optimal control of traffic signals, directly impacts the observed state of the system and therefore the policies of other agents.
This non-stationarity can be mitigated through careful consideration and selection of an appropriate reward function. However, existing literature does not consider the impact of the reward function on the performance of agents in a non-stationary environment such as TSC. In this paper, we select 12 reward functions from the literature, and empirically evaluate them compared to a baseline of a commercial solution in a multi-agent setting. Furthermore, we are particularly interested in the performance of agents when used in a real-world scenario, and so we use demand calibrated data from Ingolstadt, Germany to compare the average waiting time and trip duration of vehicles. We find that reward functions which often perform well in a single intersection setting may not outperform commercial solutions in a multi-agent setting due to their impact on the demand profile of other agents. Furthermore, the reward functions which include the waiting time of agents produce the most predictable demand profile, in turn leading to increased throughput than alternatively proposed solutions.

traffic signal control, intelligent traffic management, Reinforcement Learning, Problem of Non-Stationarity, Multi-agent reinforcement learning

1-13

Koohy, Behrad

1d8bf838-48c3-46ec-b2d3-a1c5001ccaaf

Stein, Sebastian

cb2325e7-5e63-475e-8a69-9db2dfbdb00b

Gerding, Enrico

d9e92ee5-1a8c-4467-a689-8363e7743362

Manla, Ghaithaa

5e520dba-a097-490e-9f8b-fe4a51f8e211

July 2022

Koohy, Behrad

1d8bf838-48c3-46ec-b2d3-a1c5001ccaaf

Stein, Sebastian

cb2325e7-5e63-475e-8a69-9db2dfbdb00b

Gerding, Enrico

d9e92ee5-1a8c-4467-a689-8363e7743362

Manla, Ghaithaa

5e520dba-a097-490e-9f8b-fe4a51f8e211

Koohy, Behrad, Stein, Sebastian, Gerding, Enrico and Manla, Ghaithaa (2022) Reward function design in multi-agent reinforcement learning for traffic signal control. ATT'22: Workshop Agents in Traffic and Transportation, July 25, 2022, Vienna, Austria: Part of IJCAI 2022, Austria, Vienna, Austria. 23 - 29 Jul 2022. pp. 1-13 .

Record type: Conference or Workshop Item (Paper)

Abstract

Text

Reward Function Design in Multi-Agent Reinforcement Learning for Traffic Signal Control - Accepted Manuscript

Available under License Creative Commons Attribution.

Download (233kB)

Text

1 - Version of Record

Available under License Creative Commons Attribution.

Download (286kB)

More information

Accepted/In Press date: 3 June 2022

Published date: July 2022

Venue - Dates: ATT'22: Workshop Agents in Traffic and Transportation, July 25, 2022, Vienna, Austria: Part of IJCAI 2022, Austria, Vienna, Austria, 2022-07-23 - 2022-07-29

Related URLs:

http://ceur-ws.org/Vol-3173/1.pdf

Keywords: traffic signal control, intelligent traffic management, Reinforcement Learning, Problem of Non-Stationarity, Multi-agent reinforcement learning

Learn more about the Agents, Interactions and Complexity Learn more about the School of Electronics and Computer Science