The University of Southampton
University of Southampton Institutional Repository

An automated signalized junction controller that learns strategies by temporal difference reinforcement learning

An automated signalized junction controller that learns strategies by temporal difference reinforcement learning
An automated signalized junction controller that learns strategies by temporal difference reinforcement learning
This paper shows how temporal difference learning can be used to build a signalized junction controller that will learn its own strategies through experience. Simulation tests detailed here show that the learned strategies can have high performance. This work builds upon previous work where a neural network based junction controller that can learn strategies from a human expert was developed. In the simulations presented, vehicles are assumed to be broadcasting their position over WiFi giving the junction controller rich information. The vehicle’s position data are pre-processed to describe a simplified state. The state-space is classified into regions associated with junction control decisions using a neural network. This classification is the strategy and is parameterized by the weights of the neural network. The weights can be learned either through supervised learning with a human trainer or reinforcement learning by temporal difference (TD).Tests on a model of an isolated T junction show an average delay of 14.12s and 14.36s respectively for the human trained and TD trained networks. Tests on a model of a pair of closely spaced junctions show 17.44s and 20.82s respectively. Both methods of training produced strategies that were approximately equivalent in their equitable treatment of vehicles, defined here as the variance over the journey time distributions.
0952-1976
652-659
Box, S.
2bc3f3c9-514a-41b8-bd55-a8b34fd11113
Waterson, B.
60a59616-54f7-4c31-920d-975583953286
Box, S.
2bc3f3c9-514a-41b8-bd55-a8b34fd11113
Waterson, B.
60a59616-54f7-4c31-920d-975583953286

Box, S. and Waterson, B. (2013) An automated signalized junction controller that learns strategies by temporal difference reinforcement learning. Engineering Applications of Artificial Intelligence, 26 (1), 652-659. (doi:10.1016/j.engappai.2012.02.013).

Record type: Article

Abstract

This paper shows how temporal difference learning can be used to build a signalized junction controller that will learn its own strategies through experience. Simulation tests detailed here show that the learned strategies can have high performance. This work builds upon previous work where a neural network based junction controller that can learn strategies from a human expert was developed. In the simulations presented, vehicles are assumed to be broadcasting their position over WiFi giving the junction controller rich information. The vehicle’s position data are pre-processed to describe a simplified state. The state-space is classified into regions associated with junction control decisions using a neural network. This classification is the strategy and is parameterized by the weights of the neural network. The weights can be learned either through supervised learning with a human trainer or reinforcement learning by temporal difference (TD).Tests on a model of an isolated T junction show an average delay of 14.12s and 14.36s respectively for the human trained and TD trained networks. Tests on a model of a pair of closely spaced junctions show 17.44s and 20.82s respectively. Both methods of training produced strategies that were approximately equivalent in their equitable treatment of vehicles, defined here as the variance over the journey time distributions.

Text
tdpaper2012 (1).pdf - Author's Original
Download (331kB)

More information

e-pub ahead of print date: 17 March 2012
Published date: January 2013
Organisations: Transportation Group

Identifiers

Local EPrints ID: 336298
URI: http://eprints.soton.ac.uk/id/eprint/336298
ISSN: 0952-1976
PURE UUID: a98f336e-921a-4d51-b130-196455eb805d
ORCID for B. Waterson: ORCID iD orcid.org/0000-0001-9817-7119

Catalogue record

Date deposited: 21 Mar 2012 11:35
Last modified: 15 Mar 2024 02:58

Export record

Altmetrics

Contributors

Author: S. Box
Author: B. Waterson ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×