The University of Southampton
University of Southampton Institutional Repository

COLERGs-constrained safe reinforcement learning for realising MASS's risk-informed collision avoidance decision making

COLERGs-constrained safe reinforcement learning for realising MASS's risk-informed collision avoidance decision making
COLERGs-constrained safe reinforcement learning for realising MASS's risk-informed collision avoidance decision making
Maritime autonomous surface ship (MASS) represents a significant advancement in maritime technology, offering the potential for increased efficiency, reduced operational costs, and enhanced maritime traffic safety. However, MASS navigation in complex maritime traffic and congested water areas presents challenges, especially in Collision Avoidance Decision Making (CADM) during multi-ship encounter scenarios. Through a robust risk assessment design for time-sequential and joint-target ships (TSs) encounter scenarios, a novel risk and reliability critic-enhanced safe hierarchical reinforcement learning (RA-SHRL), constrained by the International Regulations for Preventing Collisions at Sea (COLREGs), is proposed to realize the autonomous navigation and CADM of MASS. Finally, experimental simulations are conducted against a time-sequenced obstacle avoidance scenario and a swarm obstacle avoidance scenario. The experimental results demonstrate that RA-SHRL generates safe, efficient, and reliable collision avoidance strategies in both time-sequential dynamic obstacles and mixed joint-TSs environments. Additionally, the RA-SHRL is capable of assessing risk and avoiding multiple joint-TSs. Compared with Deep Q-network (DQN) and Constrained Policy Optimization (CPO), the search efficiency of the algorithm proposed in this paper is improved by 40% and 12%, respectively. Moreover, it achieved a 91.3% success rate of collision avoidance during training. The methodology could also benefit other autonomous systems in dynamic environments.
0950-7051
Wang, Chengbo
08e72cc5-67a7-448b-ba20-46ae3ed588da
Zhang, Xinyu
3bf3c7d5-4670-4162-a9a8-3eebe4bb6c40
Gao, Hongbo
9af7d842-ea39-4d80-8051-3bddf4131647
Bashir, Musa
03146b14-6871-4656-8318-d25655175374
Li, Huanhuan
5e806b21-10a7-465c-9db3-32e466ae42f1
Yang, Zaili
82d4eebc-4532-4343-8555-35169e79bb6d
Wang, Chengbo
08e72cc5-67a7-448b-ba20-46ae3ed588da
Zhang, Xinyu
3bf3c7d5-4670-4162-a9a8-3eebe4bb6c40
Gao, Hongbo
9af7d842-ea39-4d80-8051-3bddf4131647
Bashir, Musa
03146b14-6871-4656-8318-d25655175374
Li, Huanhuan
5e806b21-10a7-465c-9db3-32e466ae42f1
Yang, Zaili
82d4eebc-4532-4343-8555-35169e79bb6d

Wang, Chengbo, Zhang, Xinyu, Gao, Hongbo, Bashir, Musa, Li, Huanhuan and Yang, Zaili (2024) COLERGs-constrained safe reinforcement learning for realising MASS's risk-informed collision avoidance decision making. Knowledge-Based Systems, 300, [112205]. (doi:10.1016/j.knosys.2024.112205).

Record type: Article

Abstract

Maritime autonomous surface ship (MASS) represents a significant advancement in maritime technology, offering the potential for increased efficiency, reduced operational costs, and enhanced maritime traffic safety. However, MASS navigation in complex maritime traffic and congested water areas presents challenges, especially in Collision Avoidance Decision Making (CADM) during multi-ship encounter scenarios. Through a robust risk assessment design for time-sequential and joint-target ships (TSs) encounter scenarios, a novel risk and reliability critic-enhanced safe hierarchical reinforcement learning (RA-SHRL), constrained by the International Regulations for Preventing Collisions at Sea (COLREGs), is proposed to realize the autonomous navigation and CADM of MASS. Finally, experimental simulations are conducted against a time-sequenced obstacle avoidance scenario and a swarm obstacle avoidance scenario. The experimental results demonstrate that RA-SHRL generates safe, efficient, and reliable collision avoidance strategies in both time-sequential dynamic obstacles and mixed joint-TSs environments. Additionally, the RA-SHRL is capable of assessing risk and avoiding multiple joint-TSs. Compared with Deep Q-network (DQN) and Constrained Policy Optimization (CPO), the search efficiency of the algorithm proposed in this paper is improved by 40% and 12%, respectively. Moreover, it achieved a 91.3% success rate of collision avoidance during training. The methodology could also benefit other autonomous systems in dynamic environments.

Text
KNOSYS-Accepted manuscript - Accepted Manuscript
Restricted to Repository staff only until 31 August 2026.
Request a copy

More information

Accepted/In Press date: 6 July 2024
e-pub ahead of print date: 16 July 2024
Published date: 24 July 2024

Identifiers

Local EPrints ID: 503689
URI: http://eprints.soton.ac.uk/id/eprint/503689
ISSN: 0950-7051
PURE UUID: 3c156f0b-bbd1-4c08-97a9-3a718719692d
ORCID for Huanhuan Li: ORCID iD orcid.org/0000-0002-4293-4763

Catalogue record

Date deposited: 11 Aug 2025 16:31
Last modified: 22 Aug 2025 02:49

Export record

Altmetrics

Contributors

Author: Chengbo Wang
Author: Xinyu Zhang
Author: Hongbo Gao
Author: Musa Bashir
Author: Huanhuan Li ORCID iD
Author: Zaili Yang

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×