COLERGs-constrained safe reinforcement learning for realising MASS's risk-informed collision avoidance decision making
COLERGs-constrained safe reinforcement learning for realising MASS's risk-informed collision avoidance decision making
Maritime autonomous surface ship (MASS) represents a significant advancement in maritime technology, offering the potential for increased efficiency, reduced operational costs, and enhanced maritime traffic safety. However, MASS navigation in complex maritime traffic and congested water areas presents challenges, especially in Collision Avoidance Decision Making (CADM) during multi-ship encounter scenarios. Through a robust risk assessment design for time-sequential and joint-target ships (TSs) encounter scenarios, a novel risk and reliability critic-enhanced safe hierarchical reinforcement learning (RA-SHRL), constrained by the International Regulations for Preventing Collisions at Sea (COLREGs), is proposed to realize the autonomous navigation and CADM of MASS. Finally, experimental simulations are conducted against a time-sequenced obstacle avoidance scenario and a swarm obstacle avoidance scenario. The experimental results demonstrate that RA-SHRL generates safe, efficient, and reliable collision avoidance strategies in both time-sequential dynamic obstacles and mixed joint-TSs environments. Additionally, the RA-SHRL is capable of assessing risk and avoiding multiple joint-TSs. Compared with Deep Q-network (DQN) and Constrained Policy Optimization (CPO), the search efficiency of the algorithm proposed in this paper is improved by 40% and 12%, respectively. Moreover, it achieved a 91.3% success rate of collision avoidance during training. The methodology could also benefit other autonomous systems in dynamic environments.
Wang, Chengbo
08e72cc5-67a7-448b-ba20-46ae3ed588da
Zhang, Xinyu
3bf3c7d5-4670-4162-a9a8-3eebe4bb6c40
Gao, Hongbo
9af7d842-ea39-4d80-8051-3bddf4131647
Bashir, Musa
03146b14-6871-4656-8318-d25655175374
Li, Huanhuan
5e806b21-10a7-465c-9db3-32e466ae42f1
Yang, Zaili
82d4eebc-4532-4343-8555-35169e79bb6d
24 July 2024
Wang, Chengbo
08e72cc5-67a7-448b-ba20-46ae3ed588da
Zhang, Xinyu
3bf3c7d5-4670-4162-a9a8-3eebe4bb6c40
Gao, Hongbo
9af7d842-ea39-4d80-8051-3bddf4131647
Bashir, Musa
03146b14-6871-4656-8318-d25655175374
Li, Huanhuan
5e806b21-10a7-465c-9db3-32e466ae42f1
Yang, Zaili
82d4eebc-4532-4343-8555-35169e79bb6d
Wang, Chengbo, Zhang, Xinyu, Gao, Hongbo, Bashir, Musa, Li, Huanhuan and Yang, Zaili
(2024)
COLERGs-constrained safe reinforcement learning for realising MASS's risk-informed collision avoidance decision making.
Knowledge-Based Systems, 300, [112205].
(doi:10.1016/j.knosys.2024.112205).
Abstract
Maritime autonomous surface ship (MASS) represents a significant advancement in maritime technology, offering the potential for increased efficiency, reduced operational costs, and enhanced maritime traffic safety. However, MASS navigation in complex maritime traffic and congested water areas presents challenges, especially in Collision Avoidance Decision Making (CADM) during multi-ship encounter scenarios. Through a robust risk assessment design for time-sequential and joint-target ships (TSs) encounter scenarios, a novel risk and reliability critic-enhanced safe hierarchical reinforcement learning (RA-SHRL), constrained by the International Regulations for Preventing Collisions at Sea (COLREGs), is proposed to realize the autonomous navigation and CADM of MASS. Finally, experimental simulations are conducted against a time-sequenced obstacle avoidance scenario and a swarm obstacle avoidance scenario. The experimental results demonstrate that RA-SHRL generates safe, efficient, and reliable collision avoidance strategies in both time-sequential dynamic obstacles and mixed joint-TSs environments. Additionally, the RA-SHRL is capable of assessing risk and avoiding multiple joint-TSs. Compared with Deep Q-network (DQN) and Constrained Policy Optimization (CPO), the search efficiency of the algorithm proposed in this paper is improved by 40% and 12%, respectively. Moreover, it achieved a 91.3% success rate of collision avoidance during training. The methodology could also benefit other autonomous systems in dynamic environments.
Text
KNOSYS-Accepted manuscript
- Accepted Manuscript
Restricted to Repository staff only until 31 August 2026.
Request a copy
More information
Accepted/In Press date: 6 July 2024
e-pub ahead of print date: 16 July 2024
Published date: 24 July 2024
Identifiers
Local EPrints ID: 503689
URI: http://eprints.soton.ac.uk/id/eprint/503689
ISSN: 0950-7051
PURE UUID: 3c156f0b-bbd1-4c08-97a9-3a718719692d
Catalogue record
Date deposited: 11 Aug 2025 16:31
Last modified: 22 Aug 2025 02:49
Export record
Altmetrics
Contributors
Author:
Chengbo Wang
Author:
Xinyu Zhang
Author:
Hongbo Gao
Author:
Musa Bashir
Author:
Huanhuan Li
Author:
Zaili Yang
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics