Proactive multi-USV maritime search and rescue in stochastic wave environments: a hierarchical, non-causal reinforcement learning framework

Maritime search and rescue (SAR) in stochastic wave environments presents a critical challengefor multi-Unmanned Surface Vehicle (USV) systems and demands a fine balance betweensearch efficiency and operational safety. This paper proposes a novel hierarchical reinforcementlearning framework, termed Non-Causal Reward Multi-Agent Proximal Policy Optimization(NCR-MAPPO), to address this challenge. Our framework decouples the mission by employinga strategic guidance system based on International Maritime Organization (IMO) standards forsystematic coverage while a tactical motion controller built upon the Multi-Agent ProximalPolicy Optimization (MAPPO) algorithm learns cooperative execution. The core innovation isa Non-Causal Reward (NCR) mechanism that incorporates short-term wave field prediction intothe decision process, enabling a shift from reactive collision avoidance to proactive seakeepingcontrol. Through comprehensive simulations, we demonstrate the superiority of our framework.Compared to the standard MAPPO baseline, NCR-MAPPO significantly enhances survivabilityby reducing wave impact incidents by 27% and exposure to hazardous sea states by 25% whilemaintaining high mission efficiency. Thiswork provides a robust solution for autonomous marinesystems by bridging the gap between regulatory compliance and predictive safety control.

0029-8018

Song, Yutong

e40d4fb3-f448-4275-83c5-5dc3a423b7c5

Zeng, Tianyi

2d247d78-9b02-4acd-a7c5-166482833316

Zhang, Yao

1b512f22-e660-481d-ae60-31d87344625f

Tezdogan, Tahsin

7e7328e2-4185-4052-8e9a-53fd81c98909

Song, Yutong

e40d4fb3-f448-4275-83c5-5dc3a423b7c5

Zeng, Tianyi

2d247d78-9b02-4acd-a7c5-166482833316

Zhang, Yao

1b512f22-e660-481d-ae60-31d87344625f

Tezdogan, Tahsin

7e7328e2-4185-4052-8e9a-53fd81c98909

Song, Yutong, Zeng, Tianyi, Zhang, Yao and Tezdogan, Tahsin (2026) Proactive multi-USV maritime search and rescue in stochastic wave environments: a hierarchical, non-causal reinforcement learning framework. Ocean Engineering. (In Press)

Record type: Article

Abstract

Text

OE_MARL(unmarked) - Accepted Manuscript

Restricted to Repository staff only until 31 March 2027.

Available under License Creative Commons Attribution Non-commercial No Derivatives.

Request a copy

More information

Accepted/In Press date: 31 March 2026

Identifiers

Local EPrints ID: 511465

URI: http://eprints.soton.ac.uk/id/eprint/511465

ISSN: 0029-8018

PURE UUID: 6ab809da-6446-4a9c-8ceb-ce0b9336a146

ORCID for Tahsin Tezdogan:

orcid.org/0000-0002-7032-3038

Catalogue record

Date deposited: 15 May 2026 16:45

Last modified: 16 May 2026 02:09

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Yutong Song

Author: Tianyi Zeng

Author: Yao Zhang

Author: Tahsin Tezdogan

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information