The University of Southampton
University of Southampton Institutional Repository

3D spatial information compression based deep reinforcement learning for UAV path planning in unknown environments

3D spatial information compression based deep reinforcement learning for UAV path planning in unknown environments
3D spatial information compression based deep reinforcement learning for UAV path planning in unknown environments
In the past decade, unmanned aerial vehicles (UAVs) technology has developed rapidly, while the flexibility and low cost of UAVs make them attractive in many applications. Path planning for UAVs is crucial in most applications, where the path planning for UAVs in unknown, while complex 3Denvironments has also become an urgent challenge to mitigate. In this paper, we consider the unknown3D environment as a partially observable Markov decision process (POMDP) problem and we derive the Bellman equation without the introduction of belief state (BS) distribution. More explicitly, we use an independent emulator to model the environmental observation history, and obtain an approximate BS distribution of the state through Monte Carlo simulation in the emulator, which eliminates the need for BS calculation to improve training efficiency and path planning performance. Additionally, we propose a three-dimensional spatial information compression (3DSIC) algorithm to continuous POMDP environmentthat can compress 3D environmental information into 2D, greatly reducing the search space of the path planning algorithms. The simulation results show that our proposed 3D spatial information compression based deep deterministic policy gradient (3DSIC-DDPG) algorithm can improve the training efficiency by 95.9% compared to the traditional DDPG algorithm in unknown 3D environments. Additionally, the efficiency of combining 3DSIC with fast recurrent stochastic value gradient (FRSVG) algorithm, which can be considered as the most advanced state-of-the-art planning algorithm for the UAV, is 95% higher than that of FRSVG without 3DSIC algorithm in unknown environments.
2644-1330
2662-2676
Wang, Zhipeng
feb79a9c-caba-4f0c-a561-dff6447aae64
Ng, Soon Xin
e19a63b0-0f12-4591-ab5f-554820d5f78c
El-Hajjar, Mohammed
3a829028-a427-4123-b885-2bab81a44b6f
Wang, Zhipeng
feb79a9c-caba-4f0c-a561-dff6447aae64
Ng, Soon Xin
e19a63b0-0f12-4591-ab5f-554820d5f78c
El-Hajjar, Mohammed
3a829028-a427-4123-b885-2bab81a44b6f

Wang, Zhipeng, Ng, Soon Xin and El-Hajjar, Mohammed (2025) 3D spatial information compression based deep reinforcement learning for UAV path planning in unknown environments. IEEE Open Journal of Vehicular Technology, 6, 2662-2676. (doi:10.1109/OJVT.2025.3611507).

Record type: Article

Abstract

In the past decade, unmanned aerial vehicles (UAVs) technology has developed rapidly, while the flexibility and low cost of UAVs make them attractive in many applications. Path planning for UAVs is crucial in most applications, where the path planning for UAVs in unknown, while complex 3Denvironments has also become an urgent challenge to mitigate. In this paper, we consider the unknown3D environment as a partially observable Markov decision process (POMDP) problem and we derive the Bellman equation without the introduction of belief state (BS) distribution. More explicitly, we use an independent emulator to model the environmental observation history, and obtain an approximate BS distribution of the state through Monte Carlo simulation in the emulator, which eliminates the need for BS calculation to improve training efficiency and path planning performance. Additionally, we propose a three-dimensional spatial information compression (3DSIC) algorithm to continuous POMDP environmentthat can compress 3D environmental information into 2D, greatly reducing the search space of the path planning algorithms. The simulation results show that our proposed 3D spatial information compression based deep deterministic policy gradient (3DSIC-DDPG) algorithm can improve the training efficiency by 95.9% compared to the traditional DDPG algorithm in unknown 3D environments. Additionally, the efficiency of combining 3DSIC with fast recurrent stochastic value gradient (FRSVG) algorithm, which can be considered as the most advanced state-of-the-art planning algorithm for the UAV, is 95% higher than that of FRSVG without 3DSIC algorithm in unknown environments.

Text
paper - Accepted Manuscript
Available under License Creative Commons Attribution.
Download (4MB)
Text
3D_Spatial_Information_Compression_Based_Deep_Reinforcement_Learning_for_UAV_Path_Planning_in_Unknown_Environments - Version of Record
Available under License Creative Commons Attribution.
Download (3MB)

More information

Accepted/In Press date: 14 September 2025
Published date: 18 September 2025

Identifiers

Local EPrints ID: 506086
URI: http://eprints.soton.ac.uk/id/eprint/506086
ISSN: 2644-1330
PURE UUID: e5a523d8-39a8-4b3f-96ed-60edbe42d297
ORCID for Zhipeng Wang: ORCID iD orcid.org/0009-0004-1940-1047
ORCID for Soon Xin Ng: ORCID iD orcid.org/0000-0002-0930-7194
ORCID for Mohammed El-Hajjar: ORCID iD orcid.org/0000-0002-7987-1401

Catalogue record

Date deposited: 28 Oct 2025 18:21
Last modified: 29 Oct 2025 03:02

Export record

Altmetrics

Contributors

Author: Zhipeng Wang ORCID iD
Author: Soon Xin Ng ORCID iD
Author: Mohammed El-Hajjar ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×