Unmanned Aerial Vehicle Autonomous Navigation using Semi-Prior Deep Reinforcement Learning
Unmanned Aerial Vehicle Autonomous Navigation using Semi-Prior Deep Reinforcement Learning
The rapid development of unmanned aerial vehicles (UAV) in recent years has attracted widespread attention from academia and industry. In the current context, the urgent problems that drones need to solve are endurance and autonomous navigation. Autonomous navigation is the main barrier for UAV technology. There is a wealth of research on UAV autonomous navigation and path planning, but most of it focuses on path planning problems in 2D environments, which sacrifices the flexibility of UAV movement in 3D space. Deep reinforcement learning (DRL) algorithm is considered one of the most suitable algorithms for unmanned aerial vehicle path planning problems. It can learn the dynamic changes of the environment, and use deep neural networks to reduce the spatial complexity of 3D environment with geometric growth of environment size.
However, DRL algorithms still faces problems such as low training efficiency, sparse training data, and the tendency to fall into local optimal traps in 3D environment path planning tasks. In order to address the above issues, I propose an cumulated reward model in this thesis to overcome the sparsity of the training data for DRL algorithms, and propose a region segmentation algorithm to reduce the probability of DRL algorithms UAV agents falling into the local optimal trap. Furthermore, I propose the 3D spatial information compression (3DSIC) algorithm, which can compress 3D spatial environment information into 2D, greatly reducing the search space of UAVs in 3D environments, and then extend 3DSIC to unknown partially observable Markov environments, enabling the algorithm to work in more complex application scenarios.
Firstly, I propose a cumulative reward model that considers the distance between UAV and destination, as well as the obstacle density near the next state to construct a reward function, greatly solving the sparsity problem of the effective experience in the experience pool of the DRL algorithm with experience replay. This greatly improves the training efficiency of the UAV agents. Additionally, this reward model can be applied to various DRL algorithms such as Deep Q-Network (DQN) and deep deterministic policy gradient (DDPG). Simulation results show that compared to traditional reward models, the DRL algorithm using cumulative reward models can attain an improved training efficiency by 30.8%. Additionally, I propose a region segmentation algorithm that divides a large region into multiple smaller regions and detects the connectivity between the boundaries of the small regions to establish a connectivity table. Then, based on the connectivity information, the reward value can be re-evaluated to reduce the likelihood of UAV agents falling into local optimal traps. The simulation results show that the region segmentation algorithm can reduce the probability of DQN falling into the local optimal trap by 99% and the probability of DDPG falling into the local optimal trap by 92%.
Then, I propose a 3DSIC algorithm, which can compress a limited number of drone flight altitude layers into a 2D environment. Furthermore, the UAV path planning problem in the 3D environment is transformed into several path planning problems in the 2D environment. The simulation results show that the training efficiency of DQN can be improved by 4.028 times after applying the 3DSIC algorithm. After applying the 3DSIC algorithm, the training efficiency of the DDPG algorithm increased by 3.9 times.
Afterwards, I extend the 3DSIC algorithm to the 3D partially observable Markov decision process (POMDP) environment, so that even if UAVs have no knowledge of environmental information, they can perceive the environment through sensor data and apply the 3DSIC algorithm to compress spatial information. More specifically, I introduce an emulator that not only compresses spatial information, but also performs Monte Carlo simulations in the emulator to obtain mapping probabilities between observations and states. The simulation results show that the extended 3DSIC algorithm can improve the training efficiency of DDPG algorithms by 95.9% in the POMDP environment.
Finally, a dual-directional 3D spatial information compression (DD3DSIC) algorithm is proposed to reduce the problem of collisions that drones are prone to during vertical motion. This method compresses both horizontal and vertical spatial information, providing a more comprehensive assessment of drone movements. The simulation results show that the probability of collision between DD3DSIC algorithm and vertical obstacles is 24.13% lower than the original 3DSIC algorithm.
University of Southampton
Wang, Zhipeng
feb79a9c-caba-4f0c-a561-dff6447aae64
June 2025
Wang, Zhipeng
feb79a9c-caba-4f0c-a561-dff6447aae64
Ng, Michael
e19a63b0-0f12-4591-ab5f-554820d5f78c
El-Hajjar, Mohammed
3a829028-a427-4123-b885-2bab81a44b6f
Wang, Zhipeng
(2025)
Unmanned Aerial Vehicle Autonomous Navigation using Semi-Prior Deep Reinforcement Learning.
University of Southampton, Doctoral Thesis, 183pp.
Record type:
Thesis
(Doctoral)
Abstract
The rapid development of unmanned aerial vehicles (UAV) in recent years has attracted widespread attention from academia and industry. In the current context, the urgent problems that drones need to solve are endurance and autonomous navigation. Autonomous navigation is the main barrier for UAV technology. There is a wealth of research on UAV autonomous navigation and path planning, but most of it focuses on path planning problems in 2D environments, which sacrifices the flexibility of UAV movement in 3D space. Deep reinforcement learning (DRL) algorithm is considered one of the most suitable algorithms for unmanned aerial vehicle path planning problems. It can learn the dynamic changes of the environment, and use deep neural networks to reduce the spatial complexity of 3D environment with geometric growth of environment size.
However, DRL algorithms still faces problems such as low training efficiency, sparse training data, and the tendency to fall into local optimal traps in 3D environment path planning tasks. In order to address the above issues, I propose an cumulated reward model in this thesis to overcome the sparsity of the training data for DRL algorithms, and propose a region segmentation algorithm to reduce the probability of DRL algorithms UAV agents falling into the local optimal trap. Furthermore, I propose the 3D spatial information compression (3DSIC) algorithm, which can compress 3D spatial environment information into 2D, greatly reducing the search space of UAVs in 3D environments, and then extend 3DSIC to unknown partially observable Markov environments, enabling the algorithm to work in more complex application scenarios.
Firstly, I propose a cumulative reward model that considers the distance between UAV and destination, as well as the obstacle density near the next state to construct a reward function, greatly solving the sparsity problem of the effective experience in the experience pool of the DRL algorithm with experience replay. This greatly improves the training efficiency of the UAV agents. Additionally, this reward model can be applied to various DRL algorithms such as Deep Q-Network (DQN) and deep deterministic policy gradient (DDPG). Simulation results show that compared to traditional reward models, the DRL algorithm using cumulative reward models can attain an improved training efficiency by 30.8%. Additionally, I propose a region segmentation algorithm that divides a large region into multiple smaller regions and detects the connectivity between the boundaries of the small regions to establish a connectivity table. Then, based on the connectivity information, the reward value can be re-evaluated to reduce the likelihood of UAV agents falling into local optimal traps. The simulation results show that the region segmentation algorithm can reduce the probability of DQN falling into the local optimal trap by 99% and the probability of DDPG falling into the local optimal trap by 92%.
Then, I propose a 3DSIC algorithm, which can compress a limited number of drone flight altitude layers into a 2D environment. Furthermore, the UAV path planning problem in the 3D environment is transformed into several path planning problems in the 2D environment. The simulation results show that the training efficiency of DQN can be improved by 4.028 times after applying the 3DSIC algorithm. After applying the 3DSIC algorithm, the training efficiency of the DDPG algorithm increased by 3.9 times.
Afterwards, I extend the 3DSIC algorithm to the 3D partially observable Markov decision process (POMDP) environment, so that even if UAVs have no knowledge of environmental information, they can perceive the environment through sensor data and apply the 3DSIC algorithm to compress spatial information. More specifically, I introduce an emulator that not only compresses spatial information, but also performs Monte Carlo simulations in the emulator to obtain mapping probabilities between observations and states. The simulation results show that the extended 3DSIC algorithm can improve the training efficiency of DDPG algorithms by 95.9% in the POMDP environment.
Finally, a dual-directional 3D spatial information compression (DD3DSIC) algorithm is proposed to reduce the problem of collisions that drones are prone to during vertical motion. This method compresses both horizontal and vertical spatial information, providing a more comprehensive assessment of drone movements. The simulation results show that the probability of collision between DD3DSIC algorithm and vertical obstacles is 24.13% lower than the original 3DSIC algorithm.
Text
Southampton_PhD_Thesis_Final
- Version of Record
Restricted to Repository staff only until 31 December 2025.
Text
Final-thesis-submission-Examination-Mr-Zhipeng-Wang
Restricted to Repository staff only
More information
Published date: June 2025
Identifiers
Local EPrints ID: 502451
URI: http://eprints.soton.ac.uk/id/eprint/502451
PURE UUID: 5f76b786-c4a8-4e69-a405-4aba1129b963
Catalogue record
Date deposited: 26 Jun 2025 17:00
Last modified: 11 Sep 2025 03:21
Export record
Contributors
Author:
Zhipeng Wang
Thesis advisor:
Michael Ng
Thesis advisor:
Mohammed El-Hajjar
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics