Speech enhancement by using deep learning algorithms

Cui, Jianqiao (2024) Speech enhancement by using deep learning algorithms. University of Southampton, Doctoral Thesis, 159pp.

Record type: Thesis (Doctoral)

Abstract

Speech signals are often degraded by ambient noise, which significantly hampers speech intelligibility and quality, posing challenges for both human communication and speech-related technologies. Over the past decade, the advent of deep learning has catalysed remarkable progress in the field of speech enhancement. With the proliferation of smart devices demanding real-time processing capabilities, the development of real-time deep learning-based speech enhancement systems has become increasingly pertinent.
The primary objective of this thesis is to advance the state-of-the-art in real-time speech enhancement algorithms, with a focus on improving the intelligibility and quality of speech in noisy environments. Our research commences with an exploration into the intricacies of auditory perception and the impact of hearing loss on speech comprehension, setting the stage for the development of sophisticated speech enhancement techniques.
Traditional speech enhancement methods are reviewed in chapter 2, leading to an in-depth discussion on the selection of features critical for distinguishing speech from noise. The work transitions to deep learning neural networks, detailing architectures like LSTM-RNNs and CNNs, and their implementation in speech enhancement, emphasizing the importance of quantitative evaluations.
Chapter 3 delves into the application of Generative Adversarial Neural Networks (GANs) in the domain of speech enhancement, building upon existing research to further refine the use of these models. The chapter focuses on the innovative integration of the magnitude spectrum as an input feature, which significantly contributes to the performance enhancement of GANs. Additionally, the exploration of various deep learning architectures as potential generators within the GAN framework is presented, showcasing the adaptability and continuous improvement potential of GANs in speech enhancement.
Attention mechanisms are presented as a driving force for innovation in speech enhancement, with the novel 'Mask First, Compensation Last' topology aiming to reduce speech distortion and residual noise. Motived by them, the chapter 4 further explores a new cascaded architecture on raw waveform input against the complexity of auditory perception.
Chapter 5 brings a new combination method in speech enhancement, contrasting mapping-based and masking-based methods, and proposing a parallel dual-module system, the Compensation for Complex Domain Network (CCDN), that unifies the magnitude spectrum with complex domain details.
The final chapter addresses the challenge of data mismatch in traditional supervised methods. We proposed an innovative strategy that combines unsupervised pre-training with supervised fine-tuning. This approach not only enhances speech quality in complex noise environments but also simulates the advantages of supervised learning without requiring paired data. Our model's adaptability to real-world noise conditions and its effectiveness in various speech enhancement tasks are validated through rigorous experimental evaluations and subjective listening tests. This chapter culminates in showcasing a robust, and practical speech enhancement model fit for real-world application, highlighted by its adaptability to real-world noise conditions and the integration of unsupervised learning strategies for enhanced model robustness and versatility. By enhancing the quality of human communication and addressing challenges faced by individuals with hearing impairments or in noisy environments.

Text

Jianqiao_Cui_PhD_thesis - Version of Record

Available under License University of Southampton Thesis Licence.

Download (4MB)

Text

Final-thesis-submission-Examination-Mr-Jianqiao-Cui (1)

Restricted to Repository staff only

More information

Published date: 2024

Related URLs:

Identifiers

Local EPrints ID: 492126

URI: http://eprints.soton.ac.uk/id/eprint/492126

PURE UUID: 6b41058a-5274-4fb2-8637-3f57f3f2bc5d

ORCID for Jianqiao Cui:

orcid.org/0000-0002-6016-5574

ORCID for Stefan Bleeck:

orcid.org/0000-0003-4378-3394

ORCID for Philip Nelson:

orcid.org/0000-0002-9563-3235

Catalogue record

Date deposited: 17 Jul 2024 16:37

Last modified: 15 Aug 2024 02:12

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Jianqiao Cui

Thesis advisor: Stefan Bleeck

Thesis advisor: Philip Nelson

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information