The University of Southampton
University of Southampton Institutional Repository

Speech enhancement by using deep learning algorithms

Speech enhancement by using deep learning algorithms
Speech enhancement by using deep learning algorithms
Speech signals are often degraded by ambient noise, which significantly hampers speech intelligibility and quality, posing challenges for both human communication and speech-related technologies. Over the past decade, the advent of deep learning has catalysed remarkable progress in the field of speech enhancement. With the proliferation of smart devices demanding real-time processing capabilities, the development of real-time deep learning-based speech enhancement systems has become increasingly pertinent.
The primary objective of this thesis is to advance the state-of-the-art in real-time speech enhancement algorithms, with a focus on improving the intelligibility and quality of speech in noisy environments. Our research commences with an exploration into the intricacies of auditory perception and the impact of hearing loss on speech comprehension, setting the stage for the development of sophisticated speech enhancement techniques.
Traditional speech enhancement methods are reviewed in chapter 2, leading to an in-depth discussion on the selection of features critical for distinguishing speech from noise. The work transitions to deep learning neural networks, detailing architectures like LSTM-RNNs and CNNs, and their implementation in speech enhancement, emphasizing the importance of quantitative evaluations.
Chapter 3 delves into the application of Generative Adversarial Neural Networks (GANs) in the domain of speech enhancement, building upon existing research to further refine the use of these models. The chapter focuses on the innovative integration of the magnitude spectrum as an input feature, which significantly contributes to the performance enhancement of GANs. Additionally, the exploration of various deep learning architectures as potential generators within the GAN framework is presented, showcasing the adaptability and continuous improvement potential of GANs in speech enhancement.
Attention mechanisms are presented as a driving force for innovation in speech enhancement, with the novel 'Mask First, Compensation Last' topology aiming to reduce speech distortion and residual noise. Motived by them, the chapter 4 further explores a new cascaded architecture on raw waveform input against the complexity of auditory perception.
Chapter 5 brings a new combination method in speech enhancement, contrasting mapping-based and masking-based methods, and proposing a parallel dual-module system, the Compensation for Complex Domain Network (CCDN), that unifies the magnitude spectrum with complex domain details.
The final chapter addresses the challenge of data mismatch in traditional supervised methods. We proposed an innovative strategy that combines unsupervised pre-training with supervised fine-tuning. This approach not only enhances speech quality in complex noise environments but also simulates the advantages of supervised learning without requiring paired data. Our model's adaptability to real-world noise conditions and its effectiveness in various speech enhancement tasks are validated through rigorous experimental evaluations and subjective listening tests. This chapter culminates in showcasing a robust, and practical speech enhancement model fit for real-world application, highlighted by its adaptability to real-world noise conditions and the integration of unsupervised learning strategies for enhanced model robustness and versatility. By enhancing the quality of human communication and addressing challenges faced by individuals with hearing impairments or in noisy environments.
University of Southampton
Cui, Jianqiao
3961d0d6-9687-4fbc-9e17-93be8bd86a36
Cui, Jianqiao
3961d0d6-9687-4fbc-9e17-93be8bd86a36
Bleeck, Stefan
c888ccba-e64c-47bf-b8fa-a687e87ec16c
Nelson, Philip
5c6f5cc9-ea52-4fe2-9edf-05d696b0c1a9

Cui, Jianqiao (2024) Speech enhancement by using deep learning algorithms. University of Southampton, Doctoral Thesis, 159pp.

Record type: Thesis (Doctoral)

Abstract

Speech signals are often degraded by ambient noise, which significantly hampers speech intelligibility and quality, posing challenges for both human communication and speech-related technologies. Over the past decade, the advent of deep learning has catalysed remarkable progress in the field of speech enhancement. With the proliferation of smart devices demanding real-time processing capabilities, the development of real-time deep learning-based speech enhancement systems has become increasingly pertinent.
The primary objective of this thesis is to advance the state-of-the-art in real-time speech enhancement algorithms, with a focus on improving the intelligibility and quality of speech in noisy environments. Our research commences with an exploration into the intricacies of auditory perception and the impact of hearing loss on speech comprehension, setting the stage for the development of sophisticated speech enhancement techniques.
Traditional speech enhancement methods are reviewed in chapter 2, leading to an in-depth discussion on the selection of features critical for distinguishing speech from noise. The work transitions to deep learning neural networks, detailing architectures like LSTM-RNNs and CNNs, and their implementation in speech enhancement, emphasizing the importance of quantitative evaluations.
Chapter 3 delves into the application of Generative Adversarial Neural Networks (GANs) in the domain of speech enhancement, building upon existing research to further refine the use of these models. The chapter focuses on the innovative integration of the magnitude spectrum as an input feature, which significantly contributes to the performance enhancement of GANs. Additionally, the exploration of various deep learning architectures as potential generators within the GAN framework is presented, showcasing the adaptability and continuous improvement potential of GANs in speech enhancement.
Attention mechanisms are presented as a driving force for innovation in speech enhancement, with the novel 'Mask First, Compensation Last' topology aiming to reduce speech distortion and residual noise. Motived by them, the chapter 4 further explores a new cascaded architecture on raw waveform input against the complexity of auditory perception.
Chapter 5 brings a new combination method in speech enhancement, contrasting mapping-based and masking-based methods, and proposing a parallel dual-module system, the Compensation for Complex Domain Network (CCDN), that unifies the magnitude spectrum with complex domain details.
The final chapter addresses the challenge of data mismatch in traditional supervised methods. We proposed an innovative strategy that combines unsupervised pre-training with supervised fine-tuning. This approach not only enhances speech quality in complex noise environments but also simulates the advantages of supervised learning without requiring paired data. Our model's adaptability to real-world noise conditions and its effectiveness in various speech enhancement tasks are validated through rigorous experimental evaluations and subjective listening tests. This chapter culminates in showcasing a robust, and practical speech enhancement model fit for real-world application, highlighted by its adaptability to real-world noise conditions and the integration of unsupervised learning strategies for enhanced model robustness and versatility. By enhancing the quality of human communication and addressing challenges faced by individuals with hearing impairments or in noisy environments.

Text
Jianqiao_Cui_PhD_thesis - Version of Record
Available under License University of Southampton Thesis Licence.
Download (4MB)
Text
Final-thesis-submission-Examination-Mr-Jianqiao-Cui (1)
Restricted to Repository staff only

More information

Published date: 2024

Identifiers

Local EPrints ID: 492126
URI: http://eprints.soton.ac.uk/id/eprint/492126
PURE UUID: 6b41058a-5274-4fb2-8637-3f57f3f2bc5d
ORCID for Jianqiao Cui: ORCID iD orcid.org/0000-0002-6016-5574
ORCID for Stefan Bleeck: ORCID iD orcid.org/0000-0003-4378-3394
ORCID for Philip Nelson: ORCID iD orcid.org/0000-0002-9563-3235

Catalogue record

Date deposited: 17 Jul 2024 16:37
Last modified: 18 Jul 2024 01:56

Export record

Contributors

Author: Jianqiao Cui ORCID iD
Thesis advisor: Stefan Bleeck ORCID iD
Thesis advisor: Philip Nelson ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×