The University of Southampton
University of Southampton Institutional Repository

From simulation to reality: tackling data mismatches in speech enhancement with unsupervised pre-training

From simulation to reality: tackling data mismatches in speech enhancement with unsupervised pre-training
From simulation to reality: tackling data mismatches in speech enhancement with unsupervised pre-training
In this study, we introduce an innovative speech enhancement methodology that ingeniously combines unsupervised pre-training with supervised fine-tuning. This hybrid approach directly addresses the prevalent data mismatch challenge inherent in traditional supervised speech enhancement methods. Our technique distinctly utilizes unpaired noisy and clean speech data and incorporates varied noises during the pre-training phase. This strategy effectively simulates the benefits of supervised learning, eliminating the need for paired data. Inspired by contrastive learning techniques prevalent in computer vision, our model is adept at preserving essential speech features amidst noise interference.At the heart of our method lies a sophisticated Generative Adversarial Network (GAN) architecture. This includes a generator that proficiently processes both magnitude and complex domain features, alongside a discriminator designed to optimize specific evaluation metrics.Through rigorous experimental evaluations, we validate the robustness and versatility of our approach. It consistently delivers superior speech quality, demonstrating remarkable efficacy in real-world scenarios, which are often characterized by complex and unpredictable noise environments.
Speech enhancement
Cui, Jianqiao
3961d0d6-9687-4fbc-9e17-93be8bd86a36
Bleeck, Stefan
c888ccba-e64c-47bf-b8fa-a687e87ec16c
Cui, Jianqiao
3961d0d6-9687-4fbc-9e17-93be8bd86a36
Bleeck, Stefan
c888ccba-e64c-47bf-b8fa-a687e87ec16c

Cui, Jianqiao and Bleeck, Stefan (2024) From simulation to reality: tackling data mismatches in speech enhancement with unsupervised pre-training. Inter-Noise 2024, , Nantes, France. 25 - 29 Aug 2024. 10 pp . (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

In this study, we introduce an innovative speech enhancement methodology that ingeniously combines unsupervised pre-training with supervised fine-tuning. This hybrid approach directly addresses the prevalent data mismatch challenge inherent in traditional supervised speech enhancement methods. Our technique distinctly utilizes unpaired noisy and clean speech data and incorporates varied noises during the pre-training phase. This strategy effectively simulates the benefits of supervised learning, eliminating the need for paired data. Inspired by contrastive learning techniques prevalent in computer vision, our model is adept at preserving essential speech features amidst noise interference.At the heart of our method lies a sophisticated Generative Adversarial Network (GAN) architecture. This includes a generator that proficiently processes both magnitude and complex domain features, alongside a discriminator designed to optimize specific evaluation metrics.Through rigorous experimental evaluations, we validate the robustness and versatility of our approach. It consistently delivers superior speech quality, demonstrating remarkable efficacy in real-world scenarios, which are often characterized by complex and unpredictable noise environments.

Text
JianqiaoCui_Internoise - Accepted Manuscript
Download (1MB)

More information

Accepted/In Press date: 10 July 2024
Venue - Dates: Inter-Noise 2024, , Nantes, France, 2024-08-25 - 2024-08-29
Keywords: Speech enhancement

Identifiers

Local EPrints ID: 492521
URI: http://eprints.soton.ac.uk/id/eprint/492521
PURE UUID: f6507cad-e1c3-4b49-bcac-1757aeb99f7f
ORCID for Jianqiao Cui: ORCID iD orcid.org/0000-0002-6016-5574
ORCID for Stefan Bleeck: ORCID iD orcid.org/0000-0003-4378-3394

Catalogue record

Date deposited: 30 Jul 2024 16:42
Last modified: 31 Jul 2024 01:58

Export record

Contributors

Author: Jianqiao Cui ORCID iD
Author: Stefan Bleeck ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×