From simulation to reality: tackling data mismatches in speech enhancement with unsupervised pre-training

Cui, Jianqiao and Bleeck, Stefan (2024) From simulation to reality: tackling data mismatches in speech enhancement with unsupervised pre-training. Inter-Noise 2024, , Nantes, France. 25 - 29 Aug 2024. 10 pp . (In Press)

Record type: Conference or Workshop Item (Paper)

Abstract

In this study, we introduce an innovative speech enhancement methodology that ingeniously combines unsupervised pre-training with supervised fine-tuning. This hybrid approach directly addresses the prevalent data mismatch challenge inherent in traditional supervised speech enhancement methods. Our technique distinctly utilizes unpaired noisy and clean speech data and incorporates varied noises during the pre-training phase. This strategy effectively simulates the benefits of supervised learning, eliminating the need for paired data. Inspired by contrastive learning techniques prevalent in computer vision, our model is adept at preserving essential speech features amidst noise interference.At the heart of our method lies a sophisticated Generative Adversarial Network (GAN) architecture. This includes a generator that proficiently processes both magnitude and complex domain features, alongside a discriminator designed to optimize specific evaluation metrics.Through rigorous experimental evaluations, we validate the robustness and versatility of our approach. It consistently delivers superior speech quality, demonstrating remarkable efficacy in real-world scenarios, which are often characterized by complex and unpredictable noise environments.

Text

JianqiaoCui_Internoise - Accepted Manuscript

Available under License University of Southampton Accepted Manuscript Licence.

Download (1MB)