Improved speech enhancement by using both clean speech and ‘clean’ noise

Generally, speech enhancement (SE) models based on supervised deep learning technology, use input features from both noisy and clean speech but not from the noise itself. We suggest here that this ‘clean’ background noise, before mixing it with speech, can also help SE and that is to our knowledge not described yet. In our proposed model, not only the speech, but also the noise is enhanced initially and later combined for improved intelligibility and quality. We also present a second innovation to capture better contextual information that traditional networks are often poor in. To leverage both speech and background noise information and long-term context information, this paper describes a sequence-to-sequence (S2S) mapping structure using a novel two-path speech enhancement system, consisting of two parallel paths: a Noise Enhancement Path (NEP) and a Speech Enhancement Path (SEP). In the NEP, the encoder-decoder structure is used for enhancing only the ‘clean’ noise, while the SEP is used to suppress the background noise in the clean speech. In the SEP, a Hierarchical Attention (HA) mechanism is adopted to leverage long-range sequence capture. In the NEP, we us traditional gated controlled mechanism from ConvTasnet but improve it by adding dilated convolution to increase receptive fields. Experiments are conducted on the Librispeech dataset, and results show that the proposed model performs better than recent models in various measures, including ESTOI and PESQ scores. We conclude that the simple speech plus noise paradigm often adopted for training such models is not optimal.

Speech enhancement, hierarchical attention mechanism, supervise speech enhancement, separate paths, magnitude, gated control

10.1109/BDAI59165.2023.10256737

192-196

IEEE

Cui, Jianqiao

3961d0d6-9687-4fbc-9e17-93be8bd86a36

Bleeck, Stefan

c888ccba-e64c-47bf-b8fa-a687e87ec16c

27 September 2023

Cui, Jianqiao

3961d0d6-9687-4fbc-9e17-93be8bd86a36

Bleeck, Stefan

c888ccba-e64c-47bf-b8fa-a687e87ec16c

Cui, Jianqiao and Bleeck, Stefan (2023) Improved speech enhancement by using both clean speech and ‘clean’ noise. In 2023 IEEE 6th International Conference on Big Data and Artificial Intelligence (BDAI). IEEE. pp. 192-196 . (doi:10.1109/BDAI59165.2023.10256737).