The University of Southampton
University of Southampton Institutional Repository

Towards more efficient DNN-based speech enhancement using quantized correlation mask

Towards more efficient DNN-based speech enhancement using quantized correlation mask
Towards more efficient DNN-based speech enhancement using quantized correlation mask

Many studies on deep learning-based speech enhancement (SE) utilizing the computational auditory scene analysis method typically employs the ideal binary mask or the ideal ratio mask to reconstruct the enhanced speech signal. However, many SE applications in real scenarios demand a desirable balance between denoising capability and computational cost. In this study, first, an improvement over the ideal ratio mask to attain more superior SE performance is proposed through introducing an efficient adaptive correlation-based factor for adjusting the ratio mask. The proposed method exploits the correlation coefficients among the noisy speech, noise and clean speech to effectively re-distribute the power ratio of the speech and noise during the ratio mask construction phase. Second, to make the supervised SE system more computationally-efficient, quantization techniques are considered to reduce the number of bits needed to represent floating numbers, leading to a more compact SE model. The proposed quantized correlation mask is utilized in conjunction with a 4-layer deep neural network (DNN-QCM) comprising dropout regulation, pre-training and noise-aware training to derive a robust and high-order mapping in enhancement, and to improve generalization capability in unseen conditions. Results show that the quantized correlation mask outperforms the conventional ratio mask representation and the other SE algorithms used for comparison. When compared to a DNN with ideal ratio mask as its learning targets, the DNN-QCM provided an improvement of approximately 6.5% in the short-time objective intelligibility score and 11.0% in the perceptual evaluation of speech quality score. The introduction of the quantization method can reduce the neural network weights to a 5-bit representation from a 32-bit, while effectively suppressing stationary and non-stationary noise. Timing analyses also show that with the techniques incorporated in the proposed DNN-QCM system to increase its compactness, the training and inference time can be reduced by 15.7% and 10.5%, respectively.

Correlation coefficients, deep neural network, dynamic noise-aware training, quantization, speech enhancement, training targets
2169-3536
24350-24362
Abdullah, Salinna
89e5e2a6-7778-4cd8-ba08-ed4e0df53050
Zamani, Majid
431788cc-0702-4fa9-9709-f5777a2d0d25
Demosthenous, Andreas
bed19531-d770-4f48-8464-59d225ddea8d
Abdullah, Salinna
89e5e2a6-7778-4cd8-ba08-ed4e0df53050
Zamani, Majid
431788cc-0702-4fa9-9709-f5777a2d0d25
Demosthenous, Andreas
bed19531-d770-4f48-8464-59d225ddea8d

Abdullah, Salinna, Zamani, Majid and Demosthenous, Andreas (2021) Towards more efficient DNN-based speech enhancement using quantized correlation mask. IEEE Access, 9, 24350-24362, [9345671]. (doi:10.1109/ACCESS.2021.3056711).

Record type: Article

Abstract

Many studies on deep learning-based speech enhancement (SE) utilizing the computational auditory scene analysis method typically employs the ideal binary mask or the ideal ratio mask to reconstruct the enhanced speech signal. However, many SE applications in real scenarios demand a desirable balance between denoising capability and computational cost. In this study, first, an improvement over the ideal ratio mask to attain more superior SE performance is proposed through introducing an efficient adaptive correlation-based factor for adjusting the ratio mask. The proposed method exploits the correlation coefficients among the noisy speech, noise and clean speech to effectively re-distribute the power ratio of the speech and noise during the ratio mask construction phase. Second, to make the supervised SE system more computationally-efficient, quantization techniques are considered to reduce the number of bits needed to represent floating numbers, leading to a more compact SE model. The proposed quantized correlation mask is utilized in conjunction with a 4-layer deep neural network (DNN-QCM) comprising dropout regulation, pre-training and noise-aware training to derive a robust and high-order mapping in enhancement, and to improve generalization capability in unseen conditions. Results show that the quantized correlation mask outperforms the conventional ratio mask representation and the other SE algorithms used for comparison. When compared to a DNN with ideal ratio mask as its learning targets, the DNN-QCM provided an improvement of approximately 6.5% in the short-time objective intelligibility score and 11.0% in the perceptual evaluation of speech quality score. The introduction of the quantization method can reduce the neural network weights to a 5-bit representation from a 32-bit, while effectively suppressing stationary and non-stationary noise. Timing analyses also show that with the techniques incorporated in the proposed DNN-QCM system to increase its compactness, the training and inference time can be reduced by 15.7% and 10.5%, respectively.

This record has no associated files available for download.

More information

Published date: 3 February 2021
Additional Information: Publisher Copyright: © 2013 IEEE.
Keywords: Correlation coefficients, deep neural network, dynamic noise-aware training, quantization, speech enhancement, training targets

Identifiers

Local EPrints ID: 489170
URI: http://eprints.soton.ac.uk/id/eprint/489170
ISSN: 2169-3536
PURE UUID: 2bf3a6be-8435-4802-b1c1-92ad649650f0
ORCID for Majid Zamani: ORCID iD orcid.org/0009-0007-0844-473X

Catalogue record

Date deposited: 16 Apr 2024 16:37
Last modified: 18 Apr 2024 02:09

Export record

Altmetrics

Contributors

Author: Salinna Abdullah
Author: Majid Zamani ORCID iD
Author: Andreas Demosthenous

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×