A compact CNN-based speech enhancement with adaptive filter design using Gabor function and region-aware convolution
A compact CNN-based speech enhancement with adaptive filter design using Gabor function and region-aware convolution
Speech enhancement (SE) is used in many applications, such as hearing devices, to improve speech intelligibility and quality. Convolutional neural network-based (CNN-based) SE algorithms in literature often employ generic convolutional filters that are not optimized for SE applications. This paper presents a CNN-based SE algorithm with an adaptive filter design (named 'CNN-AFD') using Gabor function and region-aware convolution. The proposed algorithm incorporates fixed Gabor functions into convolutional filters to model human auditory processing for improved denoising performance. The feature maps obtained from the Gabor-incorporated convolutional layers serve as learnable guided masks (tuned at backpropagation) for generating adaptive custom region-aware filters. The custom filters extract features from speech regions (i.e., 'region-aware') while maintaining translation-invariance. To reduce the high cost of inference of the CNN, skip convolution and activation analysis-wise pruning are explored. Employing skip convolution allowed the training time per epoch to be reduced by close to 40%. Pruning of neurons with high numbers of zero activations complements skip convolution and significantly reduces model parameters by more than 30%. The proposed CNN-AFD outperformed all four CNN-based SE baseline algorithms (i.e., a CNN-based SE employing generic filters, a CNN-based SE without region-aware convolution, a CNN-based SE trained with complex spectrograms and a CNN-based SE processing in the time-domain) with an average of 0.95, 1.82 and 0.82 in short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and logarithmic spectral distance (LSD) scores, respectively, when tasked to denoise speech contaminated with NOISEX-92 noises at -5, 0 and 5 dB signal-to-noise ratios (SNRs).
activation analysis, Adaptive filter design, convolutional neural network, Gabor filter, pruning, skip convolution, speech enhancement
130657-130671
Abdullah, Salinna
89e5e2a6-7778-4cd8-ba08-ed4e0df53050
Zamani, Majid
431788cc-0702-4fa9-9709-f5777a2d0d25
Demosthenous, Andreas
bed19531-d770-4f48-8464-59d225ddea8d
Abdullah, Salinna
89e5e2a6-7778-4cd8-ba08-ed4e0df53050
Zamani, Majid
431788cc-0702-4fa9-9709-f5777a2d0d25
Demosthenous, Andreas
bed19531-d770-4f48-8464-59d225ddea8d
Abdullah, Salinna, Zamani, Majid and Demosthenous, Andreas
(2022)
A compact CNN-based speech enhancement with adaptive filter design using Gabor function and region-aware convolution.
IEEE Access, 10, .
(doi:10.1109/ACCESS.2022.3228744).
Abstract
Speech enhancement (SE) is used in many applications, such as hearing devices, to improve speech intelligibility and quality. Convolutional neural network-based (CNN-based) SE algorithms in literature often employ generic convolutional filters that are not optimized for SE applications. This paper presents a CNN-based SE algorithm with an adaptive filter design (named 'CNN-AFD') using Gabor function and region-aware convolution. The proposed algorithm incorporates fixed Gabor functions into convolutional filters to model human auditory processing for improved denoising performance. The feature maps obtained from the Gabor-incorporated convolutional layers serve as learnable guided masks (tuned at backpropagation) for generating adaptive custom region-aware filters. The custom filters extract features from speech regions (i.e., 'region-aware') while maintaining translation-invariance. To reduce the high cost of inference of the CNN, skip convolution and activation analysis-wise pruning are explored. Employing skip convolution allowed the training time per epoch to be reduced by close to 40%. Pruning of neurons with high numbers of zero activations complements skip convolution and significantly reduces model parameters by more than 30%. The proposed CNN-AFD outperformed all four CNN-based SE baseline algorithms (i.e., a CNN-based SE employing generic filters, a CNN-based SE without region-aware convolution, a CNN-based SE trained with complex spectrograms and a CNN-based SE processing in the time-domain) with an average of 0.95, 1.82 and 0.82 in short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and logarithmic spectral distance (LSD) scores, respectively, when tasked to denoise speech contaminated with NOISEX-92 noises at -5, 0 and 5 dB signal-to-noise ratios (SNRs).
Text
A_Compact_CNN-Based_Speech_Enhancement_With_Adaptive_Filter_Design_Using_Gabor_Function_and_Region-Aware_Convolution
- Version of Record
More information
Accepted/In Press date: 9 December 2022
e-pub ahead of print date: 12 December 2022
Additional Information:
Publisher Copyright:
© 2013 IEEE.
Keywords:
activation analysis, Adaptive filter design, convolutional neural network, Gabor filter, pruning, skip convolution, speech enhancement
Identifiers
Local EPrints ID: 489270
URI: http://eprints.soton.ac.uk/id/eprint/489270
ISSN: 2169-3536
PURE UUID: 92fd72ac-4ee1-4bc9-868d-592b1565d94f
Catalogue record
Date deposited: 18 Apr 2024 17:05
Last modified: 06 Jun 2024 02:19
Export record
Altmetrics
Contributors
Author:
Salinna Abdullah
Author:
Majid Zamani
Author:
Andreas Demosthenous
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics