The University of Southampton
University of Southampton Institutional Repository

A compact CNN-based speech enhancement with adaptive filter design using Gabor function and region-aware convolution

A compact CNN-based speech enhancement with adaptive filter design using Gabor function and region-aware convolution
A compact CNN-based speech enhancement with adaptive filter design using Gabor function and region-aware convolution

Speech enhancement (SE) is used in many applications, such as hearing devices, to improve speech intelligibility and quality. Convolutional neural network-based (CNN-based) SE algorithms in literature often employ generic convolutional filters that are not optimized for SE applications. This paper presents a CNN-based SE algorithm with an adaptive filter design (named 'CNN-AFD') using Gabor function and region-aware convolution. The proposed algorithm incorporates fixed Gabor functions into convolutional filters to model human auditory processing for improved denoising performance. The feature maps obtained from the Gabor-incorporated convolutional layers serve as learnable guided masks (tuned at backpropagation) for generating adaptive custom region-aware filters. The custom filters extract features from speech regions (i.e., 'region-aware') while maintaining translation-invariance. To reduce the high cost of inference of the CNN, skip convolution and activation analysis-wise pruning are explored. Employing skip convolution allowed the training time per epoch to be reduced by close to 40%. Pruning of neurons with high numbers of zero activations complements skip convolution and significantly reduces model parameters by more than 30%. The proposed CNN-AFD outperformed all four CNN-based SE baseline algorithms (i.e., a CNN-based SE employing generic filters, a CNN-based SE without region-aware convolution, a CNN-based SE trained with complex spectrograms and a CNN-based SE processing in the time-domain) with an average of 0.95, 1.82 and 0.82 in short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and logarithmic spectral distance (LSD) scores, respectively, when tasked to denoise speech contaminated with NOISEX-92 noises at -5, 0 and 5 dB signal-to-noise ratios (SNRs).

activation analysis, Adaptive filter design, convolutional neural network, Gabor filter, pruning, skip convolution, speech enhancement
2169-3536
130657-130671
Abdullah, Salinna
89e5e2a6-7778-4cd8-ba08-ed4e0df53050
Zamani, Majid
431788cc-0702-4fa9-9709-f5777a2d0d25
Demosthenous, Andreas
bed19531-d770-4f48-8464-59d225ddea8d
Abdullah, Salinna
89e5e2a6-7778-4cd8-ba08-ed4e0df53050
Zamani, Majid
431788cc-0702-4fa9-9709-f5777a2d0d25
Demosthenous, Andreas
bed19531-d770-4f48-8464-59d225ddea8d

Abdullah, Salinna, Zamani, Majid and Demosthenous, Andreas (2022) A compact CNN-based speech enhancement with adaptive filter design using Gabor function and region-aware convolution. IEEE Access, 10, 130657-130671. (doi:10.1109/ACCESS.2022.3228744).

Record type: Article

Abstract

Speech enhancement (SE) is used in many applications, such as hearing devices, to improve speech intelligibility and quality. Convolutional neural network-based (CNN-based) SE algorithms in literature often employ generic convolutional filters that are not optimized for SE applications. This paper presents a CNN-based SE algorithm with an adaptive filter design (named 'CNN-AFD') using Gabor function and region-aware convolution. The proposed algorithm incorporates fixed Gabor functions into convolutional filters to model human auditory processing for improved denoising performance. The feature maps obtained from the Gabor-incorporated convolutional layers serve as learnable guided masks (tuned at backpropagation) for generating adaptive custom region-aware filters. The custom filters extract features from speech regions (i.e., 'region-aware') while maintaining translation-invariance. To reduce the high cost of inference of the CNN, skip convolution and activation analysis-wise pruning are explored. Employing skip convolution allowed the training time per epoch to be reduced by close to 40%. Pruning of neurons with high numbers of zero activations complements skip convolution and significantly reduces model parameters by more than 30%. The proposed CNN-AFD outperformed all four CNN-based SE baseline algorithms (i.e., a CNN-based SE employing generic filters, a CNN-based SE without region-aware convolution, a CNN-based SE trained with complex spectrograms and a CNN-based SE processing in the time-domain) with an average of 0.95, 1.82 and 0.82 in short-time objective intelligibility (STOI), perceptual evaluation of speech quality (PESQ) and logarithmic spectral distance (LSD) scores, respectively, when tasked to denoise speech contaminated with NOISEX-92 noises at -5, 0 and 5 dB signal-to-noise ratios (SNRs).

Text
A_Compact_CNN-Based_Speech_Enhancement_With_Adaptive_Filter_Design_Using_Gabor_Function_and_Region-Aware_Convolution - Version of Record
Available under License Creative Commons Attribution.
Download (4MB)

More information

Accepted/In Press date: 9 December 2022
e-pub ahead of print date: 12 December 2022
Additional Information: Publisher Copyright: © 2013 IEEE.
Keywords: activation analysis, Adaptive filter design, convolutional neural network, Gabor filter, pruning, skip convolution, speech enhancement

Identifiers

Local EPrints ID: 489270
URI: http://eprints.soton.ac.uk/id/eprint/489270
ISSN: 2169-3536
PURE UUID: 92fd72ac-4ee1-4bc9-868d-592b1565d94f
ORCID for Majid Zamani: ORCID iD orcid.org/0009-0007-0844-473X

Catalogue record

Date deposited: 18 Apr 2024 17:05
Last modified: 19 Apr 2024 02:06

Export record

Altmetrics

Contributors

Author: Salinna Abdullah
Author: Majid Zamani ORCID iD
Author: Andreas Demosthenous

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×