Hardware efficient speech enhancement with noise aware multi-target deep learning
Hardware efficient speech enhancement with noise aware multi-target deep learning
This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88~mm^{2} and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.
Deep neural network, digital circuits, field programmable gate array (FPGA), mapping, masking, multi-target learning, speech enhancement, structured pruning, ternary quantisation
141-152
Abdullah, Salinna
89e5e2a6-7778-4cd8-ba08-ed4e0df53050
Zamani, Majid
431788cc-0702-4fa9-9709-f5777a2d0d25
Demosthenous, Andreas
bed19531-d770-4f48-8464-59d225ddea8d
3 May 2024
Abdullah, Salinna
89e5e2a6-7778-4cd8-ba08-ed4e0df53050
Zamani, Majid
431788cc-0702-4fa9-9709-f5777a2d0d25
Demosthenous, Andreas
bed19531-d770-4f48-8464-59d225ddea8d
Abdullah, Salinna, Zamani, Majid and Demosthenous, Andreas
(2024)
Hardware efficient speech enhancement with noise aware multi-target deep learning.
IEEE Open Journal of Circuits and Systems, 5, .
(doi:10.1109/OJCAS.2024.3389100).
Abstract
This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88~mm^{2} and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.
Text
Hardware_Efficient_Speech_Enhancement_With_Noise_Aware_Multi-Target_Deep_Learning
- Version of Record
More information
Accepted/In Press date: 8 April 2024
e-pub ahead of print date: 16 April 2024
Published date: 3 May 2024
Keywords:
Deep neural network, digital circuits, field programmable gate array (FPGA), mapping, masking, multi-target learning, speech enhancement, structured pruning, ternary quantisation
Identifiers
Local EPrints ID: 490494
URI: http://eprints.soton.ac.uk/id/eprint/490494
PURE UUID: d32856aa-ea49-4a75-a5dc-c20ba43a211e
Catalogue record
Date deposited: 28 May 2024 17:10
Last modified: 29 May 2024 02:08
Export record
Altmetrics
Contributors
Author:
Salinna Abdullah
Author:
Majid Zamani
Author:
Andreas Demosthenous
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics