The University of Southampton
University of Southampton Institutional Repository

Hardware efficient speech enhancement with noise aware multi-target deep learning

Hardware efficient speech enhancement with noise aware multi-target deep learning
Hardware efficient speech enhancement with noise aware multi-target deep learning

This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88~mm^{2} and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.

Deep neural network, digital circuits, field programmable gate array (FPGA), mapping, masking, multi-target learning, speech enhancement, structured pruning, ternary quantisation
141-152
Abdullah, Salinna
89e5e2a6-7778-4cd8-ba08-ed4e0df53050
Zamani, Majid
431788cc-0702-4fa9-9709-f5777a2d0d25
Demosthenous, Andreas
bed19531-d770-4f48-8464-59d225ddea8d
Abdullah, Salinna
89e5e2a6-7778-4cd8-ba08-ed4e0df53050
Zamani, Majid
431788cc-0702-4fa9-9709-f5777a2d0d25
Demosthenous, Andreas
bed19531-d770-4f48-8464-59d225ddea8d

Abdullah, Salinna, Zamani, Majid and Demosthenous, Andreas (2024) Hardware efficient speech enhancement with noise aware multi-target deep learning. IEEE Open Journal of Circuits and Systems, 5, 141-152. (doi:10.1109/OJCAS.2024.3389100).

Record type: Article

Abstract

This paper describes a supervised speech enhancement (SE) method utilising a noise-aware four-layer deep neural network and training target switching. For optimal speech denoising, the SE system, trained with multiple-target joint learning, switches between mapping-based, masking-based, or complementary processing, depending on the level of noise contamination detected. Optimisation techniques, including ternary quantisation, structural pruning, efficient sparse matrix representation and cost-effective approximations for complex computations, were implemented to reduce area, memory, and power requirements. Up to 19.1x compression was obtained, and all weights could be stored on the on-chip memory. When processing NOISEX-92 noises, the system achieved an average short-time objective intelligibility (STOI) and perceptual evaluation of speech quality (PESQ) scores of 0.81 and 1.62, respectively, outperforming SE algorithms trained with only a single learning target. The proposed SE processor was implemented on a field programmable gate array (FPGA) for proof of concept. Mapping the design on a 65-nm CMOS process led to a chip core area of 3.88~mm^{2} and a power consumption of 1.91 mW when operating at a 10 MHz clock frequency.

Text
Hardware_Efficient_Speech_Enhancement_With_Noise_Aware_Multi-Target_Deep_Learning - Version of Record
Available under License Creative Commons Attribution.
Download (7MB)

More information

Accepted/In Press date: 8 April 2024
e-pub ahead of print date: 16 April 2024
Published date: 3 May 2024
Keywords: Deep neural network, digital circuits, field programmable gate array (FPGA), mapping, masking, multi-target learning, speech enhancement, structured pruning, ternary quantisation

Identifiers

Local EPrints ID: 490494
URI: http://eprints.soton.ac.uk/id/eprint/490494
PURE UUID: d32856aa-ea49-4a75-a5dc-c20ba43a211e
ORCID for Majid Zamani: ORCID iD orcid.org/0009-0007-0844-473X

Catalogue record

Date deposited: 28 May 2024 17:10
Last modified: 06 Nov 2024 03:10

Export record

Altmetrics

Contributors

Author: Salinna Abdullah
Author: Majid Zamani ORCID iD
Author: Andreas Demosthenous

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×