The University of Southampton
University of Southampton Institutional Repository

Speech enhancement based on neural networks for improved speech perception in noise by people with hearing loss

Speech enhancement based on neural networks for improved speech perception in noise by people with hearing loss
Speech enhancement based on neural networks for improved speech perception in noise by people with hearing loss
Hearing loss can lead to problems with communication, affect the psychological wellbeing and decrease the quality of life of an affected person. One of the main challenges for people with hearing loss is speech perception in noisy environments. Whereas hearing devices such as hearing aids and cochlear implants successfully provide high levels of speech understanding in quiet acoustic conditions, they still fail to recover speech intelligibility in situations with background noise to a level obtained with the healthy auditory system. This thesis is about the development and evaluation of a speech enhancement algorithm for hearing devices to improve speech perception in noise by people with hearing loss. The proposed algorithm applies an artificial neural network algorithm to the task of speech enhancement. The algorithm decomposes the noisy input signal into time-frequency units, extracts a set of auditory-inspired features and feeds them to the neural network to produce an estimate of the frequency channels that contain more perceptually important information in terms of the energy ratio between the speech and the background noise. This estimate is used to retain only the frequency channels with high speech energy by attenuating the noise-dominated frequency channels. It is hypothesized that this processing leads to improved speech intelligibility in noise provided that the estimate of the energy ratio is accurate. The neural network is optimized for this task using significant amounts of acoustic training data and has been evaluated in several listening experiments with people with hearing loss including users of hearing aids and users of cochlear implants. Several aspects of the proposed speech enhancement framework based on neural networks have been investigated. The temporal window that can be used to extract auditory-inspired features for the processing has been evaluated by measuring the tolerance of processing delay by people with hearing loss. It was shown that the occurrence of hearing loss significantly increased the tolerance of processing delay and thus may allow for the use of longer temporal windows that are required for the processing of more complex speech enhancement algorithms. The results of a second listening study on processing delay indicated that further increases in the tolerable length of processing delay may occur based on long-term acclimatisation effects. The increased tolerance to processing delay by people with hearing loss and effects of long-term acclimatisation may allow for longer temporal windows for the processing that enable complex algorithms to run in real time without causing disturbance for the user of a hearing device. The speech enhancement framework was evaluated in terms of its benefits for understanding speech in challenging environments by users of hearing aids. Speech intelligibility and quality scores were obtained for subjects with mild to moderate hearing loss listening to sentences in speech-shaped noise and multi-talker babble following processing with the algorithm. Intelligibility and quality scores were significantly improved by the proposed approach using an auditory-inspired feature set. Results indicated advantages in performance over a more classical Wiener filter algorithm. Furthermore, the neural network based approach appeared more promising than dictionary-based sparse coding in terms of performance and ease of implementation. In order to evaluate the algorithm for users of cochlear implants, two listening studies were performed to measure speech intelligibility in background noise. Firstly, normal hearing subjects listening to vocoded stimuli to simulate CI speech perception obtained significant improvements in speech intelligibility in stationary and fluctuating noise over both unprocessed and Wiener filter processed conditions. Secondly, a listening study with 14 CI users obtained improvements in speech-in-noise performance for three types of background noise. Two neural network based algorithms were compared: a speaker-dependent algorithm, that was trained on the target speaker used for testing, and a speaker-independent algorithm, that was trained on different speakers. Significant improvements in the intelligibility of speech in stationary and fluctuating noises were found relative to the unprocessed condition for the speaker-dependent algorithm in all noise types and for the speaker-independent algorithm in 2 out of 3 noise types. The neural network based algorithms used noise-specific neural networks that generalized to novel segments of the same noise type and worked over a range of SNRs. The proposed algorithm has the potential to improve the intelligibility of speech in noise for users of hearing aids and cochlear implants while meeting the requirements of low computational complexity and processing delay for real-time application. The last investigation in this thesis was concerned with the individual preferences by potential users of speech enhancement algorithms. A study was performed to obtain user-controlled parameters for the strength of noise reduction processing by normal hearing and hearing impaired subjects and the choice in parameters was evaluated in terms of their efficacy for improving speech understanding in background noise, the awareness of background sounds and perceptual quality ratings. Interestingly, the group with hearing loss chose similar parameters compared to the normal hearing group and was as good or better in terms of the obtained benefits for speech understanding in noise with the individualized noise reduction processing. However, hearing impaired listeners seemed to be less robust to variations in the parameters and were significantly less aware of the background sounds after noise reduction processing. Overall, the results of this thesis provide further evidence for the promising approach of neural network based speech enhancement for potential application in hearing devices to obtain benefits in speech perception in noise for people with hearing loss.
University of Southampton
Goehring, Tobias
11007d58-6905-451e-aa60-1e1ea681f15a
Goehring, Tobias
11007d58-6905-451e-aa60-1e1ea681f15a
Bleeck, Stefan
c888ccba-e64c-47bf-b8fa-a687e87ec16c

Goehring, Tobias (2016) Speech enhancement based on neural networks for improved speech perception in noise by people with hearing loss. University of Southampton, Doctoral Thesis, 161pp.

Record type: Thesis (Doctoral)

Abstract

Hearing loss can lead to problems with communication, affect the psychological wellbeing and decrease the quality of life of an affected person. One of the main challenges for people with hearing loss is speech perception in noisy environments. Whereas hearing devices such as hearing aids and cochlear implants successfully provide high levels of speech understanding in quiet acoustic conditions, they still fail to recover speech intelligibility in situations with background noise to a level obtained with the healthy auditory system. This thesis is about the development and evaluation of a speech enhancement algorithm for hearing devices to improve speech perception in noise by people with hearing loss. The proposed algorithm applies an artificial neural network algorithm to the task of speech enhancement. The algorithm decomposes the noisy input signal into time-frequency units, extracts a set of auditory-inspired features and feeds them to the neural network to produce an estimate of the frequency channels that contain more perceptually important information in terms of the energy ratio between the speech and the background noise. This estimate is used to retain only the frequency channels with high speech energy by attenuating the noise-dominated frequency channels. It is hypothesized that this processing leads to improved speech intelligibility in noise provided that the estimate of the energy ratio is accurate. The neural network is optimized for this task using significant amounts of acoustic training data and has been evaluated in several listening experiments with people with hearing loss including users of hearing aids and users of cochlear implants. Several aspects of the proposed speech enhancement framework based on neural networks have been investigated. The temporal window that can be used to extract auditory-inspired features for the processing has been evaluated by measuring the tolerance of processing delay by people with hearing loss. It was shown that the occurrence of hearing loss significantly increased the tolerance of processing delay and thus may allow for the use of longer temporal windows that are required for the processing of more complex speech enhancement algorithms. The results of a second listening study on processing delay indicated that further increases in the tolerable length of processing delay may occur based on long-term acclimatisation effects. The increased tolerance to processing delay by people with hearing loss and effects of long-term acclimatisation may allow for longer temporal windows for the processing that enable complex algorithms to run in real time without causing disturbance for the user of a hearing device. The speech enhancement framework was evaluated in terms of its benefits for understanding speech in challenging environments by users of hearing aids. Speech intelligibility and quality scores were obtained for subjects with mild to moderate hearing loss listening to sentences in speech-shaped noise and multi-talker babble following processing with the algorithm. Intelligibility and quality scores were significantly improved by the proposed approach using an auditory-inspired feature set. Results indicated advantages in performance over a more classical Wiener filter algorithm. Furthermore, the neural network based approach appeared more promising than dictionary-based sparse coding in terms of performance and ease of implementation. In order to evaluate the algorithm for users of cochlear implants, two listening studies were performed to measure speech intelligibility in background noise. Firstly, normal hearing subjects listening to vocoded stimuli to simulate CI speech perception obtained significant improvements in speech intelligibility in stationary and fluctuating noise over both unprocessed and Wiener filter processed conditions. Secondly, a listening study with 14 CI users obtained improvements in speech-in-noise performance for three types of background noise. Two neural network based algorithms were compared: a speaker-dependent algorithm, that was trained on the target speaker used for testing, and a speaker-independent algorithm, that was trained on different speakers. Significant improvements in the intelligibility of speech in stationary and fluctuating noises were found relative to the unprocessed condition for the speaker-dependent algorithm in all noise types and for the speaker-independent algorithm in 2 out of 3 noise types. The neural network based algorithms used noise-specific neural networks that generalized to novel segments of the same noise type and worked over a range of SNRs. The proposed algorithm has the potential to improve the intelligibility of speech in noise for users of hearing aids and cochlear implants while meeting the requirements of low computational complexity and processing delay for real-time application. The last investigation in this thesis was concerned with the individual preferences by potential users of speech enhancement algorithms. A study was performed to obtain user-controlled parameters for the strength of noise reduction processing by normal hearing and hearing impaired subjects and the choice in parameters was evaluated in terms of their efficacy for improving speech understanding in background noise, the awareness of background sounds and perceptual quality ratings. Interestingly, the group with hearing loss chose similar parameters compared to the normal hearing group and was as good or better in terms of the obtained benefits for speech understanding in noise with the individualized noise reduction processing. However, hearing impaired listeners seemed to be less robust to variations in the parameters and were significantly less aware of the background sounds after noise reduction processing. Overall, the results of this thesis provide further evidence for the promising approach of neural network based speech enhancement for potential application in hearing devices to obtain benefits in speech perception in noise for people with hearing loss.

Text
FINAL e-thesis for e-prints Goehring 26622203 - Version of Record
Available under License University of Southampton Thesis Licence.
Download (6MB)

More information

Published date: December 2016

Identifiers

Local EPrints ID: 467396
URI: http://eprints.soton.ac.uk/id/eprint/467396
PURE UUID: 6917ca37-e22d-4fdd-a179-59b7f469e637
ORCID for Stefan Bleeck: ORCID iD orcid.org/0000-0003-4378-3394

Catalogue record

Date deposited: 07 Jul 2022 17:19
Last modified: 16 Mar 2024 05:32

Export record

Contributors

Author: Tobias Goehring
Thesis advisor: Stefan Bleeck ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×