Speech perception in a sparse domain
Speech perception in a sparse domain
Environmental statistics are known to be important factors shaping our perceptual system. The visual and auditory systems have evolved to be efficient for processing natural images or speech. The common characteristics between natural images and speech are that they are both highly structured, therefore having much redundancy. Our perceptual system may use redundancy reduction and sparse coding strategies to deal with complex stimuli every day. Both redundancy reduction and sparse coding theory emphasise the importance of high order statistics signals.
This thesis includes psycho-acoustical experiments designed to investigate how higher order statistics affect our speech perception. Sparseness can be defined by the fourth order statistics, kurtosis, and it is hypothesised that greater kurtosis should be reflected by better speech recognition performance in noise. Based on a corpus of speech material, kurtosis was found to be significantly correlated to the glimpsing area of noisy speech, an established measure that predicts speech recognition. Kurtosis was also found to be a good predictor of speech recognition and an algorithm based on increasing kurtosis was also found to improve speech recognition score in noise. The listening experiment for the first time showed that higher order statistics are important for speech perception in noise.
It is known the hearing impaired listeners have difficulty understanding speech in noise. Increasing kurtosis of noisy speech may be particularly helpful for them to achieve better performance. Currently, neither hearing aids nor cochlear implants help hearing impaired users greatly in adverse listening enviroments, partly due to having a reduced dynamic range of hearing. Thus there is an information bottleneck, whereby these devices must transform acoustical sounds with a large dynamic range into the smaller range of hearing impaired listeners. The limited dynamic range problem can be thought of as a communication channel with limited capacity. Information could be more efficiently encoded for such a communication channel if redundant information could be reduced. For cochlear implant users, unwanted channel interaction could also contribute lower speech recognition scores in noisy conditions.
This thesis proposes a solution to these problems for cochlear implant users by reducing signal redundancy and making signals more sparse. A novel speech processing algorithm, SPARSE, was developed and implemented. This algorithm aims to reduce redundant information and transform signals input into more sparse stimulation sequences. It is hypothesised that sparse firing patterns of neurons will be achieved, which should be more biological efficient based on sparse coding theory. Listening experiments were conducted with ten cochlear implant users who listened to speech signals in modulated and speech babble noises, either using the conventional coding strategy or the new SPARSE algorithm. Results showed that the SPARSE algorithm can help them to improve speech understanding in noise, particularly for those with low baseline performance. It is concluded that signal processing algorithms for cochlear implants, and possibly also for hearing aids, that increase signal sparseness may deliver benefits for speech recognition in noise. A patent based on the algorithm has been applied for.
Li, Guoping
b791b5c0-52cb-4311-b0de-3d6b2f289835
12 March 2008
Li, Guoping
b791b5c0-52cb-4311-b0de-3d6b2f289835
Li, Guoping
(2008)
Speech perception in a sparse domain.
University of Southampton, Institute of Sound and Vibration Research, Doctoral Thesis, 165pp.
Record type:
Thesis
(Doctoral)
Abstract
Environmental statistics are known to be important factors shaping our perceptual system. The visual and auditory systems have evolved to be efficient for processing natural images or speech. The common characteristics between natural images and speech are that they are both highly structured, therefore having much redundancy. Our perceptual system may use redundancy reduction and sparse coding strategies to deal with complex stimuli every day. Both redundancy reduction and sparse coding theory emphasise the importance of high order statistics signals.
This thesis includes psycho-acoustical experiments designed to investigate how higher order statistics affect our speech perception. Sparseness can be defined by the fourth order statistics, kurtosis, and it is hypothesised that greater kurtosis should be reflected by better speech recognition performance in noise. Based on a corpus of speech material, kurtosis was found to be significantly correlated to the glimpsing area of noisy speech, an established measure that predicts speech recognition. Kurtosis was also found to be a good predictor of speech recognition and an algorithm based on increasing kurtosis was also found to improve speech recognition score in noise. The listening experiment for the first time showed that higher order statistics are important for speech perception in noise.
It is known the hearing impaired listeners have difficulty understanding speech in noise. Increasing kurtosis of noisy speech may be particularly helpful for them to achieve better performance. Currently, neither hearing aids nor cochlear implants help hearing impaired users greatly in adverse listening enviroments, partly due to having a reduced dynamic range of hearing. Thus there is an information bottleneck, whereby these devices must transform acoustical sounds with a large dynamic range into the smaller range of hearing impaired listeners. The limited dynamic range problem can be thought of as a communication channel with limited capacity. Information could be more efficiently encoded for such a communication channel if redundant information could be reduced. For cochlear implant users, unwanted channel interaction could also contribute lower speech recognition scores in noisy conditions.
This thesis proposes a solution to these problems for cochlear implant users by reducing signal redundancy and making signals more sparse. A novel speech processing algorithm, SPARSE, was developed and implemented. This algorithm aims to reduce redundant information and transform signals input into more sparse stimulation sequences. It is hypothesised that sparse firing patterns of neurons will be achieved, which should be more biological efficient based on sparse coding theory. Listening experiments were conducted with ten cochlear implant users who listened to speech signals in modulated and speech babble noises, either using the conventional coding strategy or the new SPARSE algorithm. Results showed that the SPARSE algorithm can help them to improve speech understanding in noise, particularly for those with low baseline performance. It is concluded that signal processing algorithms for cochlear implants, and possibly also for hearing aids, that increase signal sparseness may deliver benefits for speech recognition in noise. A patent based on the algorithm has been applied for.
More information
Published date: 12 March 2008
Organisations:
University of Southampton
Identifiers
Local EPrints ID: 188321
URI: http://eprints.soton.ac.uk/id/eprint/188321
PURE UUID: e4e758bd-e67f-439c-8559-de056bab9519
Catalogue record
Date deposited: 24 May 2011 14:13
Last modified: 14 Mar 2024 03:31
Export record
Contributors
Author:
Guoping Li
Thesis advisor:
Mark Lutman
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics