Speech perception in a sparse domain

Li, Guoping (2008) Speech perception in a sparse domain University of Southampton, Institute of Sound and Vibration Research, Doctoral Thesis , 165pp.


[img] PDF thesis.pdf - Other
Download (14MB)


Environmental statistics are known to be important factors shaping
our perceptual system. The visual and auditory systems have evolved
to be effcient for processing natural images or speech. The com-
mon characteristics between natural images and speech are that they
are both highly structured, therefore having much redundancy. Our
perceptual system may use redundancy reduction and sparse coding
strategies to deal with complex stimuli every day. Both redundancy
reduction and sparse coding theory emphasise the importance of high
order statistics signals.

This thesis includes psycho-acoustical experiments designed to inves-
tigate how higher order statistics affect our speech perception. Sparse-
ness can be defined by the fourth order statistics, kurtosis, and it is
hypothesised that greater kurtosis should be reflected by better speech
recognition performance in noise. Based on a corpus of speech mate-
rial, kurtosis was found to be significantly correlated to the glimps-
ing area of noisy speech, an established measure that predicts speech
recognition. Kurtosis was also found to be a good predictor of speech
recognition and an algorithm based on increasing kurtosis was also
found to improve speech recognition score in noise. The listening
experiment for the first time showed that higher order statistics are
important for speech perception in noise.

It is known the hearing impaired listeners have diffculty understand-
ing speech in noise. Increasing kurtosis of noisy speech may be par-
ticularly helpful for them to achieve better performance. Currently,
neither hearing aids nor cochlear implants help hearing impaired users
greatly in adverse listening enviroments, partly due to having a re-
duced dynamic range of hearing. Thus there is an information bot-
tleneck, whereby these devices must transform acoustical sounds with
a large dynamic range into the smaller range of hearing impaired lis-
teners. The limited dynamic range problem can be thought of as a
communication channel with limited capacity. Information could be
more effciently encoded for such a communication channel if redun-
dant information could be reduced. For cochlear implant users, un-
wanted channel interaction could also contribute lower speech recog-
nition scores in noisy conditions.

This thesis proposes a solution to these problems for cochlear im-
plant users by reducing signal redundancy and making signals more
sparse. A novel speech processing algorithm, SPARSE, was devel-
oped and implemented. This algorithm aims to reduce redundant
information and transform signals input into more sparse stimulation
sequences. It is hypothesised that sparse firing patterns of neurons
will be achieved, which should be more biological efficient based on
sparse coding theory. Listening experiments were conducted with ten
cochlear implant users who listened to speech signals in modulated
and speech babble noises, either using the conventional coding strat-
egy or the new SPARSE algorithm. Results showed that the SPARSE
algorithm can help them to improve speech understanding in noise,
particularly for those with low baseline performance. It is concluded
that signal processing algorithms for cochlear implants, and possibly
also for hearing aids, that increase signal sparseness may deliver ben-
efits for speech recognition in noise. A patent based on the algorithm
has been applied for.

Item Type: Thesis (Doctoral)

Organisations: University of Southampton
ePrint ID: 188321
Date :
Date Event
12 March 2008Published
Date Deposited: 24 May 2011 14:13
Last Modified: 18 Apr 2017 02:06
Further Information:Google Scholar
URI: http://eprints.soton.ac.uk/id/eprint/188321

Actions (login required)

View Item View Item