Bayesian algorithms for speech enhancement
Bayesian algorithms for speech enhancement
The portability of modern voice processing devices allows them to be used in environments
where background noise conditions can be adverse. Background noise
can deteriorate the quality of speech transmitted through such devices, but speech
enhancement algorithms can ameliorate this degradation to some extent. The development
of speech enhancement algorithms that improve the quality of noisy speech
is the aim of this thesis, which consists of three main parts.
In the first part, we propose a framework of algorithms that estimate the clean speech
Short Time Fourier Transform (STFT) coefficients. The algorithms are derived from
the Bayesian theory of estimation and can be grouped according to i) the STFT
representation they estimate ii) the estimator they apply and iii) the speech prior
density they assume. Apart from the introduction of algorithms that surpass the
performance of similar algorithms that exist in the literature, the compilation of the
above framework offers insight on the effect and relative importance of the different
components of the algorithms (e.g. prior, estimator) to the quality of the enhanced
speech.
In the second part of this thesis, we develop methods for the estimation of the power
of time varying noise. The main outcome is a method that exploits some similarities
between the distribution of the noisy speech spectral amplitude coefficients within a
single frequency bin, and the corresponding distribution of the corrupting noise. The
above similarities allow the extraction of samples that are more likely to correspond
to noise, from a window of past spectral amplitude observations. The extracted
samples are then used to produce an estimate of the noise power.
In the final part of this thesis, we are concerned with the incorporation of the time
and frequency dependencies of speech signals in our estimation model. The theoretical
framework on which the modelling is based is provided by Markov Random
Fields (MRF’s). Initially, we develop a MAP estimator of speech based on the Gaussian
MRF prior. In the following, we introduce the Chi MRF, which is employed in
the development of an improved speech estimator. Finally, the performance of fixed
and adaptive schemes for the estimation of the MRF parameters is investigated.
Andrianakis, I.
eb7acf9d-5ae7-4834-b00e-e84719ca83a6
November 2007
Andrianakis, I.
eb7acf9d-5ae7-4834-b00e-e84719ca83a6
White, P.R.
2dd2477b-5aa9-42e2-9d19-0806d994eaba
Andrianakis, I.
(2007)
Bayesian algorithms for speech enhancement.
University of Southampton, Institute of Sound and Vibration Research, Doctoral Thesis, 198pp.
Record type:
Thesis
(Doctoral)
Abstract
The portability of modern voice processing devices allows them to be used in environments
where background noise conditions can be adverse. Background noise
can deteriorate the quality of speech transmitted through such devices, but speech
enhancement algorithms can ameliorate this degradation to some extent. The development
of speech enhancement algorithms that improve the quality of noisy speech
is the aim of this thesis, which consists of three main parts.
In the first part, we propose a framework of algorithms that estimate the clean speech
Short Time Fourier Transform (STFT) coefficients. The algorithms are derived from
the Bayesian theory of estimation and can be grouped according to i) the STFT
representation they estimate ii) the estimator they apply and iii) the speech prior
density they assume. Apart from the introduction of algorithms that surpass the
performance of similar algorithms that exist in the literature, the compilation of the
above framework offers insight on the effect and relative importance of the different
components of the algorithms (e.g. prior, estimator) to the quality of the enhanced
speech.
In the second part of this thesis, we develop methods for the estimation of the power
of time varying noise. The main outcome is a method that exploits some similarities
between the distribution of the noisy speech spectral amplitude coefficients within a
single frequency bin, and the corresponding distribution of the corrupting noise. The
above similarities allow the extraction of samples that are more likely to correspond
to noise, from a window of past spectral amplitude observations. The extracted
samples are then used to produce an estimate of the noise power.
In the final part of this thesis, we are concerned with the incorporation of the time
and frequency dependencies of speech signals in our estimation model. The theoretical
framework on which the modelling is based is provided by Markov Random
Fields (MRF’s). Initially, we develop a MAP estimator of speech based on the Gaussian
MRF prior. In the following, we introduce the Chi MRF, which is employed in
the development of an improved speech estimator. Finally, the performance of fixed
and adaptive schemes for the estimation of the MRF parameters is investigated.
More information
Published date: November 2007
Organisations:
University of Southampton
Identifiers
Local EPrints ID: 66244
URI: http://eprints.soton.ac.uk/id/eprint/66244
PURE UUID: 039bc728-d12e-4443-bd15-e93d16dc5cf6
Catalogue record
Date deposited: 20 May 2009
Last modified: 14 Mar 2024 02:34
Export record
Contributors
Author:
I. Andrianakis
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics