The University of Southampton
University of Southampton Institutional Repository

Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement

Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement
Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement
This paper presents a speech enhancement method based on the tracking and denoising of the formants of a linear prediction (LP) model of the spectral envelope of speech and the parameters of a harmonic noise model (HNM) of its excitation. The main advantages of tracking and denoising the prominent energy contours of speech are the efficient use of the spectral and temporal structures of successive speech frames and a mitigation of processing artefact known as the ‘musical noise’ or ‘musical tones’.
The formant-tracking linear prediction (FTLP) model estimation consists of three stages: (a) speech pre-cleaning based on a spectral amplitude estimation, (b) formant-tracking across successive speech frames using the Viterbi method, and (c) Kalman filtering of the formant trajectories across successive speech frames.
The HNM parameters for the excitation signal comprise; voiced/unvoiced decision, the fundamental frequency, the harmonics’ amplitudes and the variance of the noise component of excitation. A frequency-domain pitch extraction method is proposed that searches for the peak signal to noise ratios (SNRs) at the harmonics. For each speech frame several pitch candidates are calculated. An estimate of the pitch trajectory across successive frames is obtained using a Viterbi decoder. The trajectories of the noisy excitation harmonics across successive speech frames are modeled and denoised using Kalman filters.
The proposed method is used to deconstruct noisy speech, de-noise its model parameters and then reconstitute speech from its cleaned parts. Experimental evaluations show the performance gains of the formant tracking, pitch extraction and noise reduction stages.
HNM, kalman, formant
0885-2308
69-83
Yan, Q.
e5434e63-5287-4ec8-a338-b3ad84685abd
Vaseghi, S.
c9816f62-74bc-46c2-a693-77033369e528
Zavarehei, E.
3a3698a3-4987-466a-a3ef-690cb95055ce
Milner, B.
8d5eb9bc-4c09-4915-bab5-2dacc7a18042
Darch, J.
a8dd0da1-851b-40e6-aa33-a2e490150b7c
White, P.R.
2dd2477b-5aa9-42e2-9d19-0806d994eaba
Yan, Q.
e5434e63-5287-4ec8-a338-b3ad84685abd
Vaseghi, S.
c9816f62-74bc-46c2-a693-77033369e528
Zavarehei, E.
3a3698a3-4987-466a-a3ef-690cb95055ce
Milner, B.
8d5eb9bc-4c09-4915-bab5-2dacc7a18042
Darch, J.
a8dd0da1-851b-40e6-aa33-a2e490150b7c
White, P.R.
2dd2477b-5aa9-42e2-9d19-0806d994eaba

Yan, Q., Vaseghi, S., Zavarehei, E., Milner, B., Darch, J. and White, P.R. (2008) Kalman tracking of linear predictor and harmonic noise models for noisy speech enhancement. Computer Speech & Language, 22 (1), 69-83. (doi:10.1016/j.csl.2007.06.002).

Record type: Article

Abstract

This paper presents a speech enhancement method based on the tracking and denoising of the formants of a linear prediction (LP) model of the spectral envelope of speech and the parameters of a harmonic noise model (HNM) of its excitation. The main advantages of tracking and denoising the prominent energy contours of speech are the efficient use of the spectral and temporal structures of successive speech frames and a mitigation of processing artefact known as the ‘musical noise’ or ‘musical tones’.
The formant-tracking linear prediction (FTLP) model estimation consists of three stages: (a) speech pre-cleaning based on a spectral amplitude estimation, (b) formant-tracking across successive speech frames using the Viterbi method, and (c) Kalman filtering of the formant trajectories across successive speech frames.
The HNM parameters for the excitation signal comprise; voiced/unvoiced decision, the fundamental frequency, the harmonics’ amplitudes and the variance of the noise component of excitation. A frequency-domain pitch extraction method is proposed that searches for the peak signal to noise ratios (SNRs) at the harmonics. For each speech frame several pitch candidates are calculated. An estimate of the pitch trajectory across successive frames is obtained using a Viterbi decoder. The trajectories of the noisy excitation harmonics across successive speech frames are modeled and denoised using Kalman filters.
The proposed method is used to deconstruct noisy speech, de-noise its model parameters and then reconstitute speech from its cleaned parts. Experimental evaluations show the performance gains of the formant tracking, pitch extraction and noise reduction stages.

This record has no associated files available for download.

More information

Published date: January 2008
Keywords: HNM, kalman, formant

Identifiers

Local EPrints ID: 49627
URI: http://eprints.soton.ac.uk/id/eprint/49627
ISSN: 0885-2308
PURE UUID: d86a5c88-4c5b-4e51-ac4e-17a0e0a77841
ORCID for P.R. White: ORCID iD orcid.org/0000-0002-4787-8713

Catalogue record

Date deposited: 23 Nov 2007
Last modified: 16 Mar 2024 02:39

Export record

Altmetrics

Contributors

Author: Q. Yan
Author: S. Vaseghi
Author: E. Zavarehei
Author: B. Milner
Author: J. Darch
Author: P.R. White ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×