An investigation of speech synthesis parameters

The model of speech production generally used in speech synthesis is that of a source modified by a digital filter. The major difference between a number of models is the form of the digital filter. The purpose of this research is to compare the properties of these filters when used for speech synthesis. Six models were investigated: (1) series resonance; (2) direct form; (3) reflection coefficients; (4) area function; (5) parallel resonance; and (6) a simple articulatory model. Types (2,3,4) are three varieties of linear predictive coding (LPC) parameters. There are five parts to the investigation: (1) an historical survey of models for speech synthesis and their problems; (2) a formal description of the models and their analytical relationships; (3) an objective assessment of the behaviour of the models during interpolation; (4) measurement of intelligibility (using a FAAF test); and (5) measurement of naturalness. Principal results are: synthesizer types (1) to (4) are all-pole models, formally equivalent in the steady state. But when the parameters of any of the models are interpolated, consequences for motion of vocal tract resonances (formants) differ. These differences exceed the discrimination limen for formant frequency, and make a small but statistically significant difference to intelligibility, but not to naturalness. Simple linear interpolation was found to be as good as cosine or piecewise-linear interpolation. Complete lack of interpolation reduced intelligibility by 30%. Finally, the synthesis studied achieved as few place-of-articulation errors as did LPC speech, indicating that intelligibility was limited not by parameter and transition type, but by other factors such as the excitation signal, phoneme target values, and durations.

Wright, Richard Douglas

f7459a9e-6413-4276-a3c8-865647d1f917

1988

Wright, Richard Douglas

f7459a9e-6413-4276-a3c8-865647d1f917

Elliott, S.J.

721dc55c-8c3e-4895-b9c4-82f62abd3567

Sinclair, D.A.

861a10da-bc43-4629-8efe-19f6a44472fc

Wright, Richard Douglas (1988) An investigation of speech synthesis parameters. University of Southampton, Institute of Sound and Vibration Research, Doctoral Thesis, 315pp.

Record type: Thesis (Doctoral)

Abstract

Other

000786.PDF - Other

Download (10MB)

More information

Published date: 1988

Organisations: University of Southampton

Identifiers

Local EPrints ID: 52279

URI: http://eprints.soton.ac.uk/id/eprint/52279

PURE UUID: ecac312a-e601-4375-b2f0-4de4f76bbf97

Catalogue record

Date deposited: 26 Aug 2008

Last modified: 15 Mar 2024 10:31

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Richard Douglas Wright

Thesis advisor: S.J. Elliott

Thesis advisor: D.A. Sinclair

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information