An investigation of speech synthesis parameters
An investigation of speech synthesis parameters
The model of speech production generally used in speech synthesis is that of a source modified by a digital filter. The major difference between a number of models is the form of the digital filter. The purpose of this research is to compare the properties of these filters when used for speech synthesis. Six models were investigated: (1) series resonance; (2) direct form; (3) reflection coefficients; (4) area function; (5) parallel resonance; and (6) a simple articulatory model. Types (2,3,4) are three varieties of linear predictive coding (LPC) parameters. There are five parts to the investigation: (1) an historical survey of models for speech synthesis and their problems; (2) a formal description of the models and their analytical relationships; (3) an objective assessment of the behaviour of the models during interpolation; (4) measurement of intelligibility (using a FAAF test); and (5) measurement of naturalness. Principal results are: synthesizer types (1) to (4) are all-pole models, formally equivalent in the steady state. But when the parameters of any of the models are interpolated, consequences for motion of vocal tract resonances (formants) differ. These differences exceed the discrimination limen for formant frequency, and make a small but statistically significant difference to intelligibility, but not to naturalness. Simple linear interpolation was found to be as good as cosine or piecewise-linear interpolation. Complete lack of interpolation reduced intelligibility by 30%. Finally, the synthesis studied achieved as few place-of-articulation errors as did LPC speech, indicating that intelligibility was limited not by parameter and transition type, but by other factors such as the excitation signal, phoneme target values, and durations.
Wright, Richard Douglas
f7459a9e-6413-4276-a3c8-865647d1f917
1988
Wright, Richard Douglas
f7459a9e-6413-4276-a3c8-865647d1f917
Elliott, S.J.
721dc55c-8c3e-4895-b9c4-82f62abd3567
Sinclair, D.A.
861a10da-bc43-4629-8efe-19f6a44472fc
Wright, Richard Douglas
(1988)
An investigation of speech synthesis parameters.
University of Southampton, Institute of Sound and Vibration Research, Doctoral Thesis, 315pp.
Record type:
Thesis
(Doctoral)
Abstract
The model of speech production generally used in speech synthesis is that of a source modified by a digital filter. The major difference between a number of models is the form of the digital filter. The purpose of this research is to compare the properties of these filters when used for speech synthesis. Six models were investigated: (1) series resonance; (2) direct form; (3) reflection coefficients; (4) area function; (5) parallel resonance; and (6) a simple articulatory model. Types (2,3,4) are three varieties of linear predictive coding (LPC) parameters. There are five parts to the investigation: (1) an historical survey of models for speech synthesis and their problems; (2) a formal description of the models and their analytical relationships; (3) an objective assessment of the behaviour of the models during interpolation; (4) measurement of intelligibility (using a FAAF test); and (5) measurement of naturalness. Principal results are: synthesizer types (1) to (4) are all-pole models, formally equivalent in the steady state. But when the parameters of any of the models are interpolated, consequences for motion of vocal tract resonances (formants) differ. These differences exceed the discrimination limen for formant frequency, and make a small but statistically significant difference to intelligibility, but not to naturalness. Simple linear interpolation was found to be as good as cosine or piecewise-linear interpolation. Complete lack of interpolation reduced intelligibility by 30%. Finally, the synthesis studied achieved as few place-of-articulation errors as did LPC speech, indicating that intelligibility was limited not by parameter and transition type, but by other factors such as the excitation signal, phoneme target values, and durations.
More information
Published date: 1988
Organisations:
University of Southampton
Identifiers
Local EPrints ID: 52279
URI: http://eprints.soton.ac.uk/id/eprint/52279
PURE UUID: ecac312a-e601-4375-b2f0-4de4f76bbf97
Catalogue record
Date deposited: 26 Aug 2008
Last modified: 15 Mar 2024 10:31
Export record
Contributors
Author:
Richard Douglas Wright
Thesis advisor:
D.A. Sinclair
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics