The University of Southampton
University of Southampton Institutional Repository

An investigation of speech synthesis parameters

An investigation of speech synthesis parameters
An investigation of speech synthesis parameters
The model of speech production generally used in speech synthesis is that of a source modified by a digital filter. The major difference between a number of models is the form of the digital filter. The purpose of this research is to compare the properties of these filters when used for speech synthesis. Six models were investigated: (1) series resonance; (2) direct form; (3) reflection coefficients; (4) area function; (5) parallel resonance; and (6) a simple articulatory model. Types (2,3,4) are three varieties of linear predictive coding (LPC) parameters. There are five parts to the investigation: (1) an historical survey of models for speech synthesis and their problems; (2) a formal description of the models and their analytical relationships; (3) an objective assessment of the behaviour of the models during interpolation; (4) measurement of intelligibility (using a FAAF test); and (5) measurement of naturalness. Principal results are: synthesizer types (1) to (4) are all-pole models, formally equivalent in the steady state. But when the parameters of any of the models are interpolated, consequences for motion of vocal tract resonances (formants) differ. These differences exceed the discrimination limen for formant frequency, and make a small but statistically significant difference to intelligibility, but not to naturalness. Simple linear interpolation was found to be as good as cosine or piecewise-linear interpolation. Complete lack of interpolation reduced intelligibility by 30%. Finally, the synthesis studied achieved as few place-of-articulation errors as did LPC speech, indicating that intelligibility was limited not by parameter and transition type, but by other factors such as the excitation signal, phoneme target values, and durations.
Wright, Richard Douglas
f7459a9e-6413-4276-a3c8-865647d1f917
Wright, Richard Douglas
f7459a9e-6413-4276-a3c8-865647d1f917
Elliott, S.J.
721dc55c-8c3e-4895-b9c4-82f62abd3567
Sinclair, D.A.
861a10da-bc43-4629-8efe-19f6a44472fc

Wright, Richard Douglas (1988) An investigation of speech synthesis parameters. University of Southampton, Institute of Sound and Vibration Research, Doctoral Thesis, 315pp.

Record type: Thesis (Doctoral)

Abstract

The model of speech production generally used in speech synthesis is that of a source modified by a digital filter. The major difference between a number of models is the form of the digital filter. The purpose of this research is to compare the properties of these filters when used for speech synthesis. Six models were investigated: (1) series resonance; (2) direct form; (3) reflection coefficients; (4) area function; (5) parallel resonance; and (6) a simple articulatory model. Types (2,3,4) are three varieties of linear predictive coding (LPC) parameters. There are five parts to the investigation: (1) an historical survey of models for speech synthesis and their problems; (2) a formal description of the models and their analytical relationships; (3) an objective assessment of the behaviour of the models during interpolation; (4) measurement of intelligibility (using a FAAF test); and (5) measurement of naturalness. Principal results are: synthesizer types (1) to (4) are all-pole models, formally equivalent in the steady state. But when the parameters of any of the models are interpolated, consequences for motion of vocal tract resonances (formants) differ. These differences exceed the discrimination limen for formant frequency, and make a small but statistically significant difference to intelligibility, but not to naturalness. Simple linear interpolation was found to be as good as cosine or piecewise-linear interpolation. Complete lack of interpolation reduced intelligibility by 30%. Finally, the synthesis studied achieved as few place-of-articulation errors as did LPC speech, indicating that intelligibility was limited not by parameter and transition type, but by other factors such as the excitation signal, phoneme target values, and durations.

Other
000786.PDF - Other
Download (10MB)

More information

Published date: 1988
Organisations: University of Southampton

Identifiers

Local EPrints ID: 52279
URI: http://eprints.soton.ac.uk/id/eprint/52279
PURE UUID: ecac312a-e601-4375-b2f0-4de4f76bbf97

Catalogue record

Date deposited: 26 Aug 2008
Last modified: 15 Mar 2024 10:31

Export record

Contributors

Author: Richard Douglas Wright
Thesis advisor: S.J. Elliott
Thesis advisor: D.A. Sinclair

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×