Extending pronunciation by analogy for speech synthesis applicaions
Extending pronunciation by analogy for speech synthesis applicaions
Automatic pronunciation of unknown words, especially in English, is a hard problem of great importance in speech technology. This thesis focuses on a data-driven approach namely 'pronunciation by analogy', so-called PbA: for generating the pronunciation of unknown words from input text. The aim is to explore many useful aspects of the use of PbA in speech synthesis applications. This thesis is mostly devoted to the problem of proper name pronunciation, because previous work showed that proper names have significant impact on the perfonnance of text-to-speech (TTS) systems. The extension of PbA for multilingual pronunciation is also studied. The performance of PbA is investigated in a wide variety of aspects including: to incor porate automatic syllabification by analogy, to determine the effect of different kinds of lexicon, to determine the effect of lexicon size, to test with seven European languages in order to quantify the relationship between transcription accuracy and orthography, and to compare with other data-driven methods in terms of objective and subjective evaluations. The experimental results show that PbA can achieve a promising level of word accuracy and is superior to other methods tested on the problern of proper name pronuncia tion. In the objective evaluation, the best performance is 68.38% names correct and 94.31% phonemes correct, with a standard PbA using a leave-one-out strategy on 52,911 names in the CMU dictionary. In the subjective evaluation, the comparison is primarily based on 24 listeners' opinions of the acceptability of pronunciations from 150 names. 'Wilcoxson signed-rank tests show that the dictionary pronunciations are rated superior to the automatically-inferred pronunciations; one part of listening tests shows that PbA is marginally superior to the other methods, but no such superiority is seen for another part of listening tests. \iVith reference to the performance on seven European languages (Dutch, English, 1'-renc11, 1"risian, German, Norwegian, and Spanish), PbA achieves more than 85% words correct in case of all languages except English. In conclusion, this thesis has shown that PbA should become the method of choice in TTS applications.
University of Southampton
Soonklang, Tasanawan
9a3d1856-a59b-4154-8891-94dd1cfb9ae7
2008
Soonklang, Tasanawan
9a3d1856-a59b-4154-8891-94dd1cfb9ae7
Soonklang, Tasanawan
(2008)
Extending pronunciation by analogy for speech synthesis applicaions.
University of Southampton, Doctoral Thesis.
Record type:
Thesis
(Doctoral)
Abstract
Automatic pronunciation of unknown words, especially in English, is a hard problem of great importance in speech technology. This thesis focuses on a data-driven approach namely 'pronunciation by analogy', so-called PbA: for generating the pronunciation of unknown words from input text. The aim is to explore many useful aspects of the use of PbA in speech synthesis applications. This thesis is mostly devoted to the problem of proper name pronunciation, because previous work showed that proper names have significant impact on the perfonnance of text-to-speech (TTS) systems. The extension of PbA for multilingual pronunciation is also studied. The performance of PbA is investigated in a wide variety of aspects including: to incor porate automatic syllabification by analogy, to determine the effect of different kinds of lexicon, to determine the effect of lexicon size, to test with seven European languages in order to quantify the relationship between transcription accuracy and orthography, and to compare with other data-driven methods in terms of objective and subjective evaluations. The experimental results show that PbA can achieve a promising level of word accuracy and is superior to other methods tested on the problern of proper name pronuncia tion. In the objective evaluation, the best performance is 68.38% names correct and 94.31% phonemes correct, with a standard PbA using a leave-one-out strategy on 52,911 names in the CMU dictionary. In the subjective evaluation, the comparison is primarily based on 24 listeners' opinions of the acceptability of pronunciations from 150 names. 'Wilcoxson signed-rank tests show that the dictionary pronunciations are rated superior to the automatically-inferred pronunciations; one part of listening tests shows that PbA is marginally superior to the other methods, but no such superiority is seen for another part of listening tests. \iVith reference to the performance on seven European languages (Dutch, English, 1'-renc11, 1"risian, German, Norwegian, and Spanish), PbA achieves more than 85% words correct in case of all languages except English. In conclusion, this thesis has shown that PbA should become the method of choice in TTS applications.
Text
1132023.pdf
- Version of Record
More information
Published date: 2008
Identifiers
Local EPrints ID: 466407
URI: http://eprints.soton.ac.uk/id/eprint/466407
PURE UUID: 1d4fc862-2b2c-4431-a404-75bb34c5f460
Catalogue record
Date deposited: 05 Jul 2022 05:14
Last modified: 16 Mar 2024 20:41
Export record
Contributors
Author:
Tasanawan Soonklang
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics