Phonetic inventory for an Arabic speech corpus
Phonetic inventory for an Arabic speech corpus
Corpus design for speech synthesis is a well-researched topic in languages such as English compared to Modern Standard Arabic, and there is a tendency to focus on methods to automatically generate the orthographic transcript to be recorded (usually greedy methods). In this work, a study of Modern Standard Arabic (MSA) phonetics and phonology is conducted in order to create criteria for a greedy meth-od to create a speech corpus transcript for recording. The size of the dataset is reduced a number of times using these optimisation methods with different parameters to yield a much smaller dataset with identical phonetic coverage than before the reduction, and this output transcript is chosen for recording. This is part of a larger work to create a completely annotated and segmented speech corpus for MSA.
phonology, corpus design, corpus evaluation
734-738
Halabi, Nawar
99b4cad8-beb0-4525-ad22-c76eee208023
Wald, Mike
90577cfd-35ae-4e4a-9422-5acffecd89d5
25 May 2016
Halabi, Nawar
99b4cad8-beb0-4525-ad22-c76eee208023
Wald, Mike
90577cfd-35ae-4e4a-9422-5acffecd89d5
Halabi, Nawar and Wald, Mike
(2016)
Phonetic inventory for an Arabic speech corpus.
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Slovenia, Slovenia.
23 - 28 May 2016.
.
Record type:
Conference or Workshop Item
(Poster)
Abstract
Corpus design for speech synthesis is a well-researched topic in languages such as English compared to Modern Standard Arabic, and there is a tendency to focus on methods to automatically generate the orthographic transcript to be recorded (usually greedy methods). In this work, a study of Modern Standard Arabic (MSA) phonetics and phonology is conducted in order to create criteria for a greedy meth-od to create a speech corpus transcript for recording. The size of the dataset is reduced a number of times using these optimisation methods with different parameters to yield a much smaller dataset with identical phonetic coverage than before the reduction, and this output transcript is chosen for recording. This is part of a larger work to create a completely annotated and segmented speech corpus for MSA.
Text
Arabic Phonetic Vocab 2016.pdf
- Accepted Manuscript
More information
Published date: 25 May 2016
Venue - Dates:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Slovenia, Slovenia, 2016-05-23 - 2016-05-28
Keywords:
phonology, corpus design, corpus evaluation
Organisations:
Web & Internet Science
Identifiers
Local EPrints ID: 397310
URI: http://eprints.soton.ac.uk/id/eprint/397310
PURE UUID: d3c0c085-3ab5-48d1-a12a-333faed9af6b
Catalogue record
Date deposited: 27 Jun 2016 10:21
Last modified: 15 Mar 2024 01:11
Export record
Contributors
Author:
Nawar Halabi
Author:
Mike Wald
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics