The University of Southampton
University of Southampton Institutional Repository

Phonetic inventory for an Arabic speech corpus

Phonetic inventory for an Arabic speech corpus
Phonetic inventory for an Arabic speech corpus
Corpus design for speech synthesis is a well-researched topic in languages such as English compared to Modern Standard Arabic, and there is a tendency to focus on methods to automatically generate the orthographic transcript to be recorded (usually greedy methods). In this work, a study of Modern Standard Arabic (MSA) phonetics and phonology is conducted in order to create criteria for a greedy meth-od to create a speech corpus transcript for recording. The size of the dataset is reduced a number of times using these optimisation methods with different parameters to yield a much smaller dataset with identical phonetic coverage than before the reduction, and this output transcript is chosen for recording. This is part of a larger work to create a completely annotated and segmented speech corpus for MSA.
phonology, corpus design, corpus evaluation
734-738
Halabi, Nawar
99b4cad8-beb0-4525-ad22-c76eee208023
Wald, Mike
90577cfd-35ae-4e4a-9422-5acffecd89d5
Halabi, Nawar
99b4cad8-beb0-4525-ad22-c76eee208023
Wald, Mike
90577cfd-35ae-4e4a-9422-5acffecd89d5

Halabi, Nawar and Wald, Mike (2016) Phonetic inventory for an Arabic speech corpus. Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Slovenia, Slovenia. 23 - 28 May 2016. pp. 734-738 .

Record type: Conference or Workshop Item (Poster)

Abstract

Corpus design for speech synthesis is a well-researched topic in languages such as English compared to Modern Standard Arabic, and there is a tendency to focus on methods to automatically generate the orthographic transcript to be recorded (usually greedy methods). In this work, a study of Modern Standard Arabic (MSA) phonetics and phonology is conducted in order to create criteria for a greedy meth-od to create a speech corpus transcript for recording. The size of the dataset is reduced a number of times using these optimisation methods with different parameters to yield a much smaller dataset with identical phonetic coverage than before the reduction, and this output transcript is chosen for recording. This is part of a larger work to create a completely annotated and segmented speech corpus for MSA.

Text
Arabic Phonetic Vocab 2016.pdf - Accepted Manuscript
Download (732kB)

More information

Published date: 25 May 2016
Venue - Dates: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Slovenia, Slovenia, 2016-05-23 - 2016-05-28
Keywords: phonology, corpus design, corpus evaluation
Organisations: Web & Internet Science

Identifiers

Local EPrints ID: 397310
URI: http://eprints.soton.ac.uk/id/eprint/397310
PURE UUID: d3c0c085-3ab5-48d1-a12a-333faed9af6b

Catalogue record

Date deposited: 27 Jun 2016 10:21
Last modified: 15 Mar 2024 01:11

Export record

Contributors

Author: Nawar Halabi
Author: Mike Wald

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×