The University of Southampton

×

Generating language distance metrics by language recognition using acoustic features

Generating language distance metrics by language recognition using acoustic features

Generating language distance metrics by language recognition using acoustic features

A language recognition system is used to build quantitative measure of language distance. The OpenEAR toolkit is used to extract more than 6,000 features per speech sample. The features consist of 56 low level descriptors (LLDs) and their Delta and Delta Delta values, the corresponding 39 functionals. The language model training component is based on the Gentle AdaBoost algorithm. When tested on a group of 10 principally Indo-European languages, the language recognition system performs comparatively to other language recognizers.
The UPGMA tree built from the interlanguage distances identifies the major subgroups of Indo-European. Genetic algorithms are also implemented to generate the language map on the 2D plane. Although some errors remain, the obtained language tree and map are indicators of language relationships. We discuss errors in our system and more generally perspectives for the use of sound file classifiers in historical linguistics.

10.1109/WCSP.2016.7752528

IEEE

Sun, Le

4ace187a-9524-4fba-91c8-9f062b972d7d

Hu, Roland

8eec48f4-3ba2-4c2b-ada1-b7e6c7f778f6

Yu, Huimin

428aba1e-25b9-49c6-84a3-be5b4ff8fbc1

Sluckin, T. J.

8dbb6b08-7034-4ae2-aa65-6b80072202f6

Sun, Le

4ace187a-9524-4fba-91c8-9f062b972d7d

Hu, Roland

8eec48f4-3ba2-4c2b-ada1-b7e6c7f778f6

Yu, Huimin

428aba1e-25b9-49c6-84a3-be5b4ff8fbc1

Sluckin, T. J.

8dbb6b08-7034-4ae2-aa65-6b80072202f6

Sun, Le, Hu, Roland, Yu, Huimin and Sluckin, T. J. (2016) Generating language distance metrics by language recognition using acoustic features. In 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP). IEEE.. (doi:10.1109/WCSP.2016.7752528).

Record type: Conference or Workshop Item (Paper)

Abstract

A language recognition system is used to build quantitative measure of language distance. The OpenEAR toolkit is used to extract more than 6,000 features per speech sample. The features consist of 56 low level descriptors (LLDs) and their Delta and Delta Delta values, the corresponding 39 functionals. The language model training component is based on the Gentle AdaBoost algorithm. When tested on a group of 10 principally Indo-European languages, the language recognition system performs comparatively to other language recognizers.
The UPGMA tree built from the interlanguage distances identifies the major subgroups of Indo-European. Genetic algorithms are also implemented to generate the language map on the 2D plane. Although some errors remain, the obtained language tree and map are indicators of language relationships. We discuss errors in our system and more generally perspectives for the use of sound file classifiers in historical linguistics.

Text

Sun_et_aL_preprint WCSP2016 - Accepted Manuscript

Available under License University of Southampton Accepted Manuscript Licence.

Download (253kB)

More information

Accepted/In Press date: 1 September 2016

e-pub ahead of print date: 24 November 2016

Additional Information: Electronic ISBN: 978-1-5090-2860-3 USB ISBN: 978-1-5090-2859-7 Print on Demand(PoD) ISBN: 978-1-5090-2861-0

Venue - Dates: 8th international Conference on Wireless Communications & Signal Processing, , Yangzhou, China, 2016-10-13 - 2016-10-15

Organisations: Applied Mathematics

Identifiers

Local EPrints ID: 406381

URI: http://eprints.soton.ac.uk/id/eprint/406381

DOI: doi:10.1109/WCSP.2016.7752528

PURE UUID: abe756dc-14bf-4fbe-898b-babf6733d96d

ORCID for T. J. Sluckin:

orcid.org/0000-0002-9163-0061

Catalogue record

Date deposited: 10 Mar 2017 10:46

Last modified: 16 Mar 2024 02:32

Export record

Altmetrics

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Le Sun

Author: Roland Hu

Author: Huimin Yu

Author: T. J. Sluckin

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×