The University of Southampton
University of Southampton Institutional Repository

Generating language distance metrics by language recognition using acoustic features

Generating language distance metrics by language recognition using acoustic features
Generating language distance metrics by language recognition using acoustic features
A language recognition system is used to build quantitative measure of language distance. The OpenEAR toolkit is used to extract more than 6,000 features per speech sample. The features consist of 56 low level descriptors (LLDs) and their Delta and Delta Delta values, the corresponding 39 functionals. The language model training component is based on the Gentle AdaBoost algorithm. When tested on a group of 10 principally Indo-European languages, the language recognition system performs comparatively to other language recognizers.
The UPGMA tree built from the interlanguage distances identifies the major subgroups of Indo-European. Genetic algorithms are also implemented to generate the language map on the 2D plane. Although some errors remain, the obtained language tree and map are indicators of language relationships. We discuss errors in our system and more generally perspectives for the use of sound file classifiers in historical linguistics.
IEEE
Sun, Le
4ace187a-9524-4fba-91c8-9f062b972d7d
Hu, Roland
8eec48f4-3ba2-4c2b-ada1-b7e6c7f778f6
Yu, Huimin
428aba1e-25b9-49c6-84a3-be5b4ff8fbc1
Sluckin, T. J.
8dbb6b08-7034-4ae2-aa65-6b80072202f6
Sun, Le
4ace187a-9524-4fba-91c8-9f062b972d7d
Hu, Roland
8eec48f4-3ba2-4c2b-ada1-b7e6c7f778f6
Yu, Huimin
428aba1e-25b9-49c6-84a3-be5b4ff8fbc1
Sluckin, T. J.
8dbb6b08-7034-4ae2-aa65-6b80072202f6

Sun, Le, Hu, Roland, Yu, Huimin and Sluckin, T. J. (2016) Generating language distance metrics by language recognition using acoustic features. In 2016 8th International Conference on Wireless Communications & Signal Processing (WCSP). IEEE.. (doi:10.1109/WCSP.2016.7752528).

Record type: Conference or Workshop Item (Paper)

Abstract

A language recognition system is used to build quantitative measure of language distance. The OpenEAR toolkit is used to extract more than 6,000 features per speech sample. The features consist of 56 low level descriptors (LLDs) and their Delta and Delta Delta values, the corresponding 39 functionals. The language model training component is based on the Gentle AdaBoost algorithm. When tested on a group of 10 principally Indo-European languages, the language recognition system performs comparatively to other language recognizers.
The UPGMA tree built from the interlanguage distances identifies the major subgroups of Indo-European. Genetic algorithms are also implemented to generate the language map on the 2D plane. Although some errors remain, the obtained language tree and map are indicators of language relationships. We discuss errors in our system and more generally perspectives for the use of sound file classifiers in historical linguistics.

Text
Sun_et_aL_preprint WCSP2016 - Accepted Manuscript
Download (253kB)

More information

Accepted/In Press date: 1 September 2016
e-pub ahead of print date: 24 November 2016
Additional Information: Electronic ISBN: 978-1-5090-2860-3 USB ISBN: 978-1-5090-2859-7 Print on Demand(PoD) ISBN: 978-1-5090-2861-0
Venue - Dates: 8th international Conference on Wireless Communications & Signal Processing, , Yangzhou, China, 2016-10-13 - 2016-10-15
Organisations: Applied Mathematics

Identifiers

Local EPrints ID: 406381
URI: http://eprints.soton.ac.uk/id/eprint/406381
PURE UUID: abe756dc-14bf-4fbe-898b-babf6733d96d
ORCID for T. J. Sluckin: ORCID iD orcid.org/0000-0002-9163-0061

Catalogue record

Date deposited: 10 Mar 2017 10:46
Last modified: 16 Mar 2024 02:32

Export record

Altmetrics

Contributors

Author: Le Sun
Author: Roland Hu
Author: Huimin Yu
Author: T. J. Sluckin ORCID iD

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×