The University of Southampton
University of Southampton Institutional Repository

Exploring practical metrics to support automatic speech recognition evaluations

Exploring practical metrics to support automatic speech recognition evaluations
Exploring practical metrics to support automatic speech recognition evaluations
Recent studies into the evaluation of automatic speech recognition for its quality of output in the form of text have shown that using word error rate to see how many mistakes exist in English does not necessarily help the developer of automatic transcriptions or captions. Confidence levels as to the type of errors being made remain low because mistranslations from speech to text are not always captured with a note that details the reason for the error. There have been situations in higher education where students requiring captions and transcriptions have found that some academic lecture results are littered with word errors which means that comprehension levels drop and those with cognitive, physical and sensory disabilities are particularly affected. Despite the incredible improvements in general understanding of conversational automatic speech recognition, academic situations tend to include numerous domain specific terms and the lecturers may be non-native speakers, coping with recording technology in noisy situations. This paper aims to discuss the way additional metrics are used to capture issues and feedback into the machine learning process to enable enhanced quality of output and more inclusive practices for those using virtual conferencing systems. The process goes beyond what is expressed and examines paralinguistic aspects such as timing, intonation, voice quality and speech understanding.
automatic speech recognition, captions, disability, error correction, transcriptions, word error rate
0926-9630
305-310
Draffan, E.A.
021d4f4e-d269-4379-ba5a-7e2ffb73d2bf
Wald, Mike
90577cfd-35ae-4e4a-9422-5acffecd89d5
Ding, Chaohai
f92dff41-8249-46b3-9a73-284a3bd286ac
Li, Yunjia
3a0d988e-b5e3-43c9-a268-dc14b5313547
Draffan, E.A.
021d4f4e-d269-4379-ba5a-7e2ffb73d2bf
Wald, Mike
90577cfd-35ae-4e4a-9422-5acffecd89d5
Ding, Chaohai
f92dff41-8249-46b3-9a73-284a3bd286ac
Li, Yunjia
3a0d988e-b5e3-43c9-a268-dc14b5313547

Draffan, E.A., Wald, Mike, Ding, Chaohai and Li, Yunjia (2023) Exploring practical metrics to support automatic speech recognition evaluations. Studies in Health Technology and Informatics, 306, 305-310. (doi:10.3233/SHTI230636).

Record type: Article

Abstract

Recent studies into the evaluation of automatic speech recognition for its quality of output in the form of text have shown that using word error rate to see how many mistakes exist in English does not necessarily help the developer of automatic transcriptions or captions. Confidence levels as to the type of errors being made remain low because mistranslations from speech to text are not always captured with a note that details the reason for the error. There have been situations in higher education where students requiring captions and transcriptions have found that some academic lecture results are littered with word errors which means that comprehension levels drop and those with cognitive, physical and sensory disabilities are particularly affected. Despite the incredible improvements in general understanding of conversational automatic speech recognition, academic situations tend to include numerous domain specific terms and the lecturers may be non-native speakers, coping with recording technology in noisy situations. This paper aims to discuss the way additional metrics are used to capture issues and feedback into the machine learning process to enable enhanced quality of output and more inclusive practices for those using virtual conferencing systems. The process goes beyond what is expressed and examines paralinguistic aspects such as timing, intonation, voice quality and speech understanding.

Text
Metrics for ASR evaluation - Accepted Manuscript
Available under License Creative Commons Attribution.
Download (127kB)

More information

e-pub ahead of print date: 1 August 2023
Additional Information: Funding Information: The team from the University of Southampton would like to thank all those experts who took part in this review and the students, lecturers and colleagues who also helped by offering their opinions when using a newly developed e-learning platform with cloudbased video conferencing, summary and linked resources as part of UKRI Innovation UK funded projects (Technology Strategy Board Ref 10024466, 10013521, 103341).
Keywords: automatic speech recognition, captions, disability, error correction, transcriptions, word error rate

Identifiers

Local EPrints ID: 483685
URI: http://eprints.soton.ac.uk/id/eprint/483685
ISSN: 0926-9630
PURE UUID: 5b14ae65-84db-4117-b49d-4c07c03466a2
ORCID for E.A. Draffan: ORCID iD orcid.org/0000-0003-1590-7556

Catalogue record

Date deposited: 03 Nov 2023 17:53
Last modified: 18 Mar 2024 03:07

Export record

Altmetrics

Contributors

Author: E.A. Draffan ORCID iD
Author: Mike Wald
Author: Chaohai Ding
Author: Yunjia Li

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×