Exploring practical metrics to support automatic speech recognition evaluations
Exploring practical metrics to support automatic speech recognition evaluations
Recent studies into the evaluation of automatic speech recognition for its quality of output in the form of text have shown that using word error rate to see how many mistakes exist in English does not necessarily help the developer of automatic transcriptions or captions. Confidence levels as to the type of errors being made remain low because mistranslations from speech to text are not always captured with a note that details the reason for the error. There have been situations in higher education where students requiring captions and transcriptions have found that some academic lecture results are littered with word errors which means that comprehension levels drop and those with cognitive, physical and sensory disabilities are particularly affected. Despite the incredible improvements in general understanding of conversational automatic speech recognition, academic situations tend to include numerous domain specific terms and the lecturers may be non-native speakers, coping with recording technology in noisy situations. This paper aims to discuss the way additional metrics are used to capture issues and feedback into the machine learning process to enable enhanced quality of output and more inclusive practices for those using virtual conferencing systems. The process goes beyond what is expressed and examines paralinguistic aspects such as timing, intonation, voice quality and speech understanding.
automatic speech recognition, captions, disability, error correction, transcriptions, word error rate
305-310
Draffan, E.A.
021d4f4e-d269-4379-ba5a-7e2ffb73d2bf
Wald, Mike
90577cfd-35ae-4e4a-9422-5acffecd89d5
Ding, Chaohai
f92dff41-8249-46b3-9a73-284a3bd286ac
Li, Yunjia
3a0d988e-b5e3-43c9-a268-dc14b5313547
Draffan, E.A.
021d4f4e-d269-4379-ba5a-7e2ffb73d2bf
Wald, Mike
90577cfd-35ae-4e4a-9422-5acffecd89d5
Ding, Chaohai
f92dff41-8249-46b3-9a73-284a3bd286ac
Li, Yunjia
3a0d988e-b5e3-43c9-a268-dc14b5313547
Draffan, E.A., Wald, Mike, Ding, Chaohai and Li, Yunjia
(2023)
Exploring practical metrics to support automatic speech recognition evaluations.
Studies in Health Technology and Informatics, 306, .
(doi:10.3233/SHTI230636).
Abstract
Recent studies into the evaluation of automatic speech recognition for its quality of output in the form of text have shown that using word error rate to see how many mistakes exist in English does not necessarily help the developer of automatic transcriptions or captions. Confidence levels as to the type of errors being made remain low because mistranslations from speech to text are not always captured with a note that details the reason for the error. There have been situations in higher education where students requiring captions and transcriptions have found that some academic lecture results are littered with word errors which means that comprehension levels drop and those with cognitive, physical and sensory disabilities are particularly affected. Despite the incredible improvements in general understanding of conversational automatic speech recognition, academic situations tend to include numerous domain specific terms and the lecturers may be non-native speakers, coping with recording technology in noisy situations. This paper aims to discuss the way additional metrics are used to capture issues and feedback into the machine learning process to enable enhanced quality of output and more inclusive practices for those using virtual conferencing systems. The process goes beyond what is expressed and examines paralinguistic aspects such as timing, intonation, voice quality and speech understanding.
Text
Metrics for ASR evaluation
- Accepted Manuscript
More information
e-pub ahead of print date: 1 August 2023
Additional Information:
Funding Information:
The team from the University of Southampton would like to thank all those experts who took part in this review and the students, lecturers and colleagues who also helped by offering their opinions when using a newly developed e-learning platform with cloudbased video conferencing, summary and linked resources as part of UKRI Innovation UK funded projects (Technology Strategy Board Ref 10024466, 10013521, 103341).
Keywords:
automatic speech recognition, captions, disability, error correction, transcriptions, word error rate
Identifiers
Local EPrints ID: 483685
URI: http://eprints.soton.ac.uk/id/eprint/483685
ISSN: 0926-9630
PURE UUID: 5b14ae65-84db-4117-b49d-4c07c03466a2
Catalogue record
Date deposited: 03 Nov 2023 17:53
Last modified: 18 Mar 2024 03:07
Export record
Altmetrics
Contributors
Author:
E.A. Draffan
Author:
Mike Wald
Author:
Chaohai Ding
Author:
Yunjia Li
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics