High-Level Approaches to Confidence Estimation in Speech Recognition
High-Level Approaches to Confidence Estimation in Speech Recognition
We describe some high-level approaches to estimating confidence scores for the words output by a speech recognizer. By "high-level" we mean that the proposed measures do not rely on decoder specific "side information" and so should find more general applicability than measures that have been developed for specific recognizers. Our main approach is to attempt to decouple the language modeling and acoustic modeling in the recognizer in order to generate independent information from these two sources that can then be used for estimation of confidence. We isolate these two information sources by using a phone recognizer working in parallel with the word recognizer. A set of techniques for estimating confidence measures using the phone recognizer output in conjunction with the word recognizer output is described. The most effective of these techniques is based on the construction of "metamodels," which generate alternative word hypotheses for an utterance. An alternative approach requires no other recognizers or extra information for confidence estimation and is based on the notion that a word that is semantically "distant" from the other decoded words in the utterance is likely to be incorrect. We describe a method for constructing "semantic similarities" between words and hence estimating a confidence. Results using the U.K. version of the Wall Street Journal are given for each technique.
Confidence measures, latent semantic analysis, -best lists, phoneme recognition, speech recognition.
460-471
Cox, Stephen
983571f0-8ca8-4597-8a38-d7b21f58aa55
Dasmahapatra, Srinandan
6716035c-8073-4e52-939f-c957bd3a5b7d
October 2002
Cox, Stephen
983571f0-8ca8-4597-8a38-d7b21f58aa55
Dasmahapatra, Srinandan
6716035c-8073-4e52-939f-c957bd3a5b7d
Cox, Stephen and Dasmahapatra, Srinandan
(2002)
High-Level Approaches to Confidence Estimation in Speech Recognition.
IEEE Transactions on Speech and Audio Processing, 10 (7), .
Abstract
We describe some high-level approaches to estimating confidence scores for the words output by a speech recognizer. By "high-level" we mean that the proposed measures do not rely on decoder specific "side information" and so should find more general applicability than measures that have been developed for specific recognizers. Our main approach is to attempt to decouple the language modeling and acoustic modeling in the recognizer in order to generate independent information from these two sources that can then be used for estimation of confidence. We isolate these two information sources by using a phone recognizer working in parallel with the word recognizer. A set of techniques for estimating confidence measures using the phone recognizer output in conjunction with the word recognizer output is described. The most effective of these techniques is based on the construction of "metamodels," which generate alternative word hypotheses for an utterance. An alternative approach requires no other recognizers or extra information for confidence estimation and is based on the notion that a word that is semantically "distant" from the other decoded words in the utterance is likely to be incorrect. We describe a method for constructing "semantic similarities" between words and hence estimating a confidence. Results using the U.K. version of the Wall Street Journal are given for each technique.
More information
Published date: October 2002
Keywords:
Confidence measures, latent semantic analysis, -best lists, phoneme recognition, speech recognition.
Organisations:
Electronics & Computer Science
Identifiers
Local EPrints ID: 258951
URI: http://eprints.soton.ac.uk/id/eprint/258951
PURE UUID: 96632f51-fa74-4a58-8730-cd396707e459
Catalogue record
Date deposited: 03 Mar 2004
Last modified: 14 Mar 2024 06:16
Export record
Contributors
Author:
Stephen Cox
Author:
Srinandan Dasmahapatra
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics