Microphone signal processing for speech recognition in cars.
Microphone signal processing for speech recognition in cars.
This thesis is concerned with the problem of automatic recognition of speech that is contaminated with acoustic noise and reverberation, especially in cars. Speech recognition has great potential for in-car application, because it allows drivers to use complex interfaces without distracting their eyes or hands. However, the high levels of noise present inside travelling cars produce high rates of error in current automatic speech recognisers. Error rates may be reduced either by making a speech recogniser that is 'noise-robust', i.e. insensitive to noise on its input, or by decreasing the noise content of the recogniser's input speech. The latter approach has been followed in this thesis.
The SNR of the speech received by a microphone may be increased by bringing it closer to the speaker's mouth and/or making it more directional. This thesis assesses various microphone mounting positions and a car's interior surfaces, and reports on measurements of the directional and frequency responses of some commercial 'car-phone' microphones.
The noise received by a microphone can also be reduced by processing its output signal. Greater noise reductions may be achieved by processing multiple versions of the speech, with different noise components, obtained from an array of microphones. Several single- and multichannel processors are evaluated in this thesis, using input speech recorded by seven microphones at various positions inside a travelling car.
Many of the processors tested and used optimal FIR filters. Such filters are optimised in advance, so as to minimise some measure of the noise on their output, and then held fixed during normal operation. Conventional minimum mean-squared-error (MMSE) optimal filters were found to give large plants in SNR, but little or no decrease in speech recognition error rates, owing to the way they distort the speech spectrum.
University of Southampton
Rex, James Alexander
445c03d1-ce9c-4ef5-b7b6-1710156d9177
2000
Rex, James Alexander
445c03d1-ce9c-4ef5-b7b6-1710156d9177
Rex, James Alexander
(2000)
Microphone signal processing for speech recognition in cars.
University of Southampton, Doctoral Thesis.
Record type:
Thesis
(Doctoral)
Abstract
This thesis is concerned with the problem of automatic recognition of speech that is contaminated with acoustic noise and reverberation, especially in cars. Speech recognition has great potential for in-car application, because it allows drivers to use complex interfaces without distracting their eyes or hands. However, the high levels of noise present inside travelling cars produce high rates of error in current automatic speech recognisers. Error rates may be reduced either by making a speech recogniser that is 'noise-robust', i.e. insensitive to noise on its input, or by decreasing the noise content of the recogniser's input speech. The latter approach has been followed in this thesis.
The SNR of the speech received by a microphone may be increased by bringing it closer to the speaker's mouth and/or making it more directional. This thesis assesses various microphone mounting positions and a car's interior surfaces, and reports on measurements of the directional and frequency responses of some commercial 'car-phone' microphones.
The noise received by a microphone can also be reduced by processing its output signal. Greater noise reductions may be achieved by processing multiple versions of the speech, with different noise components, obtained from an array of microphones. Several single- and multichannel processors are evaluated in this thesis, using input speech recorded by seven microphones at various positions inside a travelling car.
Many of the processors tested and used optimal FIR filters. Such filters are optimised in advance, so as to minimise some measure of the noise on their output, and then held fixed during normal operation. Conventional minimum mean-squared-error (MMSE) optimal filters were found to give large plants in SNR, but little or no decrease in speech recognition error rates, owing to the way they distort the speech spectrum.
Text
757216.pdf
- Version of Record
More information
Published date: 2000
Identifiers
Local EPrints ID: 464177
URI: http://eprints.soton.ac.uk/id/eprint/464177
PURE UUID: bcfb7277-197a-4eb6-80b6-6b2ec14d846b
Catalogue record
Date deposited: 04 Jul 2022 21:25
Last modified: 16 Mar 2024 19:19
Export record
Contributors
Author:
James Alexander Rex
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics