Microphone signal processing for speech recognition in cars.

Rex, James Alexander (2000) Microphone signal processing for speech recognition in cars. University of Southampton, Doctoral Thesis.

Record type: Thesis (Doctoral)

Abstract

This thesis is concerned with the problem of automatic recognition of speech that is contaminated with acoustic noise and reverberation, especially in cars. Speech recognition has great potential for in-car application, because it allows drivers to use complex interfaces without distracting their eyes or hands. However, the high levels of noise present inside travelling cars produce high rates of error in current automatic speech recognisers. Error rates may be reduced either by making a speech recogniser that is 'noise-robust', i.e. insensitive to noise on its input, or by decreasing the noise content of the recogniser's input speech. The latter approach has been followed in this thesis.

The SNR of the speech received by a microphone may be increased by bringing it closer to the speaker's mouth and/or making it more directional. This thesis assesses various microphone mounting positions and a car's interior surfaces, and reports on measurements of the directional and frequency responses of some commercial 'car-phone' microphones.

The noise received by a microphone can also be reduced by processing its output signal. Greater noise reductions may be achieved by processing multiple versions of the speech, with different noise components, obtained from an array of microphones. Several single- and multichannel processors are evaluated in this thesis, using input speech recorded by seven microphones at various positions inside a travelling car.

Many of the processors tested and used optimal FIR filters. Such filters are optimised in advance, so as to minimise some measure of the noise on their output, and then held fixed during normal operation. Conventional minimum mean-squared-error (MMSE) optimal filters were found to give large plants in SNR, but little or no decrease in speech recognition error rates, owing to the way they distort the speech spectrum.

Text

757216.pdf - Version of Record

Available under License University of Southampton Thesis Licence.

Download (25MB)