The University of Southampton
University of Southampton Institutional Repository

Microphone signal processing for speech recognition in cars.

Microphone signal processing for speech recognition in cars.
Microphone signal processing for speech recognition in cars.

This thesis is concerned with the problem of automatic recognition of speech that is contaminated with acoustic noise and reverberation, especially in cars. Speech recognition has great potential for in-car application, because it allows drivers to use complex interfaces without distracting their eyes or hands. However, the high levels of noise present inside travelling cars produce high rates of error in current automatic speech recognisers. Error rates may be reduced either by making a speech recogniser that is 'noise-robust', i.e. insensitive to noise on its input, or by decreasing the noise content of the recogniser's input speech. The latter approach has been followed in this thesis.

The SNR of the speech received by a microphone may be increased by bringing it closer to the speaker's mouth and/or making it more directional. This thesis assesses various microphone mounting positions and a car's interior surfaces, and reports on measurements of the directional and frequency responses of some commercial 'car-phone' microphones.

The noise received by a microphone can also be reduced by processing its output signal. Greater noise reductions may be achieved by processing multiple versions of the speech, with different noise components, obtained from an array of microphones. Several single- and multichannel processors are evaluated in this thesis, using input speech recorded by seven microphones at various positions inside a travelling car.

Many of the processors tested and used optimal FIR filters. Such filters are optimised in advance, so as to minimise some measure of the noise on their output, and then held fixed during normal operation. Conventional minimum mean-squared-error (MMSE) optimal filters were found to give large plants in SNR, but little or no decrease in speech recognition error rates, owing to the way they distort the speech spectrum.

University of Southampton
Rex, James Alexander
445c03d1-ce9c-4ef5-b7b6-1710156d9177
Rex, James Alexander
445c03d1-ce9c-4ef5-b7b6-1710156d9177

Rex, James Alexander (2000) Microphone signal processing for speech recognition in cars. University of Southampton, Doctoral Thesis.

Record type: Thesis (Doctoral)

Abstract

This thesis is concerned with the problem of automatic recognition of speech that is contaminated with acoustic noise and reverberation, especially in cars. Speech recognition has great potential for in-car application, because it allows drivers to use complex interfaces without distracting their eyes or hands. However, the high levels of noise present inside travelling cars produce high rates of error in current automatic speech recognisers. Error rates may be reduced either by making a speech recogniser that is 'noise-robust', i.e. insensitive to noise on its input, or by decreasing the noise content of the recogniser's input speech. The latter approach has been followed in this thesis.

The SNR of the speech received by a microphone may be increased by bringing it closer to the speaker's mouth and/or making it more directional. This thesis assesses various microphone mounting positions and a car's interior surfaces, and reports on measurements of the directional and frequency responses of some commercial 'car-phone' microphones.

The noise received by a microphone can also be reduced by processing its output signal. Greater noise reductions may be achieved by processing multiple versions of the speech, with different noise components, obtained from an array of microphones. Several single- and multichannel processors are evaluated in this thesis, using input speech recorded by seven microphones at various positions inside a travelling car.

Many of the processors tested and used optimal FIR filters. Such filters are optimised in advance, so as to minimise some measure of the noise on their output, and then held fixed during normal operation. Conventional minimum mean-squared-error (MMSE) optimal filters were found to give large plants in SNR, but little or no decrease in speech recognition error rates, owing to the way they distort the speech spectrum.

Text
757216.pdf - Version of Record
Available under License University of Southampton Thesis Licence.
Download (25MB)

More information

Published date: 2000

Identifiers

Local EPrints ID: 464177
URI: http://eprints.soton.ac.uk/id/eprint/464177
PURE UUID: bcfb7277-197a-4eb6-80b6-6b2ec14d846b

Catalogue record

Date deposited: 04 Jul 2022 21:25
Last modified: 16 Mar 2024 19:19

Export record

Contributors

Author: James Alexander Rex

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×