The University of Southampton
University of Southampton Institutional Repository

A Novel Approach to Noisy Speech recognition using DTW algorithm with Mel-Frequency Cepstral Coefficients

A Novel Approach to Noisy Speech recognition using DTW algorithm with Mel-Frequency Cepstral Coefficients
A Novel Approach to Noisy Speech recognition using DTW algorithm with Mel-Frequency Cepstral Coefficients
A new and effective approach to recognition of noisy speech is introduced. End-Point-Detection algorithm is used to measure the noise power and to automatically initiate recording of a spoken word. Unvoiced components of the recorded speech, buried under noise, viz. ambient noise or hiss noise or telephone noise, were then optimally minimized by Finite Impulse Response (FIR) band pass Filter. The speech signal was then sampled and speech features were extracted using low-level and customized Mel-Frequency Cepstral Coefficients (MFCC), which were later dynamically time-warped to find the average minimal distance from Euclidean distance matrices to help facilitate the recognition of speech. For generalization, speech data from three speakers, of three different level of pitch, were collected and were compared to a mid-pitch speaker to establish both speaker independent and speaker dependent efficacy and accuracy. Such a speech recognition system can be both fast and effective even in quite noisy environments.
21-29
Shafik, Rishad Ahmed
aa0bdafc-b022-4cb2-a8ef-4bf8a03ba524
Yousaf-Zai, Fazli Qayyum
e356ffff-846a-4889-bbf5-fa3d0fe65f1a
Shafik, Rishad Ahmed
aa0bdafc-b022-4cb2-a8ef-4bf8a03ba524
Yousaf-Zai, Fazli Qayyum
e356ffff-846a-4889-bbf5-fa3d0fe65f1a

Shafik, Rishad Ahmed and Yousaf-Zai, Fazli Qayyum (2004) A Novel Approach to Noisy Speech recognition using DTW algorithm with Mel-Frequency Cepstral Coefficients. Journal of Engineering and Technology (JET-IUT), 5 (2), 21-29.

Record type: Article

Abstract

A new and effective approach to recognition of noisy speech is introduced. End-Point-Detection algorithm is used to measure the noise power and to automatically initiate recording of a spoken word. Unvoiced components of the recorded speech, buried under noise, viz. ambient noise or hiss noise or telephone noise, were then optimally minimized by Finite Impulse Response (FIR) band pass Filter. The speech signal was then sampled and speech features were extracted using low-level and customized Mel-Frequency Cepstral Coefficients (MFCC), which were later dynamically time-warped to find the average minimal distance from Euclidean distance matrices to help facilitate the recognition of speech. For generalization, speech data from three speakers, of three different level of pitch, were collected and were compared to a mid-pitch speaker to establish both speaker independent and speaker dependent efficacy and accuracy. Such a speech recognition system can be both fast and effective even in quite noisy environments.

Full text not available from this repository.

More information

Published date: December 2004
Organisations: Electronic & Software Systems

Identifiers

Local EPrints ID: 263218
URI: https://eprints.soton.ac.uk/id/eprint/263218
PURE UUID: ccd963dc-8ec9-4496-8d39-b44d27293e90

Catalogue record

Date deposited: 30 Nov 2006
Last modified: 18 Jul 2017 08:43

Export record

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of https://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×