A Novel Approach to Noisy Speech recognition using DTW algorithm with Mel-Frequency Cepstral Coefficients
A Novel Approach to Noisy Speech recognition using DTW algorithm with Mel-Frequency Cepstral Coefficients
A new and effective approach to recognition of noisy speech is introduced. End-Point-Detection algorithm is used to measure the noise power and to automatically initiate recording of a spoken word. Unvoiced components of the recorded speech, buried under noise, viz. ambient noise or hiss noise or telephone noise, were then optimally minimized by Finite Impulse Response (FIR) band pass Filter. The speech signal was then sampled and speech features were extracted using low-level and customized Mel-Frequency Cepstral Coefficients (MFCC), which were later dynamically time-warped to find the average minimal distance from Euclidean distance matrices to help facilitate the recognition of speech. For generalization, speech data from three speakers, of three different level of pitch, were collected and were compared to a mid-pitch speaker to establish both speaker independent and speaker dependent efficacy and accuracy. Such a speech recognition system can be both fast and effective even in quite noisy environments.
21-29
Shafik, Rishad Ahmed
aa0bdafc-b022-4cb2-a8ef-4bf8a03ba524
Yousaf-Zai, Fazli Qayyum
e356ffff-846a-4889-bbf5-fa3d0fe65f1a
December 2004
Shafik, Rishad Ahmed
aa0bdafc-b022-4cb2-a8ef-4bf8a03ba524
Yousaf-Zai, Fazli Qayyum
e356ffff-846a-4889-bbf5-fa3d0fe65f1a
Shafik, Rishad Ahmed and Yousaf-Zai, Fazli Qayyum
(2004)
A Novel Approach to Noisy Speech recognition using DTW algorithm with Mel-Frequency Cepstral Coefficients.
Journal of Engineering and Technology, 5 (2), .
Abstract
A new and effective approach to recognition of noisy speech is introduced. End-Point-Detection algorithm is used to measure the noise power and to automatically initiate recording of a spoken word. Unvoiced components of the recorded speech, buried under noise, viz. ambient noise or hiss noise or telephone noise, were then optimally minimized by Finite Impulse Response (FIR) band pass Filter. The speech signal was then sampled and speech features were extracted using low-level and customized Mel-Frequency Cepstral Coefficients (MFCC), which were later dynamically time-warped to find the average minimal distance from Euclidean distance matrices to help facilitate the recognition of speech. For generalization, speech data from three speakers, of three different level of pitch, were collected and were compared to a mid-pitch speaker to establish both speaker independent and speaker dependent efficacy and accuracy. Such a speech recognition system can be both fast and effective even in quite noisy environments.
This record has no associated files available for download.
More information
Published date: December 2004
Organisations:
Electronic & Software Systems
Identifiers
Local EPrints ID: 263218
URI: http://eprints.soton.ac.uk/id/eprint/263218
PURE UUID: ccd963dc-8ec9-4496-8d39-b44d27293e90
Catalogue record
Date deposited: 30 Nov 2006
Last modified: 10 Dec 2021 21:35
Export record
Contributors
Author:
Rishad Ahmed Shafik
Author:
Fazli Qayyum Yousaf-Zai
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics