In-ear microphone speech data segmentation and recognition using neural networks
In-ear microphone speech data segmentation and recognition using neural networks
Speech collected through a microphone placed in front of the mouth has been the primary source of data collection for speech recognition. However, this set-up also picks up any ambient noise present at the same time. As a result, locations which may provide shielding from surrounding noise have also been considered. This study considers an ear-insert microphone which collects speech from the ear canal to take advantage of the ear canal noise shielding properties to operate in noisy environments. Speech segmentation is achieved using short-time signal magnitude and short-time energy-entropy features. Cepstral coefficients extracted from each segmented utterance are used as input features to a back-propagation neural network for the seven isolated word recognizer implemented. Results show that a backpropagation neural network configuration may be a viable choice for this recognition task and that the best average recognition rate (94.73%) is obtained with mel-frequency cepstral coefficients for a two-layer network
1424435343
262-267
Bulbuller, G.
104bb958-84de-488b-b993-02286979245d
Fargues, M.P.
9c73867d-327e-461f-8d6f-afdd0d639020
Vaidyanathan, R.
f062a7b1-fc7e-4227-9e1b-ca0b61330237
2006
Bulbuller, G.
104bb958-84de-488b-b993-02286979245d
Fargues, M.P.
9c73867d-327e-461f-8d6f-afdd0d639020
Vaidyanathan, R.
f062a7b1-fc7e-4227-9e1b-ca0b61330237
Bulbuller, G., Fargues, M.P. and Vaidyanathan, R.
(2006)
In-ear microphone speech data segmentation and recognition using neural networks.
In Proceedings of the IEE 12th Digital Signal Processing Workshop, 4th Signal Processing Education Workshop.
IEEE.
.
(doi:10.1109/DSPWS.2006.265387).
Record type:
Conference or Workshop Item
(Paper)
Abstract
Speech collected through a microphone placed in front of the mouth has been the primary source of data collection for speech recognition. However, this set-up also picks up any ambient noise present at the same time. As a result, locations which may provide shielding from surrounding noise have also been considered. This study considers an ear-insert microphone which collects speech from the ear canal to take advantage of the ear canal noise shielding properties to operate in noisy environments. Speech segmentation is achieved using short-time signal magnitude and short-time energy-entropy features. Cepstral coefficients extracted from each segmented utterance are used as input features to a back-propagation neural network for the seven isolated word recognizer implemented. Results show that a backpropagation neural network configuration may be a viable choice for this recognition task and that the best average recognition rate (94.73%) is obtained with mel-frequency cepstral coefficients for a two-layer network
This record has no associated files available for download.
More information
Published date: 2006
Venue - Dates:
IEE 12th Digital Signal Processing Workshop, 4th Signal Processing Education Workshop, Wyoming, USA, September 24-27 2006, Wyoming, USA, 2006-09-24 - 2006-09-27
Identifiers
Local EPrints ID: 45658
URI: http://eprints.soton.ac.uk/id/eprint/45658
ISBN: 1424435343
PURE UUID: cc6da12d-ffd6-4ab1-9c50-0c502bcc4964
Catalogue record
Date deposited: 16 Apr 2007
Last modified: 15 Mar 2024 09:12
Export record
Altmetrics
Contributors
Author:
G. Bulbuller
Author:
M.P. Fargues
Author:
R. Vaidyanathan
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics