Enhancement of body-conducted speech from an ear-microphone
Enhancement of body-conducted speech from an ear-microphone
This thesis is concerned with the use of optimal filtering to enhance the intelligibility and quality of body-conducted speech. In the context of a mobile communication system, body-conducted speech picked up by an ear-microphone has the advantage of being immune to background noise compared to air-conducted speech, which can be easily degraded by external noise sources. However, their intelligibility and quality differ significantly, as air-conducted speech is subjectively superior to body-conducted speech. This study investigates the design of a filter to enhance the overall quality of body-conducted speech. This is pursued through the application of optimal filtering which uses the body-conducted speech as the input, and the air-conducted speech as the desired signal. The optimal filter, which is generated for specific phonetic material, attempts to match the two signals by minimising their difference in a least-squares sense.
Analysis of simultaneous speech recordings with acoustic and vibration transducers in quiet and noisy conditions demonstrated the noise-immunity characteristic of body-conducted speech, and its low-pass filtering effect due to body-conduction attenuation. Experiments have also shown that the temporo-mandibular joint of a speaker is a good position at which to detect speech vibrations. Optimal filtering has been compared subjectively to high-pass filtering, a technique that is currently used in communication systems. The statistical analysis of standard intelligibility and quality tests showed that, overall, significantly better performance may be obtained with the optimal filter derived for different words/speakers, under both quiet and noisy conditions.
Furthermore, the analysis of the optimal filters suggested that the filter's performance is affected by the electronic noise-floor of the accelerometer, and is not strongly dependent on the frequency content of the speech input. Instead, the performance depends mainly on the anatomical characteristics of a speaker, and the positioning of the transducers. This led to the design of several types of fixed optimal filters, which were generated as averages across a number of speakers and/or speech materials. The results of formal listening tests revealed that an optimal filter that is generated specifically for one speaker outperforms the optimal filters designed for a number of speakers of mixed, or single gender.
University of Southampton
Papanagiotou, Kyriakos
398f60c6-5d30-4529-b1eb-4dbc1b605edc
2003
Papanagiotou, Kyriakos
398f60c6-5d30-4529-b1eb-4dbc1b605edc
Papanagiotou, Kyriakos
(2003)
Enhancement of body-conducted speech from an ear-microphone.
University of Southampton, Doctoral Thesis.
Record type:
Thesis
(Doctoral)
Abstract
This thesis is concerned with the use of optimal filtering to enhance the intelligibility and quality of body-conducted speech. In the context of a mobile communication system, body-conducted speech picked up by an ear-microphone has the advantage of being immune to background noise compared to air-conducted speech, which can be easily degraded by external noise sources. However, their intelligibility and quality differ significantly, as air-conducted speech is subjectively superior to body-conducted speech. This study investigates the design of a filter to enhance the overall quality of body-conducted speech. This is pursued through the application of optimal filtering which uses the body-conducted speech as the input, and the air-conducted speech as the desired signal. The optimal filter, which is generated for specific phonetic material, attempts to match the two signals by minimising their difference in a least-squares sense.
Analysis of simultaneous speech recordings with acoustic and vibration transducers in quiet and noisy conditions demonstrated the noise-immunity characteristic of body-conducted speech, and its low-pass filtering effect due to body-conduction attenuation. Experiments have also shown that the temporo-mandibular joint of a speaker is a good position at which to detect speech vibrations. Optimal filtering has been compared subjectively to high-pass filtering, a technique that is currently used in communication systems. The statistical analysis of standard intelligibility and quality tests showed that, overall, significantly better performance may be obtained with the optimal filter derived for different words/speakers, under both quiet and noisy conditions.
Furthermore, the analysis of the optimal filters suggested that the filter's performance is affected by the electronic noise-floor of the accelerometer, and is not strongly dependent on the frequency content of the speech input. Instead, the performance depends mainly on the anatomical characteristics of a speaker, and the positioning of the transducers. This led to the design of several types of fixed optimal filters, which were generated as averages across a number of speakers and/or speech materials. The results of formal listening tests revealed that an optimal filter that is generated specifically for one speaker outperforms the optimal filters designed for a number of speakers of mixed, or single gender.
Text
909937.pdf
- Version of Record
More information
Published date: 2003
Identifiers
Local EPrints ID: 465078
URI: http://eprints.soton.ac.uk/id/eprint/465078
PURE UUID: c914feb7-eda7-4abb-ad99-78b21e440b33
Catalogue record
Date deposited: 05 Jul 2022 00:22
Last modified: 16 Mar 2024 19:56
Export record
Contributors
Author:
Kyriakos Papanagiotou
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics