Cochlea modelling and its application to speech processing

Models of the cochlea provide a valuable tool for both better understanding its mechanics and also as an inspiration for many speech processing algorithms. Realistic modelling of the cochlea can be computationally demanding, however, which limits its applicability in signal processing applications. To mitigate this issue, an eﬃcient numerical method has been proposed for performing time domain simulations, based on a nonlinear state space formulation [1]. This model has then been contrasted with another type of cochlear model, that is established from a cascade of digital ﬁlters. A comparison of the responses from these two models has been conducted, in terms of their realism in simulating the measured nonlinear cochlear response to single tones and pairs of tones. Guided by these results, the ﬁlter cascade model is chosen for subsequent signal processing applications because it is signiﬁcantly more eﬃcient than the state space model, while still producing realistic responses.

Using this nonlinear ﬁlter cascade model as a front-end, two speech processing tasks have been investigated: voice activity detection and supervised speech separation. Both tasks are tackled within a machine learning framework, in which a neural network is trained to reproduce target outputs. The results are compared with those using a number of other simpler auditory-inspired analysis methods. Simulation results show that although the nonlinear ﬁlter cascade model can be more eﬀective in many testing scenarios, its relative advantage against other analysis methods is small. The incorporation of temporal context information and network structure engineering are found to be more important in improving the performance of these tasks. Once a suitable context expansion strategy has been selected, the diﬀerence between various front-end processing methods considered is marginal.

University of Southampton

Pan, Shuokai

c132dd25-649c-484f-a905-5cc6156477d0

June 2018

Pan, Shuokai

c132dd25-649c-484f-a905-5cc6156477d0

Elliott, Stephen

721dc55c-8c3e-4895-b9c4-82f62abd3567

Pan, Shuokai (2018) Cochlea modelling and its application to speech processing. University of Southampton, Doctoral Thesis, 211pp.

Record type: Thesis (Doctoral)

Abstract

Text

Shuokai PAN Final e-thesis - Version of Record

Available under License University of Southampton Thesis Licence.

Download (38MB)

More information

Published date: June 2018

Identifiers

Local EPrints ID: 427156

URI: http://eprints.soton.ac.uk/id/eprint/427156

PURE UUID: 772476e9-39a2-4712-bd3e-9c954cd9f36f

Catalogue record

Date deposited: 03 Jan 2019 17:30

Last modified: 11 Jun 2025 23:44

Export record

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Shuokai Pan

Thesis advisor: Stephen Elliott

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information