Biologically inspired modelling of binaural localisation
Biologically inspired modelling of binaural localisation
Accurate sound localisation depends on the brain’s ability to extract and integrate spatial cues across frequency. This thesis develops a biologically inspired computational model that predicts human sound localisation performance, evaluates it against listener behaviour, and extends the work through perceptual experiments and morphological analysis to support personalisation.
The model employs excitation-inhibition (EI) patterns derived from the binaural processing model of Breebaart et al. (2001a), originally developed to predict binaural unmasking. In that framework, left- and right-ear signals are compared through EI interactions across a grid of characteristic interaural time difference (ITD) and interaural level difference (ILD), producing an internal activity pattern whose minimum encodes the spatial properties of the stimulus. The present work repurposes these patterns as interpretable spatial features for a tonotopically organised multi-layer perceptron (MLP) that reflects cochlear-to-cortical frequency mapping. The model is used to probe the frequency-dependent contributions of different binaural cues, reproducing the known human reliance on ITDs at low frequencies and ILDs at high frequencies, and revealing that high-frequency cues help resolve
front-back confusions (FBCs).
To establish controlled human benchmarks for localisation under reverberation and to lay the groundwork for future recursive modelling of the precedence effect and dynamic cues such as head movement, a behavioural experiment was conducted using a standardised burst-noise stimulus across three acoustically matched environments: anechoic, short-reverberation, and long-reverberation. This was motivated by conflicting findings in the existing sound localisation literature on the effects of reverberation, largely due to differences in stimuli and response methods across studies. Localisation of burst noise remained robust across rooms, and head movements improved accuracy and largely eliminated front-back confusions. Although the recursive model was not realised within this thesis, these benchmarks remain a valuable resource for future development.
The proposed localisation model’s finding that high-frequency spectral cues resolve FBCs suggests that EI patterns encode listener-specific spectral information shaped by ear anatomy, offering a natural basis for comparing ears across individuals. Through a collaboration with Yamaha, the thesis demonstrates that EI patterns provide a useful feature space for analysing ear-pair similarity, revealing clustering based on interaural spectral disparity and spectral symmetry groups, which are linked to concha morphology. A virtual localisation experiment confirmed that elevation accuracy was reduced when listeners used non-individualised head-related transfer functions (HRTFs) from ears with conchae larger than their own. Together, these contributions advance interpretable localisation modelling, robust human performance benchmarks, and listener-specific personalisation of spatial audio.
University of Southampton
Wang, Hsuan-Yang
77ff593b-fbf9-4d92-8f42-03ed0b792531
May 2026
Wang, Hsuan-Yang
77ff593b-fbf9-4d92-8f42-03ed0b792531
Evers, Christine
93090c84-e984-4cc3-9363-fbf3f3639c4b
Nelson, Philip
5c6f5cc9-ea52-4fe2-9edf-05d696b0c1a9
Wang, Hsuan-Yang
(2026)
Biologically inspired modelling of binaural localisation.
University of Southampton, Doctoral Thesis, 220pp.
Record type:
Thesis
(Doctoral)
Abstract
Accurate sound localisation depends on the brain’s ability to extract and integrate spatial cues across frequency. This thesis develops a biologically inspired computational model that predicts human sound localisation performance, evaluates it against listener behaviour, and extends the work through perceptual experiments and morphological analysis to support personalisation.
The model employs excitation-inhibition (EI) patterns derived from the binaural processing model of Breebaart et al. (2001a), originally developed to predict binaural unmasking. In that framework, left- and right-ear signals are compared through EI interactions across a grid of characteristic interaural time difference (ITD) and interaural level difference (ILD), producing an internal activity pattern whose minimum encodes the spatial properties of the stimulus. The present work repurposes these patterns as interpretable spatial features for a tonotopically organised multi-layer perceptron (MLP) that reflects cochlear-to-cortical frequency mapping. The model is used to probe the frequency-dependent contributions of different binaural cues, reproducing the known human reliance on ITDs at low frequencies and ILDs at high frequencies, and revealing that high-frequency cues help resolve
front-back confusions (FBCs).
To establish controlled human benchmarks for localisation under reverberation and to lay the groundwork for future recursive modelling of the precedence effect and dynamic cues such as head movement, a behavioural experiment was conducted using a standardised burst-noise stimulus across three acoustically matched environments: anechoic, short-reverberation, and long-reverberation. This was motivated by conflicting findings in the existing sound localisation literature on the effects of reverberation, largely due to differences in stimuli and response methods across studies. Localisation of burst noise remained robust across rooms, and head movements improved accuracy and largely eliminated front-back confusions. Although the recursive model was not realised within this thesis, these benchmarks remain a valuable resource for future development.
The proposed localisation model’s finding that high-frequency spectral cues resolve FBCs suggests that EI patterns encode listener-specific spectral information shaped by ear anatomy, offering a natural basis for comparing ears across individuals. Through a collaboration with Yamaha, the thesis demonstrates that EI patterns provide a useful feature space for analysing ear-pair similarity, revealing clustering based on interaural spectral disparity and spectral symmetry groups, which are linked to concha morphology. A virtual localisation experiment confirmed that elevation accuracy was reduced when listeners used non-individualised head-related transfer functions (HRTFs) from ears with conchae larger than their own. Together, these contributions advance interpretable localisation modelling, robust human performance benchmarks, and listener-specific personalisation of spatial audio.
Text
HYW_PhD_Thesis_Final
- Version of Record
Restricted to Repository staff only until 17 November 2026.
Text
Final-thesis-submission-Examination-Mr-Hsuan-Yang-Wang
Restricted to Repository staff only
More information
Published date: May 2026
Identifiers
Local EPrints ID: 511570
URI: http://eprints.soton.ac.uk/id/eprint/511570
PURE UUID: 2d61d27b-a0a8-47a7-b427-32d4bfb9c55b
Catalogue record
Date deposited: 20 May 2026 17:07
Last modified: 21 May 2026 01:59
Export record
Altmetrics
Contributors
Author:
Hsuan-Yang Wang
Thesis advisor:
Christine Evers
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics