Predicting binaural colouration using VGGish embeddings
Predicting binaural colouration using VGGish embeddings
An initial feasibility study is presented exploring the use of a pre-trained feature extractor designed for large-scale audio classification applied to the task of predicting colouration between binaural signals. A multilayer perceptron (MLP) is trained to predict binaural colouration using feature embeddings obtained from the VGGish network and data from five previously conducted listening tests. The evaluation compares seven versions of the network, each trained using different data augmentation methods, along with three existing signal processing methods for predicting binaural colouration: basic spectral difference (BSD), log. spectral distance (LSD) and an auditory model for predicting binaural colouration (PBC-2). Results show that while the MLP networks are comparable to BSD and LSD, specific features relevant for colouration may be needed to compete against the more complex PBC-2.
Audio Engineering Society
McKenzie, Thomas
78185deb-7cc1-4674-aa21-934417cc60d7
Wright, Alec
36063552-f6b0-4027-be25-8bc12772f13c
Turner, Daniel
c33539e8-5e0b-49f3-8edc-84883848a515
Llado, Pedro
d7ad1f10-ecab-4e66-927a-504d15dff0d9
2 September 2025
McKenzie, Thomas
78185deb-7cc1-4674-aa21-934417cc60d7
Wright, Alec
36063552-f6b0-4027-be25-8bc12772f13c
Turner, Daniel
c33539e8-5e0b-49f3-8edc-84883848a515
Llado, Pedro
d7ad1f10-ecab-4e66-927a-504d15dff0d9
McKenzie, Thomas, Wright, Alec, Turner, Daniel and Llado, Pedro
(2025)
Predicting binaural colouration using VGGish embeddings.
In AES International Conference on Machine Learning and Artifical Intelligence for Audio.
Audio Engineering Society.
10 pp
.
Record type:
Conference or Workshop Item
(Paper)
Abstract
An initial feasibility study is presented exploring the use of a pre-trained feature extractor designed for large-scale audio classification applied to the task of predicting colouration between binaural signals. A multilayer perceptron (MLP) is trained to predict binaural colouration using feature embeddings obtained from the VGGish network and data from five previously conducted listening tests. The evaluation compares seven versions of the network, each trained using different data augmentation methods, along with three existing signal processing methods for predicting binaural colouration: basic spectral difference (BSD), log. spectral distance (LSD) and an auditory model for predicting binaural colouration (PBC-2). Results show that while the MLP networks are comparable to BSD and LSD, specific features relevant for colouration may be needed to compete against the more complex PBC-2.
Text
Predicting_binaural_colouration_using_VGGish_embeddings__doc
- Version of Record
More information
Published date: 2 September 2025
Identifiers
Local EPrints ID: 506199
URI: http://eprints.soton.ac.uk/id/eprint/506199
PURE UUID: 690a3ec5-8ce0-42a9-b4ad-9db72778d584
Catalogue record
Date deposited: 30 Oct 2025 17:34
Last modified: 31 Oct 2025 03:08
Export record
Contributors
Author:
Thomas McKenzie
Author:
Alec Wright
Author:
Daniel Turner
Author:
Pedro Llado
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics