The University of Southampton
University of Southampton Institutional Repository

Predicting binaural colouration using VGGish embeddings

Predicting binaural colouration using VGGish embeddings
Predicting binaural colouration using VGGish embeddings
An initial feasibility study is presented exploring the use of a pre-trained feature extractor designed for large-scale audio classification applied to the task of predicting colouration between binaural signals. A multilayer perceptron (MLP) is trained to predict binaural colouration using feature embeddings obtained from the VGGish network and data from five previously conducted listening tests. The evaluation compares seven versions of the network, each trained using different data augmentation methods, along with three existing signal processing methods for predicting binaural colouration: basic spectral difference (BSD), log. spectral distance (LSD) and an auditory model for predicting binaural colouration (PBC-2). Results show that while the MLP networks are comparable to BSD and LSD, specific features relevant for colouration may be needed to compete against the more complex PBC-2.
Audio Engineering Society
McKenzie, Thomas
78185deb-7cc1-4674-aa21-934417cc60d7
Wright, Alec
36063552-f6b0-4027-be25-8bc12772f13c
Turner, Daniel
c33539e8-5e0b-49f3-8edc-84883848a515
Llado, Pedro
d7ad1f10-ecab-4e66-927a-504d15dff0d9
McKenzie, Thomas
78185deb-7cc1-4674-aa21-934417cc60d7
Wright, Alec
36063552-f6b0-4027-be25-8bc12772f13c
Turner, Daniel
c33539e8-5e0b-49f3-8edc-84883848a515
Llado, Pedro
d7ad1f10-ecab-4e66-927a-504d15dff0d9

McKenzie, Thomas, Wright, Alec, Turner, Daniel and Llado, Pedro (2025) Predicting binaural colouration using VGGish embeddings. In AES International Conference on Machine Learning and Artifical Intelligence for Audio. Audio Engineering Society. 10 pp .

Record type: Conference or Workshop Item (Paper)

Abstract

An initial feasibility study is presented exploring the use of a pre-trained feature extractor designed for large-scale audio classification applied to the task of predicting colouration between binaural signals. A multilayer perceptron (MLP) is trained to predict binaural colouration using feature embeddings obtained from the VGGish network and data from five previously conducted listening tests. The evaluation compares seven versions of the network, each trained using different data augmentation methods, along with three existing signal processing methods for predicting binaural colouration: basic spectral difference (BSD), log. spectral distance (LSD) and an auditory model for predicting binaural colouration (PBC-2). Results show that while the MLP networks are comparable to BSD and LSD, specific features relevant for colouration may be needed to compete against the more complex PBC-2.

Text
Predicting_binaural_colouration_using_VGGish_embeddings__doc - Version of Record
Download (260kB)

More information

Published date: 2 September 2025

Identifiers

Local EPrints ID: 506199
URI: http://eprints.soton.ac.uk/id/eprint/506199
PURE UUID: 690a3ec5-8ce0-42a9-b4ad-9db72778d584
ORCID for Daniel Turner: ORCID iD orcid.org/0000-0002-8542-9302

Catalogue record

Date deposited: 30 Oct 2025 17:34
Last modified: 31 Oct 2025 03:08

Export record

Contributors

Author: Thomas McKenzie
Author: Alec Wright
Author: Daniel Turner ORCID iD
Author: Pedro Llado

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×