The University of Southampton
University of Southampton Institutional Repository

An audio-visual system for object-based audio: from recording to listening

An audio-visual system for object-based audio: from recording to listening
An audio-visual system for object-based audio: from recording to listening
Object-based audio is an emerging representation for audio content, where content is represented in a reproduction-format-agnostic way and thus produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This article introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audio-visual interfaces to support object-based capture and listener-tracked rendering, and incorporates a proposed component for objectification, i.e., recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system's capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group), is evaluated with perceptually-motivated objective and subjective experiments. These experiments demonstrate that the novel components of the system add capabilities beyond the state of the art. Finally, we discuss challenges and future perspectives for object-based audio workflows.
1520-9210
1919-1931
Coleman, Philip
6105e2da-4b1f-4f85-97a5-817020eaa2dd
Franck, Andreas
fa179b73-6a83-4c42-b300-81f1dfe9ef6d
Francombe, Jon
82a90920-a558-4064-ac52-4c4d696f8f68
Liu, Qingju
9c3579c5-8fa0-402b-ad8e-83f4542b1b05
de Campos, Teofilo
6b6b4054-afdf-484d-b3dd-5f6ccbd082bd
Hughes, Richard J
2a31c35a-7961-41a5-a980-bc37c679a14e
Menzies, Dylan
0cc76abc-8a10-4b7f-96e5-56eceb0b2c5c
Simon Galvez, Marcos
777da25f-86fc-4a22-8ff1-ac2cbbfe27ae
Tang, Yan
e1d9ef8e-9e0a-49d3-b0a8-e572f8298c47
Woodcock, James
2ca2e0cd-24b7-4039-a1f2-bcafa1fef80f
Jackson, Philip J.B.
c658b148-ce3e-418d-ba12-9e970c9563dc
Melchior, Frank
e9de06c1-5123-4100-b8ae-87290b6dcff0
Pike, Chris R.
a04aa087-b7b8-483b-9c2d-36c748a8c2c3
Fazi, Filippo
e5aefc08-ab45-47c1-ad69-c3f12d07d807
Cox, Trevor J.
13f915fb-1615-4913-882b-b8bfb8c6f78a
Hilton, Adrian
3165731a-f405-43ca-b3f8-ee8becce7945
Coleman, Philip
6105e2da-4b1f-4f85-97a5-817020eaa2dd
Franck, Andreas
fa179b73-6a83-4c42-b300-81f1dfe9ef6d
Francombe, Jon
82a90920-a558-4064-ac52-4c4d696f8f68
Liu, Qingju
9c3579c5-8fa0-402b-ad8e-83f4542b1b05
de Campos, Teofilo
6b6b4054-afdf-484d-b3dd-5f6ccbd082bd
Hughes, Richard J
2a31c35a-7961-41a5-a980-bc37c679a14e
Menzies, Dylan
0cc76abc-8a10-4b7f-96e5-56eceb0b2c5c
Simon Galvez, Marcos
777da25f-86fc-4a22-8ff1-ac2cbbfe27ae
Tang, Yan
e1d9ef8e-9e0a-49d3-b0a8-e572f8298c47
Woodcock, James
2ca2e0cd-24b7-4039-a1f2-bcafa1fef80f
Jackson, Philip J.B.
c658b148-ce3e-418d-ba12-9e970c9563dc
Melchior, Frank
e9de06c1-5123-4100-b8ae-87290b6dcff0
Pike, Chris R.
a04aa087-b7b8-483b-9c2d-36c748a8c2c3
Fazi, Filippo
e5aefc08-ab45-47c1-ad69-c3f12d07d807
Cox, Trevor J.
13f915fb-1615-4913-882b-b8bfb8c6f78a
Hilton, Adrian
3165731a-f405-43ca-b3f8-ee8becce7945

Coleman, Philip, Franck, Andreas, Francombe, Jon, Liu, Qingju, de Campos, Teofilo, Hughes, Richard J, Menzies, Dylan, Simon Galvez, Marcos, Tang, Yan, Woodcock, James, Jackson, Philip J.B., Melchior, Frank, Pike, Chris R., Fazi, Filippo, Cox, Trevor J. and Hilton, Adrian (2018) An audio-visual system for object-based audio: from recording to listening. IEEE Transactions on Multimedia, 20 (8), 1919-1931. (doi:10.1109/TMM.2018.2794780).

Record type: Article

Abstract

Object-based audio is an emerging representation for audio content, where content is represented in a reproduction-format-agnostic way and thus produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This article introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audio-visual interfaces to support object-based capture and listener-tracked rendering, and incorporates a proposed component for objectification, i.e., recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system's capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group), is evaluated with perceptually-motivated objective and subjective experiments. These experiments demonstrate that the novel components of the system add capabilities beyond the state of the art. Finally, we discuss challenges and future perspectives for object-based audio workflows.

Text
Coleman_et_al_An Audio-Visual System for Object-Based Audio - Accepted Manuscript
Download (3MB)

More information

Accepted/In Press date: 15 January 2018
e-pub ahead of print date: 17 January 2018
Published date: August 2018

Identifiers

Local EPrints ID: 417096
URI: http://eprints.soton.ac.uk/id/eprint/417096
ISSN: 1520-9210
PURE UUID: 42332a8a-b4b3-4ca4-a017-e15e7f44fb10
ORCID for Andreas Franck: ORCID iD orcid.org/0000-0002-4707-6710
ORCID for Dylan Menzies: ORCID iD orcid.org/0000-0003-1475-8798
ORCID for Filippo Fazi: ORCID iD orcid.org/0000-0003-4129-1433

Catalogue record

Date deposited: 19 Jan 2018 17:30
Last modified: 13 Jun 2024 01:46

Export record

Altmetrics

Contributors

Author: Philip Coleman
Author: Andreas Franck ORCID iD
Author: Jon Francombe
Author: Qingju Liu
Author: Teofilo de Campos
Author: Richard J Hughes
Author: Dylan Menzies ORCID iD
Author: Yan Tang
Author: James Woodcock
Author: Philip J.B. Jackson
Author: Frank Melchior
Author: Chris R. Pike
Author: Filippo Fazi ORCID iD
Author: Trevor J. Cox
Author: Adrian Hilton

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×