The University of Southampton
University of Southampton Institutional Repository

Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras

Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras
Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras
As personalised immersive display systems have been intensely explored in virtual reality (VR), plausible 3D audio corresponding to the visual content is required to provide more realistic experiences to users. It is well known that spatial audio synchronised with visual information improves a sense of immersion but limited research progress has been achieved in immersive audio-visual content production and reproduction. In this paper, we propose an end-to-end pipeline to simultaneously reconstruct 3D geometry and acoustic properties of the environment from a pair of omnidirectional panoramic images. A semantic scene reconstruction and completion method using a deep convolutional neural network is proposed to estimate the complete semantic scene geometry in order to adapt spatial audio reproduction to the scene. Experiments provide objective and subjective evaluations of the proposed pipeline for plausible audio-visual VR reproduction of real scenes.
3D reconstruction and completion, Audio-visual scene reproduction, Scene understanding, Spatial audio
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Remaggi, Luca
c74406cb-15d2-4575-b086-97b55421649e
Dourado, A.
f76b78bc-6f5c-4ed5-b44f-eec7a41dd6bf
de Campos, Teofilo
e6f409e4-b320-45d3-b56d-bb6b9995babb
Jackson, Philip J. B.
01e45068-e098-486c-85ad-4333f4f0a33f
Hilton, Adrian
12782a55-4c4d-4dfb-a690-62505f6665db
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Remaggi, Luca
c74406cb-15d2-4575-b086-97b55421649e
Dourado, A.
f76b78bc-6f5c-4ed5-b44f-eec7a41dd6bf
de Campos, Teofilo
e6f409e4-b320-45d3-b56d-bb6b9995babb
Jackson, Philip J. B.
01e45068-e098-486c-85ad-4333f4f0a33f
Hilton, Adrian
12782a55-4c4d-4dfb-a690-62505f6665db

Kim, Hansung, Remaggi, Luca, Dourado, A., de Campos, Teofilo, Jackson, Philip J. B. and Hilton, Adrian (2021) Immersive audio-visual scene reproduction using semantic scene reconstruction from 360 cameras. Virtual Reality. (doi:10.1007/s10055-021-00594-3).

Record type: Article

Abstract

As personalised immersive display systems have been intensely explored in virtual reality (VR), plausible 3D audio corresponding to the visual content is required to provide more realistic experiences to users. It is well known that spatial audio synchronised with visual information improves a sense of immersion but limited research progress has been achieved in immersive audio-visual content production and reproduction. In this paper, we propose an end-to-end pipeline to simultaneously reconstruct 3D geometry and acoustic properties of the environment from a pair of omnidirectional panoramic images. A semantic scene reconstruction and completion method using a deep convolutional neural network is proposed to estimate the complete semantic scene geometry in order to adapt spatial audio reproduction to the scene. Experiments provide objective and subjective evaluations of the proposed pipeline for plausible audio-visual VR reproduction of real scenes.

Text
The_Virtual_Reality - Accepted Manuscript
Download (8MB)
Text
Kim2021_Article_ImmersiveAudio-visualSceneRepr - Version of Record
Available under License Creative Commons Attribution.
Download (2MB)

More information

Accepted/In Press date: 4 October 2021
e-pub ahead of print date: 30 October 2021
Published date: 30 October 2021
Additional Information: Funding Information: This work was supported by the UKRI EPSRC Programme Grant S3A: Future Spatial Audio for an Immersive Listener Experience at Home (EP/L000539/1) and Prosperity Partnership AI4ME: AI for Personalised Media Experiences EP/V038087/1, the BBC as part of the BBC Audio Research Partnership, and Audio-Visual Media Research Platform (EP/P022529/1). Details about the data underlying this work are available from: http://dx.doi.org/10.15126/surreydata.00812228. Publisher Copyright: © 2021, The Author(s).
Keywords: 3D reconstruction and completion, Audio-visual scene reproduction, Scene understanding, Spatial audio

Identifiers

Local EPrints ID: 451975
URI: http://eprints.soton.ac.uk/id/eprint/451975
PURE UUID: 560f688b-660b-425d-8a1f-9bc822927631
ORCID for Hansung Kim: ORCID iD orcid.org/0000-0003-4907-0491

Catalogue record

Date deposited: 05 Nov 2021 17:30
Last modified: 17 Mar 2024 04:01

Export record

Altmetrics

Contributors

Author: Hansung Kim ORCID iD
Author: Luca Remaggi
Author: A. Dourado
Author: Teofilo de Campos
Author: Philip J. B. Jackson
Author: Adrian Hilton

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×