3D audio-visual indoor scene reconstruction and semantics completion for virtual reality from a single 360◦ RGB-D image

We introduce a new approach for constructing immersive virtual spaces by generating comprehensive 3D voxelised models that encompass both geometric and semantic scene representations from a single 360° RGB-D input. The proposed approach utilises a deep convolutional neural network for semantic scene completion (SSC), allowing the estimation of complete semantics and geometries of the scene. We design MDBNet a dual head model that simultaneously processes RGB and depth data using a perspective camera. Depth information is encoded using a flipped transcribed signed distance function (F-TSDF), capturing essential geometric shape characteristics. We extend the inference capabilities of MDBNet on RGB-D input of the perspective camera to accommodate 360° RGB-D by proposing MDBNet360. We employ RGB spherical-to-cubic projection and 3D rotation for depth point clouds, allowing for virtual reality (VR) space design with comprehensive spatial coverage. To our knowledge, this is the first work to extend a pre-trained SSC model, originally using perspective camera RGB-D input, to infer a 3D model from 360º RGB-D input. To assess acoustic properties, we measure parameters such as early decay time (EDT) and reverberation time (RT60) using the exponential sine sweep method (ESS). We used Unity with the Steam Audio plug-in for conducting simulations in virtual space. The proposed framework demonstrates better virtual space reconstruction and immersive sound generation, advancing semantically rich and spatially accurate virtual environments compared to the state-of-the-art (SOTA). Code and rendered sounds are available on GitHub: https://github.com/MonaIA1/Repo360.

10.1007/s10055-026-01312-7

Alawadh, Mona

60613079-426e-425a-81d3-09a6fbb7a92c

Alinaghi, Atiyeh

69c051f1-9b47-4c47-b9e3-52fa3079f9a3

Niranjan, Mahesan

5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Kim, Hansung

2c7c135c-f00b-4409-acb2-85b3a9e8225f

9 February 2026

Alawadh, Mona

60613079-426e-425a-81d3-09a6fbb7a92c

Alinaghi, Atiyeh

69c051f1-9b47-4c47-b9e3-52fa3079f9a3

Niranjan, Mahesan

5cbaeea8-7288-4b55-a89c-c43d212ddd4f

Kim, Hansung

2c7c135c-f00b-4409-acb2-85b3a9e8225f

Alawadh, Mona, Alinaghi, Atiyeh, Niranjan, Mahesan and Kim, Hansung (2026) 3D audio-visual indoor scene reconstruction and semantics completion for virtual reality from a single 360◦ RGB-D image. Virtual Reality, 30, [55]. (doi:10.1007/s10055-026-01312-7).

Record type: Article

Abstract

Text

Virtual_Reality_jornal_MDBNet360-Accepted - Accepted Manuscript

Available under License Creative Commons Attribution.

Download (45MB)

Text

s10055-026-01312-7 - Version of Record

Available under License Creative Commons Attribution.

Download (8MB)

More information

Accepted/In Press date: 5 January 2026

e-pub ahead of print date: 6 February 2026

Published date: 9 February 2026

Learn more about the Vision, Learning and Control Learn more about the Institute for Life Sciences Learn more about the School of Electronics and Computer Science Learn more about the Institute for Life Sciences

Identifiers

Local EPrints ID: 509693

URI: http://eprints.soton.ac.uk/id/eprint/509693

DOI: doi:10.1007/s10055-026-01312-7

PURE UUID: a69db72b-58b5-4618-988a-3b1366cb390f

ORCID for Mona Alawadh:

orcid.org/0000-0001-5354-7681

ORCID for Mahesan Niranjan:

orcid.org/0000-0001-7021-140X

ORCID for Hansung Kim:

orcid.org/0000-0003-4907-0491

Catalogue record

Date deposited: 02 Mar 2026 17:58

Last modified: 07 Mar 2026 04:03

Export record

Altmetrics

Share this record

Share this on Facebook Share this on Twitter Share this on Weibo

Contributors

Author: Mona Alawadh

Author: Atiyeh Alinaghi

Author: Mahesan Niranjan

Author: Hansung Kim

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Library staff additional information