3D audio-visual indoor scene reconstruction and semantics completion for virtual reality from a single 360◦ RGB-D image
3D audio-visual indoor scene reconstruction and semantics completion for virtual reality from a single 360◦ RGB-D image
We introduce a new approach for constructing immersive virtual spaces by generating comprehensive 3D voxelised models that encompass both geometric and semantic scene representations from a single 360° RGB-D input. The proposed approach utilises a deep convolutional neural network for semantic scene completion (SSC), allowing the estimation of complete semantics and geometries of the scene. We design MDBNet a dual head model that simultaneously processes RGB and depth data using a perspective camera. Depth information is encoded using a flipped transcribed signed distance function (F-TSDF), capturing essential geometric shape characteristics. We extend the inference capabilities of MDBNet on RGB-D input of the perspective camera to accommodate 360° RGB-D by proposing MDBNet360. We employ RGB spherical-to-cubic projection and 3D rotation for depth point clouds, allowing for virtual reality (VR) space design with comprehensive spatial coverage. To our knowledge, this is the first work to extend a pre-trained SSC model, originally using perspective camera RGB-D input, to infer a 3D model from 360º RGB-D input. To assess acoustic properties, we measure parameters such as early decay time (EDT) and reverberation time (RT60) using the exponential sine sweep method (ESS). We used Unity with the Steam Audio plug-in for conducting simulations in virtual space. The proposed framework demonstrates better virtual space reconstruction and immersive sound generation, advancing semantically rich and spatially accurate virtual environments compared to the state-of-the-art (SOTA). Code and rendered sounds are available on GitHub: https://github.com/MonaIA1/Repo360.
Alawadh, Mona
60613079-426e-425a-81d3-09a6fbb7a92c
Alinaghi, Atiyeh
69c051f1-9b47-4c47-b9e3-52fa3079f9a3
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
9 February 2026
Alawadh, Mona
60613079-426e-425a-81d3-09a6fbb7a92c
Alinaghi, Atiyeh
69c051f1-9b47-4c47-b9e3-52fa3079f9a3
Niranjan, Mahesan
5cbaeea8-7288-4b55-a89c-c43d212ddd4f
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Alawadh, Mona, Alinaghi, Atiyeh, Niranjan, Mahesan and Kim, Hansung
(2026)
3D audio-visual indoor scene reconstruction and semantics completion for virtual reality from a single 360◦ RGB-D image.
Virtual Reality, 30, [55].
(doi:10.1007/s10055-026-01312-7).
Abstract
We introduce a new approach for constructing immersive virtual spaces by generating comprehensive 3D voxelised models that encompass both geometric and semantic scene representations from a single 360° RGB-D input. The proposed approach utilises a deep convolutional neural network for semantic scene completion (SSC), allowing the estimation of complete semantics and geometries of the scene. We design MDBNet a dual head model that simultaneously processes RGB and depth data using a perspective camera. Depth information is encoded using a flipped transcribed signed distance function (F-TSDF), capturing essential geometric shape characteristics. We extend the inference capabilities of MDBNet on RGB-D input of the perspective camera to accommodate 360° RGB-D by proposing MDBNet360. We employ RGB spherical-to-cubic projection and 3D rotation for depth point clouds, allowing for virtual reality (VR) space design with comprehensive spatial coverage. To our knowledge, this is the first work to extend a pre-trained SSC model, originally using perspective camera RGB-D input, to infer a 3D model from 360º RGB-D input. To assess acoustic properties, we measure parameters such as early decay time (EDT) and reverberation time (RT60) using the exponential sine sweep method (ESS). We used Unity with the Steam Audio plug-in for conducting simulations in virtual space. The proposed framework demonstrates better virtual space reconstruction and immersive sound generation, advancing semantically rich and spatially accurate virtual environments compared to the state-of-the-art (SOTA). Code and rendered sounds are available on GitHub: https://github.com/MonaIA1/Repo360.
Text
Virtual_Reality_jornal_MDBNet360-Accepted
- Accepted Manuscript
Text
s10055-026-01312-7
- Version of Record
More information
Accepted/In Press date: 5 January 2026
e-pub ahead of print date: 6 February 2026
Published date: 9 February 2026
Identifiers
Local EPrints ID: 509693
URI: http://eprints.soton.ac.uk/id/eprint/509693
PURE UUID: a69db72b-58b5-4618-988a-3b1366cb390f
Catalogue record
Date deposited: 02 Mar 2026 17:58
Last modified: 07 Mar 2026 04:03
Export record
Altmetrics
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics