The University of Southampton
University of Southampton Institutional Repository

Semantic scene completion from a single 360-degree image and depth map

Semantic scene completion from a single 360-degree image and depth map
Semantic scene completion from a single 360-degree image and depth map
We present a method for Semantic Scene Completion (SSC) of complete indoor scenes from a single 360◦ RGB image and corresponding depth map using a Deep Convolution Neural Network that takes advantage of existing datasets of synthetic and real RGB-D images for training. Recent works on SSC only perform occupancy prediction of small regions of the room covered by the field-of-view of the sensor in use, which implies the need of multiple images to cover the whole scene, being an inappropriate method for dynamic scenes. Our approach uses only a single 360º image with its corresponding depth map to infer the occupancy and semantic labels of the whole room. Using one single image is important to allow predictions with no previous knowledge of the scene and enable extension to dynamic scene applications. We evaluated our method on two 360º image datasets: a high-quality 360º RGB-D dataset gathered with a Matterport sensor and low-quality 360º RGB-D images generated with a pair of commercial 360º cameras and stereo matching. The experiments showed that the proposed pipeline performs SSC not only with Matterport cameras but also with more affordable 360º cameras, which adds a great number of potential applications, including immersive spatial audio reproduction, augmented reality, assistive computing and robotics.
Augmented reality, Cameras, Cell proliferation, Computation theory, Computer vision, Dynamic scenes, Facsimile, Field of views, Image datasets, Multiple image, Occupancy predictions, Semantic labels, Semantics, Convolution neural network, Stereo matching, Stereo image processing
36-46
Dourado, A.
f76b78bc-6f5c-4ed5-b44f-eec7a41dd6bf
Kim, H.
2c7c135c-f00b-4409-acb2-85b3a9e8225f
de Campos, T.
4f3a6e9e-24ab-4de8-b4fe-b4b46759a9f4
Hilton, A.
12782a55-4c4d-4dfb-a690-62505f6665db
Dourado, A.
f76b78bc-6f5c-4ed5-b44f-eec7a41dd6bf
Kim, H.
2c7c135c-f00b-4409-acb2-85b3a9e8225f
de Campos, T.
4f3a6e9e-24ab-4de8-b4fe-b4b46759a9f4
Hilton, A.
12782a55-4c4d-4dfb-a690-62505f6665db

Dourado, A., Kim, H., de Campos, T. and Hilton, A. (2020) Semantic scene completion from a single 360-degree image and depth map. International Conference on Computer Vision Theory and Applications: VISAPP 2020, , Valetta, Malta. 27 - 29 Feb 2020. pp. 36-46 .

Record type: Conference or Workshop Item (Paper)

Abstract

We present a method for Semantic Scene Completion (SSC) of complete indoor scenes from a single 360◦ RGB image and corresponding depth map using a Deep Convolution Neural Network that takes advantage of existing datasets of synthetic and real RGB-D images for training. Recent works on SSC only perform occupancy prediction of small regions of the room covered by the field-of-view of the sensor in use, which implies the need of multiple images to cover the whole scene, being an inappropriate method for dynamic scenes. Our approach uses only a single 360º image with its corresponding depth map to infer the occupancy and semantic labels of the whole room. Using one single image is important to allow predictions with no previous knowledge of the scene and enable extension to dynamic scene applications. We evaluated our method on two 360º image datasets: a high-quality 360º RGB-D dataset gathered with a Matterport sensor and low-quality 360º RGB-D images generated with a pair of commercial 360º cameras and stereo matching. The experiments showed that the proposed pipeline performs SSC not only with Matterport cameras but also with more affordable 360º cameras, which adds a great number of potential applications, including immersive spatial audio reproduction, augmented reality, assistive computing and robotics.

This record has no associated files available for download.

More information

Published date: 2020
Additional Information: Funding Information: The authors would like to thank FAPDF (fap.df.gov.br), CNPq grant PQ 314154/2018-3 (cnpq.br) and EPSRC Audio-Visual Media Platform Grant EP/P022529/1 for the financial support to this work. Mr. Dourado also would like to thank to TCU (tcu.gov.br) for supporting his PhD studies. Publisher Copyright: Copyright © 2020 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
Venue - Dates: International Conference on Computer Vision Theory and Applications: VISAPP 2020, , Valetta, Malta, 2020-02-27 - 2020-02-29
Keywords: Augmented reality, Cameras, Cell proliferation, Computation theory, Computer vision, Dynamic scenes, Facsimile, Field of views, Image datasets, Multiple image, Occupancy predictions, Semantic labels, Semantics, Convolution neural network, Stereo matching, Stereo image processing

Identifiers

Local EPrints ID: 440634
URI: http://eprints.soton.ac.uk/id/eprint/440634
PURE UUID: 6ba17f50-b0cf-4c96-9e2c-fd71e173d784
ORCID for H. Kim: ORCID iD orcid.org/0000-0003-4907-0491

Catalogue record

Date deposited: 12 May 2020 16:46
Last modified: 23 Feb 2023 03:21

Export record

Contributors

Author: A. Dourado
Author: H. Kim ORCID iD
Author: T. de Campos
Author: A. Hilton

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×