Human-centric scene understanding from single view 360 video
Human-centric scene understanding from single view 360 video
In this paper, we propose an approach to indoor scene understanding from observation of people in single view spherical video. As input, our approach takes a centrally located spherical video capture of an indoor scene, estimating the 3D localisation of human actions performed throughout the long term capture. The central contribution of this work is a deep convolutional encoder-decoder network trained on a synthetic dataset to reconstruct regions of affordance from captured human activity. The predicted affordance segmentation is then applied to compose a reconstruction of the complete 3D scene, integrating the affordance segmentation into 3D space. The mapping learnt between human activity and affordance segmentation demonstrates that omnidirectional observation of human activity can be applied to scene understanding tasks such as 3D reconstruction. We show that our approach using only observation of people performs well against previous approaches, allowing reconstruction of occluded regions and labelling of scene affordances. © 2018 IEEE.
3D reconstruction, Affordances, Convolutional encoders, Human activities, Human-centric, Observation of human activities, Scene interactions, Scene understanding, Image reconstruction
334-342
Fowler, S.
a4d9db52-325d-44ba-84e9-64db8cbc029e
Kim, H.
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Hilton, A.
12782a55-4c4d-4dfb-a690-62505f6665db
2018
Fowler, S.
a4d9db52-325d-44ba-84e9-64db8cbc029e
Kim, H.
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Hilton, A.
12782a55-4c4d-4dfb-a690-62505f6665db
Fowler, S., Kim, H. and Hilton, A.
(2018)
Human-centric scene understanding from single view 360 video.
In 10.1109/3DV.2018.00046.
IEEE.
.
(doi:10.1109/3DV.2018.00046).
Record type:
Conference or Workshop Item
(Paper)
Abstract
In this paper, we propose an approach to indoor scene understanding from observation of people in single view spherical video. As input, our approach takes a centrally located spherical video capture of an indoor scene, estimating the 3D localisation of human actions performed throughout the long term capture. The central contribution of this work is a deep convolutional encoder-decoder network trained on a synthetic dataset to reconstruct regions of affordance from captured human activity. The predicted affordance segmentation is then applied to compose a reconstruction of the complete 3D scene, integrating the affordance segmentation into 3D space. The mapping learnt between human activity and affordance segmentation demonstrates that omnidirectional observation of human activity can be applied to scene understanding tasks such as 3D reconstruction. We show that our approach using only observation of people performs well against previous approaches, allowing reconstruction of occluded regions and labelling of scene affordances. © 2018 IEEE.
This record has no associated files available for download.
More information
Published date: 2018
Additional Information:
cited By 0
Venue - Dates:
International Conference on 3D Vision, 2018-10-19
Keywords:
3D reconstruction, Affordances, Convolutional encoders, Human activities, Human-centric, Observation of human activities, Scene interactions, Scene understanding, Image reconstruction
Identifiers
Local EPrints ID: 440627
URI: http://eprints.soton.ac.uk/id/eprint/440627
PURE UUID: d376db84-1c4a-4358-87f0-935dd5893c54
Catalogue record
Date deposited: 12 May 2020 16:46
Last modified: 17 Mar 2024 04:01
Export record
Altmetrics
Contributors
Author:
S. Fowler
Author:
H. Kim
Author:
A. Hilton
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics