Temporally consistent 3D human pose estimation using dual 360° cameras
Temporally consistent 3D human pose estimation using dual 360° cameras
This paper presents a 3D human pose estimation system that uses a stereo pair of 360° sensors to capture the complete scene from a single location. The approach combines the advantages of omnidirectional capture, the accuracy of multiple view 3D pose estimation and the portability of monocular acquisition. Joint monocular belief maps for joint locations are estimated from 360° images and are used to fit a 3D skeleton to each frame. Temporal data association and smoothing is performed to produce accurate 3D pose estimates throughout the sequence. We evaluate our system on the Panoptic Studio dataset, as well as real 360° video for tracking multiple people, demonstrating an average Mean Per Joint Position Error of 12.47cm with 30cm baseline cameras. We also demonstrate improved capabilities over perspective and 360° multi-view systems when presented with limited camera views of the subject.
81-90
Shere, M.
4c639f4e-704c-4e4f-ab97-8e3fd6eafaad
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Hilton, Adrian
12782a55-4c4d-4dfb-a690-62505f6665db
5 January 2021
Shere, M.
4c639f4e-704c-4e4f-ab97-8e3fd6eafaad
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Hilton, Adrian
12782a55-4c4d-4dfb-a690-62505f6665db
Shere, M., Kim, Hansung and Hilton, Adrian
(2021)
Temporally consistent 3D human pose estimation using dual 360° cameras.
Winter Conference on Applications of Computer Vision 2021.
05 - 09 Jan 2021.
.
(doi:10.1109/WACV48630.2021.00013).
Record type:
Conference or Workshop Item
(Paper)
Abstract
This paper presents a 3D human pose estimation system that uses a stereo pair of 360° sensors to capture the complete scene from a single location. The approach combines the advantages of omnidirectional capture, the accuracy of multiple view 3D pose estimation and the portability of monocular acquisition. Joint monocular belief maps for joint locations are estimated from 360° images and are used to fit a 3D skeleton to each frame. Temporal data association and smoothing is performed to produce accurate 3D pose estimates throughout the sequence. We evaluate our system on the Panoptic Studio dataset, as well as real 360° video for tracking multiple people, demonstrating an average Mean Per Joint Position Error of 12.47cm with 30cm baseline cameras. We also demonstrate improved capabilities over perspective and 360° multi-view systems when presented with limited camera views of the subject.
This record has no associated files available for download.
More information
Accepted/In Press date: 2 November 2020
Published date: 5 January 2021
Additional Information:
Funding Information:
This work is supported by both the EPSRC (grant number EP/N 509383/1) and BBC Research and Development. The authors would like to thank the reviewers for their constructive comments.
Publisher Copyright:
© 2021 IEEE.
Venue - Dates:
Winter Conference on Applications of Computer Vision 2021, 2021-01-05 - 2021-01-09
Identifiers
Local EPrints ID: 445074
URI: http://eprints.soton.ac.uk/id/eprint/445074
PURE UUID: 30cea83b-16cb-4110-a31d-32692a704913
Catalogue record
Date deposited: 19 Nov 2020 17:30
Last modified: 17 Mar 2024 04:01
Export record
Altmetrics
Contributors
Author:
M. Shere
Author:
Hansung Kim
Author:
Adrian Hilton
Download statistics
Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.
View more statistics