The University of Southampton
University of Southampton Institutional Repository

Optimising 2D pose representations: improving accuracy, stability and generalisability within unsupervised 2D-3D human pose estimation

Optimising 2D pose representations: improving accuracy, stability and generalisability within unsupervised 2D-3D human pose estimation
Optimising 2D pose representations: improving accuracy, stability and generalisability within unsupervised 2D-3D human pose estimation
This paper investigated pose representation within the field of unsupervised 2D-3D human pose estimation (HPE). All current unsupervised 2D-3D HPE approaches provide the entire 2D kinematic skeleton to a model during training. We argue that this is sub-optimal and disruptive as long-range correlations will be induced between independent 2D key points and predicted 3D coordinates during training. To this end, we conducted the following study. With a maximum architecture capacity of 6 residual blocks, we evaluated the performance of 7 models which each represented a 2D pose differently during the adversarial unsupervised 2D-3D HPE process. Additionally, we showed the correlations induced between 2D key points when a full pose is lifted, highlighting the unintuitive correlations learned. Our results show that the most optimal representation of a 2D pose during the lifting stage is that of two independent segments, the torso and legs, with no shared features between each lifting network. This approach decreased the average error by 20% on the Human3.6M dataset when compared to a model with a near identical parameter count trained on the entire 2D kinematic skeleton. Furthermore, due to the complex nature of adversarial learning, we showed how this representation can also improve convergence during training allowing for an optimum result to be obtained more often.
Association for Computing Machinery
Hardy, Peter
361a5d48-51cf-4eaf-9b60-1de78f2f2f20
Dasmahapatra, Srinandan
eb5fd76f-4335-4ae9-a88a-20b9e2b3f698
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Volino, Marco
Mustafa, Armin
Vangorp, Peter
Hardy, Peter
361a5d48-51cf-4eaf-9b60-1de78f2f2f20
Dasmahapatra, Srinandan
eb5fd76f-4335-4ae9-a88a-20b9e2b3f698
Kim, Hansung
2c7c135c-f00b-4409-acb2-85b3a9e8225f
Volino, Marco
Mustafa, Armin
Vangorp, Peter

Hardy, Peter, Dasmahapatra, Srinandan and Kim, Hansung (2023) Optimising 2D pose representations: improving accuracy, stability and generalisability within unsupervised 2D-3D human pose estimation. Volino, Marco, Mustafa, Armin and Vangorp, Peter (eds.) In CVMP '23: Proceedings of the 20th ACM SIGGRAPH European Conference on Visual Media Production. Association for Computing Machinery. 9 pp . (doi:10.1145/3626495.3626505).

Record type: Conference or Workshop Item (Paper)

Abstract

This paper investigated pose representation within the field of unsupervised 2D-3D human pose estimation (HPE). All current unsupervised 2D-3D HPE approaches provide the entire 2D kinematic skeleton to a model during training. We argue that this is sub-optimal and disruptive as long-range correlations will be induced between independent 2D key points and predicted 3D coordinates during training. To this end, we conducted the following study. With a maximum architecture capacity of 6 residual blocks, we evaluated the performance of 7 models which each represented a 2D pose differently during the adversarial unsupervised 2D-3D HPE process. Additionally, we showed the correlations induced between 2D key points when a full pose is lifted, highlighting the unintuitive correlations learned. Our results show that the most optimal representation of a 2D pose during the lifting stage is that of two independent segments, the torso and legs, with no shared features between each lifting network. This approach decreased the average error by 20% on the Human3.6M dataset when compared to a model with a near identical parameter count trained on the entire 2D kinematic skeleton. Furthermore, due to the complex nature of adversarial learning, we showed how this representation can also improve convergence during training allowing for an optimum result to be obtained more often.

This record has no associated files available for download.

More information

Published date: 30 November 2023
Venue - Dates: ACM SIGGRAPH European Conference on Visual Media Production, BFI Southban, London, United Kingdom, 2023-11-30 - 2023-12-01

Identifiers

Local EPrints ID: 490847
URI: http://eprints.soton.ac.uk/id/eprint/490847
PURE UUID: df738637-7574-4a77-ba8c-5d0ef7f57494
ORCID for Hansung Kim: ORCID iD orcid.org/0000-0003-4907-0491

Catalogue record

Date deposited: 07 Jun 2024 16:33
Last modified: 08 Jun 2024 02:00

Export record

Altmetrics

Contributors

Author: Peter Hardy
Author: Srinandan Dasmahapatra
Author: Hansung Kim ORCID iD
Editor: Marco Volino
Editor: Armin Mustafa
Editor: Peter Vangorp

Download statistics

Downloads from ePrints over the past year. Other digital versions may also be available to download e.g. from the publisher's website.

View more statistics

Atom RSS 1.0 RSS 2.0

Contact ePrints Soton: eprints@soton.ac.uk

ePrints Soton supports OAI 2.0 with a base URL of http://eprints.soton.ac.uk/cgi/oai2

This repository has been built using EPrints software, developed at the University of Southampton, but available to everyone to use.

We use cookies to ensure that we give you the best experience on our website. If you continue without changing your settings, we will assume that you are happy to receive cookies on the University of Southampton website.

×