Optimising 2D pose representations: improving accuracy, stability and generalisability within unsupervised 2D-3D human pose estimation

This paper investigated pose representation within the field of unsupervised 2D-3D human pose estimation (HPE). All current unsupervised 2D-3D HPE approaches provide the entire 2D kinematic skeleton to a model during training. We argue that this is sub-optimal and disruptive as long-range correlations will be induced between independent 2D key points and predicted 3D coordinates during training. To this end, we conducted the following study. With a maximum architecture capacity of 6 residual blocks, we evaluated the performance of 7 models which each represented a 2D pose differently during the adversarial unsupervised 2D-3D HPE process. Additionally, we showed the correlations induced between 2D key points when a full pose is lifted, highlighting the unintuitive correlations learned. Our results show that the most optimal representation of a 2D pose during the lifting stage is that of two independent segments, the torso and legs, with no shared features between each lifting network. This approach decreased the average error by 20% on the Human3.6M dataset when compared to a model with a near identical parameter count trained on the entire 2D kinematic skeleton. Furthermore, due to the complex nature of adversarial learning, we showed how this representation can also improve convergence during training allowing for an optimum result to be obtained more often.

10.1145/3626495.3626505

Association for Computing Machinery

Hardy, Peter

361a5d48-51cf-4eaf-9b60-1de78f2f2f20

Dasmahapatra, Srinandan

eb5fd76f-4335-4ae9-a88a-20b9e2b3f698

Kim, Hansung

2c7c135c-f00b-4409-acb2-85b3a9e8225f

Volino, Marco

Mustafa, Armin

Vangorp, Peter

30 November 2023