Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes

Choi, Hongsuk; Moon, Gyeongsik; Park, JoonKyu; Lee, Kyoung Mu

Computer Science > Computer Vision and Pattern Recognition

arXiv:2104.07300 (cs)

[Submitted on 15 Apr 2021 (v1), last revised 18 Sep 2022 (this version, v3)]

Title:Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes

Authors:Hongsuk Choi, Gyeongsik Moon, JoonKyu Park, Kyoung Mu Lee

View PDF

Abstract:We consider the problem of recovering a single person's 3D human mesh from in-the-wild crowded scenes. While much progress has been in 3D human mesh estimation, existing methods struggle when test input has crowded scenes. The first reason for the failure is a domain gap between training and testing data. A motion capture dataset, which provides accurate 3D labels for training, lacks crowd data and impedes a network from learning crowded scene-robust image features of a target person. The second reason is a feature processing that spatially averages the feature map of a localized bounding box containing multiple people. Averaging the whole feature map makes a target person's feature indistinguishable from others. We present 3DCrowdNet that firstly explicitly targets in-the-wild crowded scenes and estimates a robust 3D human mesh by addressing the above issues. First, we leverage 2D human pose estimation that does not require a motion capture dataset with 3D labels for training and does not suffer from the domain gap. Second, we propose a joint-based regressor that distinguishes a target person's feature from others. Our joint-based regressor preserves the spatial activation of a target by sampling features from the target's joint locations and regresses human model parameters. As a result, 3DCrowdNet learns target-focused features and effectively excludes the irrelevant features of nearby persons. We conduct experiments on various benchmarks and prove the robustness of 3DCrowdNet to the in-the-wild crowded scenes both quantitatively and qualitatively. The code is available at this https URL.

Comments:	Accepted to CVPR 2022, 16 pages including the supplementary material
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2104.07300 [cs.CV]
	(or arXiv:2104.07300v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2104.07300

Submission history

From: Hongsuk Choi [view email]
[v1] Thu, 15 Apr 2021 08:21:28 UTC (29,299 KB)
[v2] Mon, 28 Mar 2022 10:06:47 UTC (3,864 KB)
[v3] Sun, 18 Sep 2022 13:25:19 UTC (3,858 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators