An End-to-End Transformer Model for Crowd Localization

Liang, Dingkang; Xu, Wei; Bai, Xiang

Computer Science > Computer Vision and Pattern Recognition

arXiv:2202.13065 (cs)

[Submitted on 26 Feb 2022 (v1), last revised 8 Aug 2022 (this version, v2)]

Title:An End-to-End Transformer Model for Crowd Localization

Authors:Dingkang Liang, Wei Xu, Xiang Bai

View PDF

Abstract:Crowd localization, predicting head positions, is a more practical and high-level task than simply counting. Existing methods employ pseudo-bounding boxes or pre-designed localization maps, relying on complex post-processing to obtain the head positions. In this paper, we propose an elegant, end-to-end Crowd Localization Transformer named CLTR that solves the task in the regression-based paradigm. The proposed method views the crowd localization as a direct set prediction problem, taking extracted features and trainable embeddings as input of the transformer-decoder. To reduce the ambiguous points and generate more reasonable matching results, we introduce a KMO-based Hungarian matcher, which adopts the nearby context as the auxiliary matching cost. Extensive experiments conducted on five datasets in various data settings show the effectiveness of our method. In particular, the proposed method achieves the best localization performance on the NWPU-Crowd, UCF-QNRF, and ShanghaiTech Part A datasets.

Comments:	Accepted by ECCV 2022. The project page is at this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2202.13065 [cs.CV]
	(or arXiv:2202.13065v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2202.13065

Submission history

From: Dingkang Liang [view email]
[v1] Sat, 26 Feb 2022 05:21:30 UTC (24,586 KB)
[v2] Mon, 8 Aug 2022 10:56:39 UTC (3,033 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:An End-to-End Transformer Model for Crowd Localization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:An End-to-End Transformer Model for Crowd Localization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators