Revisiting spatio-temporal layouts for compositional action recognition

Radevski, Gorjan; Moens, Marie-Francine; Tuytelaars, Tinne

Computer Science > Computer Vision and Pattern Recognition

arXiv:2111.01936 (cs)

[Submitted on 2 Nov 2021]

Title:Revisiting spatio-temporal layouts for compositional action recognition

Authors:Gorjan Radevski, Marie-Francine Moens, Tinne Tuytelaars

View PDF

Abstract:Recognizing human actions is fundamentally a spatio-temporal reasoning problem, and should be, at least to some extent, invariant to the appearance of the human and the objects involved. Motivated by this hypothesis, in this work, we take an object-centric approach to action recognition. Multiple works have studied this setting before, yet it remains unclear (i) how well a carefully crafted, spatio-temporal layout-based method can recognize human actions, and (ii) how, and when, to fuse the information from layout and appearance-based models. The main focus of this paper is compositional/few-shot action recognition, where we advocate the usage of multi-head attention (proven to be effective for spatial reasoning) over spatio-temporal layouts, i.e., configurations of object bounding boxes. We evaluate different schemes to inject video appearance information to the system, and benchmark our approach on background cluttered action recognition. On the Something-Else and Action Genome datasets, we demonstrate (i) how to extend multi-head attention for spatio-temporal layout-based action recognition, (ii) how to improve the performance of appearance-based models by fusion with layout-based models, (iii) that even on non-compositional background-cluttered video datasets, a fusion between layout- and appearance-based models improves the performance.

Comments:	Published in BMVC 2021 (Oral)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2111.01936 [cs.CV]
	(or arXiv:2111.01936v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2111.01936

Submission history

From: Gorjan Radevski [view email]
[v1] Tue, 2 Nov 2021 23:04:39 UTC (11,327 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Revisiting spatio-temporal layouts for compositional action recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Revisiting spatio-temporal layouts for compositional action recognition

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators