Transformers in Action: Weakly Supervised Action Segmentation

Ridley, John; Coskun, Huseyin; Tan, David Joseph; Navab, Nassir; Tombari, Federico

Computer Science > Computer Vision and Pattern Recognition

arXiv:2201.05675 (cs)

[Submitted on 14 Jan 2022 (v1), last revised 20 Jan 2022 (this version, v2)]

Title:Transformers in Action: Weakly Supervised Action Segmentation

Authors:John Ridley, Huseyin Coskun, David Joseph Tan, Nassir Navab, Federico Tombari

View PDF

Abstract:The video action segmentation task is regularly explored under weaker forms of supervision, such as transcript supervision, where a list of actions is easier to obtain than dense frame-wise labels. In this formulation, the task presents various challenges for sequence modeling approaches due to the emphasis on action transition points, long sequence lengths, and frame contextualization, making the task well-posed for transformers. Given developments enabling transformers to scale linearly, we demonstrate through our architecture how they can be applied to improve action alignment accuracy over the equivalent RNN-based models with the attention mechanism focusing around salient action transition regions. Additionally, given the recent focus on inference-time transcript selection, we propose a supplemental transcript embedding approach to select transcripts more quickly at inference-time. Furthermore, we subsequently demonstrate how this approach can also improve the overall segmentation performance. Finally, we evaluate our proposed methods across the benchmark datasets to better understand the applicability of transformers and the importance of transcript selection on this video-driven weakly-supervised task.

Comments:	Under Review
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2201.05675 [cs.CV]
	(or arXiv:2201.05675v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2201.05675

Submission history

From: Huseyin Coskun [view email]
[v1] Fri, 14 Jan 2022 21:15:58 UTC (10,421 KB)
[v2] Thu, 20 Jan 2022 19:31:31 UTC (10,422 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Transformers in Action: Weakly Supervised Action Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Transformers in Action: Weakly Supervised Action Segmentation

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators