VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Nguyen, Chuong H.; Huynh, Su; Nguyen, Vinh; Nguyen, Ngoc

Computer Science > Computer Vision and Pattern Recognition

arXiv:2207.03782 (cs)

[Submitted on 8 Jul 2022]

Title:VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Authors:Chuong H. Nguyen, Su Huynh, Vinh Nguyen, Ngoc Nguyen

View PDF

Abstract:Since being introduced in 2020, Vision Transformers (ViT) has been steadily breaking the record for many vision tasks and are often described as ``all-you-need" to replace ConvNet. Despite that, ViTs are generally computational, memory-consuming, and unfriendly for embedded devices. In addition, recent research shows that standard ConvNet if redesigned and trained appropriately can compete favorably with ViT in terms of accuracy and scalability. In this paper, we adopt the modernized structure of ConvNet to design a new backbone for action recognition. Particularly, our main target is to serve for industrial product deployment, such as FPGA boards in which only standard operations are supported. Therefore, our network simply consists of 2D convolutions, without using any 3D convolution, long-range attention plugin, or Transformer blocks. While being trained with much fewer epochs (5x-10x), our backbone surpasses the methods using (2+1)D and 3D convolution, and achieve comparable results with ViT on two benchmark datasets.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2207.03782 [cs.CV]
	(or arXiv:2207.03782v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2207.03782

Submission history

From: Chuong Nguyen [view email]
[v1] Fri, 8 Jul 2022 09:33:46 UTC (5,471 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:VidConv: A modernized 2D ConvNet for Efficient Video Recognition

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators