Semantic-aligned Fusion Transformer for One-shot Object Detection

Zhao, Yizhou; Guo, Xun; Lu, Yan

Computer Science > Computer Vision and Pattern Recognition

arXiv:2203.09093 (cs)

[Submitted on 17 Mar 2022 (v1), last revised 20 Mar 2022 (this version, v2)]

Title:Semantic-aligned Fusion Transformer for One-shot Object Detection

Authors:Yizhou Zhao, Xun Guo, Yan Lu

View PDF

Abstract:One-shot object detection aims at detecting novel objects according to merely one given instance. With extreme data scarcity, current approaches explore various feature fusions to obtain directly transferable meta-knowledge. Yet, their performances are often unsatisfactory. In this paper, we attribute this to inappropriate correlation methods that misalign query-support semantics by overlooking spatial structures and scale variances. Upon analysis, we leverage the attention mechanism and propose a simple but effective architecture named Semantic-aligned Fusion Transformer (SaFT) to resolve these issues. Specifically, we equip SaFT with a vertical fusion module (VFM) for cross-scale semantic enhancement and a horizontal fusion module (HFM) for cross-sample feature fusion. Together, they broaden the vision for each feature point from the support to a whole augmented feature pyramid from the query, facilitating semantic-aligned associations. Extensive experiments on multiple benchmarks demonstrate the superiority of our framework. Without fine-tuning on novel classes, it brings significant performance gains to one-stage baselines, lifting state-of-the-art results to a higher level.

Comments:	Accepted by CVPR2022
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2203.09093 [cs.CV]
	(or arXiv:2203.09093v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2203.09093

Submission history

From: Yizhou Zhao [view email]
[v1] Thu, 17 Mar 2022 05:38:47 UTC (1,904 KB)
[v2] Sun, 20 Mar 2022 09:27:23 UTC (1,907 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Semantic-aligned Fusion Transformer for One-shot Object Detection

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Semantic-aligned Fusion Transformer for One-shot Object Detection

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators