research-article

An efficient framework for zero-shot sketch-based image retrieval

Authors:

Clinton FookesAuthors Info & Claims

Volume 126, Issue C

https://doi.org/10.1016/j.patcog.2022.108528

Published: 01 June 2022 Publication History

Highlights

•

We propose an efficient framework for zero-shot sketch-based image retrieval.

•

The model is trained in an end-to-end way with three introduced learning objectives: domain-balanced quadruplet loss, semantic classification loss and semantic knowledge preservation loss.

•

A low-cost but accurate semantic knowledge distillation pipeline is introduced. It does not require a language model or an online teacher network as with past approaches.

•

The proposed method achieved state-of-the-art results on three challenging zero-shot sketch-based image retrieval datasets: Sketchy Extended, TU-Berlin Extended and QuickDraw Extended.

Abstract

Zero-shot sketch-based image retrieval (ZS-SBIR) has recently attracted the attention of the computer vision community due to its real-world applications, and the more realistic and challenging setting that it presents over SBIR. ZS-SBIR inherits the main challenges of multiple computer vision problems including content-based Image Retrieval (CBIR), zero-shot learning and domain adaptation. The majority of previous studies using deep neural networks have achieved improved results by either projecting sketch and images into a common low-dimensional space, or transferring knowledge from seen to unseen classes. However, those approaches are trained with complex frameworks composed of multiple deep convolutional neural networks (CNNs) and are dependent on category-level word labels. This increases the requirements for training resources and datasets. In comparison, we propose a simple and efficient framework that does not require high computational training resources, and learns the semantic embedding space from a vision model rather than a language model, as is done by related studies. Furthermore, at training and inference stages our method only uses a single CNN. In this work, a pre-trained ImageNet CNN (i.e., ResNet50) is fine-tuned with three proposed learning objects: domain-balanced quadruplet loss, semantic classification loss, and semantic knowledge preservation loss. The domain-balanced quadruplet and semantic classification losses are introduced to learn discriminative, semantic and domain invariant features by considering ZS-SBIR as an object detection and verification problem. To preserve semantic knowledge learned with ImageNet and exploit it for unseen categories, the semantic knowledge preservation loss is proposed. To reduce computational cost and increase the accuracy of the semantic knowledge distillation process, ground-truth semantic knowledge is prepared in a class-oriented fashion prior to training. Extensive experiments are conducted on three challenging ZS-SBIR datasets: Sketchy Extended, TU-Berlin Extended and QuickDraw Extended. The proposed method achieves state-of-the-art results, and outperforms the majority of related works by a substantial margin.

References

[1]

A. Gordo, J. Almazán, J. Revaud, D. Larlus, Deep image retrieval: learning global representations for image search, European Conference on Computer Vision, Springer, 2016, pp. 241–257.

Highlights

Abstract

References

Cited By

Index Terms

Recommendations

Prototype-based Selective Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval

Deep cross-modal discriminant adversarial learning for zero-shot sketch-based image retrieval

Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

Share

Share this Publication link

Share on social media

Affiliations