skip to main content
research-article

An efficient framework for zero-shot sketch-based image retrieval

Published: 01 June 2022 Publication History

Highlights

We propose an efficient framework for zero-shot sketch-based image retrieval.
The model is trained in an end-to-end way with three introduced learning objectives: domain-balanced quadruplet loss, semantic classification loss and semantic knowledge preservation loss.
A low-cost but accurate semantic knowledge distillation pipeline is introduced. It does not require a language model or an online teacher network as with past approaches.
The proposed method achieved state-of-the-art results on three challenging zero-shot sketch-based image retrieval datasets: Sketchy Extended, TU-Berlin Extended and QuickDraw Extended.

Abstract

Zero-shot sketch-based image retrieval (ZS-SBIR) has recently attracted the attention of the computer vision community due to its real-world applications, and the more realistic and challenging setting that it presents over SBIR. ZS-SBIR inherits the main challenges of multiple computer vision problems including content-based Image Retrieval (CBIR), zero-shot learning and domain adaptation. The majority of previous studies using deep neural networks have achieved improved results by either projecting sketch and images into a common low-dimensional space, or transferring knowledge from seen to unseen classes. However, those approaches are trained with complex frameworks composed of multiple deep convolutional neural networks (CNNs) and are dependent on category-level word labels. This increases the requirements for training resources and datasets. In comparison, we propose a simple and efficient framework that does not require high computational training resources, and learns the semantic embedding space from a vision model rather than a language model, as is done by related studies. Furthermore, at training and inference stages our method only uses a single CNN. In this work, a pre-trained ImageNet CNN (i.e., ResNet50) is fine-tuned with three proposed learning objects: domain-balanced quadruplet loss, semantic classification loss, and semantic knowledge preservation loss. The domain-balanced quadruplet and semantic classification losses are introduced to learn discriminative, semantic and domain invariant features by considering ZS-SBIR as an object detection and verification problem. To preserve semantic knowledge learned with ImageNet and exploit it for unseen categories, the semantic knowledge preservation loss is proposed. To reduce computational cost and increase the accuracy of the semantic knowledge distillation process, ground-truth semantic knowledge is prepared in a class-oriented fashion prior to training. Extensive experiments are conducted on three challenging ZS-SBIR datasets: Sketchy Extended, TU-Berlin Extended and QuickDraw Extended. The proposed method achieves state-of-the-art results, and outperforms the majority of related works by a substantial margin.

References

[1]
A. Gordo, J. Almazán, J. Revaud, D. Larlus, Deep image retrieval: learning global representations for image search, European Conference on Computer Vision, Springer, 2016, pp. 241–257.
[2]
F. Schroff, D. Kalenichenko, J. Philbin, FaceNet: a unified embedding for face recognition and clustering, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 815–823.
[3]
O. Tursun, S. Denman, S. Sivapalan, S. Sridharan, C. Fookes, S. Mau, Component-based attention for large-scale trademark retrieval, IEEE Trans. Inf. Forensics Secur. (2019).
[4]
S. Dey, P. Riba, A. Dutta, J. Llados, Y.-Z. Song, Doodle to search: practical zero-shot sketch-based image retrieval, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019, pp. 2179–2188.
[5]
Q. Liu, L. Xie, H. Wang, A.L. Yuille, Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval, Proceedings of the IEEE International Conference on Computer Vision, 2019, pp. 3662–3671.
[6]
T. Dutta, S. Biswas, Style-guided zero-shot sketch-based image retrieval, BMVC, 2019, p. 209.
[7]
T. Dutta, A. Singh, S. Biswas, Adaptive margin diversity regularizer for handling data imbalance in zero-shot SBIR, European Conference on Computer Vision, Springer, 2020, pp. 349–364.
[8]
Y. Wang, F. Huang, Y. Zhang, R. Feng, T. Zhang, W. Fan, Deep cascaded cross-modal correlation learning for fine-grained sketch-based image retrieval, Pattern Recognit. 100 (2020) 107148.
[9]
F. Huang, C. Jin, Y. Zhang, K. Weng, T. Zhang, W. Fan, Sketch-based image retrieval with deep visual semantic descriptor, Pattern Recognit. 76 (2018) 537–548.
[10]
H. Zhang, S. Liu, C. Zhang, W. Ren, R. Wang, X. Cao, SketchNet: sketch classification with web images, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1105–1113.
[11]
P. Sangkloy, N. Burnell, C. Ham, J. Hays, The sketchy database: learning to retrieve badly drawn bunnies, ACM Trans. Graphics (TOG) 35 (4) (2016) 1–12.
[12]
Y. Shen, L. Liu, F. Shen, L. Shao, Zero-shot sketch-image hashing, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 3598–3607.
[13]
J. Lei, Y. Song, B. Peng, Z. Ma, L. Shao, Y.-Z. Song, Semi-heterogeneous three-way joint embedding network for sketch-based image retrieval, IEEE Trans. Circuits Syst. Video Technol. (2019).
[14]
L. Liu, F. Shen, Y. Shen, X. Liu, L. Shao, Deep sketch hashing: fast free-hand sketch-based image retrieval, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 2862–2871.
[15]
J. Zhang, F. Shen, L. Liu, F. Zhu, M. Yu, L. Shao, H. Tao Shen, L. Van Gool, Generative domain-migration hashing for sketch-to-image retrieval, Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 297–314.
[16]
V. Kumar Verma, A. Mishra, A. Mishra, P. Rai, Generative model for zero-shot sketch-based image retrieval, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019.
[17]
K. Pang, Y.-Z. Song, T. Xiang, T.M. Hospedales, Cross-domain generative learning for fine-grained sketch-based image retrieval, BMVC, 2017, pp. 1–12.
[18]
L. Guo, J. Liu, Y. Wang, Z. Luo, W. Wen, H. Lu, Sketch-based image retrieval using generative adversarial networks, Proceedings of the 25th ACM International Conference on Multimedia, 2017, pp. 1267–1268.
[19]
J. Zhu, X. Xu, F. Shen, R.K.-W. Lee, Z. Wang, H.T. Shen, Ocean: a dual learning approach for generalized zero-shot sketch-based image retrieval, 2020 IEEE International Conference on Multimedia and Expo (ICME), IEEE, 2020, pp. 1–6.
[20]
A. Dutta, Z. Akata, Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval, CVPR, 2019.
[21]
Z. Zhang, Y. Zhang, R. Feng, T. Zhang, W. Fan, Zero-shot sketch-based image retrieval via graph convolution network, AAAI, 2020, pp. 12943–12950.
[22]
J. Li, Z. Ling, L. Niu, L. Zhang, Bi-directional domain translation for zero-shot sketch-based image retrieval, arXiv:1911.13251(2019).
[23]
A. Pandey, A. Mishra, V.K. Verma, A. Mittal, H. Murthy, Stacked adversarial network for zero-shot sketch based image retrieval, The IEEE Winter Conference on Applications of Computer Vision, 2020, pp. 2540–2549.
[24]
G.A. Miller, WordNet: An Electronic Lexical Database, MIT press, 1998.
[25]
P. Sangkloy, N. Burnell, C. Ham, J. Hays, The sketchy database: learning to retrieve badly drawn bunnies, ACM Transactions on Graphics (Proceedings of SIGGRAPH), 2016.
[26]
M. Eitz, J. Hays, M. Alexa, How do humans sketch objects?, ACM Trans. Graphics (TOG) 31 (4) (2012) 1–10.
[27]
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[28]
J. Song, Q. Yu, Y.-Z. Song, T. Xiang, T.M. Hospedales, Deep spatial-semantic attention for fine-grained sketch-based image retrieval, Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 5551–5560.
[29]
F. Wang, L. Kang, Y. Li, Sketch-based 3D shape retrieval using convolutional neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1875–1883.
[30]
C. Peng, N. Wang, J. Li, X. Gao, DLFace: deep local descriptor for cross-modality face recognition, Pattern Recognit. 90 (2019) 161–171.
[31]
D. Liu, X. Gao, N. Wang, C. Peng, J. Li, Iterative local re-ranking with attribute guided synthesis for face sketch recognition, Pattern Recognit. 109 (2021) 107579.
[32]
Y. Lei, Z. Zhou, P. Zhang, Y. Guo, Z. Ma, L. Liu, Deep point-to-subspace metric learning for sketch-based 3D shape retrieval, Pattern Recognit. 96 (2019) 106981.
[33]
R. Hu, J. Collomosse, A performance evaluation of gradient field hog descriptor for sketch based image retrieval, Comput. Vision Image Understanding 117 (7) (2013) 790–806.
[34]
J.M. Saavedra, Sketch based image retrieval using a soft computation of the histogram of edge local orientations (S-HELO), 2014 IEEE International Conference on Image Processing (ICIP), IEEE, 2014, pp. 2998–3002.
[35]
J.M. Saavedra, J.M. Barrios, S. Orand, Sketch based image retrieval using learned keyshapes (LKS), BMVC, vol. 1, 2015, p. 7.
[36]
Q. Yu, F. Liu, Y.-Z. Song, T. Xiang, T.M. Hospedales, C.-C. Loy, Sketch me that shoe, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 799–807.
[37]
Y. Qi, Y.-Z. Song, H. Zhang, J. Liu, Sketch-based image retrieval via siamese convolutional neural network, 2016 IEEE International Conference on Image Processing (ICIP), IEEE, 2016, pp. 2460–2464.
[38]
C. Bai, J. Chen, Q. Ma, P. Hao, S. Chen, Cross-domain representation learning by domain-migration generative adversarial network for sketch based image retrieval, J. Vis. Commun. Image Represent 71 (2020) 102835.
[39]
U. Chaudhuri, B. Banerjee, A. Bhattacharya, M. Datcu, A simplified framework for zero-shot cross-modal sketch data retrieval, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2020, pp. 182–183.
[40]
C. Deng, X. Xu, H. Wang, M. Yang, D. Tao, Progressive cross-modal semantic network for zero-shot sketch-based image retrieval, IEEE Trans. Image Process. 29 (2020) 8892–8902.
[41]
X. Xu, K. Lin, H. Lu, L. Gao, H.T. Shen, Correlated features synthesis and alignment for zero-shot cross-modal retrieval, Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 1419–1428.
[42]
S.K. Yelamarthi, S.K. Reddy, A. Mishra, A. Mittal, A zero-shot framework for sketch based image retrieval, European Conference on Computer Vision, Springer, 2018, pp. 316–333.
[43]
W. Chen, X. Chen, J. Zhang, K. Huang, Beyond triplet loss: a deep quadruplet network for person re-identification, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 403–412.
[44]
K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhudinov, R. Zemel, Y. Bengio, Show, attend and tell: neural image caption generation with visual attention, International Conference on Machine Learning, 2015, pp. 2048–2057.
[45]
A. Khatun, S. Denman, S. Sridharan, C. Fookes, Joint identification-verification for person re-identification: a four stream deep learning approach with improved quartet loss function, Comput. Vision Image Understanding (2020) 102989.
[46]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: an imperative style, high-performance deep learning library, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems 32, Curran Associates, Inc., 2019, pp. 8024–8035.
[47]
P. Lu, G. Huang, Y. Fu, G. Guo, H. Lin, Learning large euclidean margin for sketch-based image retrieval, arXiv:1812.04275(2018).
[48]
J. Li, Z. Ling, L. Niu, L. Zhang, Zero-shot sketch-based image retrieval with structure-aware asymmetric disentanglement, arXiv:1911.13251(2019).
[49]
Y. Gong, S. Lazebnik, A. Gordo, F. Perronnin, Iterative quantization: a procrustean approach to learning binary codes for large-scale image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 35 (12) (2012) 2916–2929.
[50]
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2921–2929.
[51]
Y. Fu, T. Xiang, Y.-G. Jiang, X. Xue, L. Sigal, S. Gong, Recent advances in zero-shot recognition: toward data-efficient understanding of visual content, IEEE Signal Process. Mag. 35 (1) (2018) 112–125.

Cited By

View all

Index Terms

  1. An efficient framework for zero-shot sketch-based image retrieval
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Pattern Recognition
          Pattern Recognition  Volume 126, Issue C
          Jun 2022
          554 pages

          Publisher

          Elsevier Science Inc.

          United States

          Publication History

          Published: 01 June 2022

          Author Tags

          1. Sketch-based image retrieval
          2. Zero-shot learning
          3. Knowledge distillation
          4. Similarity learning

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 29 Jan 2025

          Other Metrics

          Citations

          Cited By

          View all

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media