skip to main content
10.1145/3474085.3475676acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Relationship-Preserving Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval

Published: 17 October 2021 Publication History

Abstract

Zero-shot sketch-based image retrieval is challenging for the modal gap between distributions of sketches and images and the inconsistency of label spaces during training and testing. Previous methods mitigate the modal gap by projecting sketches and images into a joint embedding space. Most of them also bridge seen and unseen classes by leveraging semantic embeddings, i.e., word vectors and hierarchical similarities. In this paper, we propose Relationship-Preserving Knowledge Distillation (RPKD) to study generalizable embeddings from the perspective of knowledge distillation bypassing the usage of semantic embeddings. In particular, we firstly distill the instance-level knowledge to preserve inter-class relationships without semantic similarities that require extra effort to collect. We also reconcile the contrastive relationships among instances between different embedding spaces, which is complementary to instance-level relationships. Furthermore, embedding-induced supervision, which measures the similarities of an instance to partial class embedding centers from the teacher, is developed to align the student's classification confidences. Extensive experiments conducted on three benchmark ZS-SBIR datasets, i.e., Sketchy, TU-Berlin, and QuickDraw, demonstrate the superiority of our proposed RPKD approach comparing to the state-of-the-art methods.

References

[1]
Zeynep Akata, Mateusz Malinowski, Mario Fritz, and Bernt Schiele. 2016. Multi-cue zero-shot learning with strong supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 59--68.
[2]
Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. 2015. Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2927--2936.
[3]
Zhi Chen, Sen Wang, Jingjing Li, and Zi Huang. 2020. Rethinking Generative Zero-Shot Learning: An Ensemble Learning Perspective for Recognising Visual Patches. In Proceedings of the 28th ACM International Conference on Multimedia. 3413--3421.
[4]
Sumit Chopra, Raia Hadsell, and Yann LeCun. 2005. Learning a similarity metric discriminatively, with application to face verification. In 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 539--546.
[5]
Sounak Dey, Pau Riba, Anjan Dutta, Josep Llados, and Yi-Zhe Song. 2019. Doodle to search: Practical zero-shot sketch-based image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2179--2188.
[6]
Anjan Dutta and Zeynep Akata. 2019. Semantically tied paired cycle consistency for zero-shot sketch-based image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5089--5098.
[7]
Titir Dutta and Soma Biswas. 2019. Style-Guided Zero-Shot Sketch-based Image Retrieval. In British Machine Vision Conference 2019. 209--213.
[8]
Mathias Eitz, James Hays, and Marc Alexa. 2012. How do humans sketch objects? ACM Transactions on Graphics (TOG), Vol. 31, 4 (2012), 1--10.
[9]
Mathias Eitz, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2010a. An evaluation of descriptors for large-scale image retrieval from sketched feature lines. Computers & Graphics, Vol. 34, 5 (2010), 482--498.
[10]
Mathias Eitz, Kristian Hildebrand, Tamy Boubekeur, and Marc Alexa. 2010b. Sketch-based image retrieval: Benchmark and bag-of-features descriptors. IEEE Transactions on Visualization and Computer Graphics, Vol. 17, 11 (2010), 1624--1636.
[11]
Zhenyong Fu, Tao Xiang, Elyor Kodirov, and Shaogang Gong. 2015. Zero-shot object recognition by semantic manifold distance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2635--2644.
[12]
Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).
[13]
Rui Hu and John Collomosse. 2013. A performance evaluation of gradient field hog descriptor for sketch based image retrieval. Computer Vision and Image Understanding, Vol. 117, 7 (2013), 790--806.
[14]
Rui Hu, Tinghuai Wang, and John Collomosse. 2011. A bag-of-regions approach to sketch-based image retrieval. In 2011 18th IEEE International Conference on Image Processing. IEEE, 3661--3664.
[15]
Jonas Jongejan, Henry Rowley, Takashi Kawashima, Jongmin Kim, and Nick Fox-Gieg. 2016. Quick, Draw! - a.i. experiment. https://quickdraw.withgoogle.com.
[16]
Sasi Kiran Yelamarthi, Shiva Krishna Reddy, Ashish Mishra, and Anurag Mittal. 2018. A zero-shot framework for sketch based image retrieval. In Proceedings of the European Conference on Computer Vision (ECCV). 300--317.
[17]
Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3174--3183.
[18]
Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2009. Learning to detect unseen object classes by between-class attribute transfer. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 951--958.
[19]
Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2013. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 3 (2013), 453--465.
[20]
Jingjing Li, Mengmeng Jing, Ke Lu, Lei Zhu, Yang Yang, and Zi Huang. 2019. Alleviating feature confusion for generative zero-shot learning. In Proceedings of the 27th ACM International Conference on Multimedia. 1587--1595.
[21]
Kaiyi Lin, Xing Xu, Lianli Gao, Zheng Wang, and Heng Tao Shen. 2020. Learning Cross-Aligned Latent Embeddings for Zero-Shot Cross-Modal Retrieval. In Proceedings of the AAAI Conference on Artificial Intelligence. 11515--11522.
[22]
Li Liu, Fumin Shen, Yuming Shen, Xianglong Liu, and Ling Shao. 2017. Deep sketch hashing: Fast free-hand sketch-based image retrieval. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2862--2871.
[23]
Qing Liu, Lingxi Xie, Huiyu Wang, and Alan L Yuille. 2019. Semantic-aware knowledge preservation for zero-shot sketch-based image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 3662--3671.
[24]
Teng Long, Xing Xu, Youyou Li, Fumin Shen, Jingkuan Song, and Heng Tao Shen. 2018. Pseudo transfer with marginalized corrupted attribute for zero-shot learning. In Proceedings of the 26th ACM International Conference on Multimedia. 1802--1810.
[25]
Yang Long, Li Liu, Ling Shao, Fumin Shen, Guiguang Ding, and Jungong Han. 2017. From zero-shot learning to conventional supervised classification: Unseen visual data synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1627--1636.
[26]
David Lopez-Paz, Léon Bottou, Bernhard Schölkopf, and Vladimir Vapnik. 2015. Unifying distillation and privileged information. arXiv preprint arXiv:1511.03643 (2015).
[27]
Peng Lu, Gao Huang, Yanwei Fu, Guodong Guo, and Hangyu Lin. 2018. Learning large euclidean margin for sketch-based image retrieval. arXiv preprint arXiv:1812.04275 (2018).
[28]
Nicolas Papernot, Patrick McDaniel, Xi Wu, Somesh Jha, and Ananthram Swami. 2016. Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP). 582--597.
[29]
Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, and Zhaoning Zhang. 2019. Correlation congruence for knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 5007--5016.
[30]
Bernardino Romera-Paredes and Philip Torr. 2015. An embarrassingly simple approach to zero-shot learning. In International Conference on Machine Learning. 2152--2161.
[31]
Jose M Saavedra, Juan Manuel Barrios, and S Orand. 2015. Sketch based Image Retrieval using Learned KeyShapes (LKS). In Proceedings of the British Machine Vision Conference 2015, Vol. 1. 1--11.
[32]
Patsorn Sangkloy, Nathan Burnell, Cusuh Ham, and James Hays. 2016. The sketchy database: learning to retrieve badly drawn bunnies. ACM Transactions on Graphics (TOG), Vol. 35, 4 (2016), 1--12.
[33]
Edgar Schonfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, and Zeynep Akata. 2019. Generalized zero-and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8247--8255.
[34]
Heng Tao Shen, Luchen Liu, Yang Yang, Xing Xu, Zi Huang, Fumin Shen, and Richang Hong. 2020. Exploiting Subspace Relation in Semantic Labels for Cross-modal Hashing. IEEE Transactions on Knowledge and Data Engineering (2020), 10.1109/TKDE.2020.2970050.
[35]
Yuming Shen, Li Liu, Fumin Shen, and Ling Shao. 2018. Zero-shot sketch-image hashing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3598--3607.
[36]
Yutaro Shigeto, Ikumi Suzuki, Kazuo Hara, Masashi Shimbo, and Yuji Matsumoto. 2015. Ridge regression, hubness, and zero-shot learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 135--151.
[37]
Jifei Song, Yi-Zhe Song, Tony Xiang, and Timothy M Hospedales. 2017a. Fine-Grained Image Retrieval: the Text/Sketch Input Dilemma. In British Machine Vision Conference 2017. 1--12.
[38]
Jifei Song, Qian Yu, Yi-Zhe Song, Tao Xiang, and Timothy M Hospedales. 2017b. Deep spatial-semantic attention for fine-grained sketch-based image retrieval. In Proceedings of the IEEE International Conference on Computer Vision. 5551--5560.
[39]
Frederick Tung and Greg Mori. 2019. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 1365--1374.
[40]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of Machine Learning Research, Vol. 9, 11 (2008).
[41]
Bokun Wang, Yang Yang, Xing Xu, Alan Hanjalic, and Heng Tao Shen. 2017. Adversarial Cross-Modal Retrieval. In Proceedings of the 2017 ACM on Multimedia Conference. 154--162.
[42]
Yongqin Xian, Christoph H Lampert, Bernt Schiele, and Zeynep Akata. 2018a. Zero-shot learning-A comprehensive evaluation of the good, the bad and the ugly. IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 41, 9 (2018), 2251--2265.
[43]
Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. 2018b. Feature generating networks for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5542--5551.
[44]
Yongqin Xian, Saurabh Sharma, Bernt Schiele, and Zeynep Akata. 2019. f-VAEGAN-D2: A feature generating framework for any-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 10275--10284.
[45]
Xing Xu, Huimin Lu, Jingkuan Song, Yang Yang, Heng Tao Shen, and Xuelong Li. 2019. Ternary adversarial networks with self-supervision for zero-shot cross-modal retrieval. IEEE Transactions on Cybernetics (2019).
[46]
Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017. Learning Discriminative Binary Codes for Large-scale Cross-modal Retrieval. IEEE Trans. Image Processing, Vol. 26, 5 (2017), 2494--2507.
[47]
Xinxun Xu, Muli Yang, Yanhua Yang, and Hao Wang. [n.d.]. Progressive Domain-Independent Feature Decomposition Network for Zero-Shot Sketch-Based Image Retrieval. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI-20. 984--990.
[48]
Yang Yang, Yadan Luo, Weilun Chen, Fumin Shen, Jie Shao, and Heng Tao Shen. 2016. Zero-shot hashing via transferring supervised knowledge. In Proceedings of the 24th ACM International Conference on Multimedia. 1286--1295.
[49]
Han-Jia Ye, Su Lu, and De-Chuan Zhan. 2020. Distilling Cross-Task Knowledge via Relationship Matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 12396--12405.
[50]
Qian Yu, Feng Liu, Yi-Zhe Song, Tao Xiang, Timothy M Hospedales, and Chen-Change Loy. 2016. Sketch me that shoe. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 799--807.
[51]
Hua Zhang, Si Liu, Changqing Zhang, Wenqi Ren, Rui Wang, and Xiaochun Cao. 2016. Sketchnet: Sketch classification with web images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1105--1113.
[52]
Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2021--2030.
[53]
Ziming Zhang and Venkatesh Saligrama. 2015. Zero-shot learning via semantic similarity embedding. In Proceedings of the IEEE International Conference on Computer Vision. 4166--4174.
[54]
Ziming Zhang and Venkatesh Saligrama. 2016. Zero-shot learning via joint latent similarity embedding. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 6034--6042.
[55]
Rong Zhou, Liuli Chen, and Liqing Zhang. 2012. Sketch-based image retrieval on a large scale database. In Proceedings of the 20th ACM international conference on Multimedia. 973--976.
[56]
Jiawen Zhu, Xing Xu, Fumin Shen, Roy Ka-Wei Lee, Zheng Wang, and Heng Tao Shen. 2020. Ocean: A Dual Learning Approach For Generalized Zero-Shot Sketch-Based Image Retrieval. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1--6.
[57]
Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, and Ahmed Elgammal. 2018. A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1004--1013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '21: Proceedings of the 29th ACM International Conference on Multimedia
October 2021
5796 pages
ISBN:9781450386517
DOI:10.1145/3474085
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. knowledge distillation
  2. sketch-based image retrieval
  3. zero-shot learning

Qualifiers

  • Research-article

Funding Sources

Conference

MM '21
Sponsor:
MM '21: ACM Multimedia Conference
October 20 - 24, 2021
Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)46
  • Downloads (Last 6 weeks)8
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media