skip to main content
research-article

Dual Projective Zero-Shot Learning Using Text Descriptions

Published: 05 January 2023 Publication History

Abstract

Zero-shot learning (ZSL) aims to recognize image instances of unseen classes solely based on the semantic descriptions of the unseen classes. In this field, Generalized Zero-Shot Learning (GZSL) is a challenging problem in which the images of both seen and unseen classes are mixed in the testing phase of learning. Existing methods formulate GZSL as a semantic-visual correspondence problem and apply generative models such as Generative Adversarial Networks and Variational Autoencoders to solve the problem. However, these methods suffer from the bias problem since the images of unseen classes are often misclassified into seen classes. In this work, a novel model named the Dual Projective model for Zero-Shot Learning (DPZSL) is proposed using text descriptions. In order to alleviate the bias problem, we leverage two autoencoders to project the visual and semantic features into a latent space and evaluate the embeddings by a visual-semantic correspondence loss function. An additional novel classifier is also introduced to ensure the discriminability of the embedded features. Our method focuses on a more challenging inductive ZSL setting in which only the labeled data from seen classes are used in the training phase. The experimental results, obtained from two popular datasets—Caltech-UCSD Birds-200-2011 (CUB) and North America Birds (NAB)—show that the proposed DPZSL model significantly outperforms both the inductive ZSL and GZSL settings. Particularly in the GZSL setting, our model yields an improvement up to 15.2% in comparison with state-of-the-art CANZSL on datasets CUB and NAB with two splittings.

References

[1]
Zeynep Akata, Mateusz Malinowski, Mario Fritz, and Bernt Schiele. 2016. Multi-cue zero-shot learning with strong supervision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 59–68.
[2]
Zeynep Akata, Florent Perronnin, Zaid Harchaoui, and Cordelia Schmid. 2015. Label-embedding for image classification. IEEE Transactions on Pattern Analysis and Machine Intelligence 38, 7 (2015), 1425–1438.
[3]
Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. 2015. Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, Massachusetts, USA, 2927–2936.
[4]
Yashas Annadani and Soma Biswas. 2018. Preserving semantic relations for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, USA, 7603–7612.
[5]
Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2017), 2481–2495.
[6]
Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, and Fei Sha. 2016. Synthesized classifiers for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 5327–5336.
[7]
Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, and Fei Sha. 2020. Classifier and exemplar synthesis for zero-shot learning. International Journal of Computer Vision 128, 1 (2020), 166–201.
[8]
Wei-Lun Chao, Soravit Changpinyo, Boqing Gong, and Fei Sha. 2016. An empirical study and analysis of generalized zero-shot learning for object recognition in the wild. In European Conference on Computer Vision. Springer, Amsterdam, the Netherlands, 52–68.
[9]
Zhi Chen, Jingjing Li, Yadan Luo, Zi Huang, and Yang Yang. 2020. CANZSL: Cycle-consistent adversarial networks for zero-shot learning from natural language. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. IEEE, Snowmass village, Colorado, 874–883.
[10]
Yu-Ying Chou, Hsuan-Tien Lin, and Tyng-Luh Liu. 2020. Adaptive and generative zero-shot learning. In International Conference on Learning Representations. IEEE, Vienna, Austria.
[11]
Peng Cui, Shaowei Liu, and Wenwu Zhu. 2017. General knowledge embedded image representation learning. IEEE Transactions on Multimedia 20, 1 (2017), 198–207.
[12]
Mohamed Elhoseiny and Mohamed Elfeki. 2019. Creativity inspired zero-shot learning. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, Korea (South), 5784–5793.
[13]
Mohamed Elhoseiny, Ahmed Elgammal, and Babak Saleh. 2016. Write a classifier: Predicting visual classifiers from unstructured text. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12 (2016), 2539–2553.
[14]
Mohamed Elhoseiny, Yizhe Zhu, Han Zhang, and Ahmed Elgammal. 2017. Link the head to the “beak”: Zero shot learning from noisy text description at part precision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 5640–5649.
[15]
Andrea Frome, Greg S. Corrado, Jon Shlens, Samy Bengio, Jeff Dean, Marc’Aurelio Ranzato, and Tomas Mikolov. 2013. DeVISE: A deep visual-semantic embedding model. Advances in Neural Information Processing Systems 26 (2013).
[16]
Ross Girshick. 2015. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 1440–1448.
[17]
Zhangxuan Gu, Siyuan Zhou, Li Niu, Zihan Zhao, and Liqing Zhang. 2020. Context-aware feature generation for zero-shot semantic segmentation. In Proceedings of the 28th ACM International Conference on Multimedia. ACM, New York, NY, USA, 1921–1929.
[18]
Xintong Han, Bharat Singh, Vlad I. Morariu, and Larry S. Davis. 2017. VRFP: On-the-fly video retrieval using web images and fast Fisher vector products. IEEE Transactions on Multimedia 19, 7 (2017), 1583–1595.
[19]
He Huang, Changhu Wang, Philip S. Yu, and Chang-Dong Wang. 2019. Generative dual adversarial network for generalized zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, California, 801–810.
[20]
Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. International Conference on Learning Representations (ICLR’15). San Diego, CA, arXiv preprint arXiv:1412.6980 9 (2015).
[21]
Elyor Kodirov, Tao Xiang, Zhenyong Fu, and Shaogang Gong. 2015. Unsupervised domain adaptation for zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 2452–2460.
[22]
Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 3174–3183.
[23]
Christoph H. Lampert, Hannes Nickisch, and Stefan Harmeling. 2013. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2013), 453–465.
[24]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436–444.
[25]
Jingjing Li, Mengmeng Jing, Ke Lu, Lei Zhu, Yang Yang, and Zi Huang. 2019. Alleviating feature confusion for generative zero-shot learning. In Proceedings of the 27th ACM International Conference on Multimedia. ACM, New York, NY, USA, 1587–1595.
[26]
Kai Li, Martin Renqiang Min, and Yun Fu. 2019. Rethinking zero-shot learning: A conditional visual classification perspective. In Proceedings of the IEEE/CVF International Conference on Computer Vision. IEEE, Seoul, South Korea, 3583–3592.
[27]
Massimiliano Mancini, Muhammad Ferjad Naeem, Yongqin Xian, and Zeynep Akata. 2021. Open world compositional zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Nashville, TN, USA, 5222–5230.
[28]
Ashish Mishra, Shiva Krishna Reddy, Anurag Mittal, and Hema A. Murthy. 2018. A generative model for zero shot learning using conditional variational autoencoders. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. IEEE, Salt Lake City, Utah, USA, 2188–2196.
[29]
Pedro Morgado and Nuno Vasconcelos. 2017. Semantically consistent regularization for zero-shot recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, HI, USA, 6060–6069.
[30]
Arghya Pal and Vineeth N. Balasubramanian. 2019. Zero-shot task transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, California, USA, 2189–2198.
[31]
Yuxin Peng and Jinwei Qi. 2019. CM-GANs: Cross-modal generative adversarial networks for common representation learning. ACM Transactions on Multimedia Computing, Communications, and Applications 15, 1 (2019), 1–24.
[32]
Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, and Anton Van Den Hengel. 2016. Less is more: Zero-shot learning from online textual documents with noise suppression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 2249–2257.
[33]
Shafin Rahman, Salman Khan, and Nick Barnes. 2019. Deep0tag: Deep multiple instance learning for zero-shot image tagging. IEEE Transactions on Multimedia 22, 1 (2019), 242–255.
[34]
Bernardino Romera-Paredes and Philip Torr. 2015. An embarrassingly simple approach to zero-shot learning. In International Conference on Machine Learning. JMLR.org, Lille, France, 2152–2161.
[35]
Gerard Salton and Christopher Buckley. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management 24, 5 (1988), 513–523.
[36]
Edgar Schonfeld, Sayna Ebrahimi, Samarth Sinha, Trevor Darrell, and Zeynep Akata. 2019. Generalized zero- and few-shot learning via aligned variational autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. IEEE, Long Beach, CA, USA, 8247–8255.
[37]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[38]
Jie Song, Chengchao Shen, Yezhou Yang, Yang Liu, and Mingli Song. 2018. Transductive unbiased embedding for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, Utah, USA, 1024–1033.
[39]
Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE.Journal of Machine Learning Research 9, 11 (2008).
[40]
Grant Van Horn, Steve Branson, Ryan Farrell, Scott Haber, Jessie Barry, Panos Ipeirotis, Pietro Perona, and Serge Belongie. 2015. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Boston, Massachusetts, USA, 595–604.
[41]
Vinay Kumar Verma, Dhanajit Brahma, and Piyush Rai. 2020. Meta-learning for generalized zero-shot learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. New York, USA, 6062–6069.
[42]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The Caltech-UCSD birds-200-2011 dataset. (2011).
[43]
Wei Wang, Vincent W. Zheng, Han Yu, and Chunyan Miao. 2019. A survey of zero-shot learning: Settings, methods, and applications. ACM Transactions on Intelligent Systems and Technology 10, 2 (2019), 1–37.
[44]
Zheng Wang, Ruimin Hu, Chao Liang, Yi Yu, Junjun Jiang, Mang Ye, Jun Chen, and Qingming Leng. 2015. Zero-shot person re-identification via cross-view consistency. IEEE Transactions on Multimedia 18, 2 (2015), 260–272.
[45]
Junyuan Xie, Ross Girshick, and Ali Farhadi. 2016. Unsupervised deep embedding for clustering analysis. In International Conference on Machine Learning. PMLR, New York, USA, 478–487.
[46]
Wenju Xu, Shawn Keshmiri, and Guanghui Wang. 2019. Adversarially approximated autoencoder for image generation and manipulation. IEEE Transactions on Multimedia 21, 9 (2019), 2387–2396.
[47]
Xinchen Yan, Jimei Yang, Kihyuk Sohn, and Honglak Lee. 2016. Attribute2image: Conditional image generation from visual attributes. In European Conference on Computer Vision. Springer, Amsterdam, The Netherlands, 776–791.
[48]
Lei Zhang, Peng Wang, Lingqiao Liu, Chunhua Shen, Wei Wei, Yanning Zhang, and Anton Van Den Hengel. 2020. Towards effective deep embedding for zero-shot learning. IEEE Transactions on Circuits and Systems for Video Technology 30, 9 (2020), 2843–2852.
[49]
Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a deep embedding model for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Honolulu, Hawaii, USA, 2021–2030.
[50]
Ziming Zhang and Venkatesh Saligrama. 2015. Zero-shot learning via semantic similarity embedding. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, Santiago, Chile, 4166–4174.
[51]
Ziming Zhang and Venkatesh Saligrama. 2016. Zero-shot learning via joint latent similarity embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Las Vegas, Nevada, USA, 6034–6042.
[52]
Yizhe Zhu, Mohamed Elhoseiny, Bingchen Liu, Xi Peng, and Ahmed Elgammal. 2018. A generative adversarial approach for zero-shot learning from noisy texts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, USA, 1004–1013.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 1
January 2023
505 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3572858
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 January 2023
Online AM: 29 July 2022
Accepted: 25 January 2022
Revised: 22 October 2021
Received: 30 May 2021
Published in TOMM Volume 19, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Zero-shot learning
  2. generalized zero-shot learning
  3. autoencoder
  4. inductive zero-shot learning

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Science and Technology Project of Sichuan
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)146
  • Downloads (Last 6 weeks)15
Reflects downloads up to 22 Dec 2024

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media