skip to main content
10.1145/3240508.3240715acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Pseudo Transfer with Marginalized Corrupted Attribute for Zero-shot Learning

Published: 15 October 2018 Publication History

Abstract

Zero-shot learning (ZSL) aims to recognize unseen classes that are excluded from training classes. ZSL suffers from 1) Zero-shot bias (Z-Bias) --- model is biased towards seen classes because unseen data is inaccessible for training; 2) Zero-shot variance (Z-Variance) --- associating different images to same semantic embedding yields large associating error. To reduce Z-Bias, we propose a pseudo transfer mechanism, where we first synthesize the distribution of unseen data using semantic embeddings, then we minimize the mismatch between the seen distribution and the synthesized unseen distribution. To reduce Z-Variance, we implicitly corrupted one semantic embedding multiple times to generate image-wise semantic vectors, with which our model learn robust classifiers. Lastly, we integrate our Z-Bias and Z-variance reduction techniques with a linear ZSL model to show its usefulness. Our proposed model successfully overcomes the Z-bias and Z-variance problems. Extensive experiments on five benchmark datasets including ImageNet-1K demonstrate that our model outperforms the state-of-the-art methods with fast training.

References

[1]
Zeynep Akata, Florent Perronnin, Za"id Harchaoui, and Cordelia Schmid. 2016. Label-Embedding for Image Classification. IEEE Trans. Pattern Anal. Mach. Intell., Vol. 38, 7 (2016), 1425--1438.
[2]
Zeynep Akata, Scott Reed, Daniel Walter, Honglak Lee, and Bernt Schiele. 2015. Evaluation of output embeddings for fine-grained image classification. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2927--2936.
[3]
Maxime Bucher, Stéphane Herbin, and Frédéric Jurie. 2017. Generating visual representations for zero-shot classification. In International Conference on Computer Vision (ICCV) Workshops: TASK-CV: Transferring and Adapting Source Knowledge in Computer Vision .
[4]
Soravit Changpinyo, Wei-Lun Chao, Boqing Gong, and Fei Sha. 2016. Synthesized Classifiers for Zero-Shot Learning . In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR. IEEE, 5327--5336.
[5]
Wei-Lun Chao, Soravit Changpinyo, Boqing Gong, and Fei Sha. 2016. An Empirical Study and Analysis of Generalized Zero-Shot Learning for Object Recognition in the Wild . In Computer Vision textendash ECCV 2016. Springer, Cham, Cham, 52--68.
[6]
Minmin Chen, Kilian Q Weinberger, Zhixiang Eddie Xu, and Fei Sha. 2015. Marginalizing stacked linear denoising autoencoders. Journal of Machine Learning Research (2015).
[7]
A Farhadi, I Endres, D Hoiem, and D Forsyth. 2009. Describing objects by their attributes. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops) . IEEE, 1778--1785.
[8]
Andrea Frome, Gregory S. Corrado, Jonathon Shlens, Samy Bengio, Jeffrey Dean, Marc'Aurelio Ranzato, and Tomas Mikolov. 2013. DeViSE: A Deep Visual-Semantic Embedding Model. In Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013. Proceedings of a meeting held December 5--8, 2013, Lake Tahoe, Nevada, United States. 2121--2129.
[9]
Yanwei Fu, Timothy M Hospedales, Tao Xiang, Zhen-Yong Fu, and Shaogang Gong. 2014. Transductive Multi-view Embedding for Zero-Shot Recognition and Annotation. ECCV, Vol. 8690, Chapter 38 (2014), 584--599.
[10]
Yanwei Fu, Timothy M Hospedales, Tao Xiang, and Shaogang Gong. 2015a. Transductive Multi-View Zero-Shot Learning . IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 37, 11 (2015), 2332--2345.
[11]
Zhenyong Fu, Tao Xiang, Elyor Kodirov, and Shaogang Gong. 2015b. Zero-shot object recognition by semantic manifold distance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2635--2644.
[12]
Yunchao Gong, Qifa Ke, Michael Isard, and Svetlana Lazebnik. 2014. A multi-view embedding space for modeling internet images, tags, and their semantics. International journal of computer vision, Vol. 106, 2 (2014), 210--233.
[13]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672--2680.
[14]
Yuchen Guo, Guiguang Ding, Jungong Han, and Yue Gao. 2017. Synthesizing Samples for Zero-shot Learning. IJCAI (2017), 1774--1780.
[15]
Yuchen Guo, Guiguang Ding, Jungong Han, and Sheng Tang. 2018. Zero-shot Learning with Attribute Selection. AAAI18, Vol. 20, 40 (2018), 60.
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770--778.
[17]
Chen Huang, Chen Change Loy, and Xiaoou Tang. 2016. Local similarity-aware deep feature embedding. In Advances in Neural Information Processing Systems. 1262--1270.
[18]
Dinesh Jayaraman and Kristen Grauman. 2014. Zero-shot recognition with unreliable attributes. In Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, December 8--13 2014, Montreal, Quebec, Canada. 3464--3472.
[19]
Elyor Kodirov, Tao Xiang, Zhenyong Fu, and Shaogang Gong. 2015. Unsupervised domain adaptation for zero-shot learning. In Proceedings of the IEEE International Conference on Computer Vision. 2452--2460.
[20]
E. Kodirov, T. Xiang, and S. Gong. 2017. Semantic Autoencoder for Zero-Shot Learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 4447--4456.
[21]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105.
[22]
Christoph H Lampert, Hannes Nickisch, and Stefan Harmeling. 2014. Attribute-Based Classification for Zero-Shot Visual Object Categorization . IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 36, 3 (2014), 453--465.
[23]
Quoc Le and Tomas Mikolov. 2014. Distributed representations of sentences and documents. In International Conference on Machine Learning. 1188--1196.
[24]
Teng Long, Xing Xu, Fumin Shen, Li Liu, Ning Xie, and Yang Yang. 2017b. Zero-shot learning via discriminative representation extraction. Pattern Recognition Letters (2017).
[25]
Yang Long, Li Liu, Ling Shao, Fumin Shen, Guiguang Ding, and Jungong Han. 2017a. From Zero-shot Learning to Conventional Supervised Classification: Unseen Visual Data Synthesis. In Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1627--1636.
[26]
Laurens Maaten, Minmin Chen, Stephen Tyree, and Kilian Weinberger. 2013. Learning with Marginalized Corrupted Features. In Proceedings of the 30th International Conference on Machine Learning (Proceedings of Machine Learning Research), Sanjoy Dasgupta and David McAllester (Eds.), Vol. 28. PMLR, Atlanta, Georgia, USA, 410--418. http://proceedings.mlr.press/v28/vandermaaten13.html
[27]
Thomas Mensink, Jakob Verbeek, Florent Perronnin, and Gabriela Csurka. 2012. Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In Computer Vision--ECCV 2012 . Springer, 488--501.
[28]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems. 3111--3119.
[29]
Mohammad Norouzi, Tomas Mikolov, Samy Bengio, Yoram Singer, Jonathon Shlens, Andrea Frome, Greg Corrado, and Jeffrey Dean. 2014. Zero-Shot Learning by Convex Combination of Semantic Embeddings. In International Conference on Learning Representations (ICLR) .
[30]
Genevieve Patterson, Chen Xu, Hang Su, and James Hays. 2014. The SUN Attribute Database: Beyond Categories for Deeper Scene Understanding . International Journal of Computer Vision, Vol. 108, 1--2 (2014), 59--81.
[31]
Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) . 1532--1543.
[32]
Ruizhi Qiao, Lingqiao Liu, Chunhua Shen, and Anton van den Hengel. 2016. Less is More: Zero-Shot Learning from Online Textual Documents with Noise Suppression. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR . IEEE, 2249--2257.
[33]
Marcus Rohrbach, Sandra Ebert, and Bernt Schiele. 2013. Transfer learning in a transductive setting. In Advances in neural information processing systems. 46--54.
[34]
Marcus Rohrbach, Michael Stark, and Bernt Schiele. 2011. Evaluating knowledge transfer and zero-shot learning in a large-scale setting. In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 1641--1648.
[35]
Bernardino Romera-Paredes and Philip Torr. 2015. An embarrassingly simple approach to zero-shot learning . In International Conference on Machine Learning. 2152--2161.
[36]
Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. 2013. Zero-Shot Learning Through Cross-Modal Transfer. In Advances in Neural Information Processing Systems 26, C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger (Eds.). Curran Associates, Inc., 935--943.
[37]
Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, and Heng Tao Shen. 2017. From Deterministic to Generative: Multi-Modal Stochastic RNNs for Video Captioning. arXiv preprint arXiv:1708.02478 (2017).
[38]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200--2011 Dataset. Technical Report CNS-TR-2011-001. California Institute of Technology.
[39]
Peng Wang, Lingqiao Liu, Chunhua Shen, Zi Huang, Anton van den Hengel, and Heng Tao Shen. 2017. Multi-attention network for one shot learning. In 2017 IEEE conference on computer vision and pattern recognition, CVPR. 22--25.
[40]
Yongqin Xian, Zeynep Akata, Gaurav Sharma, Quynh Nguyen, Matthias Hein, and Bernt Schiele. 2016. Latent Embeddings for Zero-Shot Classification . In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR. IEEE, 69--77.
[41]
Yongqin Xian, Bernt Schiele, and Zeynep Akata. 2017. Zero-shot learning -- The Good, the Bad and the Ugly. In 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017) .
[42]
Xing Xu, Fumin Shen, Yang Yang 0002, Dongxiang Zhang, Heng Tao Shen, and Jingkuan Song. 2017a. Matrix Tri-Factorization with Manifold Regularizations for Zero-Shot Learning. CVPR (2017), 2007--2016.
[43]
Xing Xu, Fumin Shen, Yang Yang, Jie Shao, and Zi Huang. 2017b. Transductive Visual-Semantic Embedding for Zero-shot Learning. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval. ACM, 41--49.
[44]
Xing Xu, Fumin Shen, Yang Yang, Heng Tao Shen, and Xuelong Li. 2017c. Learning discriminative binary codes for large-scale cross-modal retrieval. IEEE Transactions on Image Processing, Vol. 26, 5 (2017), 2494--2507.
[45]
Yang Yang, Yi Yang, and Heng Tao Shen. 2013. Effective transfer tagging from image to video. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Vol. 9, 2 (2013), 14.
[46]
Li Zhang, Tao Xiang, and Shaogang Gong. 2017. Learning a Deep Embedding Model for Zero-Shot Learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 3010--3019.
[47]
Ziming Zhang and Venkatesh Saligrama. 2015. Zero-Shot Learning via Semantic Similarity Embedding. In ICCV .
[48]
Ziming Zhang and Venkatesh Saligrama. 2016. Zero-Shot Recognition via Structured Prediction . In Computer Vision textendash ECCV 2016. Springer, Cham, Cham, 533--548.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MM '18: Proceedings of the 26th ACM international conference on Multimedia
October 2018
2167 pages
ISBN:9781450356657
DOI:10.1145/3240508
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. marginalized corrupted attributes
  2. pseudo transfer
  3. zero-shot learning

Qualifiers

  • Research-article

Funding Sources

Conference

MM '18
Sponsor:
MM '18: ACM Multimedia Conference
October 22 - 26, 2018
Seoul, Republic of Korea

Acceptance Rates

MM '18 Paper Acceptance Rate 209 of 757 submissions, 28%;
Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)16
  • Downloads (Last 6 weeks)1
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media