skip to main content
10.1145/3394171.3414044acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

Data-driven Meta-set Based Fine-Grained Visual Recognition

Published: 12 October 2020 Publication History

Abstract

Constructing fine-grained image datasets typically requires domain-specific expert knowledge, which is not always available for crowd-sourcing platform annotators. Accordingly, learning directly from web images becomes an alternative method for fine-grained visual recognition. However, label noise in the web training set can severely degrade the model performance. To this end, we propose a data-driven meta-set based approach to deal with noisy web images for fine-grained recognition. Specifically, guided by a small amount of clean meta-set, we train a selection net in a meta-learning manner to distinguish in- and out-of-distribution noisy images. To further boost the robustness of the model, we also learn a labeling net to correct the labels of in-distribution noisy data. In this way, our proposed method can alleviate the harmful effects caused by out-of-distribution noise and properly exploit the in-distribution noisy samples for training. Extensive experiments on three commonly used fine-grained datasets demonstrate that our approach is much superior to state-of-the-art noise-robust methods.

Supplementary Material

MP4 File (3394171.3414044.mp4)
This is the presentation of paper `Data-driven Meta-set Based Fine-Grained Visual Recognition?. It briefly introduces the research challenge and describes the corresponding solution. If you are interested in detailed learning mechanism, please read our paper.

References

[1]
Devansh Arpit, Stanislaw Jastrzbski, Nicolas Ballas, David Krueger, Emmanuel Bengio, Maxinder S Kanwal, Tegan Maharaj, Asja Fischer, Aaron Courville, Yoshua Bengio, et al. 2017. A closer look at memorization in deep networks. In International Conference on Machine Learning. 233--242.
[2]
Steve Branson, Grant Van Horn, Serge Belongie, and Pietro Perona. 2014. Bird species categorization using pose normalized deep convolutional nets. In British Machine Vision Conference. 1--14.
[3]
Tao Chen, Jian Zhang, Guo-Sen Xie, Yazhou Yao, Xiaoshui Huang, and Zhenmin Tang. 2020. Classification Constrained Discriminator For Domain Adaptive Semantic Segmentation. In IEEE International Conference on Multimedia and Expo. 1--6.
[4]
Yue Chen, Yalong Bai, Wei Zhang, and Tao Mei. 2019. Destruction and construction learning for fine-grained image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 5157--5166.
[5]
Yin Cui, Feng Zhou, Yuanqing Lin, and Serge Belongie. 2016. Fine-grained categorization and dataset bootstrapping using deep metric learning with humans in the loop. In IEEE Conference on Computer Vision and Pattern Recognition. 1153--1162.
[6]
Jianlong Fu, Heliang Zheng, and Tao Mei. 2017. Look closer to see better: Recurrent attention convolutional neural network for fine-grained image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 4438--4446.
[7]
Weifeng Ge, Xiangru Lin, and Yizhou Yu. 2019. Weakly Supervised Complementary Parts Models for Fine-Grained Image Classification from the Bottom Up. In IEEE Conference on Computer Vision and Pattern Recognition. 3034--3043.
[8]
Jacob Goldberger and Ehud Ben-Reuven. 2016. Training deep neural-networks using a noise adaptation layer. (2016).
[9]
Bo Han, Quanming Yao, Xingrui Yu, Gang Niu, Miao Xu, Weihua Hu, Ivor Tsang, and Masashi Sugiyama. 2018. Co-teaching: Robust training of deep neural networks with extremely noisy labels. In The Conference and Workshop on Neural Information Processing Systems. 8527--8537.
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[11]
Xiangteng He and Yuxin Peng. 2017. Fine-grained image classification via combining vision and language. In IEEE Conference on Computer Vision and Pattern Recognition. 5994--6002.
[12]
Dan Hendrycks, Mantas Mazeika, Duncan Wilson, and Kevin Gimpel. 2018. Using trusted data to train deep networks on labels corrupted by severe noise. In Advances in neural information processing systems. 10456--10465.
[13]
Xian-sheng Hua, Fumin Shen, Jian Zhang, and Zhenmin Tang. 2016. A domain robust approach for image dataset construction. In ACM international conference on Multimedia. 212--216.
[14]
Shaoli Huang, Zhe Xu, Dacheng Tao, and Ya Zhang. 2016. Part-stacked cnn for fine-grained visual categorization. In IEEE Conference on Computer Vision and Pattern Recognition. 1173--1182.
[15]
Dimitri Korsch, Paul Bodesheim, and Joachim Denzler. 2019. ClassificationSpecific Parts for Improving Fine-Grained Visual Categorization. arXiv preprint arXiv:1909.07075 (2019).
[16]
Jonathan Krause, Benjamin Sapp, Andrew Howard, Howard Zhou, Alexander Toshev, Tom Duerig, James Philbin, and Li Fei-Fei. 2016. The unreasonable effectiveness of noisy data for fine-grained recognition. In European Conference on Computer Vision. 301--320.
[17]
Jonathan Krause, Michael Stark, Jia Deng, and Li Fei-Fei. 2013. 3d object representations for fine-grained categorization. In IEEE International Conference on Computer Vision. 554--561.
[18]
Michael Lam, Behrooz Mahasseni, and Sinisa Todorovic. 2017. Fine-grained recognition as hsnet search for informative image parts. In IEEE Conference on Computer Vision and Pattern Recognition. 2520--2529.
[19]
Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, and Li-Jia Li. 2017. Learning from noisy labels with distillation. In IEEE International Conference on Computer Vision. 1910--1918.
[20]
Zechao Li, Jinhui Tang, and Tao Mei. 2018. Deep collaborative embedding for social image understanding. IEEE transactions on pattern analysis and machine intelligence 41, 9 (2018), 2070--2083.
[21]
Zechao Li, Jinhui Tang, Liyan Zhang, and Jian Yang. 2020. Weakly-supervised Semantic Guided Hashing for Social Image Retrieval. International Journal of Computer Vision (2020).
[22]
Tsung-Yu Lin, Aruni RoyChowdhury, and Subhransu Maji. 2015. Bilinear cnn models for fine-grained visual recognition. In IEEE International Conference on Computer Vision. 1449--1457.
[23]
Ilya Loshchilov and Frank Hutter. 2016. Sgdr: Stochastic gradient descent with warm restarts. In International Conference on Learning Representations. 1--16.
[24]
Jiarou Lu, Huafeng Liu, Yazhou Yao, Shuyin Tao, Zhenming Tang, and Jianfeng Lu. 2020. Hsi Road: A Hyper Spectral Image Dataset For Road Segmentation. In IEEE International Conference on Multimedia and Expo. 1--6.
[25]
Haonan Luo, Guosheng Lin, Zichuan Liu, Fayao Liu, Zhenmin Tang, and Yazhou Yao. 2019. Segeqa: Video segmentation based visual attention for embodied question answering. In IEEE International Conference on Computer Vision. 9667--9676.
[26]
Subhransu Maji, Esa Rahtu, Juho Kannala, Matthew Blaschko, and Andrea Vedaldi. 2013. Fine-grained visual classification of aircraft. arXiv (2013).
[27]
Eran Malach and Shai Shalev-Shwartz. 2017. Decoupling" when to update" from" how to update". In The Conference and Workshop on Neural Information Processing Systems. 960--970.
[28]
Li Niu, Wen Li, and Dong Xu. 2015. Visual recognition by learning from web data: A weakly supervised domain generalization approach. In IEEE Conference on Computer Vision and Pattern Recognition. 2774--2783.
[29]
Li Niu, Ashok Veeraraghavan, and Ashutosh Sabharwal. 2018. Webly supervised learning meets zero-shot learning: A hybrid approach for fine-grained classification. In IEEE Conference on Computer Vision and Pattern Recognition. 7171--7180.
[30]
Yuxin Peng, Xiangteng He, and Junjie Zhao. 2017. Object-part attention model for fine-grained image classification. IEEE Transactions on Image Processing 27, 3 (2017), 1487--1500.
[31]
Scott Reed, Honglak Lee, Dragomir Anguelov, Christian Szegedy, Dumitru Erhan, and Andrew Rabinovich. 2014. Training deep neural networks on noisy labelswith bootstrapping. arXiv (2014).
[32]
Mengye Ren, Wenyuan Zeng, Bin Yang, and Raquel Urtasun. 2018. Learning to reweight examples for robust deep learning. In International Conference on Machine Learning.
[33]
Jun Shu, Qi Xie, Lixuan Yi, Qian Zhao, Sanping Zhou, Zongben Xu, and Deyu Meng. 2019. Meta-weight-net: Learning an explicit mapping for sample weighting. In Advances in Neural Information Processing Systems. 1917--1928.
[34]
Xiangbo Shu, Jinhui Tang, Guojun Qi, Wei Liu, and Jian Yang. 2019. Hierarchical long short-term concurrent memory for human interaction recognition. IEEE transactions on pattern analysis and machine intelligence (2019).
[35]
Hwanjun Song, Minseok Kim, and Jae-Gil Lee. 2019. SELFIE: Refurbishing Unclean Samples for Robust Deep Learning. In International Conference on Machine Learning. 5907--5915.
[36]
Zeren Sun, Fumin Shen, Li Liu, and Limin et al. Wang. 2019. Dynamically visual disambiguation of keyword-based image search. International Joint Conference on Artificial Intelligence (2019), 996--1002.
[37]
Jinhui Tang, Zechao Li, Hanjiang Lai, Liyan Zhang, Shuicheng Yan, et al. 2017. Personalized age progression with bi-level aging dictionary learning. IEEE transactions on pattern analysis and machine intelligence 40, 4 (2017), 905--917.
[38]
Grant Van Horn, Steve Branson, Ryan Farrell, Scott Haber, Jessie Barry, Panos Ipeirotis, Pietro Perona, and Serge Belongie. 2015. Building a bird recognition app and large scale dataset with citizen scientists: The fine print in fine-grained dataset collection. In IEEE Conference on Computer Vision and Pattern Recognition. 595--604.
[39]
Catherine Wah, Steve Branson, Peter Welinder, Pietro Perona, and Serge Belongie. 2011. The caltech-ucsd birds-200--2011 dataset. (2011).
[40]
Yaming Wang, Vlad I Morariu, and Larry S Davis. 2018. Learning a discriminative filter bank within a CNN for fine-grained recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 4148--4157.
[41]
Xiu-Shen Wei, Chen-Wei Xie, Jianxin Wu, and Chunhua Shen. 2018. MaskCNN: Localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognition 76 (2018), 704--714.
[42]
Tong Xiao, Tian Xia, Yi Yang, Chang Huang, and Xiaogang Wang. 2015. Learning from massive noisy labeled data for image classification. In IEEE Conference on Computer Vision and Pattern Recognition. 2691--2699.
[43]
Guo-Sen Xie, Li Liu, Xiaobo Jin, Fan Zhu, Zheng Zhang, Jie Qin, Yazhou Yao, and Ling Shao. 2019. Attentive region embedding network for zero-shot learning. In IEEE Conference on Computer Vision and Pattern Recognition. 9384--9393.
[44]
Guo-Sen Xie, Li Liu, Fan Zhu, Fang Zhao, Zheng Zhang, Yazhou Yao, Jie Qin, and Ling Shao. 2020. Region Graph Embedding Network for Zero-Shot Learning. In European Conference on Computer Vision.
[45]
Saining Xie, Tianbao Yang, Xiaoyu Wang, and Yuanqing Lin. 2015. Hyper-class augmented and regularized deep learning for fine-grained image classification. In IEEE Winter Conference on Applications of Computer Vision. 2645--2654.
[46]
Zhe Xu, Shaoli Huang, Ya Zhang, and Dacheng Tao. 2015. Augmenting strong supervision using web data for fine-grained categorization. In IEEE International Conference on Computer Vision. 2524--2532.
[47]
Zhe Xu, Shaoli Huang, Ya Zhang, and Dacheng Tao. 2016. Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 5 (2016), 1100--1113.
[48]
Hantao Yao, Shiliang Zhang, Yongdong Zhang, Jintao Li, and Qi Tian. 2016. Coarse-to-fine description for fine-grained visual categorization. IEEE Transactions on Image Processing 25, 10 (2016), 4858--4872.
[49]
Yazhou Yao, Fumin Shen, Guosen Xie, Li Liu, Fan Zhu, Jian Zhang, and Heng Tao Shen. 2020. Exploiting web images for multi-output classification: From category to subcategories. IEEE Transactions on Neural Networks and Learning Systems 31, 7 (2020), 2348--2360.
[50]
Yazhou Yao, Fumin Shen, Jian Zhang, Li Liu, Zhenmin Tang, and Ling Shao. 2018. Extracting multiple visual senses for web learning. IEEE Transactions on Multimedia 21, 1 (2018), 184--196.
[51]
Yazhou Yao, Fumin Shen, Jian Zhang, Li Liu, Zhenmin Tang, and Ling Shao. 2018. Extracting privileged information for enhancing classifier learning. IEEE Transactions on Image Processing 28, 1 (2018), 436--450.
[52]
Yazhou Yao, Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu, and Zhenmin Tang. 2017. Exploiting web images for dataset construction: A domain robust approach. IEEE Transactions on Multimedia 19, 8 (2017), 1771--1784.
[53]
Yazhou Yao, Jian Zhang, Fumin Shen, Li Liu, Fan Zhu, Dongxiang Zhang, and Heng Tao Shen. 2019. Towards automatic construction of diverse, high-quality image datasets. IEEE Transactions on Knowledge and Data Engineering 32, 6 (2019), 1199--1211.
[54]
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. 2016. Understanding deep learning requires rethinking generalization. In International Conference on Learning Representations. 1--15.
[55]
Chuanyi Zhang, Yazhou Yao, Huafeng Liu, Guo-Sen Xie, Xiangbo Shu, Tianfei Zhou, Zheng Zhang, Fumin Shen, and Zhenmin Tang. 2020. Web-Supervised Network with Softly Update-Drop Training for Fine-Grained Visual Classification. In AAAI Conference on Artificial Intelligence. 12781--12788.
[56]
Chuanyi Zhang, Yazhou Yao, Jiachao Zhang, Jiaxin Chen, and et al. 2020. WebSupervised Network for Fine-Grained Visual Classification. In IEEE International Conference on Multimedia and Expo. 1--6.
[57]
Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu, and Zhenmin Tang. 2016. Automatic image dataset construction with multiple textual metadata. In IEEE International Conference on Multimedia and Expo. 1--6.
[58]
Jian Zhang, Fumin Shen, Xiansheng Hua, Jingsong Xu, and Zhenmin Tang. 2017. A new web-supervised method for image dataset constructions. Neurocomputing 236 (2017), 23--31.
[59]
Jian Zhang, Fumin Shen, Wankou Yang, Xian-Sheng Hua, and Zhenmin Tang. 2018. Extracting Privileged Information from Untagged Corpora for Classifier Learning. In International Joint Conference on Artificial Intelligence. 1085--1091.
[60]
Jian Zhang, Fumin Shen, Wankou Yang, Pu Huang, and Zhenmin Tang. 2018. Discovering and distinguishing multiple visual senses for polysemous words. In AAAI Conference on Artificial Intelligence. 523--530.
[61]
Ning Zhang, Jeff Donahue, Ross Girshick, and Trevor Darrell. 2014. Part-based R-CNNs for fine-grained category detection. In European Conference on Computer Vision. 834--849.
[62]
Xiaopeng Zhang, Hongkai Xiong, Wengang Zhou, Weiyao Lin, and Qi Tian. 2016. Picking deep filter responses for fine-grained image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 1134--1142.
[63]
Yu Zhang, Xiu-Shen Wei, Jianxin Wu, Jianfei Cai, Jiangbo Lu, Viet-Anh Nguyen, and Minh N Do. 2016. Weakly supervised fine-grained categorization with partbased image representation. IEEE Transactions on Image Processing 25, 4 (2016), 1713--1725.
[64]
Heliang Zheng, Jianlong Fu, Tao Mei, and Jiebo Luo. 2017. Learning multiattention convolutional neural network for fine-grained image recognition. In IEEE International Conference on Computer Vision. 5209--5217.
[65]
Heliang Zheng, Jianlong Fu, Zheng-Jun Zha, and Jiebo Luo. 2019. Looking for the devil in the details: Learning trilinear attention sampling network for finegrained image recognition. In IEEE Conference on Computer Vision and Pattern Recognition. 5012--5021.

Cited By

View all

Index Terms

  1. Data-driven Meta-set Based Fine-Grained Visual Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MM '20: Proceedings of the 28th ACM International Conference on Multimedia
    October 2020
    4889 pages
    ISBN:9781450379885
    DOI:10.1145/3394171
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 October 2020

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. fine-grained
    2. in-distribution
    3. label noise
    4. out-of-distribution

    Qualifiers

    • Research-article

    Funding Sources

    • Fundamental Research Funds for the Central Universities of China
    • National Key R&D Program of China
    • Natural Science Foundation of Jiangsu Province
    • National Natural Science Foundation of China

    Conference

    MM '20
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 2,145 of 8,556 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 09 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media