skip to main content
research-article

Generative Multi-Label Correlation Learning

Published: 20 February 2023 Publication History

Abstract

In real-world applications, a single instance could have more than one label. To solve this task, multi-label learning methods emerged in recent years. It is a more challenging problem for many reasons, such as complex label correlation, long-tail label distribution, and data shortage. In general, overcoming these challenges and bettering learning performance could be achieved by utilizing more training samples and including label correlations. However, these solutions are expensive and inflexible. Large-scale, well-labeled datasets are difficult to obtain, and building label correlation maps requires task-specific semantic information as prior knowledge. To address these limitations, we propose a general and compact Multi-Label Correlation Learning (MUCO) framework. MUCO explicitly and effectively learns the latent label correlations by updating a label correlation tensor, which provides highly accurate and interpretable prediction results. In addition, a multi-label generative strategy is deployed to handle the long-tail label distribution challenge. It borrows the visual clues from limited samples and synthesizes more diverse samples. All networks in our model are optimized simultaneously. Extensive experiments illustrate the effectiveness and efficiency of MUCO. Ablation studies further prove the effectiveness of all the modules.

References

[1]
Mikhail Belkin, Irina Matveeva, and Partha Niyogi. 2004. Regularization and semi-supervised learning on large graphs. In Proceedings of the Association for Computational Learning. 624–638.
[2]
Matthew R. Boutell, Jiebo Luo, Xipeng Shen, and Christopher M. Brown. 2004. Learning multi-label scene classification. Pattern Recognition 37, 9 (2004), 1757–1771.
[3]
Wei-Cheng Chang, Hsiang-Fu Yu, Kai Zhong, Yiming Yang, and Inderjit S. Dhillon. 2020. Taming pretrained transformers for extreme multi-label text classification. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 3163–3171.
[4]
Tong Che, Yanran Li, Athul Paul Jacob, Yoshua Bengio, and Wenjie Li. 2017. Mode regularized generative adversarial networks. In Proceedings of the International Conference on Learning Representations.
[5]
Minmin Chen, Alice Zheng, and Kilian Weinberger. 2013. Fast image tagging. In Proceedings of the International Conference on Machine Learning. 1274–1282.
[6]
Shang-Fu Chen, Yi-Chen Chen, Chih-Kuan Yeh, and Yu-Chiang Frank Wang. 2018. Order-free rnn with visual attention for multi-label classification. In proceedings of the AAAI Conference on Artificial Intelligence.
[7]
Pinar Duygulu, Kobus Barnard, Joao F. G. de Freitas, and David A. Forsyth. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of the European Conference on Computer Vision. 97–112.
[8]
Weifeng Ge, Sibei Yang, and Yizhou Yu. 2018. Multi-Evidence filtering and fusion for multi-Label classification, object detection and semantic segmentation based on weakly supervised learning. In Proceedings of the IEEE Computer Vision and Pattern Recognition.
[9]
Nadia Ghamrawi and Andrew McCallum. 2005. Collective multi-label classification. In Proceedings of the ACM Conference on Information and Knowledge Management. 195–200.
[10]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Proceedings of the Neural Information Processing Systems. 2672–2680.
[11]
Michael Grubinger, Paul Clough, Henning Müller, and Thomas Deselaers. 2006. The IAPR TC-12 benchmark: A new evaluation resource for visual information systems. In Proceedings of the OntoImage.
[12]
Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. 2009. TagProp: Discriminative metric learning in nearest neighbor models for image annotation. In Proceedings of the IEEE International Conference on Computer Vision. 309–316.
[13]
Baolin Guo, Chenping Hou, Feiping Nie, and Dongyun Yi. 2016. Semi-supervised multi-label dimensionality reduction. In Proceedings of the IEEE International Conference on Data Mining. 919–924.
[14]
Ting Jiang, Deqing Wang, Leilei Sun, Huayi Yang, Zhengyang Zhao, and Fuzhen Zhuang. 2021. LightXML: Transformer with dynamic negative sampling for high-performance extreme multi-label text classification. 35 (2021), 7987–7994.
[15]
Feng Kang, Rong Jin, and Rahul Sukthankar. 2006. Correlated label propagation with application to multi-label learning. In Proceedings of the IEEE Computer Vision and Pattern Recognition. 1719–1726.
[16]
Diederik Kingma and Jimmy Ba. 2015. ADAM: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations.
[17]
Elyor Kodirov, Tao Xiang, and Shaogang Gong. 2017. Semantic autoencoder for zero-shot learning. In Proceedings of the IEEE Computer Vision and Pattern Recognition. 3174–3183.
[18]
Christoph H. Lampert, Hannes Nickisch, and Stefan Harmeling. 2014. Attribute-based classification for zero-shot visual object categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 3 (2014), 453–465.
[19]
Chung-Wei Lee, Wei Fang, Chih-Kuan Yeh, and Yu-Chiang Frank Wang. 2018. Multi-label zero-shot learning with structured knowledge graphs. In Proceedings of the IEEE Computer Vision and Pattern Recognition. 1576–1585.
[20]
Yi Liu, Rong Jin, and Liu Yang. 2006. Semi-supervised multi-label learning by constrained non-negative matrix factorization. In Proceedings of the AAAI Conference on Artificial Intelligence. 421–426.
[21]
Qianqian Ma, Yang-Yu Liu, and Alex Olshevsky. 2020. Optimal Lockdown for Pandemic Control. arXiv:2010.12923. Retrieved from https://arxiv.org/abs/2010.12923.
[22]
Qianqian Ma, Yang-Yu Liu, and Alex Olshevsky. 2021. Optimal vaccine allocation for pandemic stabilization. arXiv:2109.04612. Retrieved from https://arxiv.org/abs/2109.04612.
[23]
Qianqian Ma and Alex Olshevsky. 2020. Adversarial crowdsourcing through robust rank-one matrix completion. In Proceedings of the Neural Information Processing Systems. 21841–21852.
[24]
Xudong Mao, Qing Li, Haoran Xie, Raymond Y. K. Lau, Zhen Wang, and Stephen Paul Smolley. 2017. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision.
[25]
Andrew McCallum. 1999. Multi-label text classification with a mixture model trained by EM. In Proceedings of the AAAI Conference on Artificial Intelligence. 1–7.
[26]
Mehdi Mirza and Simon Osindero. 2014. Conditional generative adversarial nets. arXiv:1411.1784. Retrieved from https://arxiv.org/abs/1411.1784.
[27]
Anshul Mittal, Kunal Dahiya, Sheshansh Agrawal, Deepak Saini, Sumeet Agarwal, Purushottam Kar, and Manik Varma. 2021. DECAF: Deep extreme classification with label features. In Proceedings of the ACM International Conference on Web Search and Data Mining. 49–57.
[28]
Anshul Mittal, Noveen Sachdeva, Sheshansh Agrawal, Sumeet Agarwal, Purushottam Kar, and Manik Varma. 2021. ECLARE: Extreme classification with label graph correlations. In Proceedings of the Web Conference. 3721–3732.
[29]
Vinod Nair and Geoffrey E. Hinton. 2010. Rectified linear units improve restricted boltzmann machines. In Proceedings of the International Conference on Machine Learning.
[30]
Augustus Odena, Christopher Olah, and Jonathon Shlens. 2017. Conditional image synthesis with auxiliary classifier GANs. In Proceedings of the Journal of Machine Learning Research. 2642–2651.
[31]
Sinno Jialin Pan and Qiang Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22, 10 (2010), 1345–1359.
[32]
Genevieve Patterson and James Hays. 2012. SUN attribute database: Discovering, annotating, and recognizing scene attributes. In Proceedings of the IEEE Computer Vision and Pattern Recognition. 2751–2758.
[33]
Guo-Jun Qi. 2020. Loss-sensitive generative adversarial networks on lipschitz densities. International Journal of Computer Vision 128, 5 (2020), 1118–1140.
[34]
Guo-Jun Qi, Xian-Sheng Hua, Yong Rui, Jinhui Tang, Tao Mei, and Hong-Jiang Zhang. 2007. Correlative multi-label video annotation. In Proceedings of the ACM International Conference on Multimedia. 17–26.
[35]
Can Qin, Lichen Wang, Qianqian Ma, Yu Yin, Huan Wang, and Yun Fu. 2021. Contradictory structure learning for semi-supervised domain adaptation. In Proceedings of the SIAM International Conference on Data Mining. 576–584.
[36]
Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. 2016. Improved techniques for training GANs. In Proceedings of the Neural Information Processing Systems. 2234–2242.
[37]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations.
[38]
Farbound Tai and Hsuan-Tien Lin. 2012. Multilabel classification with principal label space transformation. Neural Computation 24, 9 (2012), 2508–2542.
[39]
Laurens Van Der Maaten. 2014. Accelerating t-SNE using tree-based algorithms. Journal of Machine Learning Research 15, 1 (2014), 3221–3245.
[40]
Luis Von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the ACM Special Interest Group on Computer-human Interaction. 319–326.
[41]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. 2011. The Caltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001.
[42]
Lichen Wang, Zhengming Ding, and Yun Fu. 2018. Adaptive graph guided embedding for multi-label annotation. In Proceedings of the International Joint Conference on Artificial Intelligence. 2798–2804.
[43]
Lichen Wang, Zhengming Ding, and Yun Fu. 2021. Generic multi-label annotation via adaptive graph and marginalized augmentation. ACM Transactions on Knowledge Discovery from Data 16, 1 (2021), 1–20.
[44]
Lichen Wang, Zhengming Ding, Seungju Han, Jae-Joon Han, Changkyu Choi, and Yun Fu. 2019. Generative correlation discovery network for multi-label learning. In Proceedings of the IEEE International Conference on Data Mining. 588–597.
[45]
Lichen Wang, Bo Zong, Qianqian Ma, Wei Cheng, Jingchao Ni, Wenchao Yu, Yanchi Liu, Dongjin Song, Haifeng Chen, and Yun Fu. 2020. Inductive and unsupervised representation learning on graph structured objects. In Proceedings of the International Conference on Learning Representations.
[46]
Zhou Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. 2004. Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing 13, 4 (2004), 600–612.
[47]
Tong Wei and Yu-Feng Li. 2019. Does tail label help for large-scale multi-label learning? IEEE Transactions on Neural Networks and Learning Systems 31, 7 (2019), 2315–2324.
[48]
Baoyuan Wu, Weidong Chen, Peng Sun, Wei Liu, Bernard Ghanem, and Siwei Lyu. 2018. Tagging like humans: Diverse and distinct image annotation. In Proceedings of the IEEE Computer Vision and Pattern Recognition. 7967–7975.
[49]
Baoyuan Wu, Fan Jia, Wei Liu, Bernard Ghanem, and Siwei Lyu. 2018. Multi-label learning with missing labels using mixed dependency graphs. International Journal of Computer Vision 126 (2018), 875–896.
[50]
Baoyuan Wu, Siwei Lyu, and Bernard Ghanem. 2015. ML-MG: Multi-label learning with missing labels using a mixed graph. In Proceedings of the IEEE International Conference on Computer Vision. 4157–4165.
[51]
Tong Wu, Qingqiu Huang, Ziwei Liu, Yu Wang, and Dahua Lin. 2020. Distribution-balanced loss for multi-label classification in long-tailed datasets. In Proceedings of the European Conference on Computer Vision. 162–178.
[52]
Tong Wu, Ziwei Liu, Qingqiu Huang, Yu Wang, and Dahua Lin. 2021. Adversarial robustness under long-tailed distribution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8659–8668.
[53]
Xiangyun Zhao, Haoxiang Li, Xiaohui Shen, Xiaodan Liang, and Ying Wu. 2018. A modulation module for multi-task learning with applications in image retrieval. In Proceedings of the European Conference on Computer Vision.
[54]
Jun-Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision.
[55]
Xiaojin Zhu, Zoubin Ghahramani, and John D. Lafferty. 2003. Semi-supervised learning using Gaussian fields and harmonic functions. In Proceedings of the International Conference on Machine Learning. 912–919.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 2
February 2023
355 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3572847
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2023
Online AM: 06 June 2022
Accepted: 07 May 2022
Revised: 13 March 2022
Received: 18 November 2021
Published in TKDD Volume 17, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Correlation learning
  2. multi-label learning
  3. image annotation
  4. image retrieval

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)156
  • Downloads (Last 6 weeks)14
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media