skip to main content
research-article

Joint Augmented and Compressed Dictionaries for Robust Image Classification

Published: 24 February 2023 Publication History

Abstract

Dictionary-based Classification (DC) has been a promising learning theory in multimedia computing. Previous studies focused on learning a discriminative dictionary as well as the sparsest representation based on the dictionary, to cope with the complex conditions in real-world applications. However, robustness by learning only one single dictionary is far from the optimal level. What is worse, it cannot take advantage of the available techniques proven in modern machine learning, like data augmentation, to mitigate the same problem. In this work, we propose a novel method that utilizes joint Augmented and Compressed Dictionaries for Robust Dictionary-based Classification (ACD-RDC). For optimization under the noise model introduced by real-world conditions, the objective function of ACD-RDC incorporates only two simple, but well-designed constraints, including one enhanced sparsity constraint by the general data augmentation, which requires less case-by-case and sophisticated tuning, and another discriminative constraint solved by a jointly learned dictionary. The optimization of the objective function is then deduced theoretically to an approximate linear problem. The sparsity and discrimination enhanced by data augmentation guarantees the robustness for image classification under various conditions, which constructs the first positive case using data augmentation to obtain robust dictionary-based classification. Numerous experiments have been conducted on popular facial and object image datasets. The results demonstrate that ACD-RDC obtains more promising classification on diversely collected images than the current dictionary-based classification methods. ACD-RDC is also confirmed to be a state-of-the-art classification method when using deep features as inputs.

References

[1]
Michal Aharon, Michael Elad, and Alfred Bruckstein. 2006. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing 54, 11 (2006), 4311.
[2]
Naveed Akhtar, Ajmal Mian, and Fatih Porikli. 2017a. Joint discriminative Bayesian dictionary and classifier learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1193–1202.
[3]
Naveed Akhtar, Faisal Shafait, and Ajmal Mian. 2017b. Efficient classification with sparsity augmented collaborative representation. Pattern Recognition 65 (2017), 136–145.
[4]
Sijia Cai, Lei Zhang, Wangmeng Zuo, and Xiangchu Feng. 2016. A probabilistic collaborative representation based approach for pattern classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2950–2959.
[5]
E. J. Candes and T. Tao. 2005. Decoding by linear programming. IEEE Transactions on Information Theory 51, 12 (2005), 4203–4215.
[6]
Bo Chen, Hao Zhang, Xuefeng Zhang, Wei Wen, Hongwei Liu, and Jun Liu. 2015. Max-margin discriminant projection via data augmentation. IEEE Transactions on Knowledge and Data Engineering 27, 7 (2015), 1964–1976.
[7]
Zitian Chen, Yanwei Fu, Kaiyu Chen, and Yu-Gang Jiang. 2019. Image block augmentation for one-shot learning. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Vol. 6. 1–8.
[8]
Adam Coates, Andrew Y. Ng, and Honglak Lee. 2011. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Vol. 15. 215–223.
[9]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.
[10]
David L. Donoho. 2006. For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution. Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 59, 6 (2006), 797–829.
[11]
D. L. Donoho, Y. Tsaig, I. Drori, and J. L. Starck. 2012. Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Transactions on Information Theory 58, 2 (2012), 1094–1121.
[12]
Nikita Dvornik, Julien Mairal, and Cordelia Schmid. 2018. Modeling visual context is key to augmenting object detection datasets. In Proceedings of the European Conference on Computer Vision (ECCV’18). 364–380.
[13]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in Neural Information Processing Systems. 2672–2680.
[14]
Jianping Gou, Hongxing Ma, Weihua Ou, Shaoning Zeng, Yunbo Rao, and Hebiao Yang. 2019. A generalized mean distance-based k-nearest neighbor classifier. Expert Systems with Applications 115 (2019), 356–372.
[15]
Jianping Gou, Lei Wang, Zhang Yi, Yunhao Yuan, Weihua Ou, and Qirong Mao. 2020. Weighted discriminative collaborative competitive representation for robust image classification. Neural Networks 125 (2020), 104–120.
[16]
Bharath Hariharan and Ross Girshick. 2017. Low-shot visual recognition by shrinking and hallucinating features. In Proceedings of the IEEE International Conference on Computer Vision. 3018–3027.
[17]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[18]
James P. Hobert. 2011. The data augmentation algorithm: Theory and methodology. In Handbook of Markov Chain Monte Carlo (2011), 253–293.
[19]
Ke Huang and Selin Aviyente. 2007. Sparse representation for signal classification. In Advances in Neural Information Processing Systems. 609–616.
[20]
Niall Hurley and Scott Rickard. 2009. Comparing measures of sparsity. IEEE Transactions on Information Theory 55, 10 (2009), 4723–4741.
[21]
Zhuolin Jiang, Zhe Lin, and Larry S. Davis. 2013. Label consistent K-SVD: Learning a discriminative dictionary for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 11 (2013), 2651–2664.
[22]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2009. Learning multiple layers of features from tiny images. arXiv (2009).
[23]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097–1105.
[24]
Abhisek Kundu, Petros Drineas, and Malik Magdon-Ismail. 2017. Recovering PCA and sparse PCA via hybrid-(L1,L2) sparse sampling of data elements. The Journal of Machine Learning Research 18, 1 (2017), 2558–2591.
[25]
Sheng Li, Ming Shao, and Yun Fu. 2018. Person re-identification by cross-view multi-level dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 12 (2018), 2963–2977.
[26]
Zhengming Li, Zhihui Lai, Yong Xu, Jian Yang, and David Zhang. 2017. A locality-constrained and label embedding dictionary learning algorithm for image classification. IEEE Transactions on Neural Networks and Learning Systems 28, 2 (2017), 278–293.
[27]
Florence Jessie MacWilliams and Neil James Alexander Sloane. 1977. The Theory of Error-correcting Codes. Vol. 16. Elsevier.
[28]
Julien Mairal, Francis Bach, and Jean Ponce. 2011. Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 4 (2011), 791–804.
[29]
Aleix M. Martinez. 1998. The AR Face Database. CVC Technical Report 24 (1998).
[30]
Sameer A. Nene, Shree K. Nayar, and Hiroshi Murase. 1996. Columbia Object Image Library (coil-20).
[31]
Onur Ozdemir, Thomas G. Allen, Sora Choi, Thakshila Wimalajeewa, and Pramod K. Varshney. 2018. Copula based classifier fusion under statistical dependence. IEEE Transactions on Pattern Analysis and Machine Intelligence 40, 11 (2018), 2740–2748.
[32]
Gregor Pirš and Erik Štrumbelj. 2019. Bayesian combination of probabilistic classifiers using multivariate normal mixtures. Journal of Machine Learning Research 20, 51 (2019), 1–18.
[33]
Nicholas G. Polson and Steven L. Scott. 2011. Data augmentation for support vector machines. Bayesian Analysis 6, 1 (2011), 1–23.
[34]
Wenjie Ruan, Min Wu, Youcheng Sun, Xiaowei Huang, Daniel Kroening, and Marta Kwiatkowska. 2019. Global robustness evaluation of deep neural networks with provable guarantees for the hamming distance. In International Joint Conference on Artificial Intelligence. Early Access.
[35]
Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 815–823.
[36]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014).
[37]
Martin A. Tanner and Wing Hung Wong. 1987. The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association 82, 398 (1987), 528–540.
[38]
Shaozhe Tao, Daniel Boley, and Shuzhong Zhang. 2016. Local linear convergence of ISTA and FISTA on the LASSO problem. SIAM Journal on Optimization 26, 1 (2016), 313–336.
[39]
Luan Quoc Tran, Xi Yin, and Xiaoming Liu. 2018. Representation learning by rotating your faces. IEEE Transactions on Pattern Analysis and Machine Intelligence Early Access (2018), 1–14. DOI:
[40]
Toan Tran, Trung Pham, Gustavo Carneiro, Lyle Palmer, and Ian Reid. 2017. A Bayesian data augmentation approach for learning deep models. In Advances in Neural Information Processing Systems. 2797–2806.
[41]
David A. Van Dyk and Xiao-Li Meng. 2001. The art of data augmentation. Journal of Computational and Graphical Statistics 10, 1 (2001), 1–50.
[42]
Hua Wang, Feiping Nie, Heng Huang, and Chris Ding. 2013. Heterogeneous visual features fusion via sparse multimodal machine. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3097–3102.
[43]
Yu-Xiong Wang, Ross Girshick, Martial Hebert, and Bharath Hariharan. 2018. Low-shot learning from imaginary data. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7278–7286.
[44]
Jie Wen, Yong Xu, and Hong Liu. 2020. Incomplete multiview spectral clustering with adaptive graph learning. IEEE Transactions on Cybernetics 50, 4 (2020), 1418–1429.
[45]
L. Wolf, T. Hassner, and I. Maoz. 2011. Face recognition in unconstrained videos with matched background similarity. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 529–534.
[46]
John Wright, Yi Ma, Julien Mairal, Guillermo Sapiro, Thomas S. Huang, and Shuicheng Yan. 2010. Sparse representation for computer vision and pattern recognition. Proc. IEEE 98, 6 (2010), 1031–1044.
[47]
John Wright, Allen Y. Yang, Arvind Ganesh, Shankar S. Sastry, and Yi Ma. 2009. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence 31, 2 (2009), 210–227.
[48]
Yongqin Xian, Tobias Lorenz, Bernt Schiele, and Zeynep Akata. 2018. Feature generating networks for zero-shot learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5542–5551.
[49]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:cs.LG/cs.LG/1708.07747
[50]
Jianxiong Xiao, Krista A. Ehinger, James Hays, Antonio Torralba, and Aude Oliva. 2016. Sun database: Exploring a large collection of scene categories. International Journal of Computer Vision 119, 1 (2016), 3–22.
[51]
Chang Xu, Dacheng Tao, and Chao Xu. 2015. Multi-view learning with incomplete views. IEEE Transactions on Image Processing 24, 12 (2015), 5812–5825.
[52]
Yong Xu, Zhengming Li, Jian Yang, and David Zhang. 2017a. A survey of dictionary learning algorithms for face recognition. IEEE Access 5 (2017), 8502–8514.
[53]
Yong Xu, Zhengming Li, Bob Zhang, Jian Yang, and Jane You. 2017b. Sample diversity, representation effectiveness and robust dictionary learning for face recognition. Information Sciences 375 (2017), 171–182.
[54]
Yong Xu, Qi Zhu, Yan Chen, Jeng-Shyang Pan, et al. 2013. An improvement to the nearest neighbor classifier and face recognition experiments. Int. J. Innov. Comput. Inf. Control 9, 2 (2013), 543–554.
[55]
Shin’ya Yamaguchi, Sekitoshi Kanai, and Takeharu Eda. 2020. Effective data augmentation with multi-domain learning GANs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 6566–6574.
[56]
Chenggang Yan, Liang Li, Chunjie Zhang, Bingtao Liu, Yongdong Zhang, and Qionghai Dai. 2019. Cross-modality bridging and knowledge transferring for image understanding. IEEE Transactions on Multimedia 21, 10 (2019), 2675–2685.
[57]
Zhenguo Yang, Qing Li, Liu Wenyin, and Jianming Lv. 2019. Shared multi-view data representation for multi-domain event detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, early access (2019), 1–14.
[58]
Shan You, Chang Xu, Chao Xu, and Dacheng Tao. 2018. Learning with single-teacher multi-student. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 4390–4397.
[59]
Shaoning Zeng, Bob Zhang, Jianping Gou, and Yong Xu. 2020. Regularization on augmented data to diversify sparse representation for robust image classification. IEEE Transactions on Cybernetics (2020). DOI:DOI:
[60]
Shaoning Zeng, Bob Zhang, Yanghao Zhang, and Jianping Gou. 2018. Collaboratively weighting deep and classic representation via \(l_2\) regularization for image classification. In Proceedings of the Asian Conference on Machine Learning. 502–517.
[61]
Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, and Yu Qiao. 2016. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters 23, 10 (2016), 1499–1503.
[62]
Lei Zhang, Meng Yang, and Xiangchu Feng. 2011. Sparse representation or collaborative representation: Which helps face recognition?. In Proceedings of the 2011 International Conference on Computer Vision. IEEE, 471–478.
[63]
Qiang Zhang and Baoxin Li. 2010. Discriminative K-SVD for dictionary learning in face recognition. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. IEEE, 2691–2698.
[64]
Yan Zhang, Yanyun Qu, Cuihua Li, Yunqi Lei, and Jianping Fan. 2019. Ontology-driven hierarchical sparse coding for large-scale image classification. Neurocomputing 360 (2019), 209–219.
[65]
Zheng Zhang, Yong Xu, Jian Yang, Xuelong Li, and David Zhang. 2015. A survey of sparse representation: Algorithms and applications. IEEE Access 3 (2015), 490–530.
[66]
Zhun Zhong, Liang Zheng, Zhedong Zheng, Shaozi Li, and Yi Yang. 2019. CamStyle: A novel data augmentation method for person re-identification. IEEE Transactions on Image Processing 28, 3 (2019), 1176–1190.
[67]
Jun Zhu, Ning Chen, Hugh Perkins, and Bo Zhang. 2014. Gibbs max-margin topic models with data augmentation. The Journal of Machine Learning Research 15, 1 (2014), 1073–1110.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 19, Issue 3s
June 2023
270 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3582887
  • Editor:
  • Abdulmotaleb El Saddik
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2023
Online AM: 01 December 2022
Accepted: 20 November 2022
Revised: 05 April 2022
Received: 06 June 2021
Published in TOMM Volume 19, Issue 3s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Classification and regression
  2. inference algorithms
  3. supervised learning
  4. machine learning

Qualifiers

  • Research-article

Funding Sources

  • University of Macau
  • Science and Technology Project of Sichuan
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 182
    Total Downloads
  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)8
Reflects downloads up to 23 Dec 2024

Other Metrics

Citations

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media