Embedded Zero-Shot Image Classification Based on Bidirectional Feature Mapping
Abstract
:1. Introduction
2. Related Work
2.1. Embedded Zero-Shot Image Classification
2.2. Attention Mechanism
3. Model and Proposed Method
3.1. Motivation
3.2. Embedded Zero-Shot Image Classification Based on Bidirectional Feature Mapping
3.2.1. Feature Extraction Module
3.2.2. Feature Mapping Module
4. Results and Analysis
4.1. Datasets
4.2. Evaluation Protocols
4.3. Implementation Details
4.4. Comparision
4.5. Abaltion Studies
4.5.1. Visual Feature Extraction Network
4.5.2. Semantic Feature Extraction Network
4.5.3. Model Component Ablation Experiments
4.6. Hyperparametric Analysis
4.6.1. Category Calibration Loss Weight Analysis
4.6.2. Combined Coefficient Analysis
4.7. Attention Map Visualization Analysis
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Wang, P.; Fan, E.; Wang, P. Comparative analysis of image classification algorithms based on traditional machine learning and deep learning. Pattern Recognit. Lett. 2021, 141, 61–67. [Google Scholar] [CrossRef]
- Wang, X.; Peng, D.; Hu, P.; Gong, Y.; Chen, Y. Cross-domain alignment for zero-shot sketch-based image retrieval. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 7024–7035. [Google Scholar] [CrossRef]
- Liu, H.; Qin, Z. Deep quantization network with visual-semantic alignment for zero-shot image retrieval. Electron. Res. Arch. 2023, 31, 4232–4247. [Google Scholar] [CrossRef]
- Hong, M.; Zhang, X.; Li, G.; Huang, Q. Fine-grained feature generation for generalized zero-shot video classification. IEEE Trans. Image Process. 2023, 32, 1599–1612. [Google Scholar] [CrossRef] [PubMed]
- Tursun, O.; Denman, S.; Sridharan, S.; Goan, E.; Fookes, C. An efficient framework for zero-shot sketch-based image retrieval. Pattern Recognit. 2022, 126, 108528. [Google Scholar] [CrossRef]
- Liu, X.; Bai, S.; An, S.; Wang, S.; Liu, W.; Zhao, X.; Ma, Y. A meaningful learning method for zero-shot semantic segmentation. Sci. China Inf. Sci. 2023, 66, 210103. [Google Scholar] [CrossRef]
- Wang, Y.; Tian, Y. Exploiting multi-scale contextual prompt learning for zero-shot semantic segmentation. Displays 2024, 81, 102616. [Google Scholar] [CrossRef]
- Chen, Q.; Wang, W.; Huang, K.; Coenen, F. Zero-shot text classification via knowledge graph embedding for social media data. IEEE Internet Things J. 2022, 9, 9205–9213. [Google Scholar] [CrossRef]
- Liu, C.; Wang, C.; Peng, Y.; Li, Z. ZVQAF: Zero-shot visual question answering with feedback from large language models. Neurocomputing 2024, 580, 127505. [Google Scholar] [CrossRef]
- Qiao, R.; Liu, L.; Shen, C.; Hengel, A.V.D. Visually aligned word embeddings for improving zero-shot learning. arXiv 2017, arXiv:1707.05427. [Google Scholar]
- Reed, S.; Akata, Z.; Lee, H.; Schiele, B. Learning deep representations of fine-grained visual descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 49–58. [Google Scholar]
- Yu, B.; Xie, C.; Tang, P.; Li, B. Semantic-visual shared knowledge graph for zero-shot learning. PeerJ Comput. Sci. 2023, 9, e1260. [Google Scholar] [CrossRef]
- Frome, A.; Corrado, G.S.; Shlens, J.; Bengio, S.; Dean, J.; Ranzato, M.A.; Mikolov, T. Devise: A deep visual-semantic embedding model. In Advances in Neural Information Processing Systems 26, Proceedings of the 27th Annual Conference on Neural Information, Lake Tahoe, NV, USA, 5–10 December 2013; Curran Associates, Inc.: Red Hook, NY, USA, 2014. [Google Scholar]
- Yu, Y.; Ji, Z.; Li, X.; Guo, J.; Zhang, Z.; Ling, H.; Wu, F. Transductive zero-shot learning with a self-training dictionary approach. IEEE Trans. Cybern. 2018, 48, 2908–2919. [Google Scholar] [CrossRef] [PubMed]
- Li, L.; Liu, L.; Du, X.; Wang, X.; Zhang, Z.; Zhang, J.; Zhang, P.; Liu, J. CGUN-2A: Deep graph convolutional network via contrastive learning for large-scale zero-shot image classification. Sensors 2022, 22, 9980. [Google Scholar] [CrossRef] [PubMed]
- Kong, D.; Li, X.; Wang, S.; Li, J.; Yin, B. Learning visual-and-semantic knowledge embedding for zero-shot image classification. Appl. Intell. 2023, 53, 2250–2264. [Google Scholar] [CrossRef]
- Wang, Y.; Feng, L.; Song, X.; Xu, D.; Zhai, Y. Zero-shot image classification method based on attention mechanism and semantic information fusion. Sensors 2023, 23, 2311. [Google Scholar] [CrossRef] [PubMed]
- Sun, X.; Tian, Y.; Li, H. Zero-shot image classification via visual–semantic feature decoupling. Multimed. Syst. 2024, 30, 82. [Google Scholar] [CrossRef]
- Xie, G.S.; Liu, L.; Jin, X.; Zhu, F.; Zhang, Z.; Qin, J.; Yao, Y.; Shao, L. Attentive region embedding network for zero-shot learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 9384–9393. [Google Scholar]
- Huynh, D.; Elhamifar, E. Fine-grained generalized zero-shot learning via dense attribute-based attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4483–4493. [Google Scholar]
- Naeem, M.F.; Xian, Y.; Gool, L.V.; Tombari, F. I2dformer: Learning image to document attention for zero-shot image classification. In Advances in Neural Information Processing Systems 35, Proceedings of the 36th Annual Conference on Neural Information Processing Systems, New Orleans, LA, USA, 28 November–9 December 2022; Curran Associates, Inc.: Red Hook, NY, USA, 2022; pp. 12283–12294. [Google Scholar]
- Akata, Z.; Reed, S.; Walter, D.; Lee, H.; Schiele, B. Evaluation of output embeddings for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 2927–2936. [Google Scholar]
- Alamri, F.; Dutta, A. Multi-head self-attention via vision transformer for zero-shot learning. arXiv 2021, arXiv:2108.00045. [Google Scholar]
- Wah, C.; Branson, S.; Welinder, P.; Peron, P.; Belongi, S. The Caltech-Ucsd Birds-200-2011 Dataset; California Institute of Technology: Pasadena, CA, USA, 2011. [Google Scholar]
- Patterson, G.; Xu, C.; Su, H.; Hays, J. The sun attribute database: Beyond categories for deeper scene understanding. Int. J. Comput. Vis. 2014, 108, 59–81. [Google Scholar] [CrossRef]
- Xian, Y.; Lampert, C.H.; Schiele, B.; Akata, Z. Zero-shot learning—Acomprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2251–2265. [Google Scholar] [CrossRef]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems 26, Proceedings of the 27th Annual Conference on Neural Information, Lake Tahoe, NV, USA, 5–10 December 2013; Curran Associates, Inc.: Red Hook, NY, USA, 2014. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Datasets | Attributes | Seen Classes | Unseen Classes | Training Samples | Test Samples |
---|---|---|---|---|---|
Caltech-UCSD-Birds (CUB) | 312 | 150 | 50 | 7057 | 4731 |
SUN Attribute (SUN) | 102 | 645 | 72 | 10,320 | 4020 |
Animals with Attributes2 (AWA2) | 85 | 40 | 10 | 23,527 | 13,795 |
Methods | Models | CUB | SUN | AWA2 |
---|---|---|---|---|
(%) | (%) | (%) | ||
Generative Methods | f-CLSWGAN | 57.3 | 60.8 | 68.2 |
f-VAEGAN-D2 | 61.0 | 64.7 | 71.1 | |
Composer | 69.4 | 62.6 | 71.5 | |
cycle-CLSWGAN | 58.4 | 60.0 | 66.3 | |
LisGAN | 58.8 | 61.7 | -- | |
Embedding-based Methods | TCN | 59.5 | 61.5 | 71.2 |
DAZLE | 66.0 | 59.4 | 67.9 | |
LFGAA | 67.6 | 61.5 | 68.1 | |
SGMA | 71.0 | -- | 68.8 | |
DSAN | 57.4 | 62.4 | 72.3 | |
BFM (ours) | 71.9 | 62.8 | 69.3 |
Methods | Models | CUB | SUN | AWA2 | ||||||
---|---|---|---|---|---|---|---|---|---|---|
(%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | (%) | ||
Generative Methods | f-CLSWGAN | 57.7 | 43.7 | 49.7 | 36.6 | 42.6 | 39.4 | 61.4 | 57.9 | 59.6 |
f-VAEGAN-D2 | 60.1 | 48.4 | 53.6 | 38.0 | 45.1 | 41.3 | 70.6 | 57.6 | 63.5 | |
Composer | 63.8 | 56.4 | 59.9 | 22.0 | 55.1 | 31.4 | 77.3 | 62.1 | 68.8 | |
cycle-CLSWGAN | 61.0 | 45.7 | 52.3 | 33.6 | 49.4 | 40.0 | 64.0 | 56.9 | 60.2 | |
LisGAN | 57.9 | 46.5 | 51.6 | 37.8 | 42.9 | 40.2 | -- | -- | -- | |
Embedding-based Methods | TCN | 52.0 | 52.6 | 52.3 | 37.3 | 31.2 | 34.0 | 65.8 | 61.2 | 63.4 |
DAZLE | 59.6 | 56.7 | 58.1 | 24.3 | 52.3 | 33.2 | 75.7 | 60.3 | 67.1 | |
LFGAA | 80.9 | 36.2 | 50.0 | 40.0 | 18.5 | 25.3 | 93.4 | 27.0 | 41.9 | |
SGMA | 71.3 | 36.7 | 48.5 | -- | -- | -- | 87.1 | 37.6 | 52.5 | |
DSAN | 56.6 | 46.9 | 51.3 | 41.1 | 33.2 | 36.7 | 78.8 | 58.6 | 67.2 | |
BFM (ours) | 61.3 | 61.9 | 61.6 | 25.3 | 48.4 | 33.2 | 72.8 | 61.3 | 66.6 |
A-V | V-A | ZSL | GZSL | |||||
---|---|---|---|---|---|---|---|---|
(%) | (%) | (%) | (%) | |||||
✗ | ✗ | ✗ | ✗ | ✗ | 26.4 | 37.6 | 2.3 | 4.4 |
✓ | ✗ | ✗ | ✗ | ✗ | 35.7 | 32.8 | 4.3 | 7.7 |
✓ | ✓ | ✗ | ✗ | ✗ | 68.1 | 49.6 | 51.8 | 50.7 |
✓ | ✓ | ✓ | ✗ | ✗ | 69.6 | 69.6 | 18.7 | 29.5 |
✓ | ✓ | ✓ | ✓ | ✗ | 71.8 | 70.2 | 18.3 | 29.1 |
✓ | ✓ | ✓ | ✓ | ✓ | 71.9 | 61.3 | 61.9 | 61.6 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sun, H.; Zhen, Z.; Liu, Y.; Zhang, X.; Han, X.; Zhang, P. Embedded Zero-Shot Image Classification Based on Bidirectional Feature Mapping. Appl. Sci. 2024, 14, 5230. https://doi.org/10.3390/app14125230
Sun H, Zhen Z, Liu Y, Zhang X, Han X, Zhang P. Embedded Zero-Shot Image Classification Based on Bidirectional Feature Mapping. Applied Sciences. 2024; 14(12):5230. https://doi.org/10.3390/app14125230
Chicago/Turabian StyleSun, Huadong, Zhibin Zhen, Yinghui Liu, Xu Zhang, Xiaowei Han, and Pengyi Zhang. 2024. "Embedded Zero-Shot Image Classification Based on Bidirectional Feature Mapping" Applied Sciences 14, no. 12: 5230. https://doi.org/10.3390/app14125230
APA StyleSun, H., Zhen, Z., Liu, Y., Zhang, X., Han, X., & Zhang, P. (2024). Embedded Zero-Shot Image Classification Based on Bidirectional Feature Mapping. Applied Sciences, 14(12), 5230. https://doi.org/10.3390/app14125230