Identifying Tomato Growth Stages in Protected Agriculture with StyleGAN3–Synthetic Images and Vision Transformer
Abstract
:1. Introduction
2. Materials and Methods
2.1. Dataset
2.2. StyleGAN3
2.3. Vision Transformer (ViT)
2.4. Experimental Platform
2.5. Experimental Setup and Evaluation Metrics
3. Implementation and Result
3.1. Performance of StyleGAN3
3.2. Performance of ViT Model
3.3. Performance Comparison
4. Discussion
5. Conclusions
- (1)
- The quality of images generated by StyleGAN3 is nearly identical to that of real images, with an average generation time of 153 milliseconds per image. This method proves to be an effective data augmentation solution, especially for cases with limited training data, such as small sample datasets. However, when computational resources are limited, transfer learning or background denoising techniques are necessary to reduce training time.
- (2)
- By generating images through a generative adversarial network (GAN) and then applying the ViT-Base model for tomato growth stage recognition, this method achieves superior performance compared to direct recognition using original images. The combination of ViT with generated images reached an accuracy of 98.39% on the test set and an average detection speed of 9.5 milliseconds. When compared to AlexNet, DenseNet50, and VGG16, this method showed improvements in accuracy by 22.85%, 3.57%, and 3.21%, respectively, demonstrating its enhanced effectiveness in classification tasks.
- (3)
- In areas with limited access to intelligent devices, the use of images generated by a generative adversarial network (GAN) significantly reduces the labor demands and inconsistencies of manual image collection. Furthermore, applying the ViT-Base model for tomato growth stage recognition can provide crucial data for informed decision-making and the precise management of tomato growth conditions. This approach offers considerable economic value by improving the efficiency of tomato production. Additionally, it can be extended to other similar crop categories, such as apples and citrus, in addition to tomatoes.
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Alajrami, M.A.; Abu-Naser, S.S. Type of Tomato Classification Using Deep Learning. Int. J. Acad. Pedagog. Res. 2020, 3, 21–25. [Google Scholar]
- Wei, Y.; Qin, R.; Ding, D.; Li, Y.; Xie, Y.; Qu, D.; Zhao, T.; Yang, S. The impact of the digital economy on high-quality agricultural development--Based on the regulatory effects of financial development. J. Huazhong Agric. Univ. 2023, 43, 9–21. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
- Hinton, G.E.; Osindero, S.; Teh, Y.-W. A fast learning algorithm for deep belief nets. Neuralcomputation 2006, 18, 1527–1554. [Google Scholar] [CrossRef] [PubMed]
- Tomas, M.; Kai, C.; Greg, C.; Jeffrey, D. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Jacob, D.; Ming, W.; Kenton, L.; Kristina, T. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
- Mahesh, G.; Sweta, J. Stacked Convolutional Neural Network for Diagnosis of COVID-19 Disease from X-ray Images. arXiv 2020, arXiv:2006.13871. [Google Scholar]
- Mahesh, G.; Sweta, J.; Raghav, A. DeepRNNetSeg: Deep Residual Neural Network for Nuclei Segmentation on Breast Cancer Histopathological Images. In Proceedings of the International Conference on Computer Vision and Image Processing (CVIP), Singapore, 27–29 September 2019; pp. 243–253. [Google Scholar]
- Xu, Y.S.; Fu, T.J.; Yang, H.K.; Lee, C.Y. Dynamic video segmentation network. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6556–6565. [Google Scholar]
- Tang, H.; Ding, L.; Wu, S.; Ren, B.; Sebe, N.; Rota, P. Deep Unsupervised Key Frame Extraction for Efficient Video Classification. ACM Trans. Multimedia Comput. Commun. Appl. 2023, 119, 1–17. [Google Scholar] [CrossRef]
- Amreen, A.; Sweta, J.; Mahesh, G.; Swetha, V. Tomato plant disease detection using transfer learning with C-GAN synthetic images. Comput. Electron. Agric. 2021, 187, 106279. [Google Scholar]
- Xu, J.; Wang, J.; Xu, X.; Ju, S. Image recognition for different developmental stages of rice by RAdam deep convolutional neural networks. Trans. Chin. Soc. Agric. Eng. 2021, 37, 143–150. [Google Scholar]
- Zhang, X.; Hou, T.; Hao, Y.; Shangguan, H.; Wang, A.; Peng, S. Surface Defect Detection of Solar Cells Based on Multiscale Region Proposal Fusion Network. IEEE Access 2021, 9, 62093–62101. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Al-Qizwini, M.; Barjasteh, I.; Al-Qassab, H.; Radha, H. Deep learning algorithm for autonomous driving using GoogLeNet. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium(IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 89–96. [Google Scholar]
- Alexey, D.; Lucas, B.; Alexander, K.; Dirk, W.; Zhai, X.; Thomas, U.; Mostafa, D.; Matthias, M.; Georg, H.; Sylvain, G.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Bai, P.; Feng, Y.; Li, G.; Zhao, M.; Zhou, H.; Hou, Z. Algorithm of wheat disease image identification based on Vision Transformer. J. Chin. Agric. Mech. 2024, 45, 267–274. [Google Scholar]
- Karen, S.; Andrew, Z. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Wang, Y.; Li, Y.; Xu, J.; Wang, A.; Ma, C.; Song, S.; Xie, F.; Zhao, C.; Hu, M. Crop Disease Recognition Method Based on Improved Vision Transformer Network. J. Chin. Comput. Syst. 2024, 45, 887–893. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
- Rasti, S.; Bleakley, C.J.; Silvestre, G.C.M.; Holden, N.M.; Langton, D.; Gregory, M.P.; O’Hare, G.M. Crop growth stage estimation prior to canopy closure using deep learning algorithms. Neural Comput. Appl. 2021, 33, 1733–1743. [Google Scholar] [CrossRef]
- Tan, S.; Liu, J.; Lu, H.; Lan, M.; Yu, J.; Liao, G.; Wang, Y.; Li, Z.; Qi, L.; Ma, X. Machine Learning Approaches for Rice Seedling Growth Stages Detection. Front. Plant Sci. 2022, 13, 914771. [Google Scholar] [CrossRef] [PubMed]
- Liu, W.; Lu, X. Research Progress of Transformer Based on Computer Vision. Comput. Eng. Appl. 2022, 58, 1–16. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farly, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. arXiv 2014, arXiv:1406.2661. [Google Scholar]
- Han, X.; Li, Y.; Gao, A.; Ma, J.; Gong, Q.; Song, Y. Data Augmentation Method for Sweet Cherries Based on Improved Generative Adversarial Network. Trans. Chin. Soc. Agric. Mach. 2024, 55, 1–17. [Google Scholar]
- Karras, T.; Aittala, A.; Laine, S.; Harkonen, E.; Hellsten, J.; Lehtinen, J.; Aila, T. Alias-Free Generative Adversarial Networks. arXiv 2021, arXiv:2106.12423. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. arXiv 2018, arXiv:1812.04948. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Brock, A.; Donahue, J.; Simonyan, K. Large Scale GAN Training for High Fidelity Natural Image Synthesis. arXiv 2018, arXiv:1809.11096. [Google Scholar]
- Huang, G.; Liu, Z.; Maaten, L.; Weinberger, K.Q. Densely Connected Convolutional Networks. arXiv 2016, arXiv:1608.06993. [Google Scholar]
Category | Number of Images (Original) | Number of Images (Synthetic) |
---|---|---|
Sapling | 300 | 450 |
Flower | 300 | 396 |
Fructification | 300 | 327 |
Mature | 300 | 350 |
Model | Patch Size | Layers | Hidden Size D | MLP Size | Heads | Params |
---|---|---|---|---|---|---|
ViT-Base | 16 × 16 | 12 | 768 | 3072 | 12 | 86 M |
ViT-Large | 16 × 16 | 24 | 1024 | 4096 | 16 | 307 M |
ViT-Huge | 14 × 14 | 32 | 1280 | 5120 | 16 | 632 M |
Category | PSNR Original Images | PSNR Original and StyleGAN3 Images |
---|---|---|
Sapling | 27.903 | 28.018 |
Flower | 27.925 | 27.932 |
Fructification | 38.78 | 39.422 |
Mature | 37.389 | 38.784 |
Category | SSIM Original Images | SSIM Original and StyleGAN3 Images |
---|---|---|
Sapling | 0.066 | 0.068 |
Flower | 0.454 | 0.384 |
Fructification | 0.833 | 0.85 |
Mature | 0.85 | 0.84 |
Model | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
ViT-Base | 94.58% | 94.99% | 94.58% | 94.41% |
ViT-Base + Synthetic images | 98.39% | 98.47% | 98.39% | 98.39% |
Model | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
ViT-Base | 98.39% | 98.47% | 98.39% | 98.39% |
AlexNet | 75.54% | 84.88% | 75.54% | 68.10% |
DenseNet50 | 94.82% | 95.61% | 94.82% | 94.77% |
VGG16 | 95.18% | 95.59% | 95.18% | 95.13% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huo, Y.; Liu, Y.; He, P.; Hu, L.; Gao, W.; Gu, L. Identifying Tomato Growth Stages in Protected Agriculture with StyleGAN3–Synthetic Images and Vision Transformer. Agriculture 2025, 15, 120. https://doi.org/10.3390/agriculture15020120
Huo Y, Liu Y, He P, Hu L, Gao W, Gu L. Identifying Tomato Growth Stages in Protected Agriculture with StyleGAN3–Synthetic Images and Vision Transformer. Agriculture. 2025; 15(2):120. https://doi.org/10.3390/agriculture15020120
Chicago/Turabian StyleHuo, Yao, Yongbo Liu, Peng He, Liang Hu, Wenbo Gao, and Le Gu. 2025. "Identifying Tomato Growth Stages in Protected Agriculture with StyleGAN3–Synthetic Images and Vision Transformer" Agriculture 15, no. 2: 120. https://doi.org/10.3390/agriculture15020120
APA StyleHuo, Y., Liu, Y., He, P., Hu, L., Gao, W., & Gu, L. (2025). Identifying Tomato Growth Stages in Protected Agriculture with StyleGAN3–Synthetic Images and Vision Transformer. Agriculture, 15(2), 120. https://doi.org/10.3390/agriculture15020120