skip to main content
10.1145/3574131.3574458acmconferencesArticle/Chapter ViewAbstractPublication PagessiggraphConference Proceedingsconference-collections
research-article

WLA-Net: A Whole New Light-weight Architecture For Visual Task

Published: 13 January 2023 Publication History

Abstract

In this paper, we introduce WLA-Net, a whole new convolutional networks that have smaller parameters and FLOPs model. WLA-Net are based on a cross architecture that uses mechanism of attention and Residual block to build light deep neural networks. While improving the classification accuracy, the parameters of model is reduced, make the model more lightweight and improving resource utilization. A lightweight convolution module is designed in the network that can perform image classification tasks accurately and efficiently while introducing a module that large Convolution attention to improve image classification accuracy. In addition, an new AttentionModule is proposed, which mines information aggregations in the channel direction as much as possible to extract more efficient depth features. It can effectively fuse the features of the channels in the image to obtain higher accuracy. At the same time, a new residual structure is designed to fuse the information between feature channels to make it more closely related. The image classification accuracy of the model is verified on the large natural images datasets. Experimental results show that the proposed method has SOTA performance.

References

[1]
Junlong Cheng, Shengwei Tian, Long Yu, Chengrui Gao, Xiaojing Kang, Xiang Ma, Weidong Wu, Shijia Liu, and Hongchun Lu. 2022. ResGANet: Residual group attention network for medical image classification and segmentation. Medical Image Analysis 76 (2022), 102313. https://doi.org/10.1016/j.media.2021.102313.
[2]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2020. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. Learning (2020).
[3]
Matthew Gwilliam, Adam Teuscher, Connor Anderson, and Ryan Farrell. 2021. Fair Comparison: Quantifying Variance in Results for Fine-grained Visual Categorization. In 2021 IEEE Winter Conference on Applications of Computer Vision (WACV). 3308–3317. https://doi.org/10.1109/WACV48630.2021.00335.
[4]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90.
[5]
Geoffrey E. Hinton, Alex Krizhevsky, and Sida I. Wang. 2011. Transforming Auto-Encoders. In International Conference on Artificial Neural Networks.
[6]
Andrew Howard, Mark Sandler, Bo Chen, Weijun Wang, Liang-Chieh Chen, Mingxing Tan, Grace Chu, Vijay Vasudevan, Yukun Zhu, Ruoming Pang, Hartwig Adam, and Quoc Le. 2019. Searching for MobileNetV3. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 1314–1324. https://doi.org/10.1109/ICCV.2019.00140.
[7]
Gao Huang, Zhuang Liu, Laurens Van Der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2261–2269. https://doi.org/10.1109/CVPR.2017.243.
[8]
Alex Krizhevsky and Geoffrey E. Hinton. 2011. Using very deep autoencoders for content-based image retrieval. In The European Symposium on Artificial Neural Networks.
[9]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. Commun. ACM 60(2012), 84 – 90.
[10]
Yehao Li, Ting Yao, Yingwei Pan, and Tao Mei. 2022. Contextual Transformer Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence (2022), 1–1. https://doi.org/10.1109/TPAMI.2022.3164083.
[11]
Min Lin, Qiang Chen, and Shuicheng Yan. 2014. Network In Network. CoRR abs/1312.4400(2014).
[12]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 9992–10002. https://doi.org/10.1109/ICCV48922.2021.00986.
[13]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design.
[14]
Ozan Oktay, Jo Schlemper, Loic Le Folgoc, Matthew Lee, Mattias Heinrich, Kazunari Misawa, Kensaku Mori, Steven McDonagh, Nils Y Hammerla, Bernhard Kainz, Ben Glocker, and Daniel Rueckert. 2018. Attention U-Net: Learning Where to Look for the Pancreas. In Medical Imaging with Deep Learning. https://openreview.net/forum?id=Skft7cijM.
[15]
Waseem Rawat and Zenghui Wang. 2017. Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review. Neural Computation 29, 9 (09 2017), 2352–2449. https://doi.org/10.1162/neco_a_00990 arXiv:https://direct.mit.edu/neco/article-pdf/29/9/2352/1017965/neco_a_00990.pdf.
[16]
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. 2018. MobileNetV2: Inverted Residuals and Linear Bottlenecks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 4510–4520. https://doi.org/10.1109/CVPR.2018.00474.
[17]
Lars Schmarje, Monty Santarossa, Simon-Martin Schröder, and Reinhard Koch. 2021. A Survey on Semi-, Self- and Unsupervised Learning for Image Classification. IEEE Access 9(2021), 82146–82168. https://doi.org/10.1109/ACCESS.2021.3084358.
[18]
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. 2019. Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization. International Journal of Computer Vision 128, 2 (oct 2019), 336–359. https://doi.org/10.1007/s11263-019-01228-7.
[19]
Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR abs/1409.1556(2015).
[20]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–9. https://doi.org/10.1109/CVPR.2015.7298594.
[21]
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Herve Jegou. 2021. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning(Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 10347–10357. https://proceedings.mlr.press/v139/touvron21a.html.
[22]
Wenhai Wang, Enze Xie, Xiang Li, Deng-Ping Fan, Kaitao Song, Ding Liang, Tong Lu, Ping Luo, and Ling Shao. 2021. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 548–558. https://doi.org/10.1109/ICCV48922.2021.00061.
[23]
Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, and Kaiming He. 2017. Aggregated Residual Transformations for Deep Neural Networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 5987–5995. https://doi.org/10.1109/CVPR.2017.634.
[24]
Li Yuan, Yunpeng Chen, Tao Wang, Weihao Yu, Yujun Shi, Zi-Hang Jiang, Francis EH Tay, Jiashi Feng, and Shuicheng Yan. 2021. Tokens-to-token vit: Training vision transformers from scratch on imagenet. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 558–567.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
VRCAI '22: Proceedings of the 18th ACM SIGGRAPH International Conference on Virtual-Reality Continuum and its Applications in Industry
December 2022
284 pages
ISBN:9798400700316
DOI:10.1145/3574131
  • Editors:
  • Enhua Wu,
  • Lionel Ming-Shuan Ni,
  • Zhigeng Pan,
  • Daniel Thalmann,
  • Ping Li,
  • Charlie C.L. Wang,
  • Lei Zhu,
  • Minghao Yang
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 January 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Lightweight Networks
  2. channel attention.
  3. spatial attention

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • the Fundamental Research Projects Jointly Funded by Guangzhou Council and Municipal Universities
  • the National Natural Science Foundation of China

Conference

VRCAI '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 51 of 107 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 56
    Total Downloads
  • Downloads (Last 12 months)9
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media