research-article

Open access

ImageNet classification with deep convolutional neural networks

Authors:

Alex Krizhevsky,

Ilya Sutskever,

Geoffrey E. HintonAuthors Info & Claims

Communications of the ACM, Volume 60, Issue 6

Pages 84 - 90

https://doi.org/10.1145/3065386

Published: 24 May 2017 Publication History

All formats PDF

Abstract

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

References

[1]

Bell, R., Koren, Y. Lessons from the netflix prize challenge. ACM SIGKDD Explor. Newsl. 9, 2 (2007), 75--79.

Digital Library

[2]

Berg, A., Deng, J., Fei-Fei, L. Large scale visual recognition challenge 2010. www.image-net.org/challenges. 2010.

[3]

Breiman, L. Random forests. Mach. Learn. 45, 1 (2001), 5--32.

Digital Library

[4]

Cireşan, D., Meier, U., Masci, J., Gambardella, L., Schmidhuber, J. High-performance neural networks for visual object classification. Arxiv preprint arXiv:1102.0183, 2011.

[5]

Cireşan, D., Meier, U., Schmidhuber, J. Multi-column deep neural networks for image classification. Arxiv preprint arXiv:1202.2745, 2012.

[6]

Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Fei-Fei, L. In ILSVRC-2012 (2012).

[7]

Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In CVPR09 (2009).

[8]

Fei-Fei, L., Fergus, R., Perona, P. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput. Vision Image Understanding 106, 1 (2007), 59--70.

Digital Library

[9]

Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 4 (1980), 193--202.

[10]

Griffin, G., Holub, A., Perona, P. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.

[11]

He, K., Zhang, X., Ren, S., Sun, J. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.

[12]

Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).

[13]

Jarrett, K., Kavukcuoglu, K., Ranzato, M.A., LeCun, Y. What is the best multi-stage architecture for object recognition? In International Conference on Computer Vision (2009). IEEE, 2146--2153.

[14]

Krizhevsky, A. Learning multiple layers of features from tiny images. Master's thesis, Department of Computer Science, University of Toronto, 2009.

[15]

Krizhevsky, A. Convolutional deep belief networks on cifar-10. Unpublished manuscript, 2010.

[16]

Krizhevsky, A., Hinton, G. Using very deep autoencoders for content-based image retrieval. In ESANN (2011).

[17]

LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L., et al. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems (1990).

Digital Library

[18]

LeCun, Y. Une procedure d'apprentissage pour reseau a seuil asymmetrique (a learning scheme for asymmetric threshold networks). 1985.

[19]

LeCun, Y., Huang, F., Bottou, L. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, CVPR 2004. Volume 2 (2004). IEEE, II--97.

Digital Library

[20]

LeCun, Y., Kavukcuoglu, K., Farabet, C. Convolutional networks and applications in vision. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS) (2010). IEEE, 253--256.

[21]

Lee, H., Grosse, R., Ranganath, R., Ng, A. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning (2009). ACM, 609--616.

Digital Library

[22]

Linnainmaa, S. Taylor expansion of the accumulated rounding error. BIT Numer. Math. 16, 2 (1976), 146--160.

[23]

Mensink, T., Verbeek, J., Perronnin, F., Csurka, G. Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV -- European Conference on Computer Vision (Florence, Italy, Oct. 2012).

[24]

Nair, V., Hinton, G.E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (2010).

Digital Library

[25]

Pinto, N., Cox, D., DiCarlo, J. Why is real-world visual object recognition hard? PLoS Comput. Biol. 4, 1 (2008), e27.

[26]

Pinto, N., Doukhan, D., DiCarlo, J., Cox, D. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput. Biol. 5, 11 (2009), e1000579.

[27]

Rumelhart, D.E., Hinton, G.E., Williams, R.J. Learning internal representations by error propagation. Technical report, DTIC Document, 1985.

[28]

Russell, BC, Torralba, A., Murphy, K., Freeman, W. Labelme: A database and web-based tool for image annotation. Int. J. Comput Vis. 77, 1 (2008), 157--173.

Digital Library

[29]

Sánchez, J., Perronnin, F. High-dimensional signature compression for large-scale image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011 (2011). IEEE, 1665--1672.

Digital Library

[30]

Simard, P., Steinkraus, D., Platt, J. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition. Volume 2 (2003), 958--962.

Digital Library

[31]

Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), 1--9.

[32]

Turaga, S., Murray, J., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., Denk, W., Seung, H. Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comput. 22, 2 (2010), 511--538.

Digital Library

[33]

Werbos, P. Beyond regression: New tools for prediction and analysis in the behavioral sciences, 1974.

Cited By

Xian ZHuang RTowey DYue C(2025)Convolutional Neural Network Image Classification Based on Different Color SpacesTsinghua Science and Technology10.26599/TST.2024.901000130:1(402-417)Online publication date: Feb-2025
https://doi.org/10.26599/TST.2024.9010001
Xie ZChen JShi ZLiu SHe S(2025)Lightweight pyramid attention residual network for intelligent fault diagnosis of machine under sharp speed variationMechanical Systems and Signal Processing10.1016/j.ymssp.2024.111824223(111824)Online publication date: Jan-2025
https://doi.org/10.1016/j.ymssp.2024.111824
Chen ZLiu QDing ZLiu F(2025)Automated structural resilience evaluation based on a multi-scale Transformer network using field monitoring dataMechanical Systems and Signal Processing10.1016/j.ymssp.2024.111813222(111813)Online publication date: Jan-2025
https://doi.org/10.1016/j.ymssp.2024.111813
Show More Cited By

Index Terms

ImageNet classification with deep convolutional neural networks

Recommendations

ImageNet classification with deep convolutional neural networks
NIPS'12: Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% ...
A dyadic multi-resolution deep convolutional neural wavelet network for image classification

For almost the past four decades, image classification has gained a lot of attention in the field of pattern recognition due to its application in various fields. Given its importance, several approaches have been proposed up to now. In this paper, we ...
Automatic Fish Species Classification Using Deep Convolutional Neural Networks
Abstract
In this paper, we presented an automated system for identification and classification of fish species. It helps the marine biologists to have greater understanding of the fish species and their habitats. The proposed model is based on deep ...

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM

Communications of the ACM Volume 60, Issue 6

June 2017

93 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/3098997

Editor:
Moshe Y. Vardi
Association for Computing Machinery, New York, NY

Issue’s Table of Contents

Copyright © 2017 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 May 2017

Published in CACM Volume 60, Issue 6

Check for updates

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26,918
Total Citations
View Citations
222,149
Total Downloads

Downloads (Last 12 months)42,441
Downloads (Last 6 weeks)3,939

Reflects downloads up to 15 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xian ZHuang RTowey DYue C(2025)Convolutional Neural Network Image Classification Based on Different Color SpacesTsinghua Science and Technology10.26599/TST.2024.901000130:1(402-417)Online publication date: Feb-2025
https://doi.org/10.26599/TST.2024.9010001
Xie ZChen JShi ZLiu SHe S(2025)Lightweight pyramid attention residual network for intelligent fault diagnosis of machine under sharp speed variationMechanical Systems and Signal Processing10.1016/j.ymssp.2024.111824223(111824)Online publication date: Jan-2025
https://doi.org/10.1016/j.ymssp.2024.111824
Chen ZLiu QDing ZLiu F(2025)Automated structural resilience evaluation based on a multi-scale Transformer network using field monitoring dataMechanical Systems and Signal Processing10.1016/j.ymssp.2024.111813222(111813)Online publication date: Jan-2025
https://doi.org/10.1016/j.ymssp.2024.111813
Yang HShi RZheng YTian CJiang YZhang HWang RDong B(2025)Study on freezing separation process through observing microstructure of NaCl solution iceSeparation and Purification Technology10.1016/j.seppur.2024.129674355(129674)Online publication date: Mar-2025
https://doi.org/10.1016/j.seppur.2024.129674
Chen ZYi YGan CTang ZKong D(2025)Scene Chinese Recognition with Local and Global AttentionPattern Recognition10.1016/j.patcog.2024.111013158(111013)Online publication date: Feb-2025
https://doi.org/10.1016/j.patcog.2024.111013
Dagès TCohen LBruckstein A(2025)A model is worth tens of thousands of examples for estimation and thousands for classificationPattern Recognition10.1016/j.patcog.2024.110904157(110904)Online publication date: Jan-2025
https://doi.org/10.1016/j.patcog.2024.110904
Ma SZhu FCheng ZZhang X(2025)Towards trustworthy dataset distillationPattern Recognition10.1016/j.patcog.2024.110875157(110875)Online publication date: Jan-2025
https://doi.org/10.1016/j.patcog.2024.110875
Cao WSun XLiu ZChai ZBao GYu YChen X(2025)The detection of PAUT pseudo defects in ultra-thick stainless-steel welds with a multimodal deep learning modelMeasurement10.1016/j.measurement.2024.115662241(115662)Online publication date: Feb-2025
https://doi.org/10.1016/j.measurement.2024.115662
Cao HLi JChen X(2025)Investigation of interfacial debonding identification for concrete filled steel tube columns based on acoustic signalsMeasurement10.1016/j.measurement.2024.115511240(115511)Online publication date: Jan-2025
https://doi.org/10.1016/j.measurement.2024.115511
Gao QMa LLiu WWang HMa QWang X(2025)Identification of damage states of load-bearing rocks using infrared radiation monitoring methodsMeasurement10.1016/j.measurement.2024.115507239(115507)Online publication date: Jan-2025
https://doi.org/10.1016/j.measurement.2024.115507
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents