Article

Maxout networks

Authors:

Ian J. Goodfellow,

David Warde-Farley,

Aaron Courville,

Yoshua BengioAuthors Info & Claims

ICML'13: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28

Pages III-1319 - III-1327

Published: 16 June 2013 Publication History

Abstract

We consider the problem of designing models to leverage a recently introduced approximate model averaging technique called dropout. We define a simple new model called maxout (so named because its output is the max of a set of inputs, and because it is a natural companion to dropout) designed to both facilitate optimization by dropout and improve the accuracy of dropout's fast approximate model averaging technique. We empirically verify that the model successfully accomplishes both of these tasks. We use maxout and dropout to demonstrate state of the art classification performance on four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN.

References

[1]

Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Bergstra, James, Goodfellow, Ian, Bergeron, Arnaud, Bouchard, Nicolas, and Bengio, Yoshua. Theano: new features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop, 2012.

[2]

Bergstra, James, Breuleux, Olivier, Bastien, Frédéric, Lamblin, Pascal, Pascanu, Razvan, Desjardins, Guillaume, Turian, Joseph, Warde-Farley, David, and Bengio, Yoshua. Theano: a CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010. Oral Presentation.

[3]

Breiman, Leo. Bagging predictors. Machine Learning, 24 (2):123-140, 1994.

[4]

Ciresan, D. C., Meier, U., Gambardella, L. M., and Schmidhuber, J. Deep big simple neural nets for handwritten digit recognition. Neural Computation, 22:1-14, 2010.

[5]

Glorot, Xavier, Bordes, Antoine, and Bengio, Yoshua. Deep sparse rectifier neural networks. In JMLR W&CP: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics (AISTATS 2011), April 2011.

[6]

Goodfellow, Ian J., Courville, Aaron, and Bengio, Yoshua. Joint training of deep Boltzmann machines for classification. In International Conference on Learning Representations: Workshops Track, 2013.

[7]

Hahnloser, Richard H. R. On the piecewise analysis of networks of linear threshold neurons. Neural Networks, 11(4):691-697, 1998.

[8]

Hinton, Geoffrey E., Srivastava, Nitish, Krizhevsky, Alex, Sutskever, Ilya, and Salakhutdinov, Ruslan. Improving neural networks by preventing co-adaptation of feature detectors. Technical report, arXiv:1207.0580, 2012.

[9]

Jarrett, Kevin, Kavukcuoglu, Koray, Ranzato, Marc'Aurelio, and LeCun, Yann. What is the best multi-stage architecture for object recognition? In Proc. International Conference on Computer Vision (ICCV'09), pp. 2146-2153. IEEE, 2009.

[10]

Krizhevsky, Alex and Hinton, Geoffrey. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.

[11]

Krizhevsky, Alex, Sutskever, Ilya, and Hinton, Geoffrey. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25 (NIPS'2012). 2012.

[12]

LeCun, Yann, Bottou, Leon, Bengio, Yoshua, and Haffner, Patrick. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278-2324, November 1998.

[13]

Malinowski, Mateusz and Fritz, Mario. Learnable pooling regions for image classification. In International Conference on Learning Representations: Workshop track, 2013.

[14]

Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., and Ng, A. Y. Reading digits in natural images with unsupervised feature learning. Deep Learning and Unsupervised Feature Learning Workshop, NIPS, 2011.

[15]

Rifai, Salah, Dauphin, Yann, Vincent, Pascal, Bengio, Yoshua, and Muller, Xavier. The manifold tangent classifier. In NIPS'2011, 2011. Student paper award.

[16]

Salakhutdinov, R. and Hinton, G.E. Deep Boltzmann machines. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS 2009), volume 8, 2009.

[17]

Salinas, E. and Abbott, L. F. A model of multiplicative neural responses in parietal cortex. Proc Natl Acad Sci U S A, 93(21):11956-11961, October 1996.

[18]

Sermanet, Pierre, Chintala, Soumith, and LeCun, Yann. Convolutional neural networks applied to house numbers digit classification. CoRR, abs/1204.3968, 2012a.

[19]

Sermanet, Pierre, Chintala, Soumith, and LeCun, Yann. Convolutional neural networks applied to house numbers digit classification. In International Conference on Pattern Recognition (ICPR 2012), 2012b.

[20]

Snoek, Jasper, Larochelle, Hugo, and Adams, Ryan Prescott. Practical bayesian optimization of machine learning algorithms. In Neural Information Processing Systems, 2012.

[21]

Srebro, Nathan and Shraibman, Adi. Rank, trace-norm and max-norm. In Proceedings of the 18th Annual Conference on Learning Theory, pp. 545-560. Springer-Verlag, 2005.

[22]

Srivastava, Nitish. Improving neural networks with dropout. Master's thesis, U. Toronto, 2013.

[23]

Wang, Shuning. General constructive representations for continuous piecewise-linear functions. IEEE Trans. Circuits Systems, 51(9):1889-1896, 2004.

[24]

Yu, Dong and Deng, Li. Deep convex net: A scalable architecture for speech pattern classification. In INTERSPEECH, pp. 2285-2288, 2011.

[25]

Zeiler, Matthew D. and Fergus, Rob. Stochastic pooling for regularization of deep convolutional neural networks. In International Conference on Learning Representations, 2013.

Cited By

Wang QGao MZhang ZXie JLi PHu QKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)DropCovProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602703(33576-33588)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602703
Blondel MLlinares-López FDadashi RHussenot LGeist MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Learning energy networks with generalized fenchel-young lossesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601179(12516-12528)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601179
Wang YCao YZhang JWu FZha Z(2021)Leveraging Deep Statistics for Underwater Image EnhancementACM Transactions on Multimedia Computing, Communications, and Applications10.1145/348952017:3s(1-20)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3489520
Show More Cited By

Maxout networks
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Deep Maxout Networks Applied to Noise-Robust Speech Recognition
IberSPEECH 2014: Proceedings of the Second International Conference on Advances in Speech and Language Technologies for Iberian Languages - Volume 8854

Deep Neural Networks DNN have become very popular for acoustic modeling due to the improvements found over traditional Gaussian Mixture Models GMM. However, not many works have addressed the robustness of these systems under noisy conditions. Recently, ...
Convolutional maxout neural networks for speech separation
ISSPIT '15: Proceedings of the 2015 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT)

Speech separation based on deep neural networks (DNNs) has been widely studied recently, and has achieved considerable success. However, previous studies are mostly based on fully-connected neural networks. In order to capture the local information of ...
Maxout neurons for deep convolutional and LSTM neural networks in speech recognition

We combine maxout neurons with convolutional and LSTM structures for DNNs.The optimal network structures and training strategies are explored for the models.Experiments are carried out for 6 languages on the IARPA Babel data sets.State-of-the-art ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

ICML'13: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28

June 2013

2534 pages

Publisher

JMLR.org

Publication History

Published: 16 June 2013

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

124
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang QGao MZhang ZXie JLi PHu QKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)DropCovProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3602703(33576-33588)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3602703
Blondel MLlinares-López FDadashi RHussenot LGeist MKoyejo SMohamed SAgarwal ABelgrave DCho KOh A(2022)Learning energy networks with generalized fenchel-young lossesProceedings of the 36th International Conference on Neural Information Processing Systems10.5555/3600270.3601179(12516-12528)Online publication date: 28-Nov-2022
https://dl.acm.org/doi/10.5555/3600270.3601179
Wang YCao YZhang JWu FZha Z(2021)Leveraging Deep Statistics for Underwater Image EnhancementACM Transactions on Multimedia Computing, Communications, and Applications10.1145/348952017:3s(1-20)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3489520
Li XXiong HChen ZHuan JXu CDou D(2021)“In-Network Ensemble”: Deep Ensemble Learning with Diversified Knowledge DistillationACM Transactions on Intelligent Systems and Technology10.1145/347346412:5(1-19)Online publication date: 21-Dec-2021
https://dl.acm.org/doi/10.1145/3473464
Liu QZhu YLiu ZZhang YWu SDemartini GZuccon GCulpepper JHuang ZTong H(2021)Deep Active Learning for Text Classification with Diverse InterpretationsProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482080(3263-3267)Online publication date: 26-Oct-2021
https://dl.acm.org/doi/10.1145/3459637.3482080
Last FSchlichtmann USchlichtmann UGal RAmrouch HLi H(2020)Partial Sharing Neural Networks for Multi-Target Regression on Power and Performance of Embedded MemoriesProceedings of the 2020 ACM/IEEE Workshop on Machine Learning for CAD10.1145/3380446.3430642(123-128)Online publication date: 16-Nov-2020
https://dl.acm.org/doi/10.1145/3380446.3430642
Ding XWang NGao XLi JWang X(2019)Group reconstruction and max-pooling residual capsule networkProceedings of the 28th International Joint Conference on Artificial Intelligence10.5555/3367243.3367350(2237-2243)Online publication date: 10-Aug-2019
https://dl.acm.org/doi/10.5555/3367243.3367350
Wang HXiao THuang DZhang LZhang CTang HYuan Y(2019)Runtime Stress Estimation for Three-dimensional IC Reliability Management Using Artificial Neural NetworkACM Transactions on Design Automation of Electronic Systems10.1145/336318524:6(1-29)Online publication date: 6-Nov-2019
https://dl.acm.org/doi/10.1145/3363185
Wang HWang GLi GLin LZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)CamDropProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357999(1141-1149)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3357999
Yao SHou YGe LHu ZZhu WTao DCheng XCui PRundensteiner ECarmel DHe QXu Yu J(2019)Regularizing Deep Neural Networks by Ensemble-based Low-Level Sample-Variances MethodProceedings of the 28th ACM International Conference on Information and Knowledge Management10.1145/3357384.3357921(1111-1120)Online publication date: 3-Nov-2019
https://dl.acm.org/doi/10.1145/3357384.3357921
Show More Cited By

View Options

View options

Figures

Tables

Media

View Table of Conten