research-article

Free access

L₂ regularization for learning kernels

Authors:

Corinna Cortes,

Afshin RostamizadehAuthors Info & Claims

UAI '09: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence

Pages 109 - 116

Published: 18 June 2009 Publication History

Abstract

The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L₁ regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L₂ regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive term O(√p/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L₁ regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L₂ regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.

References

[1]

Argyriou, A., Hauser, R., Micchelli, C., & Pontil, M. (2006). A DC-programming algorithm for kernel selection. ICML.

Digital Library

[2]

Argyriou, A., Micchelli, C., & Pontil, M. (2005). Learning convex combinations of continuously parameterized basic kernels. COLT.

Digital Library

[3]

Bach, F. (2008). Exploring large feature spaces with hierarchical multiple kernel learning. NIPS.

[4]

Blitzer, J., Dredze, M., & Pereira, F. (2007). Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification. Association for Computational Linguistics.

[5]

Boser, B., Guyon, I., & Vapnik, V. (1992). A training algorithm for optimal margin classifiers. COLT.

Digital Library

[6]

Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. JMLR, 2.

Digital Library

[7]

Bousquet, O., & Herrmann, D. J. L. (2002). On the complexity of learning the kernel matrix. NIPS.

[8]

Cortes, C., Mohri, M., & Rostamizadeh, A. (2008). Learning sequence kernels. MLSP.

[9]

Cortes, C., & Vapnik, V. (1995). Support-Vector Networks. Machine Learning, 20.

Digital Library

[10]

Jebara, T. (2004). Multi-task feature and kernel selection for SVMs. ICML.

Digital Library

[11]

Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. (2004). Learning the kernel matrix with semidefinite programming. JMLR, 5.

Digital Library

[12]

Lewis, D. P., Jebara, T., & Noble, W. S. (2006). Nonstationary kernel combination. ICML.

Digital Library

[13]

Micchelli, C., & Pontil, M. (2005). Learning the kernel function via regularization. JMLR, 6.

Digital Library

[14]

Ong, C. S., Smola, A., & Williamson, R. (2005). Learning the kernel with hyperkernels. JMLR, 6.

Digital Library

[15]

Saunders, C., Gammerman, A., & Vovk, V. (1998). Ridge Regression Learning Algorithm in Dual Variables. ICML.

Digital Library

[16]

Schölkopf, B., & Smola, A. (2002). Learning with kernels. MIT Press: Cambridge, MA.

[17]

Shawe-Taylor, J., & Cristianini, N. (2004). Kernel methods for pattern analysis. Cambridge Univ. Press.

Digital Library

[18]

Srebro, N., & Ben-David, S. (2006). Learning bounds for support vector machines with learned kernels. COLT.

Digital Library

[19]

Vapnik, V. N. (1998). Statistical learning theory. John Wiley & Sons.

[20]

von Neumann, J. (1937). Uber ein ökonomisches Gleichungssystem. Ergebn. Math. Kolloq. Wein 8.

[21]

Zien, A., & Ong, C. S. (2007). Multiclass multiple kernel learning. ICML.

Digital Library

Cited By

Wijesinghe AWang QWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)DFNetsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454827(6009-6020)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454827
Shen YChen TGiannakis G(2019)Random feature-based online multi-kernel learning in environments with unknown dynamicsThe Journal of Machine Learning Research10.5555/3322706.332272820:1(773-808)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3322706.3322728
Vizcarra JPlace RTong LGutman DWang MShi XBuck MMa JVeltri P(2019)Fusion In Breast Cancer Histology ClassificationProceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3307339.3342166(485-493)Online publication date: 4-Sep-2019
https://dl.acm.org/doi/10.1145/3307339.3342166
Show More Cited By

Index Terms

L₂ regularization for learning kernels

Recommendations

Computing regularization paths for learning multiple kernels
NIPS'04: Proceedings of the 18th International Conference on Neural Information Processing Systems

The problem of learning a sparse conic combination of kernel functions or kernel matrices for classification or regression can be achieved via the regularization by a block 1-norm [1]. In this paper, we present an algorithm that computes the entire ...
Simultaneous learning of localized multiple kernels and classifier with weighted regularization
SSPR'12/SPR'12: Proceedings of the 2012 Joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition

Kernel classifiers have demonstrated their high performance for many classification problems. For the proper selection of kernel functions, multiple kernel learning (MKL) has been researched. Furthermore, the localized MKL (LMKL) enables to set the ...
Hierarchical kernels in deep kernel learning

Kernel methods are built upon the mathematical theory of reproducing kernels and reproducing kernel Hilbert spaces. They enjoy good interpretability thanks to the solid mathematical foundation. Recently, motivated by deep neural networks in deep learning,...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

UAI '09: Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence

June 2009

667 pages

ISBN:9780974903958

Conference Chair:
David McAllester
Toyota Technological Institute at Chicago

Sponsors

Google Inc.
IBMR: IBM Research
Intel: Intel
Microsoft Research: Microsoft Research

Publisher

AUAI Press

Arlington, Virginia, United States

Publication History

Published: 18 June 2009

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

26
Total Citations
View Citations
430
Total Downloads

Downloads (Last 12 months)96
Downloads (Last 6 weeks)16

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wijesinghe AWang QWallach HLarochelle HBeygelzimer Ad'Alché-Buc FFox E(2019)DFNetsProceedings of the 33rd International Conference on Neural Information Processing Systems10.5555/3454287.3454827(6009-6020)Online publication date: 8-Dec-2019
https://dl.acm.org/doi/10.5555/3454287.3454827
Shen YChen TGiannakis G(2019)Random feature-based online multi-kernel learning in environments with unknown dynamicsThe Journal of Machine Learning Research10.5555/3322706.332272820:1(773-808)Online publication date: 1-Jan-2019
https://dl.acm.org/doi/10.5555/3322706.3322728
Vizcarra JPlace RTong LGutman DWang MShi XBuck MMa JVeltri P(2019)Fusion In Breast Cancer Histology ClassificationProceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3307339.3342166(485-493)Online publication date: 4-Sep-2019
https://dl.acm.org/doi/10.1145/3307339.3342166
Nimishakavi MJawanpuria PMishra B(2018)A dual framework for low-rank tensor completionProceedings of the 32nd International Conference on Neural Information Processing Systems10.5555/3327345.3327453(5489-5500)Online publication date: 3-Dec-2018
https://dl.acm.org/doi/10.5555/3327345.3327453
Liu XLi MWang LDou YYin JZhu ESingh SMarkovitch S(2017)Multiple kernel k-means with incomplete kernelsProceedings of the Thirty-First AAAI Conference on Artificial Intelligence10.5555/3298483.3298565(2259-2265)Online publication date: 4-Feb-2017
https://dl.acm.org/doi/10.5555/3298483.3298565
Li MLiu XWang LDou YYin JZhu E(2016)Multiple kernel clustering with local kernel alignment maximizationProceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence10.5555/3060832.3060859(1704-1710)Online publication date: 9-Jul-2016
https://dl.acm.org/doi/10.5555/3060832.3060859
Liu XDou YYin JWang LZhu E(2016)Multiple kernel k-means clustering with matrix-induced regularizationProceedings of the Thirtieth AAAI Conference on Artificial Intelligence10.5555/3016100.3016163(1888-1894)Online publication date: 12-Feb-2016
https://dl.acm.org/doi/10.5555/3016100.3016163
Rebai IBenayed YMahdi W(2016)Deep multilayer multiple kernel learningNeural Computing and Applications10.1007/s00521-015-2066-x27:8(2305-2314)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1007/s00521-015-2066-x
Dong ZTian X(2015)Multi-level photo quality assessment with multi-view featuresNeurocomputing10.1016/j.neucom.2015.05.095168:C(308-319)Online publication date: 30-Nov-2015
https://dl.acm.org/doi/10.1016/j.neucom.2015.05.095
Liu CDong SLu BAbdel-Mottaleb M(2015)Multimedia event detection with ℓ2-regularized logistic Gaussian mixture regressionNeural Computing and Applications10.1007/s00521-014-1810-y26:7(1561-1574)Online publication date: 1-Oct-2015
https://dl.acm.org/doi/10.1007/s00521-014-1810-y
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents