Article

Multiple kernel learning and the SMO algorithm

Authors:

S. V. N. Vishwanathan,

Nawanol Theera-Ampornpunt,

Manik VarmaAuthors Info & Claims

NIPS'10: Proceedings of the 24th International Conference on Neural Information Processing Systems - Volume 2

Pages 2361 - 2369

Published: 06 December 2010 Publication History

Abstract

Our objective is to train p-norm Multiple Kernel Learning (MKL) and, more generally, linear MKL regularised by the Bregman divergence, using the Sequential Minimal Optimization (SMO) algorithm. The SMO algorithm is simple, easy to implement and adapt, and efficiently scales to large problems. As a result, it has gained widespread acceptance and SVMs are routinely trained using SMO in diverse real world applications. Training using SMO has been a long standing goal in MKL for the very same reasons. Unfortunately, the standard MKL dual is not differentiable, and therefore can not be optimised using SMO style co-ordinate ascent. In this paper, we demonstrate that linear MKL regularised with the p-norm squared, or with certain Bregman divergences, can indeed be trained using SMO. The resulting algorithm retains both simplicity and efficiency and is significantly faster than state-of-the-art specialised p-norm MKL solvers. We show that we can train on a hundred thousand kernels in approximately seven minutes and on fifty thousand points in less than half an hour on a single core.

References

[1]

http://www.csie.ntu.edu.tw/ cjlin/libsvmtools/datasets/binary.html.

[2]

F. R. Bach. Exploring large feature spaces with hierarchical multiple kernel learning. In NIPS, pages 105-112, 2008.

Digital Library

[3]

F. R. Bach, G. R. G. Lanckriet, and M. I. Jordan. Multiple kernel learning, conic duality, and the SMO algorithm. In ICML, pages 6-13, 2004.

Digital Library

[4]

A. Ben-Tal, T. Margalit, and A. Nemirovski. The ordered subsets mirror descent optimization method with applications to tomography. SIAM Journal of Opimization, 12(1):79-108, 2001.

Digital Library

[5]

C.-C. Chang and C.-J. Lin. LIBSVM: a library for support vector machines, 2001. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm.

Digital Library

[6]

C. Cortes, M. Mohri, and A. Rostamizadeh. L2 regularization for learning kernels. In UAI, 2009.

Digital Library

[7]

C. Cortes, M. Mohri, and A. Rostamizadeh. Learning non-linear combinations of kernels. In NIPS, 2009.

Digital Library

[8]

R. E. Fan, P. H. Chen, and C. J. Lin. Working set selection using second order information for training SVM. JMLR, 6:1889-1918, 2005.

Digital Library

[9]

C. Gentile. Robustness of the p-norm algorithms. ML, 53(3):265-299, 2003.

Digital Library

[10]

M. Gonen and E. Alpaydin. Localized multiple kernel learning. In ICML, 2008.

Digital Library

[11]

J. Kivinen, M. K. Warmuth, andB. Hassibi. The p-norm generaliziation of the LMS algorithm for adaptive filtering. IEEE Trans. Signal Processing, 54(5):1782-1793, 2006.

Digital Library

[12]

M. Kloft, U. Brefeld, S. Sonnenburg, P. Laskov, K.-R. Muller, and A. Zien. Efficient and accurate l_p-norm Multiple Kernel Learning. In NIPS, 2009.

Digital Library

[13]

G. R. G. Lanckriet, N. Cristianini, P. Bartlett, L. El Ghaoui, and M. I. Jordan. Learning the kernel matrix with semidefinite programming. JMLR, 5:27-72, 2004.

Digital Library

[14]

C. J. Lin, S. Lucidi, L. Palagi, A. Risi, and M. Sciandrone. Decomposition algorithm model for singly linearly-constrained problems subject to lower and upper bounds. JOTA, 141(1):107-126, 2009.

[15]

J. Platt. Fast training of support vector machines using sequential minimal optimization. In Advances in Kernel Methods - Support Vector Learning, pages 185-208, 1999.

Digital Library

[16]

A. Rakotomamonjy, F. Bach, Y. Grandvalet, and S. Canu. SimpleMKL. JMLR, 9:2491-2521, 2008.

[17]

S. Sonnenburg, G. Raetsch, C. Schaefer, and B. Schoelkopf. Large scale multiple kernel learning. JMLR, 7:1531-1565, 2006.

Digital Library

[18]

M. Varma and B. R. Babu. More generality in efficient multiple kernel learning. In ICML, 2009.

Digital Library

[19]

A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In ICCV, 2009.

[20]

S. V. N. Vishwanathan, Z. Sun, N. Theera-Ampornpunt, and M. Varma, 2010. The SMO-MKL code http://research.microsoft.com/~manik/code/SMO-MKL/download.html.

[21]

J. Yang, Y. Li, Y. Tian, L. Duan, and W. Gao. Group-sensitive multiple kernel learning for object categorization. In ICCV, 2009.

Cited By

Won DManzour HChaovalitwongse W(2020)Convex Optimization for Group Feature Selection in Networked DataINFORMS Journal on Computing10.1287/ijoc.2018.086832:1(182-198)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1287/ijoc.2018.0868
Hou SYe YSong YAbdulhayoglu M(2018)Make evasion harderProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304750(5279-5283)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304652.3304750
Yi YHu PDeng X(2018)Human action recognition with salient trajectories and multiple kernel learningMultimedia Tools and Applications10.5555/3269029.326911177:14(17709-17730)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.5555/3269029.3269111
Show More Cited By

Index Terms

Multiple kernel learning and the SMO algorithm

Index terms have been assigned to the content through auto-classification.

Recommendations

A pre-selecting base kernel method in multiple kernel learning

The pre-defined base kernel greatly affects the performance of multiple kernel learning (MKL), but selecting the pre-defined base kernel still has no theoretical guidance. In practice, it is very difficult to select a set of appropriate base kernels ...
Efficient Sparse Generalized Multiple Kernel Learning

Kernel methods have been successfully applied in various applications. To succeed in these applications, it is crucial to learn a good kernel representation, whose objective is to reveal the data similarity precisely. In this paper, we address the ...
A Multiple Kernel Learning Model Based on p-Norm

By utilizing kernel functions, support vector machines (SVMs) successfully solve the linearly inseparable problems. Subsequently, its applicable areas have been greatly extended. Using multiple kernels (MKs) to improve the SVM classification ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'10: Proceedings of the 24th International Conference on Neural Information Processing Systems - Volume 2

December 2010

2630 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 06 December 2010

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Won DManzour HChaovalitwongse W(2020)Convex Optimization for Group Feature Selection in Networked DataINFORMS Journal on Computing10.1287/ijoc.2018.086832:1(182-198)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.1287/ijoc.2018.0868
Hou SYe YSong YAbdulhayoglu M(2018)Make evasion harderProceedings of the 27th International Joint Conference on Artificial Intelligence10.5555/3304652.3304750(5279-5283)Online publication date: 13-Jul-2018
https://dl.acm.org/doi/10.5555/3304652.3304750
Yi YHu PDeng X(2018)Human action recognition with salient trajectories and multiple kernel learningMultimedia Tools and Applications10.5555/3269029.326911177:14(17709-17730)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.5555/3269029.3269111
Yin SKang HChen ZKim SHung CSaid L(2018)A malware detection system based on heterogeneous information networkProceedings of the 2018 Conference on Research in Adaptive and Convergent Systems10.1145/3264746.3264784(154-159)Online publication date: 9-Oct-2018
https://dl.acm.org/doi/10.1145/3264746.3264784
Fanello SCiliberto CNoceti NMetta GOdone F(2017)Visual recognition for humanoid robotsRobotics and Autonomous Systems10.1016/j.robot.2016.10.00191:C(151-168)Online publication date: 1-May-2017
https://dl.acm.org/doi/10.1016/j.robot.2016.10.001
Li LDai S(2017)Action recognition with spatio-temporal augmented descriptor and fusion methodMultimedia Tools and Applications10.1007/s11042-016-3789-076:12(13953-13969)Online publication date: 1-Jun-2017
https://dl.acm.org/doi/10.1007/s11042-016-3789-0
Zhang HZhang WLiu WXu XFan H(2016)Multiple kernel visual-auditory representation learning for retrievalMultimedia Tools and Applications10.1007/s11042-016-3294-575:15(9169-9184)Online publication date: 1-Aug-2016
https://dl.acm.org/doi/10.1007/s11042-016-3294-5
Rebai IBenayed YMahdi W(2016)Deep multilayer multiple kernel learningNeural Computing and Applications10.1007/s00521-015-2066-x27:8(2305-2314)Online publication date: 1-Nov-2016
https://dl.acm.org/doi/10.1007/s00521-015-2066-x
Xia HWu PHoi SLeonardi SPanconesi AFerragina PGionis A(2013)Online multi-modal distance learning for scalable multimedia retrievalProceedings of the sixth ACM international conference on Web search and data mining10.1145/2433396.2433453(455-464)Online publication date: 4-Feb-2013
https://dl.acm.org/doi/10.1145/2433396.2433453
Jain AVishwanathan SVarma MYang QAgarwal DPei J(2012)SPF-GMKLProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2339530.2339648(750-758)Online publication date: 12-Aug-2012
https://dl.acm.org/doi/10.1145/2339530.2339648

View Options

View options

Media

Figures

Other

Tables

View Table of Contents