article

Information-theoretic semi-supervised metric learning via entropy regularization

Authors:

Masashi SugiyamaAuthors Info & Claims

Neural Computation, Volume 26, Issue 8

Pages 1717 - 1762

https://doi.org/10.1162/NECO_a_00614

Published: 01 August 2014 Publication History

Abstract

We propose a general information-theoretic approach to semi-supervised metric learning called SERAPH SEmi-supervised metRic leArning Paradigm with Hypersparsity that does not rely on the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize its entropy on labeled data and minimize its entropy on unlabeled data following entropy regularization. For metric learning, entropy regularization improves manifold regularization by considering the dissimilarity information of unlabeled data in the unsupervised part, and hence it allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Moreover, we regularize SERAPH by trace-norm regularization to encourage low-dimensional projections associated with the distance metric. The nonconvex optimization problem of SERAPH could be solved efficiently and stably by either a gradient projection algorithm or an EM-like iterative algorithm whose M-step is convex. Experiments demonstrate that SERAPH compares favorably with many well-known metric learning methods, and the learned Mahalanobis distance possesses high discriminability even under noisy environments.

References

[1]

Argyriou, A., Evgeniou, T., & Pontil, M. (2007). In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Multi-task feature learning. In Advances in neural information processing systems, 19. Cambrige, MA: MIT Press.

[2]

Baghshah, M., & Shouraki, S. (2009). Semi-supervised metric learning using pairwise constraints. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.

[3]

Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In T. C. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems, 14. Cambridge, MA: MIT Press.

[4]

Bellare, K., Druck, G., & McCallum, A. (2009). Alternating projections for learning with expectation constraints. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. N.p.: AUAI Press.

[5]

Bellet, A., Habrard, A., & Sebban, M. (2013). A survey on metric learning for feature vectors and structured data. http://arxiv.org/abs/1306.6709.

[6]

Berger, A., Pietra, S., & Pietra, V. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39-71.

Digital Library

[7]

Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.

[8]

Chiaromonte, F., & Cook, R. (2002). Sufficient dimension reduction and graphics in regression. Annals of the Institute of Statistical Mathematics, 54, 768-795.

[9]

Davis, J., Kulis, B., Jain, P., Sra, S., & Dhillon, I. (2007). Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning. Madison, WI: Omni Press.

[10]

Dudík, M., & Schapire, R. E. (2006). Maximum entropy distribution estimation with generalized regularization. In Proceedings of the 19th Annual Conference on Learning Theory. Berlin: Springer-Verlag.

[11]

Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188.

[12]

Fukumizu, K., Bach, F., & Jordan, M. I. (2009) Kernel dimension reduction in regression. Annals of Statistics, 37(4), 1871-1905.

[13]

Gillenwater, J., Ganchev, K., Graça, J., Pereira, F., & Taskar, B. (2011). Posterior sparsity in usupervised dependency parsing. Journal of Machine Learning Research, 12, 455-490.

Digital Library

[14]

Globerson, A., & Roweis, S. (2006). Metric learning by collapsing classes. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in neural information processing systems, 18. Cambridge, MA: MIT Press.

[15]

Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2005). Neighbourhood components analysis. In L. K. Saul, Y. Weiss, & J. Platt (Eds.), Advances in neural information processing systems, 17. Cambridge, MA: MIT Press.

[16]

Gomes, R., Krause, A., & Perona, P. (2010). Discriminative clustering by regularized information maximization. In J. Lafferty (Ed.), Advances in neural information processing systems, 23. Red Hook, NY: Curran.

[17]

Graça, J., Ganchev, K., & Taskar, B. (2008). Expectation maximization and posterior constraints. In J. C. Platt, D. Köller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems, 20. Cambridge, MA: MIT Press.

[18]

Graça, J., Ganchev, K., Taskar, B., & Pereira, F. (2009). Posterior vs. parameter sparsity in latent variable models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, & A. Culota (Eds.), Advances in neural information processing systems, 22.

[19]

Grandvalet, Y., & Bengio, Y. (2005). Semi-supervised learning by entropy minimization. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems, 17. Cambridge, MA: MIT Press.

[20]

Grandvalet, Y., & Bengio, Y. (2006). Entropy regularization. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning (pp. 151-168). Cambridge, MA: MIT Press.

[21]

Hoi, S., Liu, W., & Chang, S.-F. (2008). Semi-supervised distance metric learning for collaborative image retrieval. In Proceedings of the 21st IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press.

[22]

Huang, K., Jin, R., Xu, Z., & Liu, C. (2010). Robust metric learning by smooth optimization. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann.

[23]

Huang, K., Ying, Y., & Campbell, C. (2009). GSML: A unified framework for sparse metric learning. In Proceedings of 9th IEEE International Conference on Data Mining. N.p.: AUAI Press.

[24]

Jain, P., Kulis, B., & Dhillon, I. (2010) Inductive regularized learning of kernel functions. In J. Lafferty (Ed.), Advances in neural information processing systems, 23. Red Hook, NY: Curran.

[25]

Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620-630.

[26]

Liu, W., Ma, S., Tao, D., Liu, J., & Liu, P. (2010). Semi-supervised sparse metric learning using alternating linearization optimization. In Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining. New York: ACM Press.

[27]

Niu, G., Dai, B., Yamada, M., & Sugiyama, M. (2012). Information-theoretic semi-supervised metric learning via entropy regularization. In Proceedings of the 29th International Conference on Machine Learning. Madison, WI: Omni Press.

[28]

Polyak, B. T. (1967) A general method for solving extremal problems (in Russian). Soviet Mathematics Doklady, 174(1), 33-36.

[29]

Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323-2326.

[30]

Schölkopf, B., & Smola, A. (2001). Learning with kernels. Cambridge, MA: MIT Press.

[31]

Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. Journal of Machine Learning Research, 8, 1027-1061.

Digital Library

[32]

Sugiyama, M., Idé, T., Nakajima, S., & Sese, J. (2010). Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Machine Learning, 78(1-2), 35-61.

Digital Library

[33]

Tenenbaum, J., de Silva, V., & Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319-2323.

[34]

Torresani, L., & Lee, K. (2007). Large margin component analysis. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 19. Cambridge, MA: MIT Press.

[35]

Wang, J., Do, H., Woznica, A., & Kalousis, A. (2011). Metric learning with multiple kernels. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N Pereira, & Q. Weinberger (Eds.), Advances in neural information processing systems, 24. Red Hook, NY: Curran.

[36]

Weinberger, K., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in neural information processing systems, 18. Cambridge, MA: MIT Press.

[37]

Xing, E., Ng, A., Jordan, M. I., & Russell, S. (2003). Distance metric learning with application to clustering with side-information. In S. Becker, S. Thrün, & K. Obermayer (Eds.), Advances in neural information processing systems, 15. Cambridge, MA: MIT Press.

[38]

Yang, L., Jin, R., Sukthankar, R., & Liu, Y. (2006). An efficient algorithm for local distance metric learning. In Proceedings of the 21st National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.

[39]

Ying, Y., Huang, K., & Campbell, C. (2009). Sparse metric learning via smooth optimization. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, & A. Culota (Eds.), Advances in neural information processing systems, 22.

[40]

Yuan, M. & Lin, Y. (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68 (1), 49-67.

[41]

Zha, Z., Mei, T., Wang, M., Wang, Z., & Hua, X. (2009). Robust distance metric learning with auxiliary knowledge. In Proceedings of 21st International Joint Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.

Cited By

Mao JWang WZhang MElkind E(2023)Label specific multi-semantics metric learning for multi-label classificationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/451(4055-4063)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/451
Deng HMeng XDeng FFeng L(2023)UNIT: A unified metric learning framework based on maximum entropy regularizationApplied Intelligence10.1007/s10489-023-04831-x53:20(24509-24529)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1007/s10489-023-04831-x
Huai MZheng TMiao CYao LZhang A(2022)On the Robustness of Metric Learning: An Adversarial PerspectiveACM Transactions on Knowledge Discovery from Data10.1145/350272616:5(1-25)Online publication date: 5-Apr-2022
https://dl.acm.org/doi/10.1145/3502726
Show More Cited By

Information-theoretic semi-supervised metric learning via entropy regularization
1. Computing methodologies

Recommendations

Information-theoretic semi-supervised metric learning via entropy regularization
ICML'12: Proceedings of the 29th International Coference on International Conference on Machine Learning

We propose a general information-theoretic approach called SERAPH (SEmi-supervised metRic leArning Paradigm with Hyper-sparsity) for metric learning that does not rely upon the manifold assumption. Given the probability parameterized by a Mahalanobis ...
Semi-supervised learning with nuclear norm regularization

Integrating new knowledge sources into various learning tasks to improve their performance has recently become an interesting topic. In this paper we propose a novel semi-supervised learning (SSL) approach, called semi-supervised learning with nuclear ...
Semi-supervised metric learning via topology preserving multiple semi-supervised assumptions

Learning an appropriate distance metric is a critical problem in pattern recognition. This paper addresses the problem of semi-supervised metric learning. We propose a new regularized semi-supervised metric learning (RSSML) method using local topology ...

Comments

Information & Contributors

Information

Published In

cover image Neural Computation

Neural Computation Volume 26, Issue 8

August 2014

291 pages

ISSN:0899-7667

Issue’s Table of Contents

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 01 August 2014

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Mao JWang WZhang MElkind E(2023)Label specific multi-semantics metric learning for multi-label classificationProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/451(4055-4063)Online publication date: 19-Aug-2023
https://dl.acm.org/doi/10.24963/ijcai.2023/451
Deng HMeng XDeng FFeng L(2023)UNIT: A unified metric learning framework based on maximum entropy regularizationApplied Intelligence10.1007/s10489-023-04831-x53:20(24509-24529)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1007/s10489-023-04831-x
Huai MZheng TMiao CYao LZhang A(2022)On the Robustness of Metric Learning: An Adversarial PerspectiveACM Transactions on Knowledge Discovery from Data10.1145/350272616:5(1-25)Online publication date: 5-Apr-2022
https://dl.acm.org/doi/10.1145/3502726
Chen ZLi CGao JZhang JLi P(2021)Semisupervised Deep Embedded Clustering with Adaptive LabelsScientific Programming10.1155/2021/66134522021Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.1155/2021/6613452
Chu XLin YWang YWang XYu HGao XTong QDaumé HSingh A(2020)Distance metric learning with joint representation diversificationProceedings of the 37th International Conference on Machine Learning10.5555/3524938.3525121(1962-1973)Online publication date: 13-Jul-2020
https://dl.acm.org/doi/10.5555/3524938.3525121
Cui ZCharoenphakdee NSato ISugiyama M(2020)Classification from Triplet Comparison DataNeural Computation10.1162/neco_a_0126232:3(659-681)Online publication date: 8-Jun-2020
https://dl.acm.org/doi/10.1162/neco_a_01262
Huai MMiao CLi YSuo QSu LZhang A(2020)Learning Distance Metrics from Probabilistic InformationACM Transactions on Knowledge Discovery from Data10.1145/336432014:5(1-33)Online publication date: 6-Jul-2020
https://dl.acm.org/doi/10.1145/3364320
Wang WWang HZhang CGao Y(2018)Cross-domain metric and multiple kernel learning based on information theoryNeural Computation10.1162/neco_a_0105330:3(820-855)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1162/neco_a_01053
Huai MMiao CLi YSuo QSu LZhang AGuo YFarooq F(2018)Metric Learning from Probabilistic LabelsProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219976(1541-1550)Online publication date: 19-Jul-2018
https://dl.acm.org/doi/10.1145/3219819.3219976
Liang JHu QZhu PWang W(2018)Efficient multi-modal geometric mean metric learningPattern Recognition10.1016/j.patcog.2017.02.03275:C(188-198)Online publication date: 1-Mar-2018
https://dl.acm.org/doi/10.1016/j.patcog.2017.02.032
Show More Cited By

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents