skip to main content
article

Information-theoretic semi-supervised metric learning via entropy regularization

Published: 01 August 2014 Publication History

Abstract

We propose a general information-theoretic approach to semi-supervised metric learning called SERAPH SEmi-supervised metRic leArning Paradigm with Hypersparsity that does not rely on the manifold assumption. Given the probability parameterized by a Mahalanobis distance, we maximize its entropy on labeled data and minimize its entropy on unlabeled data following entropy regularization. For metric learning, entropy regularization improves manifold regularization by considering the dissimilarity information of unlabeled data in the unsupervised part, and hence it allows the supervised and unsupervised parts to be integrated in a natural and meaningful way. Moreover, we regularize SERAPH by trace-norm regularization to encourage low-dimensional projections associated with the distance metric. The nonconvex optimization problem of SERAPH could be solved efficiently and stably by either a gradient projection algorithm or an EM-like iterative algorithm whose M-step is convex. Experiments demonstrate that SERAPH compares favorably with many well-known metric learning methods, and the learned Mahalanobis distance possesses high discriminability even under noisy environments.

References

[1]
Argyriou, A., Evgeniou, T., & Pontil, M. (2007). In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Multi-task feature learning. In Advances in neural information processing systems, 19. Cambrige, MA: MIT Press.
[2]
Baghshah, M., & Shouraki, S. (2009). Semi-supervised metric learning using pairwise constraints. In Proceedings of the 21st International Joint Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.
[3]
Belkin, M., & Niyogi, P. (2002). Laplacian eigenmaps and spectral techniques for embedding and clustering. In T. C. Dietterich, S. Becker, & Z. Ghahramani (Eds.), Advances in neural information processing systems, 14. Cambridge, MA: MIT Press.
[4]
Bellare, K., Druck, G., & McCallum, A. (2009). Alternating projections for learning with expectation constraints. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence. N.p.: AUAI Press.
[5]
Bellet, A., Habrard, A., & Sebban, M. (2013). A survey on metric learning for feature vectors and structured data. http://arxiv.org/abs/1306.6709.
[6]
Berger, A., Pietra, S., & Pietra, V. (1996). A maximum entropy approach to natural language processing. Computational Linguistics, 22, 39-71.
[7]
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge University Press.
[8]
Chiaromonte, F., & Cook, R. (2002). Sufficient dimension reduction and graphics in regression. Annals of the Institute of Statistical Mathematics, 54, 768-795.
[9]
Davis, J., Kulis, B., Jain, P., Sra, S., & Dhillon, I. (2007). Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine Learning. Madison, WI: Omni Press.
[10]
Dudík, M., & Schapire, R. E. (2006). Maximum entropy distribution estimation with generalized regularization. In Proceedings of the 19th Annual Conference on Learning Theory. Berlin: Springer-Verlag.
[11]
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(2), 179-188.
[12]
Fukumizu, K., Bach, F., & Jordan, M. I. (2009) Kernel dimension reduction in regression. Annals of Statistics, 37(4), 1871-1905.
[13]
Gillenwater, J., Ganchev, K., Graça, J., Pereira, F., & Taskar, B. (2011). Posterior sparsity in usupervised dependency parsing. Journal of Machine Learning Research, 12, 455-490.
[14]
Globerson, A., & Roweis, S. (2006). Metric learning by collapsing classes. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in neural information processing systems, 18. Cambridge, MA: MIT Press.
[15]
Goldberger, J., Roweis, S., Hinton, G., & Salakhutdinov, R. (2005). Neighbourhood components analysis. In L. K. Saul, Y. Weiss, & J. Platt (Eds.), Advances in neural information processing systems, 17. Cambridge, MA: MIT Press.
[16]
Gomes, R., Krause, A., & Perona, P. (2010). Discriminative clustering by regularized information maximization. In J. Lafferty (Ed.), Advances in neural information processing systems, 23. Red Hook, NY: Curran.
[17]
Graça, J., Ganchev, K., & Taskar, B. (2008). Expectation maximization and posterior constraints. In J. C. Platt, D. Köller, Y. Singer, & S. Roweis (Eds.), Advances in neural information processing systems, 20. Cambridge, MA: MIT Press.
[18]
Graça, J., Ganchev, K., Taskar, B., & Pereira, F. (2009). Posterior vs. parameter sparsity in latent variable models. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, & A. Culota (Eds.), Advances in neural information processing systems, 22.
[19]
Grandvalet, Y., & Bengio, Y. (2005). Semi-supervised learning by entropy minimization. In L. K. Saul, Y. Weiss, & L. Bottou (Eds.), Advances in neural information processing systems, 17. Cambridge, MA: MIT Press.
[20]
Grandvalet, Y., & Bengio, Y. (2006). Entropy regularization. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning (pp. 151-168). Cambridge, MA: MIT Press.
[21]
Hoi, S., Liu, W., & Chang, S.-F. (2008). Semi-supervised distance metric learning for collaborative image retrieval. In Proceedings of the 21st IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE Press.
[22]
Huang, K., Jin, R., Xu, Z., & Liu, C. (2010). Robust metric learning by smooth optimization. In Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. San Francisco: Morgan Kaufmann.
[23]
Huang, K., Ying, Y., & Campbell, C. (2009). GSML: A unified framework for sparse metric learning. In Proceedings of 9th IEEE International Conference on Data Mining. N.p.: AUAI Press.
[24]
Jain, P., Kulis, B., & Dhillon, I. (2010) Inductive regularized learning of kernel functions. In J. Lafferty (Ed.), Advances in neural information processing systems, 23. Red Hook, NY: Curran.
[25]
Jaynes, E. T. (1957). Information theory and statistical mechanics. Physical Review, 106(4), 620-630.
[26]
Liu, W., Ma, S., Tao, D., Liu, J., & Liu, P. (2010). Semi-supervised sparse metric learning using alternating linearization optimization. In Proceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining. New York: ACM Press.
[27]
Niu, G., Dai, B., Yamada, M., & Sugiyama, M. (2012). Information-theoretic semi-supervised metric learning via entropy regularization. In Proceedings of the 29th International Conference on Machine Learning. Madison, WI: Omni Press.
[28]
Polyak, B. T. (1967) A general method for solving extremal problems (in Russian). Soviet Mathematics Doklady, 174(1), 33-36.
[29]
Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290, 2323-2326.
[30]
Schölkopf, B., & Smola, A. (2001). Learning with kernels. Cambridge, MA: MIT Press.
[31]
Sugiyama, M. (2007). Dimensionality reduction of multimodal labeled data by local Fisher discriminant analysis. Journal of Machine Learning Research, 8, 1027-1061.
[32]
Sugiyama, M., Idé, T., Nakajima, S., & Sese, J. (2010). Semi-supervised local Fisher discriminant analysis for dimensionality reduction. Machine Learning, 78(1-2), 35-61.
[33]
Tenenbaum, J., de Silva, V., & Langford, J. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290, 2319-2323.
[34]
Torresani, L., & Lee, K. (2007). Large margin component analysis. In B. Schölkopf, J. Platt, & T. Hoffman (Eds.), Advances in neural information processing systems, 19. Cambridge, MA: MIT Press.
[35]
Wang, J., Do, H., Woznica, A., & Kalousis, A. (2011). Metric learning with multiple kernels. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. C. N Pereira, & Q. Weinberger (Eds.), Advances in neural information processing systems, 24. Red Hook, NY: Curran.
[36]
Weinberger, K., Blitzer, J., & Saul, L. (2006). Distance metric learning for large margin nearest neighbor classification. In Y. Weiss, B. Schölkopf, & J. Platt (Eds.), Advances in neural information processing systems, 18. Cambridge, MA: MIT Press.
[37]
Xing, E., Ng, A., Jordan, M. I., & Russell, S. (2003). Distance metric learning with application to clustering with side-information. In S. Becker, S. Thrün, & K. Obermayer (Eds.), Advances in neural information processing systems, 15. Cambridge, MA: MIT Press.
[38]
Yang, L., Jin, R., Sukthankar, R., & Liu, Y. (2006). An efficient algorithm for local distance metric learning. In Proceedings of the 21st National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.
[39]
Ying, Y., Huang, K., & Campbell, C. (2009). Sparse metric learning via smooth optimization. In Y. Bengio, D. Schuurmans, J. Lafferty, C. Williams, & A. Culota (Eds.), Advances in neural information processing systems, 22.
[40]
Yuan, M. & Lin, Y. (2006) Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B, 68 (1), 49-67.
[41]
Zha, Z., Mei, T., Wang, M., Wang, Z., & Hua, X. (2009). Robust distance metric learning with auxiliary knowledge. In Proceedings of 21st International Joint Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press.

Cited By

View all
  1. Information-theoretic semi-supervised metric learning via entropy regularization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Neural Computation
    Neural Computation  Volume 26, Issue 8
    August 2014
    291 pages

    Publisher

    MIT Press

    Cambridge, MA, United States

    Publication History

    Published: 01 August 2014

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media