research-article

An Optimization Framework for Combining Ensembles of Classifiers and Clusterers with Applications to Nontransductive Semisupervised Learning and Transfer Learning

Authors:

Eduardo R. Hruschka,

Sreangsu AcharyyaAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data (TKDD), Volume 9, Issue 1

Article No.: 1, Pages 1 - 35

https://doi.org/10.1145/2601435

Published: 25 August 2014 Publication History

Abstract

Unsupervised models can provide supplementary soft constraints to help classify new “target” data because similar instances in the target set are more likely to share the same class label. Such models can also help detect possible differences between training and target distributions, which is useful in applications where concept drift may take place, as in transfer learning settings. This article describes a general optimization framework that takes as input class membership estimates from existing classifiers learned on previously encountered “source” (or training) data, as well as a similarity matrix from a cluster ensemble operating solely on the target (or test) data to be classified, and yields a consensus labeling of the target data. More precisely, the application settings considered are nontransductive semisupervised and transfer learning scenarios where the training data are used only to build an ensemble of classifiers and are subsequently discarded before classifying the target data. The framework admits a wide range of loss functions and classification/clustering methods. It exploits properties of Bregman divergences in conjunction with Legendre duality to yield a principled and scalable approach. A variety of experiments show that the proposed framework can yield results substantially superior to those provided by naïvely applying classifiers learned on the original task to the target data. In addition, we show that the proposed approach, even not being conceptually transductive, can provide better results compared to some popular transductive learning techniques.

References

[1]

A. Acharya, E. R. Hruschka, J. Ghosh, and S. Acharyya. 2011. C³ E: A framework for combining ensembles of classifiers and clusterers. In Proceedings of the 10th International Workshop on MCS, 269--278.

Digital Library

[2]

A. Acharya, E.R. Hruschka, J. Ghosh, and S. Acharyya. 2012. Transfer learning with cluster ensembles. In JMLR Workshop and Conference Proceedings 27 (2012), 123--132.

[3]

A. Banerjee, S. Merugu, Inderjit S. Dhillon, and J. Ghosh. 2005. Clustering with Bregman divergences. Journal of Machine Learning Research 6 (December 2005), 1705--1749.

Digital Library

[4]

M. Belkin, P. Niyogi, and V. Sindhwani. 2005. On manifold regularization. In AISTAT.

[5]

Y. Bengio, O. Delalleau, and N. Le Roux. 2006. Label propagation and quadratic criterion. In Semi-Supervised Learning, Olivier Chapelle, Bernhard Schölkopf, and Alexander Zien (Eds.). MIT Press, 193--216.

[6]

J. Bezdek and R. Hathaway. 2002. Some notes on alternating optimization. In Advances in Soft Computing AFSS 2002, Nikhil Pal and Michio Sugeno (Eds.). Lecture Notes in Computer Science, Vol. 2275. Springer, Berlin, 187--195.

Digital Library

[7]

J. C. Bezdek and R. J. Hathaway. 2003. Convergence of alternating optimization. Neural, Parallel Science Computing 11, 4 (2003), 351--368.

Digital Library

[8]

A. Blum. 1998. On-line algorithms in machine learning. In Online Algorithms: The State of the Art, Fiat and Woeginger (Eds.). LNCS Vol.1442, Springer.

Digital Library

[9]

K. D. Bollacker and J. Ghosh. 2000. Knowledge transfer mechanisms for characterizing image datasets. In Soft Computing and Image Processing. Physica-Verlag, Heidelberg.

[10]

S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. 2011. Distributed optimization and statistical learning via the alternating direction method of multipliers. Technical Report.

[11]

L. M. Bregman. 1967. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. U.S.S.R. Computing, Mathematics and Mathematical Physics 7, 3 (1967), 200--217.

[12]

W. Cai, S. Chen, and D. Zhang. 2009. A simultaneous learning framework for clustering and classification. Pattern Recognition 42, 7 (July 2009), 1248--1259.

Digital Library

[13]

R. Caruana. 1997. Multitask Learning. Machine Learning 28, 1 (July 1997), 41--75.

Digital Library

[14]

Y. Al Censor and S. A. Zenios. 1997. Parallel Optimization: Theory, Algorithms and Applications. Oxford University Press.

Digital Library

[15]

O. Chapelle, B. Schölkopf, and A. Zien. 2006. Semi-Supervised Learning (Adaptive Computation and Machine Learning). The MIT Press.

Digital Library

[16]

S. Chen, G. Guo, and L. Chen. 2009. Semi-supervised classification based on clustering ensembles. In Proceedings of AICI’09. Springer-Verlag, 629--638.

Digital Library

[17]

W. Cheney and A. A. Goldstein. 1959. Proximity maps for convex sets. Proceedings of the American Mathematical Society 10, 3 (1959), 448--450.

[18]

A. Corduneanu and T. Jaakkola. 2003. On information regularization. In UAI. 151--158.

Digital Library

[19]

I. Csiszár and G. Tusnády. 1984. Information geometry and alternating minimization procedures. Statistics aand Decisions 1, 1 (1984), 205--237.

[20]

W. Dai, G. Xue, Q. Yang, and Y. Yu. 2007a. Co-clustering based classification for out-of-domain documents. In Proceedings of Knowledge Discovery and Data Mining. New York, NY, 210--219.

Digital Library

[21]

W. Dai, Q. Yang, Gui rong Xue, and Yong Yu. 2007b. Boosting for transfer learning. In Proc. of ICML. 193--200.

Digital Library

[22]

J. Demsar. 2006. Statistical comparison of classifiers over multiple data sets. Journal of Machine Learning Research 7, 7 (2006), 1--30.

Digital Library

[23]

P. P. B. Eggermont and V. N. LaRiccia. 1998. On EM-like Algorithms for Minimum Distance Estimation. Unpublished manuscript, University of Delaware.

[24]

X. Fern and C. Brodley. 2004. Solving cluster ensemble problems by bipartite graph partitioning. In Proceedings of of International Conference on Machine Learning. 281--288.

Digital Library

[25]

G. Forestier, P. Gançarski, and C. Wemmert. 2010. Collaborative clustering with background knowledge. Data Knowledge Engineering 69, 2 (February 2010), 211--228.

Digital Library

[26]

J. Gao, W. Fan, J. Jiang, and J. Han. 2008. Knowledge transfer via multiple model local structure mapping. In Proceedings of Knowledge Discovery and Data Mining. 283--291.

Digital Library

[27]

J. Gao, F. Liang, W. Fan, Y. Sun, and J. Han. 2009. Graph-based consensus maximization among multiple supervised and unsupervised models. In Proceedings of Neural Information Processing Systems. 1--9.

[28]

J. Gao, F. Liang, W. Fan, Y. Sun, and J. Han. 2013. A graph-based consensus maximization approach for combining multiple supervised and unsupervised models. IEEE Transactions on Knowledge and Data Engineering 25, 1, 15--28.

Digital Library

[29]

J. Ghosh and A. Acharya. 2011. Cluster ensembles. WIREs Data Mining and Knowledge Discovery 1 (2011), 1--12.

Digital Library

[30]

A. Gunawardana and W. Byrne. 2005. Convergence theorems for generalized alternating minimization procedures. Journal of Machine Learning Research 6 (2005), 2049--2073.

Digital Library

[31]

T. Joachims. 1999a. Making large-scale SVM learning practical. In Advances in Kernel Methods: Support Vector Learning, C. Burges B. Scholkopf and A. Smola (Eds.). MIT Press, Cambridge, MA, 169--184.

Digital Library

[32]

T. Joachims. 1999b. Transductive inference for text classification using support vector machines. In Proceedings of International Conference on Machine Learning. 200--209.

Digital Library

[33]

T. Joachims. 2003. Transductive learning via spectral graph partitioning. In Proceedings of the 20th International Conference on Machine Learning (ICML-2003). 290--297.

[34]

S. Kumar, J. Ghosh, and M. M. Crawford. 2001. Best-bases feature extraction algorithms for classification of hyperspectral data. IEEE TGRS 39, 7 (2001), 1368--79.

[35]

L. I. Kuncheva. 2004. Combining Pattern Classifiers: Methods and Algorithms. Wiley, Hoboken, NJ.

Digital Library

[36]

N. C. Oza and K. Tumer. 2008. Classifier ensembles: Select real-world applications. Information Fusion 9, 1 (Jan. 2008), 4--20.

Digital Library

[37]

S. J. Pan and Q. Yang. 2010. A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22 (2010), 1345--1359.

Digital Library

[38]

R. Polikar. 2007. Bootstrap-inspired techniques in computational intelligence. IEEE Signal Processing Magazine 24, 4, 59--72.

[39]

K. Punera and J. Ghosh. 2008. Consensus based ensembles of soft clusterings. In Applied Artificial Intelligence, Vol. 22. 109--117.

Digital Library

[40]

S. Rajan, J. Ghosh, and M. M. Crawford. 2006. Exploiting class Hierarchies for knowledge transfer in hyperspectral data. IEEE TGRS 44, 11 (2006), 3408--3417.

[41]

D. L. Silver and K. P. Bennett. 2008. Guest editor’s introduction: Special issue on inductive transfer learning. Machine Learning 73, 3 (Dec. 2008), 215--220. Issue 3.

Digital Library

[42]

V. Sindhwani and S. S. Keerthi. 2006. Large scale semi-supervised linear SVMs. In Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. NY, 477--484.

Digital Library

[43]

K. Sridharan and S. M. Kakade. 2008. An information theoretic framework for multi-view learning. In COLT. 403--414.

[44]

A. Strehl and J. Ghosh. 2002a. Cluster ensembles—A knowledge reuse framework for combining multiple partitions. Journal of Machine Learning Research 3 (Dec. 2002), 583--617.

Digital Library

[45]

A. Strehl and J. Ghosh. 2002b. Cluster ensembles—A knowledge reuse framework for combining partitionings. In Proceedings of AAAI 2002, Edmonton, Canada. AAAI, 93--98.

Digital Library

[46]

A. Subramanya and J. A. Bilmes. 2009. Entropic graph regularization in non-parametric semi-supervised classification. In Proceedings of Neural Information Processing Systems. Vancouver, Canada, 1803--1811.

[47]

A. Subramanya and J. Bilmes. 2011. Semi-supervised learning with measure propagation. Journal of Machine Learning Research 12 (2011), 3311--3370.

Digital Library

[48]

S. Thrun and L. Y. Pratt. 1997. Learning To Learn. Kluwer Academic, Norwell, MA.

Digital Library

[49]

K. Tsuda. 2005. Propagating distributions on a hypergraph by dual information regularization. In Proceedings of the 22nd International Conference on Machine Learning (ICML’05). ACM, New York, NY, USA, 920--927.

Digital Library

[50]

K. Tumer and J. Ghosh. 1996. Analysis of decision boundaries in linearly combined neural classifiers. Pattern Recognition 29, 2 (1996), 341--348.

[51]

H. Wang, H. Shan, and A. Banerjee. 2009. Bayesian cluster ensembles. In Proceedings of the 9th SIAM International Conference on Data Mining. 211--222.

[52]

H. Wang, H. Shan, and A. Banerjee. 2011. Bayesian cluster ensembles. Statistical Analysis and Data Mining 1 (Jan. 2011), 1--17.

Digital Library

[53]

J. Wang, T. Jebara, and S. Chang. 2008. Graph transduction via alternating minimization. In Proceedings of the 25th Annual International Conference on Machine Learning (ICML’08), A’ Mccallum and S’ Roweis (Eds.). Omnipress, 1144--1151.

Digital Library

[54]

S. Wang and D. Schuurmans. 2003a. Learning continuous latent variable models with Bregman divergences. 2842 (2003), 190--204.

[55]

S. Wang and D. Schuurmans. 2003b. Learning latent variable models with Bregman divergences. In Proceedings of the IEEE International Symposium on Information Theory, 220--220.

[56]

M. Welling, R. S. Zemel, and G. E. Hinton. 2002. Self supervised boosting. In Advances in Neural Information Processing Systems 15. MIT Press, 665--672.

[57]

C. F. J. Wu. 1982. On the convergence properties of the EM algorithm. Annals of Statistics (1982).

[58]

S. Xie, W. Fan, and Philip S. Yu. 2012. An iterative and re-weighting framework for rejection and uncertainty resolution in crowdsourcing. (2012), 1--12.

[59]

W. Zangwill. 1969. Nonlinear Programming: A Unified Approach. Prentice-Hall International Series in Management, Englewood Cliffs, NJ.

[60]

T. Zhang, A. Popescul, and B. Dom. 2006. Linear prediction models with graph regularization for web-page categorization. In Proceedings of the 12th ACM SIGKDD. ACM, New York, NY, 821--826.

Digital Library

[61]

X. Zhu and Z. Ghahramani. 2002. Learning from Labeled and Unlabeled Data with Label Propagation. Technical Report. Carnegie Mellon University.

[62]

X. Zhu, Z. Ghahramani, and J. D. Lafferty. 2003. Semi-supervised learning using Gaussian fields and harmonic functions. In ICML, T. Fawcett, N. Mishra, T. Fawcett, and N. Mishra (Eds.). AAAI Press, 912--919.

[63]

X. Zhu and A. B. Goldberg. 2009. Introduction to Semi-Supervised Learning. Morgan & Claypool Publishers.

Digital Library

Cited By

Wang HLiu GHu P(2023)TDAN: Transferable Domain Adversarial Network for Link Prediction in Heterogeneous Social NetworksACM Transactions on Knowledge Discovery from Data10.1145/361022918:1(1-22)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3610229
Li ZZhu YVan Leeuwen M(2023)A Survey on Explainable Anomaly DetectionACM Transactions on Knowledge Discovery from Data10.1145/360933318:1(1-54)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3609333
Corbara SMoreo ASebastiani F(2023)Same or Different? Diff-Vectors for Authorship AnalysisACM Transactions on Knowledge Discovery from Data10.1145/360922618:1(1-36)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3609226
Show More Cited By

Index Terms

An Optimization Framework for Combining Ensembles of Classifiers and Clusterers with Applications to Nontransductive Semisupervised Learning and Transfer Learning
1. Applied computing
  1. Document management and text processing
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
      2. Unsupervised learning
        Cluster analysis
    2. Machine learning approaches
      1. Classification and regression trees

Recommendations

Transductive Multilabel Learning via Label Set Propagation

The problem of multilabel classification has attracted great interest in the last decade, where each instance can be assigned with a set of multiple class labels simultaneously. It has a wide variety of real-world applications, e.g., automatic image ...
Transfer learning with cluster ensembles
UTLW'11: Proceedings of the 2011 International Conference on Unsupervised and Transfer Learning workshop - Volume 27

Traditional supervised learning algorithms typically assume that the training data and test data come from a common underlying distribution. Therefore, they are challenged by the mismatch between training and test distributions encountered in transfer ...
Spatially regularized semisupervised Ensembles of Extreme Learning Machines for hyperspectral image segmentation

This paper explores the performance of Ensembles of Extreme Learning Machine classifiers for hyperspectral image classification and segmentation in a semisupervised and spatially regularized process. The approach assumes that we have available only a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Volume 9, Issue 1

October 2014

209 pages

ISSN:1556-4681

EISSN:1556-472X

DOI:10.1145/2663598

Editor:
Philip S. Yu
University of Illinois at Chicago, USA

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 August 2014

Accepted: 01 December 2013

Revised: 01 June 2013

Received: 01 April 2012

Published in TKDD Volume 9, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
535
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)2

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wang HLiu GHu P(2023)TDAN: Transferable Domain Adversarial Network for Link Prediction in Heterogeneous Social NetworksACM Transactions on Knowledge Discovery from Data10.1145/361022918:1(1-22)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3610229
Li ZZhu YVan Leeuwen M(2023)A Survey on Explainable Anomaly DetectionACM Transactions on Knowledge Discovery from Data10.1145/360933318:1(1-54)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3609333
Corbara SMoreo ASebastiani F(2023)Same or Different? Diff-Vectors for Authorship AnalysisACM Transactions on Knowledge Discovery from Data10.1145/360922618:1(1-36)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3609226
Huang JLi PHuang RChen NZhang A(2023)Revisiting the Role of Heterophily in Graph Representation Learning: An Edge Classification PerspectiveACM Transactions on Knowledge Discovery from Data10.1145/360337818:1(1-17)Online publication date: 6-Sep-2023
https://dl.acm.org/doi/10.1145/3603378
Baazizi MColazzo DGhelli GSartiani CScherzinger S(2023)Negation-closure for JSON SchemaTheoretical Computer Science10.1016/j.tcs.2023.113823955:COnline publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.tcs.2023.113823
Datta ADevismes SJohnen CLarmore L(2023)Analysis of a memory-efficient self-stabilizing BFS spanning tree constructionTheoretical Computer Science10.1016/j.tcs.2023.113804955:COnline publication date: 1-May-2023
https://dl.acm.org/doi/10.1016/j.tcs.2023.113804
Cicconetti CConti MPassarella A(2023)Service differentiation and fair sharing in distributed quantum computingPervasive and Mobile Computing10.1016/j.pmcj.2023.10175890:COnline publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1016/j.pmcj.2023.101758
Coletta Lde Almeida DSouza JManzione R(2022)Novelty detection in UAV images to identify emerging threats in eucalyptus cropsComputers and Electronics in Agriculture10.1016/j.compag.2022.106901196:COnline publication date: 1-May-2022
https://dl.acm.org/doi/10.1016/j.compag.2022.106901
Yang YGuo JYe QXia YYang PUllah AMuhammad K(2021)A weighted multi-feature transfer learning framework for intelligent medical decision makingApplied Soft Computing10.1016/j.asoc.2021.107242105:COnline publication date: 1-Jul-2021
https://dl.acm.org/doi/10.1016/j.asoc.2021.107242
Song M(2020)Personalized Image Classification by Semantic Embedding and Active LearningEntropy10.3390/e2211131422:11(1314)Online publication date: 18-Nov-2020
https://doi.org/10.3390/e22111314
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents