skip to main content
research-article

Semi-supervised learning using multiple clusterings with limited labeled data

Published: 20 September 2016 Publication History

Abstract

Supervised classification consists in learning a predictive model using a set of labeled samples. It is accepted that predictive models accuracy usually increases as more labeled samples are available. Labeled samples are generally difficult to obtain as the labeling step if often performed manually. On the contrary, unlabeled samples are easily available. As the labeling task is tedious and time consuming, users generally provide a very limited number of labeled objects. However, designing approaches able to work efficiently with a very limited number of labeled samples is highly challenging. In this context, semi-supervised approaches have been proposed to leverage from both labeled and unlabeled data.In this paper, we focus on cases where the number of labeled samples is very limited. We review and formalize eight semi-supervised learning algorithms and introduce a new method that combine supervised and unsupervised learning in order to use both labeled and unlabeled data. The main idea of this method is to produce new features derived from a first step of data clustering. These features are then used to enrich the description of the input data leading to a better use of the data distribution. The efficiency of all the methods is compared on various artificial, UCI datasets, and on the classification of a very high resolution remote sensing image. The experiments reveal that our method shows good results, especially when the number of labeled sample is very limited. It also confirms that combining labeled and unlabeled data is very useful in pattern recognition.

References

[1]
S. Anand, S. Mittal, O. Tuzel, P. Meer, Semi-supervised kernel mean shift clustering, Pattern Anal. Mach. Intell. IEEE Trans., 36 (2014) 1201-1215.
[2]
X. Ao, P. Luo, X. Ma, F. Zhuang, Q. He, Z. Shi, Z. Shen, Combining supervised and unsupervised models via unconstrained probabilistic embedding, Inf. Sci., 257 (2014) 101-114.
[3]
S. Basu, A. Banerjee, R.J. Mooney, Semi-supervised clustering by seeding, 2002.
[4]
M. Belkin, P. Niyogi, V. Sindhwani, Manifold regularization: a geometric framework for learning from labeled and unlabeled examples, J. Mach. Learn. Res., 7 (2006) 2399-2434.
[5]
K.P. Bennett, A. Demiriz, R. Maclin, Exploiting unlabeled data in ensemble methods., 2002.
[6]
A. Blum, T. Mitchell, Combining labeled and unlabeled data with co-training, 1998.
[7]
A. Bouchachia, Learning with partly labeled data., Neural Comput. Appl., 16 (2007) 267-293.
[8]
W. Cai, S. Chen, D. Zhang, A simultaneous learning framework for clustering and classification, Pattern Recognit., 42 (2009) 1248-1286.
[9]
N.V. Chawla, G.J. Karakoulas, Learning from labeled and unlabeled data an empirical study across techniques and domains., J. Artif. Intell. Res., 23 (2005) 331-366.
[10]
M. Deodhar, J. Ghosh, A framework for simultaneous co-clustering and learning from complex data, 2007.
[11]
S. Derivaux, G. Forestier, C. Wemmert, S. Lefevre, Supervised image segmentation using watershed transform, fuzzy classification and evolutionary computation, Pattern Recognit. Lett., 31 (2010) 2364-2374.
[12]
N. Durand, S. Derivaux, G. Forestier, C. Wemmert, P. Gançarski, O. Boussaid, A. Puissant, Ontology-based object recognition for remote sensing image interpretation, IEEE, 2007.
[13]
C. Eick, N. Zeidat, Z. Zhao, Supervised clustering-algorithms and benefits, 2004.
[14]
G. Forestier, S. Derivaux, C. Wemmert, P. Gancarski, An evolutionary approach for ontology driven image interpretation, Springer, 2008.
[15]
G. Forestier, P. Gancarski, C. Wemmert, Collaborative clustering with background knowledge, Data Knowl. Eng., 69 (2010) 211-228.
[16]
G. Forestier, A. Puissant, C. Wemmert, P. Gançarski, Knowledge-based region labeling for remote sensing image interpretation, Comput. Environ. Urban Syst., 36 (2012) 470-480.
[17]
G. Forestier, C. Wemmert, P. Gancarski, Towards conflict resolution in collaborative clustering, IEEE, 2010.
[18]
A.L. Fred, A.K. Jain, Combining multiple clusterings using evidence accumulation, Pattern Anal. Mach. Intell. IEEE Trans., 27 (2005) 835-850.
[19]
B. Gabrys, L. Petrakieva, Combining labelled and unlabelled data in the design of pattern classification systems, Int. J. Approx. Reason., 35 (2004) 251-273.
[20]
Z. Ghahramani, M.I. Jordan, Supervised learning from incomplete data via an em approach, Morgan Kaufmann, 1994.
[21]
S. Goldman, Y. Zhou, Enhancing supervised learning with unlabeled data, 2000.
[22]
S.T. Hadjitodorov, L.I. Kuncheva, L.P. Todorova, Moderate diversity for better cluster ensembles, Inf. Fusion, 7 (2006) 264-275.
[23]
J. Handl, J. Knowles, Cluster generators for large high-dimensional data sets with large numbers of clusters, 2005.
[24]
J.P. Hoffbeck, D.A. Landgrebe, Covariance matrix estimation and classification with limited training data., IEEE Trans. Pattern Anal. Mach. Intell., 18 (1996) 763-767.
[25]
G. Huang, S. Song, J. Gupta, C. Wu, Semi-supervised and unsupervised extreme learning machines, Cybern. IEEE Trans., 44 (2014) 2405-2417.
[26]
G. Hughes, On the mean accuracy of statistical pattern recognizers, IEEE Trans. Inf. Theory, 14 (1968) 5-63.
[27]
X. Jia, R. J.A., Cluster-space representation for hyperspectral data classification, IEEE Trans. Geosci. Remote Sens., 40 (2002) 593-598.
[28]
R. Kothari, V. Jain, Learning from labeled and unlabeled data using a minimal number of queries, IEEE Trans. Neural Netw., 14 (2003) 1496-1505.
[29]
L. Kuncheva, S.T. Hadjitodorov, Using diversity in cluster ensembles, IEEE, 2004.
[30]
L.I. Kuncheva, S.T. Hadjitodorov, L.P. Todorova, Experimental comparison of cluster ensemble methods, IEEE, 2006.
[31]
H. Liu, T. Liu, J. Wu, D. Tao, Y. Fu, Spectral ensemble clustering, ACM, 2015.
[32]
J.T. Morgan, J. Ham, M.M. Crawford, A. Henneguelle, J. Ghosh, Adaptive feature spaces for land cover classification with limited ground truth data., Int. J. Pattern Recognit. Artif. Intell., 18 (2004) 777-799.
[33]
D. Newman, S. Hettich, C. Blake, C. Merz, UCI repository of machine learning databases, 1998.
[34]
K. Nigam, A.K. McCallum, S. Thrun, T.M. Mitchell, Text classification from labeled and unlabeled documents using EM, Mach. Learn., 39 (2000) 103-134.
[35]
F. Petitjean, G. Forestier, G. Webb, A.E. Nicholson, Y. Chen, E. Keogh, Dynamic time warping averaging of time series allows faster and more accurate classification, IEEE, 2014.
[36]
R. Raina, A. Battle, H. Lee, B. Packer, A.Y. Ng, Self-taught learning: Transfer learning from unlabeled data, ACM, New York, NY, USA, 2007.
[37]
B. Raskutti, H.L. FerrÈí, A. Kowalczyk, Combining clustering and co-training to enhance text classification using unlabelled data., 2002.
[38]
M. Roy, S. Ghosh, A. Ghosh, A novel approach for change detection of remotely sensed images using semi-supervised multiple classifier system, Inf. Sci., 269 (2014) 35-47.
[39]
B.M. Shahshahani, D. Landgrebe, The effect of unlabeled samples in reducing the small sample size problem and mitigating the hughes phenomenon, IEEE Trans. Geosci. Remote Sens., 32 (1994) 1087-1095.
[40]
A. Shrivastava, V.M. Patel, R. Chellappa, Non-linear dictionary learning with partially labeled data, Pattern Recognit., 48 (2015) 3283-3292.
[41]
A. Shrivastava, S. Singh, A. Gupta, Constrained semi-supervised learning using attributes and comparative attributes, Springer Berlin Heidelberg, 2012.
[42]
A. Strehl, J. Ghosh, Cluster ensembles-a knowledge reuse framework for combining multiple partitions, J. Mach. Learn. Res., 3 (2003) 583-617.
[43]
F. Yang, X. Li, Q. Li, T. Li, Exploring the diversity in cluster ensemble generation: random sampling and random projection, Expert Syst. Appl., 41 (2014) 4844-4866.
[44]
Z. Yu, H. Chen, J. You, H.-S. Wong, J. Liu, L. Li, G. Han, Double selection based semi-supervised clustering ensemble for tumor clustering from gene expression profiles, IEEE/ACM Trans. Comput. Biol. Bioinform., 11 (2014) 727-740.
[45]
Z. Yu, L. Li, J. Liu, G. Han, Hybrid adaptive classifier ensemble, IEEE Trans. Cybern., 45 (2015) 177-190.
[46]
Z. Yu, P. Luo, J. You, H. Wong, H. Leung, S. Wu, J. Zhang, G. Han, Incremental semi-supervised clustering ensemble for high dimensional data clustering, IEEE Trans. Knowl. Data Eng., PP (2015).
[47]
X. Zhao, N. Evans, J.-L. Dugelay, A subspace co-training framework for multi-view clustering, Pattern Recognit. Lett., 41 (2014) 73-82.
[48]
Z.-H. Zhou, When semi-supervised learning meets ensemble learning, Front. Electr. Electron. Eng. China, 6 (2011) 6-16.
[49]
Z.-H. Zhou, D.-C. Zhan, Q. Yang, Semi-supervised learning with very few labeled training examples, 2007.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 361, Issue C
September 2016
162 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 20 September 2016

Author Tags

  1. Classification
  2. Pattern recognition
  3. Remote sensing
  4. Semi-supervised learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media