skip to main content
article

A Global Clustering Approach Using Hybrid Optimization for Incomplete Data Based on Interval Reconstruction of Missing Value

Published: 01 April 2016 Publication History

Abstract

Incomplete data clustering is often encountered in practice. Here the treatment of missing attribute value and the optimization procedure of clustering are the important factors impacting the clustering performance. In this study, a missing attribute value becomes an information granule and is represented as a certain interval. To avoid intervals determined by different cluster information, we propose a congeneric nearest-neighbor rule-based architecture of the preclassification result, which can improve the effectiveness of estimation of missing attribute interval. Furthermore, a global fuzzy clustering approach using particle swarm optimization assisted by the Fuzzy C-Means is proposed. A novel encoding scheme where particles are composed of the cluster prototypes and the missing attribute values is considered in the optimization procedure. The proposed approach improves the accuracy of clustering results, moreover, the missing attribute imputation can be implemented at the same time. The experimental results of several UCI data sets show the efficiency of the proposed approach.

References

[1]
Lin KP, Chen MS. On the design and analysis of the privacy-preserving SVM classifier. IEEE Trans Knowl Data Eng 2011 ;Volume 23: pp.1704-1717.
[2]
Lee CH. A Hellinger-based importance measure of association rules for classification learning. Int J Intell Syst 2014 ;Volume 29: pp.807-822.
[3]
Zhang HG, Liu JH, Ma DZ, Wang ZS. Data-core-based fuzzy min-max neural network for pattern classification. IEEE Trans Neural Netw 2011 ;Volume 22: pp.2339-2352.
[4]
Bezdek JC. Pattern recognition with fuzzy objective function algorithms. New York: Plenum Press; 1981.
[5]
Hathaway RJ, Bezdek JC, Hu Y. Generalized fuzzy c-means clustering strategies using Lp norm distances. IEEE Trans Fuzzy Syst 2000 ;Volume 8: pp.576-582.
[6]
Fu Q, Wang Z, Jiang Q. Delineating soil nutrient management zones based on fuzzy clustering optimized by PSO. Math Comput Modelling 2010 ;Volume 51: pp.1299-1305.
[7]
Grzymala-Busse JW, Grzymala-Busse WJ, Goodwin LK. A comparison of three closest fit approaches to missing attribute values in preterm birth data. Int J Intell Syst 2002 ;Volume 17: pp.125-134.
[8]
Li D, Gu H, Zhang LY. Fuzzy c-means clustering algorithm based on attribute weighted for incomplete data. J Dalian Univ Technol 2012 ;Volume 52: pp.749-754.
[9]
Hathaway RJ, Bezdek JC. Fuzzy c-means clustering of incomplete data. IEEE Trans Syst Man Cybern B 2001 ;Volume 31: pp.735-744.
[10]
Dixon JK. Pattern recognition with partly missing data. IEEE Trans Syst Man Cybern 1979 ;Volume 9: pp.617-621.
[11]
Nuovo AG. Missing data analysis with fuzzy c-means: a study of its application in a psychological scenario. Expert Syst Appl 2011 ;Volume 38: pp.6793-6797.
[12]
Aydilek IB, Arslan A. A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm. Inform Sci 2013 ;Volume 233: pp.25-35.
[13]
Do CB, Batzoglou S. What is the expectation maximization algorithm? Nat Biotechnol 2008 ;Volume 26: pp.897-899.
[14]
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 1977 ;Volume 39: pp.1-38.
[15]
Ghahramani Z, Jordan M. Supervised learning from incomplete data via an EM approach. Adv Neural Inform Proc Syst 1994 ;Volume 6: pp.120-127.
[16]
Abas AR. Using general regression with local tuning for learning mixture models from incomplete data sets. Egypt Inform J 2010 ;Volume 11: pp.49-57.
[17]
Abas AR. Unsupervised learning of mixture models based on swarm intelligence and neural networks with optimal completion using incomplete data. Egypt Inform J 2012 ;Volume 13: pp.103-109.
[18]
Lin HC, Su CT. A selective Bayes classifier with meta-heuristics for incomplete data. Neurocomputing 2013 ;Volume 106: pp.95-102.
[19]
Wang Y, Stoica P, Li J, Marzetta TL. Nonparametric spectral analysis with missing data via the EM algorithm. Digit Signal Proc 2005 ;Volume 15: pp.191-206.
[20]
Feng XD, Wu S, Liu YC. Imputing missing values for mixed numeric and categorical attributes based on incomplete data hierarchical clustering. Lect Notes Comput Sci 2011 ;Volume 7091: pp.414-424.
[21]
Subasi MM, Subasi E, Anthony M, Hammer PL. A new imputation method for incomplete binary data. Disc Appl Math 2011 ;Volume 159: pp.1040-1047.
[22]
Ravi V, Krishna M. A new online data imputation method based on general regression auto associative neural network. Neurocomputing 2014 ;Volume 138: pp.106-113.
[23]
Hathaway RJ, Bezdek JC. Clustering incomplete relational data using the non-Euclidean relational fuzzy c-means algorithm. Pattern Recognit Lett 2002 ;Volume 23: pp.151-160.
[24]
Himmelspach L, Conrad S. Fuzzy clustering of incomplete data based on cluster dispersion. In: Computer Intelligence for Knowledge-Based System Design; Lecture Notes in Computer Sciences, Vol Volume 6178. Berlin: Springer-Verlag; 2010. pp pp.59-68.
[25]
Li D, Zhong CQ, Zhang LY. Fuzzy c-means clustering of partially missing data sets based on statistical representation. In: Proc 7th Int Conf on Fuzzy System Knowledge Discovery, Yantai, People's Republic of China; August 10-12, 2010. Vol Volume 1, pp pp.460-464.
[26]
Zhang SC, Jin Z, Zhu X. Missing data imputation by utilizing information within incomplete instances. J Syst Softw 2011 ;Volume 84: pp.452-459.
[27]
Deng T, Wang X. An object-parameter approach to predicting unknown data in incomplete fuzzy soft sets. Appl Math Modelling 2013 ;Volume 37: pp.4139-4146.
[28]
Hathaway RJ, Bezdek JC, Pedrycz W. A parametric model for fusing heterogeneous fuzzy data. IEEE Trans Fuzzy Syst 1996 ;Volume 4: pp.270-281.
[29]
Doquire G, Verleysen M. Feature selection with missing data using mutual information estimators. Neurocomputing 2012 ;Volume 90: pp.3-11.
[30]
Van HJ, Khoshgoftaar TM. Incomplete-case nearest neighbor imputation in software measurement data. Inform Sci 2014 ;Volume 259: pp.596-610.
[31]
Zhang SC. Shell-neighbor method and its application in missing data imputation. Appl Intell 2011 ;Volume 35: pp.123-133.
[32]
Li D, Gu H, Zhang LY. A fuzzy c-means clustering algorithm based on nearest-neighbor intervals for incomplete data. Expert Syst Appl 2010 ;Volume 37: pp.6942-6947.
[33]
Tan KS, Lim WH, Isa NAM. Novel initialization scheme for fuzzy c-means algorithm on color image segmentation. Appl Soft Comput 2013 ;Volume 13: pp.1832-1852.
[34]
Yin XS, Shu T, Huang Q. Semi-supervised fuzzy clustering with metric learning and entropy regularization. Knowl-Based Syst 2012 ;Volume 35: pp.304-311.
[35]
Kennedy J, Eberhart R. Particle swarm optimization. In: Proc IEEE Int Conf on Neural Networks, Perth, Australia; November 27-December 1, 1995. pp pp.1942-1948.
[36]
Han F, Zhu JS. Improved particle swarm optimization combined with backpropagation for feedforward neural networks. Int J Intell Syst 2013 ;Volume 28: pp.271-288.
[37]
Acuna E, Rodriguez C. The treatment of missing values and its effect in the classifier accuracy. In: Classification, Clustering and Data Mining Applications. Heidelberg, Germany: Springer-Verlag; 2004. pp pp.639-647.
[38]
Huang KY. A hybrid particle swarm optimization approach for clustering and classification of datasets. Knowl-Based Syst 2011 ;Volume 24: pp.420-426.
[39]
Rahman MA, Islam MZ. A hybrid clustering technique combining a novel genetic algorithm with k-means. Knowl-Based Syst 2014 ;Volume 71: pp.345-365.
[40]
Farahmand H, Rashidinejad M, Mousavi A, Gharaveisi AA, Irving MR, Taylor GA. Hybrid mutation particle swarm optimization method for available transfer capability enhancement. Int J Elect Power Energy Syst 2012 ;Volume 42: pp.240-249.
[41]
Zhang L, Bing ZH, Zhang LY. A hybrid clustering algorithm based on missing attribute interval estimation for incomplete data. Pattern Anal Appl 2015 ;Volume 18: pp.377-384.
[42]
Clerc M, Kennedy J. The particle swarm-explosion, stability, and convergence in a multidimensional complex space. IEEE Trans Evol Comput 2002 ;Volume 6: pp.58-73.
[43]
Trelea IC. The particle swarm optimization algorithm: convergence analysis and parameter selection. Inform Process Lett 2003 ;Volume 85: pp.317-325.
[44]
Pimentel BA, Souza RMCR. A multivariate fuzzy c-means method. Appl Soft Comput 2013 ;Volume 13: pp.1592-1607.
[45]
Kolen JF, Hutcheson T. Reducing the time complexity of the fuzzy c-means algorithm. IEEE Trans Fuzzy Syst 2002 ;Volume 10: pp.263-267.
[46]
Zhang LY, Pedrycz W, Lu W, Liu XD, Zhang L. An interval weighed fuzzy c-means clustering by genetically guided alternating optimization. Expert Syst Appl 2014 ;Volume 41: pp.5960-5971.
[47]
Frank A, Asuncion A. UCI Machine Learning Repository. Available at "http://archive.ics.uci.edu/ml"; 2015.

Cited By

View all
  1. A Global Clustering Approach Using Hybrid Optimization for Incomplete Data Based on Interval Reconstruction of Missing Value

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image International Journal of Intelligent Systems
      International Journal of Intelligent Systems  Volume 31, Issue 4
      April 2016
      104 pages

      Publisher

      John Wiley and Sons Ltd.

      United Kingdom

      Publication History

      Published: 01 April 2016

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 27 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media