article

Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm

Authors:

Chieh-Yuan Tsai,

Chuang-Cheng ChiuAuthors Info & Claims

Computational Statistics & Data Analysis, Volume 52, Issue 10

Pages 4658 - 4672

https://doi.org/10.1016/j.csda.2008.03.002

Published: 01 June 2008 Publication History

Abstract

K-means is one of the most popular and widespread partitioning clustering algorithms due to its superior scalability and efficiency. Typically, the K-means algorithm treats all features fairly and sets weights of all features equally when evaluating dissimilarity. However, a meaningful clustering phenomenon often occurs in a subspace defined by a specific subset of all features. To address this issue, this paper proposes a novel feature weight self-adjustment (FWSA) mechanism embedded into K-means in order to improve the clustering quality of K-means. In the FWSA mechanism, finding feature weights is modeled as an optimization problem to simultaneously minimize the separations within clusters and maximize the separations between clusters. With this objective, the adjustment margin of a feature weight can be derived based on the importance of the feature to the clustering quality. At each iteration in K-means, all feature weights are adaptively updated by adding their respective adjustment margins. A number of synthetic and real data are experimented on to show the benefits of the proposed FWAS mechanism. In addition, when compared to a recent similar feature weighting work, the proposed mechanism illustrates several advantages in both the theoretical and experimental results.

References

[1]

Introduction to Machine Learning. The MIT Press, Cambridge.

[2]

Training a 3-node neural networks is NP-complete. Neural Netw. v5. 117-127.

Digital Library

[3]

A variable-selection heuristic for K-means clustering. Psychometrika. v66. 249-270.

[4]

An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognit. v37. 943-952.

[5]

Dash, M., Choi, K., Scheuermann, P., Liu, H., 2002. Feature selection for clustering - a filter solution. In: Proceedings of the 2002 IEEE International Conference on Data Mining, Maebashi, Japan, pp. 115-122

Digital Library

[6]

Optimal variable weighting for ultrametric and additive tree clustering. Quality Quantity. v20. 169-180.

[7]

OVWTRE: A program for optimal variable weighting for ultrametric and additive tree fitting. J. Classif. v5. 101-104.

[8]

Synthesized clustering: A method for amalgamating clustering bases with differential weighting variables. Psychometrika. v49. 57-78.

[9]

Devaney, M., Ram, A., 1997. Efficient feature selection in conceptual clustering. In: Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, pp. 92-97

Digital Library

[10]

Locally adaptive metrics for clustering high dimensional data. Data Min. Knowl. Discov. v14. 63-97.

Digital Library

[11]

Dy, J.G., Brodley, C.E., 2000. Feature subset selection and order identification for unsupervised learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, pp. 247-254

Digital Library

[12]

Fayyad, U., Reina, C., Bradley, P.S, 1998. Initialization of iterative refinement clustering algorithms. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York, pp. 194-198

[13]

Cluster analyses of multivariate data: Efficiency versus interpretability of classifications. Biometrics. v21. 768-769.

[14]

Variable selection in clustering. J. Classif. v5. 205-228.

[15]

Clustering objects on subsets of attributes. J. R. Stat. Soc. Ser. B-Stat. Methodol. v66. 1-25.

[16]

Weighting and selection of variables for cluster analysis. J. Classif. v12. 113-136.

[17]

Optimising K-means clustering results with standard software packages. Comput. Statist. Data Anal. v49. 969-973.

Digital Library

[18]

J-Means: A new local search heuristic for minimum sum of squares clustering. Pattern Recognit. v34. 405-413.

[19]

Clustering Algorithms. Wiley, New York.

[20]

Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data. Comput. Statist. Data Anal. v51. 641-658.

Digital Library

[21]

Introduction to Operation Research. McGraw-Hill, New York.

[22]

Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. v27. 657-668.

Digital Library

[23]

Comparing partitions. J. Classif. v2. 193-218.

[24]

An entropy weighting k-Means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. v19. 1026-1041.

Digital Library

[25]

Kim, Y., Street, W.N., Menczer, F., 2000. Feature selection in unsupervised learning via evolutionary search. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, Boston, pp. 365-369

Digital Library

[26]

Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. v26. 1154-1166.

Digital Library

[27]

Li, C., Yu, J., 2006. A novel duzzy c-Means clustering algorithm. In: Proceedings of the First International Conference on Rough Set and Knowledge Technology, Chongquing, China, pp. 510-515

Digital Library

[28]

Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. v17. 491-502.

Digital Library

[29]

McQueen, J., 1967. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297

[30]

Optimal variable weighting for ultrametric and additive trees and K-means partitioning methods and software. J. Classif. v18. 245-271.

[31]

Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. v24. 301-312.

Digital Library

[32]

The EM Algorithm and Extensions. Wiley, New York.

[33]

Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. v24. 301-312.

Digital Library

[34]

Feature weighting in K-means clustering. Mach. Learn. v52. 217-237.

Digital Library

[35]

Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J., 1998. UCI Repository of Machine Learning Databases. URL (http://www.ics.uci.edu/~mlearn /MLSummary.html)

[36]

Validity index for crisp and fuzzy clusters. Pattern Recognit. v37. 487-501.

[37]

Pelleg, D., Moore, A.W., 2000. X-means: Extending K-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, pp. 727-734

Digital Library

[38]

Computational Methods for Optimization. Academic Press, New York.

[39]

Variable selection for model-based clustering. J. Am. Stat. Assoc. v101. 168-178.

[40]

Introduction to Data Mining. Addison-Wesley, Boston.

Digital Library

[41]

Statistical Pattern Recognition. John Wiley & Sons, New Jersey.

[42]

A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif. Intell. Rev. v11. 1-5.

Digital Library

[43]

Clustering and Information Retrieval. Kluwer Academic Publisher, Netherlands.

[44]

General C-means clustering model. IEEE Trans. Pattern Anal. Mach. Intell. v27. 1197-1211.

Digital Library

[45]

Zhang, B., Hsu, M., Dayal, U., 1999. K-harmonic means-a data clustering algorithm. Technical Report HPL-1999-124, Hewlett Packard Laboratories, Oct. 29 1999

Cited By

Hajirahimi ZKhashei M(2023)Weighting Approaches in Data Mining and Knowledge Discovery: A ReviewNeural Processing Letters10.1007/s11063-023-11332-y55:8(10393-10438)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s11063-023-11332-y
Zamzami NBouguila N(2022)A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count dataPattern Analysis & Applications10.1007/s10044-022-01094-z26:1(91-106)Online publication date: 27-Jul-2022
https://dl.acm.org/doi/10.1007/s10044-022-01094-z
Pan XWang LHuang CWang SChen H(2021)A novel weighted fuzzy c-means based on feature weight learningJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20277941:6(6149-6167)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.3233/JIFS-202779
Show More Cited By

Index Terms

Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Recommendations

Hybrid Bisect K-Means Clustering Algorithm
BCGIN '11: Proceedings of the 2011 International Conference on Business Computing and Global Informatization

In this paper, we present a hybrid clustering algorithm that combines divisive and agglomerative hierarchical clustering algorithm. Our method uses bisect K-means for divisive clustering algorithm and Unweighted Pair Group Method with Arithmetic Mean (...
Ant clustering algorithm with K-harmonic means clustering

Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...
Improved K-Means Clustering Algorithm
CISP '08: Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 5 - Volume 05

K-means algorithm is widely used in spatial clustering. It takes the mean value of each cluster centroid as the Heuristic information, so it has some disadvantages: sensitive to the initial centroid and instability. The improved clustering algorithm ...

Comments

Information & Contributors

Information

Published In

cover image Computational Statistics & Data Analysis

Computational Statistics & Data Analysis Volume 52, Issue 10

June, 2008

314 pages

ISSN:0167-9473

Issue’s Table of Contents

Copyright © Elsevier B.V. © 2008.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 June 2008

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

31
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hajirahimi ZKhashei M(2023)Weighting Approaches in Data Mining and Knowledge Discovery: A ReviewNeural Processing Letters10.1007/s11063-023-11332-y55:8(10393-10438)Online publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1007/s11063-023-11332-y
Zamzami NBouguila N(2022)A novel minorization–maximization framework for simultaneous feature selection and clustering of high-dimensional count dataPattern Analysis & Applications10.1007/s10044-022-01094-z26:1(91-106)Online publication date: 27-Jul-2022
https://dl.acm.org/doi/10.1007/s10044-022-01094-z
Pan XWang LHuang CWang SChen H(2021)A novel weighted fuzzy c-means based on feature weight learningJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20277941:6(6149-6167)Online publication date: 1-Jan-2021
https://dl.acm.org/doi/10.3233/JIFS-202779
Pan XWang S(2020)Feature reduction fuzzy C-Means algorithm leveraging the marginal kurtosis measureJournal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology10.3233/JIFS-20071439:5(7259-7279)Online publication date: 1-Jan-2020
https://dl.acm.org/doi/10.3233/JIFS-200714
Li XMa LHe XXiong H(2020)You Are How You Behave – Spatiotemporal Representation Learning for College Student Academic AchievementJournal of Computer Science and Technology10.1007/s11390-020-9971-x35:2(353-367)Online publication date: 1-Mar-2020
https://dl.acm.org/doi/10.1007/s11390-020-9971-x
Hancer EXue BZhang M(2020)A survey on feature selection approaches for clusteringArtificial Intelligence Review10.1007/s10462-019-09800-w53:6(4519-4545)Online publication date: 1-Aug-2020
https://dl.acm.org/doi/10.1007/s10462-019-09800-w
Rizo Rodríguez Sde Assis Tenório de Carvalho F(2019)A new fuzzy clustering algorithm for interval-valued data based on City-Block distance2019 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)10.1109/FUZZ-IEEE.2019.8859017(1-6)Online publication date: 23-Jun-2019
https://dl.acm.org/doi/10.1109/FUZZ-IEEE.2019.8859017
Liu JGuo YLi DWang ZXu Y(2019)Kernel-based MinMax clustering methods with kernelization of the metric and auto-tuning hyper-parametersNeurocomputing10.1016/j.neucom.2019.05.056359:C(173-184)Online publication date: 24-Sep-2019
https://dl.acm.org/doi/10.1016/j.neucom.2019.05.056
Cavalcanti NFerreira MTenorio de Carvalho F(2019)Adaptive- Batch Neural GasArtificial Neural Networks and Machine Learning – ICANN 2019: Deep Learning10.1007/978-3-030-30484-3_7(84-95)Online publication date: 17-Sep-2019
https://dl.acm.org/doi/10.1007/978-3-030-30484-3_7
Hosny MHinti LAl-Malak S(2018)A co-evolutionary framework for adaptive multidimensional data clusteringIntelligent Data Analysis10.3233/IDA-16322222:1(77-101)Online publication date: 1-Jan-2018
https://dl.acm.org/doi/10.3233/IDA-163222
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents