skip to main content
article

Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm

Published: 01 June 2008 Publication History

Abstract

K-means is one of the most popular and widespread partitioning clustering algorithms due to its superior scalability and efficiency. Typically, the K-means algorithm treats all features fairly and sets weights of all features equally when evaluating dissimilarity. However, a meaningful clustering phenomenon often occurs in a subspace defined by a specific subset of all features. To address this issue, this paper proposes a novel feature weight self-adjustment (FWSA) mechanism embedded into K-means in order to improve the clustering quality of K-means. In the FWSA mechanism, finding feature weights is modeled as an optimization problem to simultaneously minimize the separations within clusters and maximize the separations between clusters. With this objective, the adjustment margin of a feature weight can be derived based on the importance of the feature to the clustering quality. At each iteration in K-means, all feature weights are adaptively updated by adding their respective adjustment margins. A number of synthetic and real data are experimented on to show the benefits of the proposed FWAS mechanism. In addition, when compared to a recent similar feature weighting work, the proposed mechanism illustrates several advantages in both the theoretical and experimental results.

References

[1]
Introduction to Machine Learning. The MIT Press, Cambridge.
[2]
Training a 3-node neural networks is NP-complete. Neural Netw. v5. 117-127.
[3]
A variable-selection heuristic for K-means clustering. Psychometrika. v66. 249-270.
[4]
An optimization algorithm for clustering using weighted dissimilarity measures. Pattern Recognit. v37. 943-952.
[5]
Dash, M., Choi, K., Scheuermann, P., Liu, H., 2002. Feature selection for clustering - a filter solution. In: Proceedings of the 2002 IEEE International Conference on Data Mining, Maebashi, Japan, pp. 115-122
[6]
Optimal variable weighting for ultrametric and additive tree clustering. Quality Quantity. v20. 169-180.
[7]
OVWTRE: A program for optimal variable weighting for ultrametric and additive tree fitting. J. Classif. v5. 101-104.
[8]
Synthesized clustering: A method for amalgamating clustering bases with differential weighting variables. Psychometrika. v49. 57-78.
[9]
Devaney, M., Ram, A., 1997. Efficient feature selection in conceptual clustering. In: Proceedings of the Fourteenth International Conference on Machine Learning, Nashville, pp. 92-97
[10]
Locally adaptive metrics for clustering high dimensional data. Data Min. Knowl. Discov. v14. 63-97.
[11]
Dy, J.G., Brodley, C.E., 2000. Feature subset selection and order identification for unsupervised learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, Stanford, pp. 247-254
[12]
Fayyad, U., Reina, C., Bradley, P.S, 1998. Initialization of iterative refinement clustering algorithms. In: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, New York, pp. 194-198
[13]
Cluster analyses of multivariate data: Efficiency versus interpretability of classifications. Biometrics. v21. 768-769.
[14]
Variable selection in clustering. J. Classif. v5. 205-228.
[15]
Clustering objects on subsets of attributes. J. R. Stat. Soc. Ser. B-Stat. Methodol. v66. 1-25.
[16]
Weighting and selection of variables for cluster analysis. J. Classif. v12. 113-136.
[17]
Optimising K-means clustering results with standard software packages. Comput. Statist. Data Anal. v49. 969-973.
[18]
J-Means: A new local search heuristic for minimum sum of squares clustering. Pattern Recognit. v34. 405-413.
[19]
Clustering Algorithms. Wiley, New York.
[20]
Cluster analysis using multivariate normal mixture models to detect differential gene expression with microarray data. Comput. Statist. Data Anal. v51. 641-658.
[21]
Introduction to Operation Research. McGraw-Hill, New York.
[22]
Automated variable weighting in k-means type clustering. IEEE Trans. Pattern Anal. Mach. Intell. v27. 657-668.
[23]
Comparing partitions. J. Classif. v2. 193-218.
[24]
An entropy weighting k-Means algorithm for subspace clustering of high-dimensional sparse data. IEEE Trans. Knowl. Data Eng. v19. 1026-1041.
[25]
Kim, Y., Street, W.N., Menczer, F., 2000. Feature selection in unsupervised learning via evolutionary search. In: Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining, Boston, pp. 365-369
[26]
Simultaneous feature selection and clustering using mixture models. IEEE Trans. Pattern Anal. Mach. Intell. v26. 1154-1166.
[27]
Li, C., Yu, J., 2006. A novel duzzy c-Means clustering algorithm. In: Proceedings of the First International Conference on Rough Set and Knowledge Technology, Chongquing, China, pp. 510-515
[28]
Toward integrating feature selection algorithms for classification and clustering. IEEE Trans. Knowl. Data Eng. v17. 491-502.
[29]
McQueen, J., 1967. Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, pp. 281-297
[30]
Optimal variable weighting for ultrametric and additive trees and K-means partitioning methods and software. J. Classif. v18. 245-271.
[31]
Performance evaluation of some clustering algorithms and validity indices. IEEE Trans. Pattern Anal. Mach. Intell. v24. 301-312.
[32]
The EM Algorithm and Extensions. Wiley, New York.
[33]
Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. v24. 301-312.
[34]
Feature weighting in K-means clustering. Mach. Learn. v52. 217-237.
[35]
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J., 1998. UCI Repository of Machine Learning Databases. URL (http://www.ics.uci.edu/~mlearn /MLSummary.html)
[36]
Validity index for crisp and fuzzy clusters. Pattern Recognit. v37. 487-501.
[37]
Pelleg, D., Moore, A.W., 2000. X-means: Extending K-means with efficient estimation of the number of clusters. In: Proceedings of the 17th International Conference on Machine Learning, pp. 727-734
[38]
Computational Methods for Optimization. Academic Press, New York.
[39]
Variable selection for model-based clustering. J. Am. Stat. Assoc. v101. 168-178.
[40]
Introduction to Data Mining. Addison-Wesley, Boston.
[41]
Statistical Pattern Recognition. John Wiley & Sons, New Jersey.
[42]
A review and empirical evaluation of feature weighting methods for a class of lazy learning algorithms. Artif. Intell. Rev. v11. 1-5.
[43]
Clustering and Information Retrieval. Kluwer Academic Publisher, Netherlands.
[44]
General C-means clustering model. IEEE Trans. Pattern Anal. Mach. Intell. v27. 1197-1211.
[45]
Zhang, B., Hsu, M., Dayal, U., 1999. K-harmonic means-a data clustering algorithm. Technical Report HPL-1999-124, Hewlett Packard Laboratories, Oct. 29 1999

Cited By

View all

Index Terms

  1. Developing a feature weight self-adjustment mechanism for a K-means clustering algorithm

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Computational Statistics & Data Analysis
    Computational Statistics & Data Analysis  Volume 52, Issue 10
    June, 2008
    314 pages

    Publisher

    Elsevier Science Publishers B. V.

    Netherlands

    Publication History

    Published: 01 June 2008

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 22 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media