skip to main content
research-article

Online frame-based clustering with unknown number of clusters

Published: 01 September 2016 Publication History

Abstract

This paper presents an online frame-based clustering algorithm (OFC) for unsupervised classification applications in which data are received in a streaming manner as time passes by with the number of clusters being unknown. This algorithm consists of a number of steps including density-based outlier removal, new cluster generation, and cluster update. It is designed for applications when data samples are received in an online manner in frames. Such frames are first passed through an outlier removal step to generate denoised frames with consistent data samples during transitions times between clusters. A classification step is then applied to find whether frames belong to any of existing clusters. When frames do not get matched to any of existing clusters and certain criteria are met, a new cluster is created in real time and in an on-the-fly manner by using support vector domain descriptors. Experiments involving four synthetic and two real datasets are conducted to show the performance of the introduced clustering algorithm in terms of cluster purity and normalized mutual information. Comparison results with similar clustering algorithms designed for streaming data are also reported exhibiting the effectiveness of the introduced online frame-based clustering algorithm. Online frame-based clustering algorithm without having any knowledge of number of clusters.For applications when samples of a class appear in streaming frames.Superior to existing algorithms applicable to online frame-based clustering.

References

[1]
C.D. Wang, J.-H. Lai, Energy based competitive learning, Neurocomputing, 74 (2011) 2265-2275.
[2]
A. Rodriguez, A. Laio, Clustering by fast search and find of density peaks, Science, 344 (2014) 1492-1496.
[3]
J.A. Hartigan, M.A. Wong, Algorithm AS136: a K-means clustering algorithm, Appl. Stat., 28 (1979) 100-108.
[4]
M. Filippone, F. Camastra, F. Masulli, S. Rovetta, A survey of kernel and spectral methods for clustering, Pattern Recognit., 41 (2008) 176-190.
[5]
G. Tzortzis, A. Likas, The MinMax k-Means clustering algorithm, Pattern Recognit., 47 (2014) 2505-2516.
[6]
A. Jain, Data clustering: 50 years beyond K-means, Pattern Recognit. Lett., 31 (2010) 651-666.
[7]
C.C. Aggarwal, J. Han, J. Wang, P.S. Yu, A framework for clustering evolving data streams, In: Proceedings of the 29th International Conference on Very Large Data Bases, vol. 29, VLDB Endowment, 2003, pp. 81-92.
[8]
F. Cao, M. Ester, W. Qian,¿A. Zhou, Density-based clustering over an evolving data stream with noise, In: Proceedings of the 6th SIAM International Conference on Data Mining, 2006, pp. 328-339.
[9]
P. Zhang, X. Zhu, J. Tan,¿L. Guo, Classifier and cluster ensembles for mining concept drifting data streams, In: Proceedings of the 10th International Conference on Data Mining, 2010, pp. 1175-1180.
[10]
Z. Zhou, W. Zheng, J. Hu, Y. Xu, J. You, One-pass online learning: a local approach, Pattern Recognit., 51 (2016) 346-357.
[11]
S. Guha, A. Meyerson, N. Mishra, R. Motwani, L. O'Callaghan, Clustering data streams: theory and practice, IEEE Trans. Knowl. Data Eng., 15 (2003) 515-528.
[12]
S. Guha, N. Mishra, R. Motwani,¿L. O'Callaghan, Clustering data streams, In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, FOCS, 2000, pp. 359-366.
[13]
C.D. Wang, J.H. Lai,¿J.Y. Zhu, A conscience on-line learning approach for kernel-based clustering, In: Proceedings of the 10th International Conference on Data Mining, 2010, pp. 531-540.
[14]
P. Patil, Y. Fatangare, P. Kulkarni, Semi-supervised learning algorithm for online electricity data streams, Adv. Intell. Syst. Comput., 324 (2015) 349-358.
[15]
V. Bhatnagar, S. Kaur, S., S. Chakravarthy, Clustering data streams using grid-based synopsis, Knowl. Inf. Syst., 41 (2014) 127-152.
[16]
B. Babcock, M. Datar,¿R.M.L. O'Callaghan, Maintaining variance and k-medians over data stream windows, In: Proceedings of the 22nd ACM Symposium on Principles of Database Systems, 2003, pp. 234-243.
[17]
C.C. Aggarwal, J. Han, J. Wang, P.S. Yu, On high dimensional projected clustering of data streams, Data Min. Knowl. Discov., 10 (2005) 251-273.
[18]
C.C. Aggarwal, J. Han, J. Wang,¿P.S. Yu, A framework for projected clustering of high dimensional data streams, In: Proceedings of the 30th International Conference on Very Large Data Bases, vol.30, VLDB Endowment, 2004, pp. 852-863.
[19]
Y. Chen, L. Tu, Density-based clustering for real-time stream data, In: Proceedings of the 13th ACM SIGKDD, International Conference on Knowledge Discovery and Data Mining, 2007, pp. 133-142.
[20]
M. Ester, H.-P. Kriegel, J. Sander,¿X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, 1996, pp. 226-231.
[21]
C.D. Wang, J.H. Lai, Position regularized support vector domain description, Pattern Recognit., 46 (2013) 875-884.
[22]
L. Tu, Y. Chen, Stream data clustering based on grid density and attraction, ACM Trans. Knowl. Discov. Data, 3 (2009) 1-27.
[23]
S. Luhr, M. Lazarescu, Incremental clustering of dynamic data streams using connectivity based representative points, Data Knowl. Eng., 68 (2009) 1-27.
[24]
C.D. Wang, J.H. Lai, D. Huang, W.-Si Zheng, SVStream: a support vector based algorithm for clustering data streams, IEEE Trans. Knowl. Data Eng., 25 (2013) 1410-1424.
[25]
A. Ben-Hur, D. Horn, H.T. Siegelmann, V. Vapnik, Support vector clustering, J. Mach. Learn. Res., 2 (2002) 125-137.
[26]
M. Ankerst, M. Breunig, H.P. Kriegel,¿J. Sander, Optics: ordering points to identify the clustering structure, In: Proceedings of the International Conference on Management of Data, 1999, pp. 49-60.
[27]
L. Tarassenko, P. Hayton, M. Brady, Novelty detection for the identification of masses in mammograms, In: Proceedings of the 4th International Conference on Artificial Neural Networks, vol. 4, 1995, pp. 442-447.
[28]
L. Parra, G. Deco, S. Miesbach, Statistical independence and novelty detection with information preserving nonlinear maps, Neural Comput., 8 (1996) 260-269.
[29]
D.M.J. Tax, One-class classification (Ph.D thesis), Delft University of Technology, 2001. {http://ict.ewi.tudelft.nl/davidt/thesis.pdf}
[30]
V. Vapnik, Wiley, New York, 1998.
[31]
M. Moya, D. Hush, Network constraints and multi-objective optimization for one-class classification, Neural Netw., 9 (1996) 463-474.
[32]
D.M. Tax, R.P. Duin, Support vector domain description, Pattern Recognit. Lett., 20 (1999) 1191-1199.
[33]
D.Y. Yeung, Y. Ding, Host-based intrusion detection using dynamic and static behavioral models, Pattern Recognit., 36 (2003) 229-243.
[34]
G.J. Ross, N.M. Adams, D.K. Tasoulis, D.J. Hand, Exponentially weighted moving average charts for detecting concept drift, Pattern Recognit. Lett., 33 (2012) 191-198.
[35]
X. Zhang, C. Furtlehner, J. Perez, C. Germain-Renaud,¿M. Sebag, Toward autonomic grids: analyzing the job flow with affinity streaming, In: Proceedings of the 15th International Conference on Knowledge Discovery and Data Mining, 2009, pp. 987-996.
[36]
X. Zhang, {http://mathworks.com/matlabcentral/fileexchange/41459}.
[37]
S. Hettich, S.D. Bay, The UCI KDD Archive, Department of Information and Computer Science, University of California, Irvine, CA, 1999. {http://kdd.ics.uci.edu}
[38]
F. van der Heijden, R.P.W. Duin, D. de Ridder, D.M.J. Tax, Wiley, 2004.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition
Pattern Recognition  Volume 57, Issue C
September 2016
198 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 September 2016

Author Tags

  1. Clustering with unknown number of clusters
  2. Frame-based clustering
  3. Online clustering for streaming data

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media