SNCStream: A social network-based data stream clustering algorithm

JP Barddal, HM Gomes, F Enembreck - Proceedings of the 30th annual …, 2015 - dl.acm.org
Proceedings of the 30th annual ACM symposium on applied computing, 2015dl.acm.org
Data Stream Clustering is an active area of research which requires efficient algorithms
capable of finding and updating clusters incrementally. On top of that, due to the inherent
evolving nature of data streams, it is expected that these algorithms manage to quickly adapt
to both concept drifts and the appearance and disappearance of clusters. Nevertheless,
many of the developed two-step algorithms are only capable of finding hyper-spherical
clusters and are highly dependant on parametrization. In this paper we introduce …
Data Stream Clustering is an active area of research which requires efficient algorithms capable of finding and updating clusters incrementally. On top of that, due to the inherent evolving nature of data streams, it is expected that these algorithms manage to quickly adapt to both concept drifts and the appearance and disappearance of clusters. Nevertheless, many of the developed two-step algorithms are only capable of finding hyper-spherical clusters and are highly dependant on parametrization. In this paper we introduce SNCStream, a one-step online clustering algorithm based on Social Networks Theory, which uses homophily to find non-hyper-spherical clusters. Our empirical studies show that SNCStream is able to surpass density-based algorithms in cluster quality and requires feasible amount of resources (time and memory) when compared to other algorithms.
ACM Digital Library