skip to main content
research-article

Detection of data drift in a two-dimensional stream using the Kolmogorov-Smirnov test

Published: 01 January 2022 Publication History

Abstract

In recent years, there has been an increasing amount of streaming information coming from time series. Learning from data appearing in real time is quite a call, due in part to the speed at which new data appears. Hidden data changes that are not previously known to learning algorithms are referred to in the literature as data or concept drift. In classical machine learning, a classifier analyzes new data using past training instances of the data stream. However, the accuracy of the classifier deteriorates due to data drift, which occurs in non-stationary data. In such situations, the classifier must detect a significant change in the data adapt its prediction over time. The motivation of this paper is to show a method for drift detection without knowledge of instance labels. Labels are sometimes not available or periodically missing, making it difficult to apply methods where knowledge of them is required.

References

[1]
S. Agrahari, A.K. Singh, Concept drift detection in data stream mining: A literature review, Journal of King Saud University - Computer and Information Sciences (2021).
[2]
M. Baena-Garcıa, J. del Campo-Ávila, R. Fidalgo, A. Bifet, R. Gavalda, R. Morales-Bueno, Early drift detection method, in: Fourth international workshop on knowledge discovery from data streams, 2006, pp. 77–86.
[3]
G. Fasano, A. Franceschini, A multidimensional version of the Kolmogorov–Smirnov test, Monthly Notices of the Royal Astronomical Society 225 (1987) 155–170.
[4]
J. Gama, P. Medas, G. Castillo, P. Rodrigues, Learning with drift detection, in: SBIA Brazilian Symposium on Artificial Intelligence, Springer Verlag, 2004, pp. 286–295.
[5]
J. Gama, A. Žliobaite, Bifet, M. Pechenizkiy, A. Bouchachia, A survey on concept drift adaptation, ACM Comput. Surv. 46 (2014).
[6]
S. Janardan, Mehta, Concept drift in streaming data classification: Algorithms, platforms and issues, Procedia Computer Science 122 (2017) 804–811. 5th International Conference on Information Technology and Quantitative Management, ITQM 2017.
[7]
J. Lu, A. Liu, F. Dong, F. Gu, J. Gama, G. Zhang, Learning under concept drift: A review, IEEE Transactions on Knowledge and Data Engineering PP (2018) 1. 1.
[8]
J. López Lobo, J. Del Ser, I. Laña, N. Bilbao, N. Kasabov, Drift detection over non-stationary data streams using evolving spiking neural networks, in: International Symposium on Intelligent and Distributed Computing, 2018, pp. 82–94.
[9]
L.L. Minku, A.P. White, X. Yao, The impact of diversity on online ensemble learning in the presence of concept drift, IEEE Transactions on Knowledge and Data Engineering 22 (2010) 730–742.
[10]
J.A. Peacock, Two-dimensional goodness-of-fit testing in astronomy, Monthly Notices of the Royal Astronomical Society 202 (1983) 615–627.
[12]
Pesaranghader, A., Viktor, H., Paquet, E., 2018. Mcdiarmid drift detection methods for evolving data streams. arXiv:1710.02030.
[13]
P. Porwik, R. Doroz, Adaptation of the idea of concept drift to some behavioral biometrics: Preliminary studies, Engineering Applications of Artificial Intelligence 99 (2021).
[14]
W.H. Press, S.A. Teukolsky, A fast algorithm for two-dimensional kolmogorov–smirnov two sample tests, Computers in Physics (1988) 2–74.
[15]
R. Simard, P. L'Ecuyer, Computing the two-sided kolmogorov-smirnov distribution, Journal of Statistical Software 39 (2011) 1–18.
[16]
Y. Xiao, A fast algorithm for two-dimensional kolmogorov–smirnov two sample tests, Computational Statistics Data Analysis 105 (2017) 53–58.
[17]
X. Zheng, P. Li, X. Hu, K. Yu, Semi-supervised classification on data streams with recurring concept drift and concept evolution, Knowledge-Based Systems 215 (2021).

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Procedia Computer Science
Procedia Computer Science  Volume 207, Issue C
2022
4695 pages
ISSN:1877-0509
EISSN:1877-0509
Issue’s Table of Contents

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2022

Author Tags

  1. Kolmogorov-Smirnov test
  2. classifiers
  3. data drift

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media