skip to main content
research-article

Bayesian Nonparametric Unsupervised Concept Drift Detection for Data Stream Mining

Published: 13 November 2020 Publication History

Abstract

Online data stream mining is of great significance in practice because of its ubiquity in many real-world scenarios, especially in the big data era. Traditional data mining algorithms cannot be directly applied to data streams due to (1) the possible change of underlying data distribution over time (i.e., concept drift) and (2) delayed, short, or even no labels for streaming data in practice. A new research area, named unsupervised concept drift detection, has emerged to tackle this difficulty mainly based on two-sample hypothesis tests, such as the Kolmogorov–Smirnov test. However, it is surprising that none of the existing methods in this area exploit the Bayesian nonparametric hypothesis test, which has clear interpretability and straightforward prior knowledge encoding ability and no strict or unrealistic requirement of prefixing the form for the underlying data distribution. In this article, we present a Bayesian nonparametric unsupervised concept drift detection method based on the Polya tree hypothesis test. The basic idea is to decompose the underlying data distribution into a multi-resolution representation that transforms the whole distribution hypothesis test into recursive and simple binomial tests. Also, an incremental mechanism is especially designed to improve its efficiency in the stream setting. The method effectively detect drifts, and it also locates where a drift happens and the posteriors of hypotheses. The experiments on synthetic data verify the desired properties of the proposed method, and the experiments on real-world data show the better performance of the method for data stream mining compared with its frequentist counterpart in the literature.

References

[1]
Manuel Baena-García, José del Campo-Ávila, Raúl Fidalgo, Albert Bifet, Ricard Gavaldà, and Rafael Morales-Bueno. 2006. Early drift detection method. In Fourth International Workshop on Knowledge Discovery from Data.
[2]
Albert Bifet and Ricard Gavaldà. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the SIAM International Conference on Data Mining (SDM’07). 443--448.
[3]
Isvani Inocencio Frías Blanco, José del Campo-Ávila, Gonzalo Ramos-Jiménez, Rafael Morales Bueno, Agustín Alejandro Ortiz Díaz, and Yailé Caballero Mota. 2015. Online and non-parametric drift detection methods based on Hoeffding’s Bounds. IEEE Trans. Knowl. Data Eng. 27, 3 (2015), 810--823.
[4]
Karsten M Borgwardt and Zoubin Ghahramani. 2009. Bayesian two-sample tests. arXiv:0906.4032. Retrieved from https://arxiv.org/abs/arXiv:0906.4032.
[5]
Corinna Cortes and Vladimir Vapnik. 1995. Support-vector networks. Mach. Learn. 20, 3 (1995), 273--297.
[6]
Roberto Souto Maior de Barros, Juan Isidro González Hidalgo, and Danilo Rafael de Lima Cabral. 2018. Wilcoxon rank sum test drift detector. Neurocomputing 275 (2018), 1954--1963.
[7]
Danilo Rafael de Lima Cabral and Roberto Souto Maior de Barros. 2018. Concept drift detection based on Fisher’s Exact test. Inf. Sci. 442–443 (2018), 220--234.
[8]
Nan Ding, Rongjing Xiang, Ian Molloy, Ninghui Li, et al. 2010. Nonparametric Bayesian matrix factorization by Power-EP. In Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS’10). 169--176.
[9]
Kjell Doksum. 1974. Tailfree and neutral random probabilities and their posterior distributions. Ann. Probab. 2, 2 (1974), 183--201.
[10]
Finale Doshi-Velez, David Pfau, Frank D. Wood, and Nicholas Roy. 2015. Bayesian nonparametric methods for partially-observable reinforcement learning. IEEE Trans. Pattern Anal. Mach. Intell. 37, 2 (2015), 394--407.
[11]
Anton Dries and Ulrich Rückert. 2009. Adaptive concept drift detection. Stat. Anal. Data Min. 2, 5–6 (2009), 311--327.
[12]
Ali Faisal, Jussi Gillberg, Gayle Leen, and Jaakko Peltonen. 2013. Transfer learning using a nonparametric sparse topic model. Neurocomputing 112 (2013), 124--137.
[13]
Thomas S. Ferguson. 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1, 2 (1973), 209--230.
[14]
Thomas S. Ferguson. 1974. Prior distributions on spaces of probability measures. Ann. Stat. 2, 4 (1974), 615--629.
[15]
Seth R. Flaxman, Daniel B. Neill, and Alexander J. Smola. 2015. Gaussian processes for independence tests with non-iid data in causal inference. ACM Trans. Intell. Syst. Technol. 7, 2, Article 22 (2015), 23 pages.
[16]
Jerome H. Friedman and Lawrence C. Rafsky. 1979. Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann. Stat. 7, 4 (1979), 697--717.
[17]
Joao Gama. 2010. Knowledge Discovery from Data Streams (1st ed.). Chapman 8 Hall/CRC.
[18]
João Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues. 2004. Learning with drift detection. In Proceedings of the 17th Brazilian Symposium on Artificial Intelligence (SBIA’04). Springer, 286--295.
[19]
João Gama, Indrė Žliobaitė, Albert Bifet, Mykola Pechenizkiy, and Abdelhamid Bouchachia. 2014. A survey on concept drift adaptation. Comput. Surv. 46, 4 (2014), 44.
[20]
Lauren A. Hannah, David M. Blei, and Warren B. Powell. 2011. Dirichlet process mixtures of generalized linear models. J. Mach. Learn. Res. 12 (2011), 1923--1953.
[21]
William L. Harkness. 1965. Properties of the extended hypergeometric distribution. Ann. Math. Stat. 36, 3 (1965), 938--945.
[22]
Chris C. Holmes, François Caron, Jim E. Griffin, and David A. Stephens. 2015. Two-sample Bayesian nonparametric hypothesis testing. Bayes. Anal. 10, 2 (2015), 297--320.
[23]
Seyyed Abbas Hosseini, Hamid R. Rabiee, Hassan Hafez, and Ali Soltani-Farani. 2014. Classifying a stream of infinite concepts: A bayesian non-parametric approach. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML/PKDD’14). Springer, 1--16.
[24]
Hanqing Hu, Mehmed Kantardzic, and Tegjyot S. Sethi. 2020. No Free Lunch Theorem for concept drift detection in streaming data classification: A review. Data Min. Knowl. Discov. 10, 2 (2020), e1327.
[25]
Elena Ikonomovska, João Gama, Raquel Sebastião, and Dejan Gjorgjevik. 2009. Regression trees from data streams with drift detection. In Discovery Science, João Gama, Vítor Santos Costa, Alípio Mário Jorge, and Pavel B. Brazdil (Eds.). Springer, Berlin, 121--135.
[26]
Imen Khamassi, Moamar Sayed-Mouchaweh, Moez Hammami, and Khaled Ghédira. 2018. Discussion and review on evolving data streams and concept drift adapting. Evolv. Syst. 9, 1 (2018), 1--23.
[27]
Daniel Kifer, Shai Ben-David, and Johannes Gehrke. 2004. Detecting change in data streams. In Proceedings of the 30th International Conference on Very Large Data Bases (VLDB’04). 180--191.
[28]
Youngin Kim and Cheong Hee Park. 2017. An efficient concept drift detection method for streaming data under limited labeling. IEICE Trans. Inf. Syst. 100, 10 (2017), 2537--2546.
[29]
Marc Lavielle and Gilles Teyssière. 2006. Detection of multiple change-points in multivariate time series. Lith. Math. J. 46, 3 (2006), 287--306.
[30]
Michael Lavine. 1992. Some aspects of Polya tree distributions for statistical modelling. Ann. Stat. 20, 3 (1992), 1222--1235.
[31]
Dawen Liang, Matthew D. Hoffman, and Daniel P. W. Ellis. 2013. Beta process sparse nonnegative matrix factorization for music. In Proceedings of the 14th International Society for Music Information Retrieval Conference (ISMIR’13). 375--380.
[32]
Shangsong Liang, Emine Yilmaz, and Evangelos Kanoulas. 2016. Dynamic clustering of streaming short documents. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). New York, NY, 995--1004.
[33]
Ning Lu, Jie Lu, Guangquan Zhang, and Ramon Lopez de Mantaras. 2016. A concept drift-tolerant case-base editing technique. Artif. Intell. 230 (2016), 108--133.
[34]
Ning Lu, Guangquan Zhang, and Jie Lu. 2014. Concept drift detection via competence models. Artif. Intell. 209 (2014), 11--28.
[35]
Edwin Lughofer, Eva Weigl, Wolfgang Heidl, Christian Eitzinger, and Thomas Radauer. 2016. Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances. Inf. Sci. 355–356 (2016), 127--151.
[36]
Li Ma. 2017. Adaptive shrinkage in Pólya tree type models. Bayes. Anal. 12, 3 (2017), 779--805.
[37]
Leandro L Minku and Xin Yao. 2012. DDD: A new ensemble approach for dealing with concept drift. IEEE Trans. Knowl. Data Eng. 24, 4 (2012), 619--633.
[38]
H. Mouss, D. Mouss, N. Mouss, and L. Sefouhi. 2004. Test of Page-Hinckley, an approach for fault detection in an agro-alimentary production system. In Proceedings of the 2004 5th Asian Control Conference, Vol. 2. 815--818.
[39]
Xin Mu, Kai Ming Ting, and Zhi-Hua Zhou. 2017. Classification under streaming emerging new classes: A solution using completely-random trees. IEEE Trans. Knowl. Data Eng. 29, 8 (2017), 1605--1618.
[40]
Denis dos Reis, Peter Flach, Stan Matwin, Gustavo Enrique de Almeida Prado Alves Batista, et al. 2016. Fast unsupervised online drift detection using incremental Kolmogorov-Smirnov test. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’16). 1545--1554.
[41]
Moamar Sayed-Mouchaweh and Edwin Lughofer. 2012. Learning in Non-stationary Environments: Methods and Applications. Springer.
[42]
Matthias Seeger. 2004. Gaussian processes for machine learning. Int. J. Neur. Syst. 14, 2 (2004), 69--106.
[43]
Tegjyot Singh Sethi and Mehmed Kantardzic. 2017. On the reliable detection of concept drift from streaming unlabeled data. Expert Syst. Appl. 82 (2017), 77--99.
[44]
Ammar Shaker and Edwin Lughofer. 2014. Self-adaptive and local strategies for a smooth treatment of drifts in data streams. Evolv. Syst. 5, 4 (2014), 239--257.
[45]
Jonathan A. Silva, Elaine R. Faria, Rodrigo C. Barros, Eduardo R. Hruschka, André C. P. L. F. de Carvalho, and João Gama. 2013. Data stream clustering: A survey. Comput. Surv. 46, 1 (2013), 13:1–13:31.
[46]
Min Wang and Guangying Liu. 2016. A simple two-sample Bayesian t-test for hypothesis testing. Am. Stat. 70, 2 (2016), 195--201.
[47]
Geoffrey I. Webb, Roy Hyde, Hong Cao, Hai Long Nguyen, and Francois Petitjean. 2016. Characterizing concept drift. Data Min. Knowl. Discov. 30, 4 (2016), 964--994.
[48]
Geoffrey I. Webb, Loong Kuan Lee, François Petitjean, and Bart Goethals. 2017. Understanding concept drift. arxiv:1704.00362. Retrieved from http://arxiv.org/abs/1704.00362.
[49]
R Wilcox. 2005. Kolmogorov–Smirnov test. Encyclopedia of Biostatistics (2005).
[50]
Wing H. Wong, Li Ma, et al. 2010. Optional Pólya tree and Bayesian inference. Ann. Stat. 38, 3 (2010), 1433--1459.
[51]
Junyu Xuan, Jie Lu, and Guangquan Zhang. 2019. A survey on Bayesian nonparametric learning. Comput. Surv. 52, 1, Article 13 (2019), 36 pages.
[52]
Junyu Xuan, Jie Lu, Guangquan Zhang, Richard Yi Xu, and Xiangfeng Luo. 2017. A Bayesian nonparametric model for multi-label learning. Mach. Learn. 106, 11 (2017), 1787--1815.
[53]
Junyu Xuan, Jie Lu, Guangquan Zhang, Richard Yida Xu, and Xiangfeng Luo. 2017. Doubly nonparametric sparse nonnegative matrix factorization based on dependent Indian buffet processes. IEEE Trans. Neur. Netw. Learn. Syst. 29, 5 (2017), 1--15.
[54]
Shujian Yu, Zubin Abraham, Heng Wang, Mohak Shah, Yantao Wei, and José C. Príncipe. 2019. Concept drift detection and adaptation with hierarchical hypothesis testing. J. Franklin Inst. 356, 5 (2019), 3187--3215.
[55]
Shujian Yu, Xiaoyang Wang, and José C. Príncipe. 2018. Request-and-reverify: Hierarchical hypothesis testing for concept drift detection with expensive labels. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 3033--3039.
[56]
Chao Zhang, Dongming Lei, Quan Yuan, Honglei Zhuang, Lance Kaplan, Shaowen Wang, and Jiawei Han. 2018. GeoBurst+: Effective and real-Time local event detection in geo-tagged tweet streams. ACM Trans. Intell. Syst. Technol. 9, 3, Article 34 (2018), 24 pages.
[57]
Morteza Zihayat, Yan Chen, and Aijun An. 2017. Memory-adaptive high utility sequential pattern mining over data streams. Mach. Learn. 106, 6 (2017), 799--836.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 12, Issue 1
Regular Papers
February 2021
280 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3436534
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 November 2020
Accepted: 01 August 2020
Revised: 01 July 2020
Received: 01 September 2019
Published in TIST Volume 12, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Bayesian nonparametric learning
  2. Data stream
  3. concept drift

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Australian Research Council (ARC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)79
  • Downloads (Last 6 weeks)7
Reflects downloads up to 15 Sep 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media