skip to main content
research-article

Mining Discriminative Itemsets Over Data Streams Using Efficient Sliding Window

Published: 27 June 2023 Publication History

Abstract

In this paper, we present an efficient novel method for mining discriminative itemsets over data streams using the sliding window model. Discriminative itemsets are the itemsets that are frequent in the target data stream, and their frequency in the target stream is much higher in comparison to their frequency in the rest of the streams. The problem of mining discriminative itemsets has more challenges than mining frequent itemsets, especially in the sliding window model, as during the window frame sliding, the algorithms have to deal with the combinatorial explosion of itemsets in more than one data stream, for the transactions coming in and going out of the sliding window. We propose a single scan algorithm using two novel in-memory data structures for mining discriminative itemsets in a combination of offline and online sliding windows. Offline processing is used for controlling the generation of many unpromising itemsets. Online processing is used for getting more up-to-date and accurate online answers between two offline slidings. The discovered discriminative itemsets are accurately updated in the offline sliding window periodically, and the mining process is continued in the online sliding between two periodic offline slidings. The extensive empirical analysis shows that the proposed algorithm provides efficient time and space complexities with full accuracy. The algorithm can handle large, fast-speed, and complex data streams.

References

[1]
Garofalakis M, Gehrke J, Rastogi R. Querying and mining data streams: you only get one look a tutorial, in Proceedings of the 2002 ACM SIGMOD international conference on Management of data. 2002, ACM: Madison, Wisconsin. p. 635–635.
[2]
Manku GS. Frequent itemset mining over data streams, in Data Stream Management: Processing High-Speed Data Streams, M. Garofalakis, J. Gehrke, and R. Rastogi, Editors. 2016, Springer Berlin Heidelberg: Berlin, Heidelberg. p. 209–219.
[3]
Lin Z et al. Mining discriminative items in multiple data streams World Wide Web 2010 13 4 497-522
[4]
Seyfi M. Mining discriminative items in multiple data streams with hierarchical counters approach. in Fourth International Workshop on Advanced Computational Intelligence (IWACI), 2011. 2011. IEEE.
[5]
Seyfi M, Geva S, Nayak R. Mining discriminative itemsets in data streams. in International Conference on Web Information Systems Engineering. 2014. Springer.
[6]
Seyfi M, et al. Efficient mining of discriminative itemsets, in Proceedings of the International Conference on Web Intelligence. 2017, ACM: Leipzig, Germany. p. 451–459.
[7]
Seyfi M et al. DISSparse: Efficient mining of discriminative itemsets J Inf Knowl Manag 2022 21 01 2250009
[8]
Seyfi M et al. Mining discriminative itemsets in data streams using the tilted-time window model Knowl Inf Syst 2021 2 1-30
[9]
Chang JH and Lee WS estWin: online data stream mining of recent frequent itemsets by sliding window method J Inf Sci 2005 31 2 76-90
[10]
Cheng J, Ke Y, and Ng W A survey on algorithms for mining frequent itemsets over data streams Knowl Inf Syst 2008 16 1 1-27
[11]
Li J, Dong G, and Ramamohanarao K Instance-based classification by emerging patterns Principles of data mining and knowledge discovery 2000 Springer 191-200
[12]
Chi Y, et al. Moment: Maintaining closed frequent itemsets over a stream sliding window. in Fourth IEEE International Conference on Data Mining ICDM '04. 2004.
[13]
Dong G and Bailey J Contrast data mining: concepts, algorithms, and applications 2012 Boca Raton CRC Press
[14]
Dong G, Li J. Efficient mining of emerging patterns: discovering trends and differences. in Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. 1999.
[15]
Alhammady H, Ramamohanarao K. Mining emerging patterns and classification in data streams. The Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence, 2005: p. 272–275
[16]
Bailey J and Loekito E Efficient incremental mining of contrast patterns in changing data Inf Process Lett 2010 110 3 88-92
[17]
Li J, Liu G, Wong L. Mining statistically important equivalence classes and delta-discriminative emerging patterns. in Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining. 2007. ACM.
[18]
He Z et al. Conditional discriminative pattern mining Inform Sci 2017 375 1-15
[19]
Leonardo P, Fabio V. Efficient mining of the most significant patterns with permutation testing. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018, ACM: London, United Kingdom. p. 2070–2079.
[20]
He Z et al. Mining conditional discriminative sequential patterns Inf Sci 2019 478 524-539
[21]
Manku GS, Motwani R. Approximate frequency counts over data streams. In Proceedings of the 28th international conference on Very Large Data Bases. 2002. VLDB Endowment.
[22]
Lee C-H, Lin C-R, Chen M-S. Sliding-window filtering: an efficient algorithm for incremental mining. In: Proceeding of the 10th Int’l Conference on Information and Knowledge Management. 2001.
[23]
Chi Y et al. Catch the moment: maintaining closed frequent itemsets over a data stream sliding window Knowl Inf Syst 2006 10 3 265-294
[24]
Leung CK-S, Khan QI. DSTree: a tree structure for the mining of frequent sets from data streams. In Sixth International Conference on Data Mining ICDM'06. 2006.
[25]
Li H-F and Lee S-Y Mining frequent itemsets over data streams using efficient window sliding techniques Int J Exp Syst Appl 2009 36 2 1466-1477
[26]
Tsai PS Mining frequent itemsets in data streams using the weighted sliding window model Expert Syst Appl 2009 36 9 11617-11625
[27]
Tanbeer SK et al. Sliding window-based frequent pattern mining over data streams Inf Sci 2009 179 22 3843-3865
[28]
Farzanyar Z, Kangavari M, and Cercone N Max-FISM: Mining (recently) maximal frequent itemsets over data streams using the sliding window model Comput Math Appl 2012 64 6 1706-1718
[29]
Zhang C, et al. Mining frequent itemsets over tuple-evolving data streams, in Proceedings of the 28th Annual ACM Symposium on Applied Computing. 2013, ACM: Coimbra, Portugal. p. 267–274.
[30]
Agrawal R, Srikant R. Fast algorithms for mining association rules in large databases. in Proceedings of the 20th International Conference on Very Large Data Bases VLDB. 1994.
[31]
Fournier-Viger P et al. The SPMF open-source data mining library version 2 Machine learning and knowledge discovery in databases: European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19–23, 2016, Proceedings, Part III 2016 Cham Springer International Publishing 36-40
[32]
Chunduri RK and Cherukuri AK Scalable algorithm for generation of attribute implication base using FP-growth and spark Soft Comput 2021 25 9219-9240
[33]
Rahaman MM et al. Identification of COVID-19 samples from chest X-Ray images using deep learning: a comparison of transfer learning approaches J Xray Sci Technol 2020 28 5 821-839
[34]
Chen H et al. GasHis-transformer: a multi-scale visual transformer approach for gastric histopathological image detection Pattern Recogn 2022 130
[35]
Liu W et al. CVM-cervix: a hybrid cervical pap-smear image classification framework using cnn, visual transformer and multilayer perceptron Pattern Recogn 2022 2
[36]
Zhang J et al. LCU-net: a novel low-cost U-net for environmental microorganism image segmentation Pattern Recogn 2021 115
[37]
Rahaman MM et al. DeepCervix: a deep learning-based framework for the classification of cervical cells using hybrid deep feature fusion techniques Comput Biol Med 2021 136

Index Terms

  1. Mining Discriminative Itemsets Over Data Streams Using Efficient Sliding Window
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image SN Computer Science
          SN Computer Science  Volume 4, Issue 5
          Jun 2023
          3596 pages

          Publisher

          Springer-Verlag

          Berlin, Heidelberg

          Publication History

          Published: 27 June 2023
          Accepted: 04 May 2023
          Received: 20 August 2022

          Author Tags

          1. Data stream mining
          2. Discriminative itemsets
          3. Prefix tree
          4. Sliding window model

          Qualifiers

          • Research-article

          Funding Sources

          • Queensland University of Technology

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • 0
            Total Citations
          • 0
            Total Downloads
          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 29 Jan 2025

          Other Metrics

          Citations

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media