Article

Processing frequent itemset discovery queries by division and set containment join operators

Author:

Ralf RantzauAuthors Info & Claims

DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

Pages 20 - 27

https://doi.org/10.1145/882082.882089

Published: 13 June 2003 Publication History

Abstract

SQL-based data mining algorithms are rarely used in practice today. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. Nevertheless, database vendors try to integrate analysis functionalities to some extent into their query execution and optimization components in order to narrow the gap between data and processing. Such a database support is particularly important when data mining applicatons need to analyze very large datasets or when they need access current data, not a possibly outdated copy of it.We investigate approaches based on SQL for the problem of finding frequent itemsets in a transaction table, including an algorithm that we recently proposed, called Quiver, which employs universal and existential quantifications. This approach employs a table schema for itemsets that is similar to the commonly used vertical layout for transactions: each item of an itemset is stored in a separate row. We argue that expressing the frequent itemset discovery problem using quantifications offers interesting opportunities to process such queries using set containment join or set containment division operators, which are not yet available in commercial database systems. Initial performance experiments reveal that Quiver cannot be processed efficiently by commercial DBMS. However, our experiments with query execution plans that use operators realizing set containment tests suggest that an efficient processing of Quiver is possible.

References

[1]

R. Agrawal, A. Somani, and Y. Xu. Storage and Querying of E-Commerce Data. In Proceedings VLDB, Rome, Italy, pages 149--158, September 2001.]]

Digital Library

[2]

R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules. In Proceedings VLDB, Santiago, Chile, pages 487--499, September 1994.]]

Digital Library

[3]

J. V. d. Bercken, B. Blohsfeld, J.-P. Dittrich, J. Krämer, T. Schäfer, M. Schneider, and M. Seeger. XXL -- A Library Approach to Supporting Efficient Implementations of Advanced Database Queries. In Proceedings VLDB, Rome, Italy, pages 39--48, September 2001.]]

Digital Library

[4]

J. Chen and D. DeWitt. Dynamic Re-grouping of Continuous Queries. In Proceedings VLDB, Hong Kong, China, pages 430--441, August 2002.]]

[5]

M. Gimbel, M. Klein, and P. Lockemann. Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS. In Proceedings DTDM, Prague, Czech Republic, pages 37--50, March 2002.]]

[6]

G. Graefe and R. Cole. Fast Algorithms for Universal Quantification in Large Databases. TODS, 20(2):187--236, 1995.]]

Digital Library

[7]

J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2001.]]

Digital Library

[8]

S. Helmer. Performance Enhancements for Advanced Database Management Systems. PhD thesis, University of Mannheim, Germany, December 2000.]]

[9]

S. Helmer and G. Moerkotte. Compiling Away Set Containment and Intersection Joins. Technical Report, University of Mannheim, Germany.]]

[10]

M. Houtsma and A. Swami. Set-oriented Data Mining in Relational Databases. DKE, 17(3):245--262, December 1995.]]

Digital Library

[11]

N. Mamoulis. Efficient Processing of Joins on Setvalued Attributes. In Proceedings SIGMOD, San Diego, California, USA, June 2003.]]

Digital Library

[12]

W. Maniatty and M. Zaki. A Requirements Analysis for Parallel KDD Systems. In Proceedings HIPS, Cancun, Mexico, pages 358--365, May 2000.]]

Digital Library

[13]

H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient Algorithms for Discovering Association Rules. In AAAI Workshop on Knowledge and Discovery in Databases, Seattle, Washington, USA, pages 181--192, July 1994.]]

[14]

S. Melnik and H. Garcia-Molina. Divide-and-Conquer Algorithm for Computing Set Containment Joins. In Proceeding EDBT, Prague, Czech Republic, pages 427--444, March 2002.]]

Digital Library

[15]

S. Melnik and H. Garcia-Molina. Adaptive Algorithms for Set Containment Joins. TODS, 28(1):56--99, March 2003.]]

Digital Library

[16]

I. Pramudiono, T. Shintani, T. Tamura, and M. Kitsuregawa. Parallel SQL Based Association Rule Mining on Large Scale PC Cluster: Performance Comparison, with Directly Coded C Implementation. In Proceedings PAKDD, Beijing, China, pages 94--98, April 1999.]]

Digital Library

[17]

K. Ramasamy. Efficient Storage and Query Processing of Set-valued Attributes. PhD thesis, University of Wisconsin, Madison, Wisconsin, USA, 2002. 144 pages.]]

Digital Library

[18]

K. Ramasamy, J. Patel, J. Naughton, and R. Kaushik. Set Containment Joins: The Good, The Bad and The Ugly. In Proceedings VLDB, Cairo, Egypt, pages 351--362, September 2000.]]

Digital Library

[19]

R. Rantzau. Frequent Itemset Discovery with SQL Using Universal Quantification. In P. Lanzi and R. Meo, editors, Database Support for Data Mining Applications, volume 2682 of LNCS. Springer, 2003. To appear.]]

[20]

R. Rantzau, L. Shapiro, B. Mitschang, and Q. Wang. Algorithms and Applications for Universal Quantification in Relational Databases. Information Systems Journal, Elsevier, 28(1):3--32, January 2003.]]

Digital Library

[21]

S. Sarawagi, S. Thomas, and R. Agrawal. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications. In Proceedings SIGMOD, Seattle, Washington, USA, pages 343--354, June 1998.]]

Digital Library

[22]

S. Sarawagi, S. Thomas, and R. Agrawal. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications. Research Report RJ 10107 (91923), IBM Almaden Research Center, San Jose, California, USA, March 1998.]]

[23]

S. Thomas and S. Chakravarthy. Performance Evaluation and Optimization of Join Queries for Association Rule Mining. In Proceedings DaWaK, Florence, Italy, pages 241--250, August--September 1999.]]

Digital Library

[24]

T. Yoshizawa, I. Pramudiono, and M. Kitsuregawa. SQL Based Association Rule Mining Using Commercial RDBMS (IBM DB2 UDB EEE). In Proceedings DaWaK, London, UK, pages 301--306, September 2000.]]

Digital Library

[25]

C. Zhang, J. Naughton, D. DeWitt, Q. Luo, and G. Lohman. On Supporting Containment Queries in Relational Database Management Systems. In Proceedings SIGMOD, Santa Barbara, California, USA, May 2001.]]

Digital Library

[26]

Z. Zheng, R. Kohavi, and L. Mason. Real World Performance of Association Rule Algorithms. In Proceedings SIGKDD, San Francisco, California, USA, pages 401--406, August 2001.]]

Digital Library

Cited By

Savnik IAkulich MKrnc MŠkrekovski R(2021)Data structure set-trie for storing and querying sets: Theoretical and empirical analysisPLOS ONE10.1371/journal.pone.024512216:2(e0245122)Online publication date: 10-Feb-2021
https://doi.org/10.1371/journal.pone.0245122
Rogora DPapalini MKhazaei KMargara ACarzaniga ACugola G(2017)High-Throughput Subset Matching on Commodity GPU-Based SystemsProceedings of the Twelfth European Conference on Computer Systems10.1145/3064176.3064190(513-526)Online publication date: 23-Apr-2017
https://dl.acm.org/doi/10.1145/3064176.3064190
Bouros PMamoulis NGe STerrovitis M(2016)Set containment join revisitedKnowledge and Information Systems10.1007/s10115-015-0895-749:1(375-402)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1007/s10115-015-0895-7
Show More Cited By

Processing frequent itemset discovery queries by division and set containment join operators
1. Information systems
2. Theory of computation
  1. Theory and algorithms for application domains
    1. Database theory

Recommendations

A New Parallel Algorithm for the Frequent Itemset Mining Problem
ISPDC '08: Proceedings of the 2008 International Symposium on Parallel and Distributed Computing

A new parallel algorithm for finding the frequent itemsets in databases is presented. It differs fundamentally of well known Apriori algorithm, where at the beginning of every step, the dimension of the new frequent itemsets increases by 1 . In our ...
A generalized parallel algorithm for frequent itemset mining
ICCOMP'08: Proceedings of the 12th WSEAS international conference on Computers

A parallel algorithm for finding the frequent itemsets in a set of transactions is presented. The frequent individual items are identified by their index. We assume that processors number (m) is less than the frequent items number (n). At the first ...
Three strategies for concurrent processing of frequent itemset queries using FP-growth
KDID'06: Proceedings of the 5th international conference on Knowledge discovery in inductive databases

Frequent itemset mining is often regarded as advanced querying where a user specifies the source dataset and pattern constraints using a given constraint model. Recently, a new problem of optimizing processing of sets of frequent itemset queries has ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

June 2003

103 pages

ISBN:9781450374224

DOI:10.1145/882082

Conference Chairs:
Mohammed J. Zaki
Rensselaer Polytechnic Institute, Troy, New York
,
Charu C. Aggarwal
IBM T.J. Watson Research Center, Yorktown Heights, New York

Copyright © 2003 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2003

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

DMKD03

Sponsor:

SIGMOD

DMKD03: 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery ( held in conjunction with MOD/PODS 2003 conference / co-located with FCRC 2003 Conference )

June 13, 2003

California, San Diego

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

15
Total Citations
View Citations
741
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Savnik IAkulich MKrnc MŠkrekovski R(2021)Data structure set-trie for storing and querying sets: Theoretical and empirical analysisPLOS ONE10.1371/journal.pone.024512216:2(e0245122)Online publication date: 10-Feb-2021
https://doi.org/10.1371/journal.pone.0245122
Rogora DPapalini MKhazaei KMargara ACarzaniga ACugola G(2017)High-Throughput Subset Matching on Commodity GPU-Based SystemsProceedings of the Twelfth European Conference on Computer Systems10.1145/3064176.3064190(513-526)Online publication date: 23-Apr-2017
https://dl.acm.org/doi/10.1145/3064176.3064190
Bouros PMamoulis NGe STerrovitis M(2016)Set containment join revisitedKnowledge and Information Systems10.1007/s10115-015-0895-749:1(375-402)Online publication date: 1-Oct-2016
https://dl.acm.org/doi/10.1007/s10115-015-0895-7
Luo YFletcher GHidders JDe Bra P(2015)Efficient and scalable trie-based algorithms for computing set containment relations2015 IEEE 31st International Conference on Data Engineering10.1109/ICDE.2015.7113293(303-314)Online publication date: Apr-2015
https://doi.org/10.1109/ICDE.2015.7113293
Ibrahim AFletcher GGuerrini GPaton N(2013)Efficient processing of containment queries on nested setsProceedings of the 16th International Conference on Extending Database Technology10.1145/2452376.2452404(227-238)Online publication date: 18-Mar-2013
https://dl.acm.org/doi/10.1145/2452376.2452404
Rantzau RMangold C(2006)Laws for Rewriting Queries Containing Division OperatorsProceedings of the 22nd International Conference on Data Engineering10.1109/ICDE.2006.180Online publication date: 3-Apr-2006
https://dl.acm.org/doi/10.1109/ICDE.2006.180
Liu HZeleznikow JJamil H(2006)Logic-Based Association Rule Mining in XML DocumentsAdvanced Web and Network Technologies, and Applications10.1007/11610496_11(97-106)Online publication date: 2006
https://doi.org/10.1007/11610496_11
Shang XSattler KHaddad HOmicini AWainwright R(2005)Depth-first frequent itemset mining in relational databasesProceedings of the 2005 ACM symposium on Applied computing10.1145/1066677.1066928(1112-1117)Online publication date: 13-Mar-2005
https://dl.acm.org/doi/10.1145/1066677.1066928
Sidló CLukács A(2005)Shaping SQL-Based frequent pattern mining algorithmsProceedings of the 4th international conference on Knowledge Discovery in Inductive Databases10.1007/11733492_11(188-201)Online publication date: 3-Oct-2005
https://dl.acm.org/doi/10.1007/11733492_11
Alves RBelo O(2005)Programming relational databases for Itemset mining over large transactional tablesProceedings of the 12th Portuguese conference on Progress in Artificial Intelligence10.1007/11595014_32(314-324)Online publication date: 5-Dec-2005
https://dl.acm.org/doi/10.1007/11595014_32
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents