skip to main content
10.1145/882082.882089acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Processing frequent itemset discovery queries by division and set containment join operators

Published: 13 June 2003 Publication History

Abstract

SQL-based data mining algorithms are rarely used in practice today. Most performance experiments have shown that SQL-based approaches are inferior to main-memory algorithms. Nevertheless, database vendors try to integrate analysis functionalities to some extent into their query execution and optimization components in order to narrow the gap between data and processing. Such a database support is particularly important when data mining applicatons need to analyze very large datasets or when they need access current data, not a possibly outdated copy of it.We investigate approaches based on SQL for the problem of finding frequent itemsets in a transaction table, including an algorithm that we recently proposed, called Quiver, which employs universal and existential quantifications. This approach employs a table schema for itemsets that is similar to the commonly used vertical layout for transactions: each item of an itemset is stored in a separate row. We argue that expressing the frequent itemset discovery problem using quantifications offers interesting opportunities to process such queries using set containment join or set containment division operators, which are not yet available in commercial database systems. Initial performance experiments reveal that Quiver cannot be processed efficiently by commercial DBMS. However, our experiments with query execution plans that use operators realizing set containment tests suggest that an efficient processing of Quiver is possible.

References

[1]
R. Agrawal, A. Somani, and Y. Xu. Storage and Querying of E-Commerce Data. In Proceedings VLDB, Rome, Italy, pages 149--158, September 2001.]]
[2]
R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules. In Proceedings VLDB, Santiago, Chile, pages 487--499, September 1994.]]
[3]
J. V. d. Bercken, B. Blohsfeld, J.-P. Dittrich, J. Krämer, T. Schäfer, M. Schneider, and M. Seeger. XXL -- A Library Approach to Supporting Efficient Implementations of Advanced Database Queries. In Proceedings VLDB, Rome, Italy, pages 39--48, September 2001.]]
[4]
J. Chen and D. DeWitt. Dynamic Re-grouping of Continuous Queries. In Proceedings VLDB, Hong Kong, China, pages 430--441, August 2002.]]
[5]
M. Gimbel, M. Klein, and P. Lockemann. Interactivity, Scalability and Resource Control for Efficient KDD Support in DBMS. In Proceedings DTDM, Prague, Czech Republic, pages 37--50, March 2002.]]
[6]
G. Graefe and R. Cole. Fast Algorithms for Universal Quantification in Large Databases. TODS, 20(2):187--236, 1995.]]
[7]
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann Publishers, 2001.]]
[8]
S. Helmer. Performance Enhancements for Advanced Database Management Systems. PhD thesis, University of Mannheim, Germany, December 2000.]]
[9]
S. Helmer and G. Moerkotte. Compiling Away Set Containment and Intersection Joins. Technical Report, University of Mannheim, Germany.]]
[10]
M. Houtsma and A. Swami. Set-oriented Data Mining in Relational Databases. DKE, 17(3):245--262, December 1995.]]
[11]
N. Mamoulis. Efficient Processing of Joins on Setvalued Attributes. In Proceedings SIGMOD, San Diego, California, USA, June 2003.]]
[12]
W. Maniatty and M. Zaki. A Requirements Analysis for Parallel KDD Systems. In Proceedings HIPS, Cancun, Mexico, pages 358--365, May 2000.]]
[13]
H. Mannila, H. Toivonen, and A. I. Verkamo. Efficient Algorithms for Discovering Association Rules. In AAAI Workshop on Knowledge and Discovery in Databases, Seattle, Washington, USA, pages 181--192, July 1994.]]
[14]
S. Melnik and H. Garcia-Molina. Divide-and-Conquer Algorithm for Computing Set Containment Joins. In Proceeding EDBT, Prague, Czech Republic, pages 427--444, March 2002.]]
[15]
S. Melnik and H. Garcia-Molina. Adaptive Algorithms for Set Containment Joins. TODS, 28(1):56--99, March 2003.]]
[16]
I. Pramudiono, T. Shintani, T. Tamura, and M. Kitsuregawa. Parallel SQL Based Association Rule Mining on Large Scale PC Cluster: Performance Comparison, with Directly Coded C Implementation. In Proceedings PAKDD, Beijing, China, pages 94--98, April 1999.]]
[17]
K. Ramasamy. Efficient Storage and Query Processing of Set-valued Attributes. PhD thesis, University of Wisconsin, Madison, Wisconsin, USA, 2002. 144 pages.]]
[18]
K. Ramasamy, J. Patel, J. Naughton, and R. Kaushik. Set Containment Joins: The Good, The Bad and The Ugly. In Proceedings VLDB, Cairo, Egypt, pages 351--362, September 2000.]]
[19]
R. Rantzau. Frequent Itemset Discovery with SQL Using Universal Quantification. In P. Lanzi and R. Meo, editors, Database Support for Data Mining Applications, volume 2682 of LNCS. Springer, 2003. To appear.]]
[20]
R. Rantzau, L. Shapiro, B. Mitschang, and Q. Wang. Algorithms and Applications for Universal Quantification in Relational Databases. Information Systems Journal, Elsevier, 28(1):3--32, January 2003.]]
[21]
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications. In Proceedings SIGMOD, Seattle, Washington, USA, pages 343--354, June 1998.]]
[22]
S. Sarawagi, S. Thomas, and R. Agrawal. Integrating Association Rule Mining with Relational Database Systems: Alternatives and Implications. Research Report RJ 10107 (91923), IBM Almaden Research Center, San Jose, California, USA, March 1998.]]
[23]
S. Thomas and S. Chakravarthy. Performance Evaluation and Optimization of Join Queries for Association Rule Mining. In Proceedings DaWaK, Florence, Italy, pages 241--250, August--September 1999.]]
[24]
T. Yoshizawa, I. Pramudiono, and M. Kitsuregawa. SQL Based Association Rule Mining Using Commercial RDBMS (IBM DB2 UDB EEE). In Proceedings DaWaK, London, UK, pages 301--306, September 2000.]]
[25]
C. Zhang, J. Naughton, D. DeWitt, Q. Luo, and G. Lohman. On Supporting Containment Queries in Relational Database Management Systems. In Proceedings SIGMOD, Santa Barbara, California, USA, May 2001.]]
[26]
Z. Zheng, R. Kohavi, and L. Mason. Real World Performance of Association Rule Algorithms. In Proceedings SIGKDD, San Francisco, California, USA, pages 401--406, August 2001.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DMKD '03: Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
June 2003
103 pages
ISBN:9781450374224
DOI:10.1145/882082
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. association rule discovery
  2. relational divison
  3. set containment join

Qualifiers

  • Article

Conference

DMKD03
Sponsor:

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media