skip to main content
10.1145/1014052.1014057acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Approximating a collection of frequent sets

Published: 22 August 2004 Publication History

Abstract

One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much in alleviating the problem.In this paper we address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a collection of sets by k sets is defined to be the size of the collection covered by the the k sets, i.e., the part of the collection that is included in one of the k sets. We also specify a bound on the number of extra sets that are allowed to be covered. We examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide simple polynomial-time approximation algorithms. We give empirical evidence showing that the approximation methods work well in practice.

References

[1]
Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining associations between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 207--216, 1993.]]
[2]
Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. In Proceedings of the IEEE International Conference on Data Engineering, pages 3--14, 1995.]]
[3]
Toon Calders and Bart Goethals. Mining all non-derivable frequent itemsets. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, pages 74--85, 2002.]]
[4]
Min Fang, Narayanan Shivakumar, Hector Garcia-Molina, Rajeev Motwani, and Jeffrey D. Ullman. Computing iceberg queries efficiently. In Proceedings of the 24th International Conference on Very Large Data Bases, pages 299--310, New York City, USA, 1998.]]
[5]
William Feller. An introduction to probability theory and its applications. John Wiley & Sons, 1968.]]
[6]
M.R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979.]]
[7]
Bart Goethals. Frequent itemset mining implementations. http://www.cs.helsinki.fi/u/goethals/software/.]]
[8]
Jiawei Han, Jianyong Wang, Ying Lu, and Petre Tzvetkov. Mining top-k frequent closed patterns without minimum support. In Proceedings of the IEEE International Conference on Data Mining, pages 211--218, 2002.]]
[9]
Dorit Hochbaum, editor. Approximation algorithms for NP-hard problems. PWS Publishing Company, 1997.]]
[10]
Ron Kohavi, Carla Brodley, Brian Frasca, Llew Mason, and Zijian Zheng. KDD-Cup 2000 Organizers' Report: Peeling the Onion. SIGKDD Explorations, 2(2):86--98, 2000. http://www.ecn.purdue.edu/KDDCUP/.]]
[11]
Heikki Mannila, Hannu Toivonen, and Inkeri Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259--289, 1997.]]
[12]
Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 1995.]]
[13]
Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. Discovering frequent closed itemsets for association rules. In 7th International Conference on Database Theory, pages 398--416, 1999.]]
[14]
Jian Pei, Guozhu Dong, Wei Zou, and Jiawei Han. On computing condensed frequent pattern bases. In Proceedings of the IEEE International Conference on Data Mining, 2002.]]

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2004
874 pages
ISBN:1581138881
DOI:10.1145/1014052
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. foundations of data mining
  2. mining frequent itemsets

Qualifiers

  • Article

Conference

KDD04

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media