Article

Approximating a collection of frequent sets

Authors:

Foto Afrati,

Aristides Gionis,

Heikki MannilaAuthors Info & Claims

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 12 - 19

https://doi.org/10.1145/1014052.1014057

Published: 22 August 2004 Publication History

Get Access

Abstract

One of the most well-studied problems in data mining is computing the collection of frequent item sets in large transactional databases. One obstacle for the applicability of frequent-set mining is that the size of the output collection can be far too large to be carefully examined and understood by the users. Even restricting the output to the border of the frequent item-set collection does not help much in alleviating the problem.In this paper we address the issue of overwhelmingly large output size by introducing and studying the following problem: What are the k sets that best approximate a collection of frequent item sets? Our measure of approximating a collection of sets by k sets is defined to be the size of the collection covered by the the k sets, i.e., the part of the collection that is included in one of the k sets. We also specify a bound on the number of extra sets that are allowed to be covered. We examine different problem variants for which we demonstrate the hardness of the corresponding problems and we provide simple polynomial-time approximation algorithms. We give empirical evidence showing that the approximation methods work well in practice.

References

[1]

Rakesh Agrawal, Tomasz Imielinski, and Arun Swami. Mining associations between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 207--216, 1993.]]

Digital Library

Google Scholar

[2]

Rakesh Agrawal and Ramakrishnan Srikant. Mining sequential patterns. In Proceedings of the IEEE International Conference on Data Engineering, pages 3--14, 1995.]]

Digital Library

Google Scholar

[3]

Toon Calders and Bart Goethals. Mining all non-derivable frequent itemsets. In Proceedings of the European Conference on Principles of Data Mining and Knowledge Discovery, pages 74--85, 2002.]]

Digital Library

Google Scholar

[4]

Min Fang, Narayanan Shivakumar, Hector Garcia-Molina, Rajeev Motwani, and Jeffrey D. Ullman. Computing iceberg queries efficiently. In Proceedings of the 24th International Conference on Very Large Data Bases, pages 299--310, New York City, USA, 1998.]]

Digital Library

Google Scholar

[5]

William Feller. An introduction to probability theory and its applications. John Wiley & Sons, 1968.]]

Google Scholar

[6]

M.R. Garey and David S. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness. W.H. Freeman, 1979.]]

Digital Library

Google Scholar

[7]

Bart Goethals. Frequent itemset mining implementations. http://www.cs.helsinki.fi/u/goethals/software/.]]

Google Scholar

[8]

Jiawei Han, Jianyong Wang, Ying Lu, and Petre Tzvetkov. Mining top-k frequent closed patterns without minimum support. In Proceedings of the IEEE International Conference on Data Mining, pages 211--218, 2002.]]

Digital Library

Google Scholar

[9]

Dorit Hochbaum, editor. Approximation algorithms for NP-hard problems. PWS Publishing Company, 1997.]]

Digital Library

Google Scholar

[10]

Ron Kohavi, Carla Brodley, Brian Frasca, Llew Mason, and Zijian Zheng. KDD-Cup 2000 Organizers' Report: Peeling the Onion. SIGKDD Explorations, 2(2):86--98, 2000. http://www.ecn.purdue.edu/KDDCUP/.]]

Digital Library

Google Scholar

[11]

Heikki Mannila, Hannu Toivonen, and Inkeri Verkamo. Discovery of frequent episodes in event sequences. Data Mining and Knowledge Discovery, 1(3):259--289, 1997.]]

Digital Library

Google Scholar

[12]

Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cambridge University Press, 1995.]]

Digital Library

Google Scholar

[13]

Nicolas Pasquier, Yves Bastide, Rafik Taouil, and Lotfi Lakhal. Discovering frequent closed itemsets for association rules. In 7th International Conference on Database Theory, pages 398--416, 1999.]]

Digital Library

Google Scholar

[14]

Jian Pei, Guozhu Dong, Wei Zou, and Jiawei Han. On computing condensed frequent pattern bases. In Proceedings of the IEEE International Conference on Data Mining, 2002.]]

Digital Library

Google Scholar

Cited By

View all

Molinaro CPulice CSubasic ABartolome ASubrahmanian V(2021)STARACM/IMS Transactions on Data Science10.1145/34191072:1(1-36)Online publication date: 3-Jan-2021
https://dl.acm.org/doi/10.1145/3419107
Li XZhou RChen LZhang YLiu CHe QYang Y(2021)Finding a Summary for All Maximal Cliques2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00120(1344-1355)Online publication date: Apr-2021
https://doi.org/10.1109/ICDE51399.2021.00120
Bashar MLi Y(2018)Interpretation of text patternsData Mining and Knowledge Discovery10.1007/s10618-018-0556-z32:4(849-884)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1007/s10618-018-0556-z
Show More Cited By

Index Terms

Approximating a collection of frequent sets
1. Information systems
  1. Information systems applications
    1. Data mining
2. Theory of computation
  1. Design and analysis of algorithms

Recommendations

Mining uncertain data for constrained frequent sets
IDEAS '09: Proceedings of the 2009 International Database Engineering & Applications Symposium

Data mining aims to search for implicit, previously unknown, and potentially useful pieces of information---such as sets of items that are frequently co-occurring together---that are embedded in data. The mined frequent sets can be used in the discovery ...
A new approach for collaborative filtering based on mining frequent itemsets
ACIIDS'13: Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part II

As one of the most successful approaches to building recommender systems, collaborative filtering (CF) uses the known preferences of a group of users to make recommendations or predictions of the unknown preferences for other users. In this paper, we ...
Study on the Discovery Algorithm of the Frequent Item Sets
ASIA '09: Proceedings of the 2009 International Asia Symposium on Intelligent Interaction and Affective Computing

Data mining technology is an interdisciplinary which has developed rapidly at home. It involves database, statistics, artificial intelligence, machine learning and other fields. The popularity of computer use produced a large amount of data. Data mining ...

Comments

Information & Contributors

Information

Published In

KDD '04: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining

August 2004

874 pages

ISBN:1581138881

DOI:10.1145/1014052

General Chairs:
Won Kim
Cyber Database Solutions
,
Ronny Kohavi
Amazon.com
,
Program Chairs:
Johannes Gehrke
Cornell University
,
William DuMouchel
AT&T Labs Research

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Conference

KDD04

Sponsor:

KDD04: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 22 - 25, 2004

WA, Seattle, USA

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

83
Total Citations
View Citations
1,091
Total Downloads

Downloads (Last 12 months)5
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Molinaro CPulice CSubasic ABartolome ASubrahmanian V(2021)STARACM/IMS Transactions on Data Science10.1145/34191072:1(1-36)Online publication date: 3-Jan-2021
https://dl.acm.org/doi/10.1145/3419107
Li XZhou RChen LZhang YLiu CHe QYang Y(2021)Finding a Summary for All Maximal Cliques2021 IEEE 37th International Conference on Data Engineering (ICDE)10.1109/ICDE51399.2021.00120(1344-1355)Online publication date: Apr-2021
https://doi.org/10.1109/ICDE51399.2021.00120
Bashar MLi Y(2018)Interpretation of text patternsData Mining and Knowledge Discovery10.1007/s10618-018-0556-z32:4(849-884)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1007/s10618-018-0556-z
Bashar MLi YShen YGao YHuang W(2017)Conceptual annotation of text patternsComputational Intelligence10.1111/coin.1213333:4(948-979)Online publication date: 26-Jul-2017
https://doi.org/10.1111/coin.12133
Saif-ur-Rehman Ashraf JAhmed SAhsan M(2016)A review on support threshold free frequent itemsets mining approaches2016 19th International Multi-Topic Conference (INMIC)10.1109/INMIC.2016.7840098(1-6)Online publication date: Dec-2016
https://doi.org/10.1109/INMIC.2016.7840098
Zhou PLi GWong A(2016)An Effective Pattern Pruning and Summarization Method Retaining High Quality Patterns With High Area Coverage in Relational DatasetsIEEE Access10.1109/ACCESS.2016.26244184(7847-7858)Online publication date: 2016
https://doi.org/10.1109/ACCESS.2016.2624418
Durand NQuafafou M(2016)Frequent Itemset Border Approximation by DualizationTransactions on Large-Scale Data- and Knowledge-Centered Systems XXVI - Volume 967010.1007/978-3-662-49784-5_2(32-60)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1007/978-3-662-49784-5_2
Simon GCaraballo PTherneau TCha SCastro MLi P(2015)Extending Association Rule Summarization Techniques to Assess Risk of Diabetes MellitusIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.7627:1(130-141)Online publication date: Jan-2015
https://doi.org/10.1109/TKDE.2013.76
Soni AGoel MGoel R(2015)3 dimensional Frequent Closed Pattern miner2015 International Symposium on Advanced Computing and Communication (ISACC)10.1109/ISACC.2015.7377320(84-89)Online publication date: Sep-2015
https://doi.org/10.1109/ISACC.2015.7377320
Wang D(2015)Contrast Pattern Based Methods for Visualizing and Predicting Spatiotemporal Events2015 IEEE International Conference on Data Mining Workshop (ICDMW)10.1109/ICDMW.2015.191(1560-1567)Online publication date: Nov-2015
https://doi.org/10.1109/ICDMW.2015.191
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Mining uncertain data for constrained frequent sets

A new approach for collaborative filtering based on mining frequent itemsets

Study on the Discovery Algorithm of the Frequent Item Sets