Article

COFI approach for mining frequent itemsets revisited

Authors:

Mohammad El-Hajj,

Osmar R. ZaïaneAuthors Info & Claims

DMKD '04: Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

Pages 70 - 75

https://doi.org/10.1145/1008694.1008706

Published: 13 June 2004 Publication History

Abstract

The COFI approach for mining frequent itemsets, introduced recently, is an efficient algorithm that was demonstrated to outperform state-of-the-art algorithms on synthetic data. For instance, COFI is not only one order of magnitude faster and requires significantly less memory than the popular FP-Growth, it is also very effective with extremely large datasets, better than any reported algorithm. However, COFI has a significant drawback when mining dense transactional databases which is the case with some real datasets. The algorithm performs poorly in these cases because it ends up generating too many local candidates that are doomed to be infrequent. In this paper, we present a new algorithm COFI* for mining frequent itemsets. This novel algorithm uses the same data structure COFI-tree as its predecessor, but partitions the patterns in such a way to avoid the drawbacks of COFI. Moreover, its approach uses a pseudo-Oracle to pinpoint the maximal itemsets, from which all frequent itemsets are derived and counted, avoiding the generation of candidates fated infrequent. Our implementation tested on real and synthetic data shows that COFI* algorithm outperforms state-of-the-art algorithms, among them COFI itself.

References

[1]

R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data, pages 207--216, Washington, D. C., May 1993.

Digital Library

[2]

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases, pages 487--499, Santiago, Chile, September 1994.

Digital Library

[3]

R. J. Bayardo. Efficiently mining long patterns from databases. In ACM SIGMOD, 1998.

Digital Library

[4]

F. Beil, M. Ester, and X. Xu. Frequent term-based text clustering. In Proc. 8th Int. Conf. on Knowledge Discovery and Data Mining (KDD '2002), Edmonton, Alberta, Canada, 2002.

Digital Library

[5]

D. Burdick, M. Calimlim, and J. Gehrke. Mafia: A maximal frequent itemset algorithm for transactional databases. In ICDE, pages 443--452, 2001.

Digital Library

[6]

M. El-Hajj and O. R. Zaïane. Inverted matrix: Efficient discovery of frequent items in large datasets in the context of interactive mining. In In Proc. 2003 Int'l Conf. on Data Mining and Knowledge Discovery (ACM SIGKDD), pages 109--118, August 2003.

Digital Library

[7]

M. El-Hajj and O. R. Zaïane. Non recursive generation of frequent k-itemsets from frequent pattern tree representations. In In Proc. of 5th International Conference on Data Warehousing and Knowledge Discovery (DaWak'2003), pages 371--380, September 2003.

[8]

B. Goethals. Frequent pattern mining implementations. http://www.cs.helsinki.fi/u/goethals/software/index.html.

[9]

B. Goethals and M. Zaki. Advances in frequent itemset mining implementations: Introduction to fimi03. In Workshop on Frequent Itemset Mining Implementations (FIMI'03) in conjunction with IEEE-ICDM, 2003.

[10]

K. Gouda and M. J. Zaki. Efficiently mining maximal frequent itemsets. In ICDM, pages 163--170, 2001.

Digital Library

[11]

J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In ACM-SIGMOD, Dallas, 2000.

Digital Library

[12]

IBM_Almaden. Quest synthetic data generation code. http://www.almaden.ibm.com/cs/quest/syndata.html.

[13]

H. Mannila. Inductive databases and condensed representations for data mining. In International Logic Programming Symposium, 1997.

Digital Library

[14]

A. Rungsawang, A. Tangpong, P. Laohawee, and T. Khampachua. Novel query expansion technique using apriori algorithm. In TREC, Gaithersburg, Maryland, 1999.

[15]

M. J. Zaki. Scalable algorithms for association mining. Knowledge and Data Engineering, 12(2):372--390, 2000.

Digital Library

Cited By

Chen JZheng HLi PZhang ZLi HLiu W(2020)Fuzzy Association Rule Mining Algorithm Based on Load ClassifierData Science10.1007/978-981-15-2810-1_18(178-191)Online publication date: 2-Feb-2020
https://doi.org/10.1007/978-981-15-2810-1_18
Mini TNedunchezhian RVijayakumar V(2017)Development of an efficient association rule classifier with temporal characteristics and hierarchical partitioning2016 Eighth International Conference on Advanced Computing (ICoAC)10.1109/ICoAC.2017.7951738(19-25)Online publication date: Jan-2017
https://doi.org/10.1109/ICoAC.2017.7951738
Selvi SKarthikeyan PVincent AAbinaya VNeeraja GDeepika R(2017)Text categorization using Rocchio algorithm and random forest algorithm2016 Eighth International Conference on Advanced Computing (ICoAC)10.1109/ICoAC.2017.7951736(7-12)Online publication date: Jan-2017
https://doi.org/10.1109/ICoAC.2017.7951736
Show More Cited By

COFI approach for mining frequent itemsets revisited
1. Computing methodologies
2. Information systems
  1. Information systems applications

Recommendations

A Combination Approach to Frequent Itemsets Mining
ICCIT '08: Proceedings of the 2008 Third International Conference on Convergence and Hybrid Information Technology - Volume 01

In this paper, we propose a new mining of frequent itemsets algorithm, called SFI-mine algorithm. The SFI-mine constructs pattern-base by using a new method which is different from the conditional pattern-base in FP-growth, mines frequent itemsets with ...
An adaptive approach to mining frequent itemsets efficiently

The mining of frequent itemsets is a fundamental and important task of data mining. To improve the efficiency in mining frequent itemsets, many researchers developed smart data structures to represent the database, and designed divide-and-conquers ...
An efficient pattern growth approach for mining fault tolerant frequent itemsets
Highlights
- Mining fault tolerant (FT) frequent itemsets are computationally expensive.
- ...
Abstract
Mining fault tolerant (FT) frequent itemsets from transactional databases are computationally more expensive than mining exact matching frequent itemsets. Previous algorithms mine FT frequent itemsets using Apriori heuristic. Apriori-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DMKD '04: Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery

June 2004

85 pages

ISBN:158113908X

DOI:10.1145/1008694

Program Chairs:
Gautam Das
Microsoft Research
,
Bing Liu
University of Illinois at Chicago
,
Philip S. Yu
IBM T.J. Watson Research Center

Copyright © 2004 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2004

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

DMKD04

Sponsor:

SIGMOD

DMKD04: 9th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery 2004

13 06 2004

Paris, France

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
622
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)1

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen JZheng HLi PZhang ZLi HLiu W(2020)Fuzzy Association Rule Mining Algorithm Based on Load ClassifierData Science10.1007/978-981-15-2810-1_18(178-191)Online publication date: 2-Feb-2020
https://doi.org/10.1007/978-981-15-2810-1_18
Mini TNedunchezhian RVijayakumar V(2017)Development of an efficient association rule classifier with temporal characteristics and hierarchical partitioning2016 Eighth International Conference on Advanced Computing (ICoAC)10.1109/ICoAC.2017.7951738(19-25)Online publication date: Jan-2017
https://doi.org/10.1109/ICoAC.2017.7951738
Selvi SKarthikeyan PVincent AAbinaya VNeeraja GDeepika R(2017)Text categorization using Rocchio algorithm and random forest algorithm2016 Eighth International Conference on Advanced Computing (ICoAC)10.1109/ICoAC.2017.7951736(7-12)Online publication date: Jan-2017
https://doi.org/10.1109/ICoAC.2017.7951736
Nawapornanan CBoonjing V(2014)HCG: A new algorithm for mining share-frequent patterns2014 International Computer Science and Engineering Conference (ICSEC)10.1109/ICSEC.2014.6978230(398-402)Online publication date: Jul-2014
https://doi.org/10.1109/ICSEC.2014.6978230
Adnan MAlhajj R(2011)A Bounded and Adaptive Memory-Based Approach to Mine Frequent Patterns From Very Large DatabasesIEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics10.1109/TSMCB.2010.204890041:1(154-172)Online publication date: 1-Feb-2011
https://dl.acm.org/doi/10.1109/TSMCB.2010.2048900
Baralis ECerquitelli TChiusano SShin SOssowski SSchumacher MPalakal MHung C(2010)A persistent HY-Tree to efficiently support itemset mining on large datasetsProceedings of the 2010 ACM Symposium on Applied Computing10.1145/1774088.1774309(1060-1064)Online publication date: 22-Mar-2010
https://dl.acm.org/doi/10.1145/1774088.1774309
Baralis ECerquitelli TChiusano SGrand A(2010)Array-Tree: A persistent data structure to compactly store frequent itemsets2010 5th IEEE International Conference Intelligent Systems10.1109/IS.2010.5548388(108-113)Online publication date: Jul-2010
https://doi.org/10.1109/IS.2010.5548388
Adnan MAlhajj R(2009)DRFP-treeApplied Intelligence10.1007/s10489-007-0099-230:2(84-97)Online publication date: 1-Apr-2009
https://dl.acm.org/doi/10.1007/s10489-007-0099-2
El-Hajj MZaïane O(2005)Mining with constraints by pruning and avoiding ineffectual processingProceedings of the 18th Australian Joint conference on Advances in Artificial Intelligence10.1007/11589990_129(1001-1004)Online publication date: 5-Dec-2005
https://dl.acm.org/doi/10.1007/11589990_129
Zaïane O(2005)Relevance of counting in data mining tasksProceedings of the First international conference on Advanced Data Mining and Applications10.1007/11527503_4(14-18)Online publication date: 22-Jul-2005
https://dl.acm.org/doi/10.1007/11527503_4

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents