research-article

CP-summary: a concise representation for browsing frequent itemsets

Authors:

Ardian Kristanto Poernomo,

Vivekanand GopalkrishnanAuthors Info & Claims

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 687 - 696

https://doi.org/10.1145/1557019.1557096

Published: 28 June 2009 Publication History

Abstract

This paper tackles the problem of summarizing frequent itemsets. We observe that previous notions of summaries cannot be directly used for analyzing frequent itemsets. In order to be used for analysis, one requirement is that the analysts should be able to browse all frequent itemsets by only having the summary.

For this purpose, we propose to build the summary based upon a novel formulation, conditional profile (or c-profile). Several features of our proposed summary are: (1) each profile in the summary can be analyzed independently, (2) it provides error guarantee (ε-adequate), and (3) it produces no false positives or false negatives.

Having the formulation, the next challenge is to produce the most concise summary which satisfies the requirement. In this paper, we also designed an algorithm which is both effective and efficient for this task. The quality of our approach is justified by extensive experiments.

The implementations for the algorithms are available from www.cais.ntu.edu.sg/~vivek/pubs/cprofile09.

Supplementary Material

JPG File (p687-gopalkrishnan.jpg)

Download
9.48 KB

MP4 File (p687-gopalkrishnan.mp4)

Download
205.12 MB

References

[1]

F. N. Afrati, A. Gionis, and H. Mannila. Approximating a collection of frequent sets. In KDD, pages 12--19, 2004.

Digital Library

[2]

R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD Conference, pages 94--105, 1998.

Digital Library

[3]

R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD Conference, pages 207--216, 1993.

Digital Library

[4]

T. Calders and B. Goethals. Non-derivable itemset mining. DMKD, 14(1):171--206, 2007.

Digital Library

[5]

H. Cheng, X. Yan, J. Han, and P. S. Yu. Direct discriminative pattern mining for effective classification. In ICDE, pages 169--178, 2008.

Digital Library

[6]

J. Cheng, Y. Ke, and W. Ng. δ-tolerance closed frequent itemsets. In ICDM, pages 139--148, 2006.

Digital Library

[7]

D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen. Data mining, hypergraph transversals, and machine learning. In PODS, pages 209--216. ACM Press, 1997.

Digital Library

[8]

R. Jin, M. Abu-Ata, Y. Xiang, and N. Ruan. Effective and efficient itemset pattern summarization: regression-based approaches. In KDD, pages 399--407, 2008.

Digital Library

[9]

J. A. Kelner and D. A. Spielman. A randomized polynomial-time simplex algorithm for linear programming. In STOC, pages 51--60, 2006.

Digital Library

[10]

M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In CIKM, pages 401--407, 1994.

Digital Library

[11]

M. Kryszkiewicz and M. Gajek. Concise representation of frequent patterns based on generalized disjunction-free generators. In PAKDD, pages 159--171, 2002.

Digital Library

[12]

B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In KDD, pages 80--86, 1998.

Digital Library

[13]

H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations (extended abstract). In KDD, pages 189--194, 1996.

[14]

N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In ICDT, pages 398--416, 1999.

Digital Library

[15]

C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In KDD, pages 730--735, 2006.

Digital Library

[16]

D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB, pages 709--720, 2005.

Digital Library

[17]

X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing itemset patterns: a profile-based approach. In KDD, pages 314--323, 2005.

Digital Library

[18]

X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD Conference, pages 335--346, 2004.

Digital Library

Cited By

Liu CChen L(2016)Summarizing uncertain transaction databases by Probabilistic Tiles2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727771(4375-4382)Online publication date: Jul-2016
https://doi.org/10.1109/IJCNN.2016.7727771
Toh WChoi KWong L(2016)Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic PlatformIntelligent Information and Database Systems10.1007/978-3-662-49390-8_1(3-12)Online publication date: 2016
https://doi.org/10.1007/978-3-662-49390-8_1
Liu GZhang HWong L(2014)A Flexible Approach to Finding Representative Pattern SetsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.2726:7(1562-1574)Online publication date: Jul-2014
https://doi.org/10.1109/TKDE.2013.27
Show More Cited By

Index Terms

CP-summary: a concise representation for browsing frequent itemsets
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

TFP: An Efficient Algorithm for Mining Top-K Frequent Closed Itemsets

Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is ...
Frequent regular itemset mining
KDD '10: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining

Concise representations of frequent itemsets sacrifice readability and direct interpretability by a data analyst of the concise patterns extracted. In this paper, we introduce an extension of itemsets, called regular, with an immediate semantics and ...
Efficient Algorithms for Mining the Concise and Lossless Representation of High Utility Itemsets
Mining high utility itemsets (HUIs) from databases is an important data mining task, which refers to the discovery of itemsets with high utilities (e.g. high profits). However, it may present too many HUIs to users, which also degrades the efficiency of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

June 2009

1426 pages

ISBN:9781605584959

DOI:10.1145/1557019

General Chairs:
John Elder
Elder Research, Inc., USA
,
Françoise Soulié Fogelman
KXEN, France
,
Program Chairs:
Peter Flach
University of Bristol, UK
,
Mohammed Zaki
RPI, USA

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD09

Sponsor:

KDD09: The 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

June 28 - July 1, 2009

Paris, France

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
689
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Liu CChen L(2016)Summarizing uncertain transaction databases by Probabilistic Tiles2016 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2016.7727771(4375-4382)Online publication date: Jul-2016
https://doi.org/10.1109/IJCNN.2016.7727771
Toh WChoi KWong L(2016)Redhyte: Towards a Self-diagnosing, Self-correcting, and Helpful Analytic PlatformIntelligent Information and Database Systems10.1007/978-3-662-49390-8_1(3-12)Online publication date: 2016
https://doi.org/10.1007/978-3-662-49390-8_1
Liu GZhang HWong L(2014)A Flexible Approach to Finding Representative Pattern SetsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2013.2726:7(1562-1574)Online publication date: Jul-2014
https://doi.org/10.1109/TKDE.2013.27
Liu CChen LZhang CGrossman RUthurusamy RDhillon IKoren Y(2013)Summarizing probabilistic frequent patternsProceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2487575.2487618(527-535)Online publication date: 11-Aug-2013
https://dl.acm.org/doi/10.1145/2487575.2487618
Liu GZhang HWong LYang QAgarwal DPei J(2012)Finding minimum representative pattern setsProceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining10.1145/2339530.2339543(51-59)Online publication date: 12-Aug-2012
https://dl.acm.org/doi/10.1145/2339530.2339543
Deng ZXu X(2012)Fast mining erasable itemsets using NC_setsExpert Systems with Applications: An International Journal10.1016/j.eswa.2011.09.14339:4(4453-4463)Online publication date: 1-Mar-2012
https://dl.acm.org/doi/10.1016/j.eswa.2011.09.143
Jin RXiang YHong HHuang KGoethals BTatti NVreeken J(2010)Block interactionProceedings of the ACM SIGKDD Workshop on Useful Patterns10.1145/1816112.1816120(55-64)Online publication date: 25-Jul-2010
https://dl.acm.org/doi/10.1145/1816112.1816120
Carmichael CLeung CGoethals BTatti NVreeken J(2010)CloseVizProceedings of the ACM SIGKDD Workshop on Useful Patterns10.1145/1816112.1816116(17-26)Online publication date: 25-Jul-2010
https://dl.acm.org/doi/10.1145/1816112.1816116

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents