skip to main content
10.1145/1557019.1557096acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

CP-summary: a concise representation for browsing frequent itemsets

Published: 28 June 2009 Publication History

Abstract

This paper tackles the problem of summarizing frequent itemsets. We observe that previous notions of summaries cannot be directly used for analyzing frequent itemsets. In order to be used for analysis, one requirement is that the analysts should be able to browse all frequent itemsets by only having the summary.
For this purpose, we propose to build the summary based upon a novel formulation, conditional profile (or c-profile). Several features of our proposed summary are: (1) each profile in the summary can be analyzed independently, (2) it provides error guarantee (ε-adequate), and (3) it produces no false positives or false negatives.
Having the formulation, the next challenge is to produce the most concise summary which satisfies the requirement. In this paper, we also designed an algorithm which is both effective and efficient for this task. The quality of our approach is justified by extensive experiments.
The implementations for the algorithms are available from www.cais.ntu.edu.sg/~vivek/pubs/cprofile09.

Supplementary Material

JPG File (p687-gopalkrishnan.jpg)
MP4 File (p687-gopalkrishnan.mp4)

References

[1]
F. N. Afrati, A. Gionis, and H. Mannila. Approximating a collection of frequent sets. In KDD, pages 12--19, 2004.
[2]
R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In SIGMOD Conference, pages 94--105, 1998.
[3]
R. Agrawal, T. Imielinski, and A. N. Swami. Mining association rules between sets of items in large databases. In SIGMOD Conference, pages 207--216, 1993.
[4]
T. Calders and B. Goethals. Non-derivable itemset mining. DMKD, 14(1):171--206, 2007.
[5]
H. Cheng, X. Yan, J. Han, and P. S. Yu. Direct discriminative pattern mining for effective classification. In ICDE, pages 169--178, 2008.
[6]
J. Cheng, Y. Ke, and W. Ng. δ-tolerance closed frequent itemsets. In ICDM, pages 139--148, 2006.
[7]
D. Gunopulos, R. Khardon, H. Mannila, and H. Toivonen. Data mining, hypergraph transversals, and machine learning. In PODS, pages 209--216. ACM Press, 1997.
[8]
R. Jin, M. Abu-Ata, Y. Xiang, and N. Ruan. Effective and efficient itemset pattern summarization: regression-based approaches. In KDD, pages 399--407, 2008.
[9]
J. A. Kelner and D. A. Spielman. A randomized polynomial-time simplex algorithm for linear programming. In STOC, pages 51--60, 2006.
[10]
M. Klemettinen, H. Mannila, P. Ronkainen, H. Toivonen, and A. I. Verkamo. Finding interesting rules from large sets of discovered association rules. In CIKM, pages 401--407, 1994.
[11]
M. Kryszkiewicz and M. Gajek. Concise representation of frequent patterns based on generalized disjunction-free generators. In PAKDD, pages 159--171, 2002.
[12]
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In KDD, pages 80--86, 1998.
[13]
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations (extended abstract). In KDD, pages 189--194, 1996.
[14]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In ICDT, pages 398--416, 1999.
[15]
C. Wang and S. Parthasarathy. Summarizing itemset patterns using probabilistic models. In KDD, pages 730--735, 2006.
[16]
D. Xin, J. Han, X. Yan, and H. Cheng. Mining compressed frequent-pattern sets. In VLDB, pages 709--720, 2005.
[17]
X. Yan, H. Cheng, J. Han, and D. Xin. Summarizing itemset patterns: a profile-based approach. In KDD, pages 314--323, 2005.
[18]
X. Yan, P. S. Yu, and J. Han. Graph indexing: A frequent structure-based approach. In SIGMOD Conference, pages 335--346, 2004.

Cited By

View all

Index Terms

  1. CP-summary: a concise representation for browsing frequent itemsets

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
    June 2009
    1426 pages
    ISBN:9781605584959
    DOI:10.1145/1557019
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 28 June 2009

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. concise representations
    2. conditional profile
    3. frequent itemset
    4. summarization

    Qualifiers

    • Research-article

    Conference

    KDD09

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 01 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media