skip to main content
article

Discovery in multi-attribute data with user-defined constraints

Published: 01 June 2002 Publication History

Abstract

There has been a growing interest in mining frequent itemsets in relational data with multiple attributes. A key step in this approach is to select a set of attributes that group data into transactions and a separate set of attributes that labels data into items. Unsupervised and unrestricted mining, however, is stymied by the combinatorial complexity and the quantity of patterns as the number of attributes grows. In this paper, we focus on leveraging the semantics of the underlying data for mining frequent itemsets. For instance, there are usually taxonomies in the data schema and functional dependencies among the attributes. Domain knowledge and user preferences often have the potential to significantly reduce the exponentially growing mining space. These observations motivate the design of a user-directed data mining framework that allows such domain knowledge to guide the mining process and control the mining strategy. We show examples of tremendous reduction in computation by using domain knowledge in mining relational data with multiple attributes.

References

[1]
R. Aggarwal, C. Aggarwal, and V. Parsad. Depth first generation of long patterns. In Int'l Conf. on Knowledge Discovery and Data Mining (SIGKDD), 2000.
[2]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In Proc. of Very Large Database. (VLDB), pages 207-216, 1993.
[3]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. of Very Large Database (VLDB), 1994.
[4]
R. Bayardo. Efficiently mining long patterns from database. In Int. Conf. Management of Data (SIGMOD), pages 85-93, 1998.
[5]
J. Deogun, V. Raghavan, A. Sarkar, and H. Sever. Data mining: Research trends, challenges, and applications, 1997.
[6]
H. B. Enderton. A Mathematical Introduction to Logic. Academic Press, 2nd edition, December 2000.
[7]
G. Grahne, L. Lakshmanan, X. Wang, and M. Xie. On dual mining: From patterns to circumstances, and back. In Int. Conf. Data Engineering (ICDE), pages 195-204, 2001.
[8]
J. Han and Y. Fu. Discovery of multiple-level association rules from large databases. In Proc. of Very Large Database (VLDB), 1995.
[9]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Int. Conf. Management of Data (SIGMOD), 2000.
[10]
J. Hipp, A. Myka, R. Wirth, and U. Guntzer. A new algorithm for faster mining of generalized association rules. In Proc. 2nd PKKD, 1998.
[11]
R. Ng, L. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Int. Conf. Management of Data (SIGMOD), pages 13-24, 1998.
[12]
B. Padmanabhan and A. Tuzhilin. Unexpectedness as a measure of interestingness in knowledge discovery, 1999.
[13]
C.-S. Perng, H. Wang, S. Ma, and J. L. Hellerstein. Farm: A framework for exploring mining spaces with multiple attributes. In IEEE Int. Conf. on Data Mining(ICDM), 2001.
[14]
J. A. Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM, 12:23-41, 1965.
[15]
A. Silberschatz and A. Tuzhilin. What makes patterns interesting in knowledge discovery systems. IEEE Trans. On Knowledge And Data Engineering, 8:970-974, 1996.
[16]
R. Srikant and R. Agrawal. Mining generalized association rules. In Proc. of Very Large Database (VLDB), pages 407-419, 1995.
[17]
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Int'l Conf. on Knowledge Discovery and Data Mining (SIGKDD), pages 67-93, 1997.
[18]
A. Tarski. A lattice-theoretical fixpoint theorem and its applications. Pacific J. Math., pages 253-309, 1955.
[19]
M. H. van Emden and R. A. Kowalski. The semantics of predicate logic as a programming language. J. ACM, pages 733-742, October 1976.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGKDD Explorations Newsletter
ACM SIGKDD Explorations Newsletter  Volume 4, Issue 1
June 2002
75 pages
ISSN:1931-0145
EISSN:1931-0153
DOI:10.1145/568574
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 June 2002
Published in SIGKDD Volume 4, Issue 1

Check for updates

Author Tags

  1. association rule
  2. domain knowledge
  3. frequent itemset
  4. multi-attribute

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Nov 2024

Other Metrics

Citations

Cited By

View all

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media