skip to main content
10.1145/502512.502526acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Empirical bayes screening for multi-item associations

Published: 26 August 2001 Publication History

Abstract

This paper considers the framework of the so-called "market basket problem", in which a database of transactions is mined for the occurrence of unusually frequent item sets. In our case, "unusually frequent" involves estimates of the frequency of each item set divided by a baseline frequency computed as if items occurred independently. The focus is on obtaining reliable estimates of this measure of interestingness for all item sets, even item sets with relatively low frequencies. For example, in a medical database of patient histories, unusual item sets including the item "patient death" (or other serious adverse event) might hopefully be flagged with as few as 5 or 10 occurrences of the item set, it being unacceptable to require that item sets occur in as many as 0.1% of millions of patient reports before the data mining algorithm detects a signal. Similar considerations apply in fraud detection applications. Thus we abandon the requirement that interesting item sets must contain a relatively large fixed minimal support, and adopt a criterion based on the results of fitting an empirical Bayes model to the item set counts. The model allows us to define a 95% Bayesian lower confidence limit for the "interestingness" measure of every item set, whereupon the item sets can be ranked according to their empirical Bayes confidence limits. For item sets of size J > 2, we also distinguish between multi-item associations that can be explained by the observed J(J-1)/2 pairwise associations, and item sets that are significantly more frequent than their pairwise associations would suggest. Such item sets can uncover complex or synergistic mechanisms generating multi-item associations. This methodology has been applied within the U.S. Food and Drug Administration (FDA) to databases of adverse drug reaction reports and within AT&T to customer international calling histories. We also present graphical techniques for exploring and understanding the modeling results.

References

[1]
Aggarwal CC, Yu PS (1998) A new framework for item set generation. Proc. of ACM-PODS Symposium on Principles of Database Systems, Seattle, WA, pp. 18-24.
[2]
Agrawal R, Imilienski T, Swami A (1993) Mining association rules between sets of items in large databases. Proc. ACM SIGMOD Intl. Conf. On Mgnt. of Data, pp. 207-216.
[3]
Agrawal R, Srikant S (1994) Fast algorithms for mining association rules. In Proc. 20 th VLDB Conf Santiago, Chile.
[4]
Agresti A (1990) Categorical Data Analysis. New York: John Wiley.
[5]
Bishop YMM, Fienberg SE, Holland PW (1975) Discrete Multivariate Analysis Cambridge, MA: MIT Press.
[6]
Brin S, Motwani R, Ullman JD, Tsur S (1997) Dynamic item set counting and implication rules for market basket data. Proc. ACM SIGMOD 1997 Intl. Conf. on Mgnt. of Data, pp. 255-264.
[7]
Bryck A, Randenbush S (1992) Hierarchical Linear Models. Newbury Park, CA: Sage Publications.
[8]
DuMouchel W (1999) Bayesian data mining in large frequency tables, with an application to the FDA Spontaneous Reporting System (with discussion), The American Statistician, 53:177-202.
[9]
DuMouchel W, Friedman C, Hripcsak G, Johnson S, Clayton P (1996) Two applications of statistical modeling to natm'al language processing. AI and Statistics V, ch. 39, edited by D. Fisher and H. Lenz, Springer-Verlag.
[10]
DuMouchel W, Volinsky C, Johnson T, Cortes C, Pregibon D (1999) Squashing flat files flatter, Proc. KDD 1999, ACM Press, San Diego, CA, p. 6-15.
[11]
Johnson N, Kotz S (1969) Discrete Distributions. Houghton Mifflin, now distributed by New York: John Wiley.
[12]
Mantel N, Haenszel W (1959) Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer Inst. 22" 719-748.
[13]
O'Hagan A (1994) Kendall's Advanced Theory of Statistics, vol. 2, Bayesian Inference. New York: Halstead Press (John Wiley).
[14]
Silverstein C, Brin S, Motwani R (1998) Beyond market baskets: generalizing association rules to dependence rules. Data Mining and Knowledge Discovery 2: 39-68.
[15]
Simpson EH (1951) The interpretation of interaction in contingency tables. J. Royal,Statistical Soc., B 13:238-241.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '01: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
August 2001
493 pages
ISBN:158113391X
DOI:10.1145/502512
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 August 2001

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Association rules
  2. Data Mining
  3. Knowledge Discovery
  4. Statistical Models
  5. empirical Bayes methods
  6. gamma-Poisson model
  7. market basket problem
  8. shrinkage estimation

Qualifiers

  • Article

Conference

KDD01
Sponsor:

Acceptance Rates

KDD '01 Paper Acceptance Rate 31 of 237 submissions, 13%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media