Article

Relative risk and odds ratio: a data mining perspective

Authors:

Yap-Peng TanAuthors Info & Claims

PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Pages 368 - 377

https://doi.org/10.1145/1065167.1065215

Published: 13 June 2005 Publication History

Abstract

We are often interested to test whether a given cause has a given effect. If we cannot specify the nature of the factors involved, such tests are called model-free studies. There are two major strategies to demonstrate associations between risk factors (ie. patterns) and outcome phenotypes (ie. class labels). The first is that of prospective study designs, and the analysis is based on the concept of "relative risk": What fraction of the exposed (ie. has the pattern) or unexposed (ie. lacks the pattern) individuals have the phenotype (ie. the class label)? The second is that of retrospective designs, and the analysis is based on the concept of "odds ratio": The odds that a case has been exposed to a risk factor is compared to the odds for a case that has not been exposed. The efficient extraction of patterns that have good relative risk and/or odds ratio has not been previously studied in the data mining context. In this paper, we investigate such patterns. We show that this pattern space can be systematically stratified into plateaus of convex spaces based on their support levels. Exploiting convexity, we formulate a number of sound and complete algorithms to extract the most general and the most specific of such patterns at each support level. We compare these algorithms. We further demonstrate that the most efficient among these algorithms is able to mine these sophisticated patterns at a speed comparable to that of mining frequent closed patterns, which are patterns that satisfy considerably simpler conditions.

References

[1]

R. Agrawal, et al. Mining association rules between sets of items in large databases. In Proceedings of 12th ACM-SIGMOD International Conference on Management of Data, pages 207--216, 1993.]]

Digital Library

[2]

A. Agresti. An Introduction to Categorical Data Analysis. Wiley & Sons, New York, 1996.]]

[3]

Y. Bastide, et al. Mining minimal non-redundant association rules using frequent closed itemsets. In Computational Logic, pages 972--986, 2000.]]

Digital Library

[4]

Y. Bastide, et al. Mining frequent patterns with counting inference. SIGKDD Explorations, 2:66--75, 2000.]]

Digital Library

[5]

R. J. Bayardo. Efficiently mining long patterns from databases. In Proceedings of 17th ACM-SIGMOD International Conference on Management of Data, pages 85--93, 1998.]]

Digital Library

[6]

G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of 5th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 15--18, 1999.]]

Digital Library

[7]

E. Fredkin. Trie memory. Communications of ACM, 3:490--500, 1960.]]

Digital Library

[8]

B. Goethals and M. J. Zaki. FIMI03: Workshop on frequent itemset mining implementations. In Proceedings of ICDM2003 Workshop on Frequent Itemset Mining implementations, pages 1--13, 2003.]]

[9]

G. Grahne and J. Zhu. Efficiently using prefix-trees in mining frequent itemsets. In Proceedings of ICDM2003 Workshop on Frequent Itemset Mining Implementations, 2003.]]

[10]

J. Han, et al. Mining frequent patterns without candidates generation. In Proceedings of 19th ACM-SIGMOD International Conference on Management of Data, pages 1--12, 2000.]]

Digital Library

[11]

J. Li, et al. The space of jumping emerging patterns and its incremental maintenance algorithms. In Proceedings of 17th International Conference on Machine Learning, pages 551--558, 2000.]]

Digital Library

[12]

V. P. Luong. The closed keys base of frequent itemsets. In Proceedings of 4th International Conference on Data Warehousing and Knowledge Discovery, pages 181--190, 2002.]]

Digital Library

[13]

F. Pan, et al. CARPENTER: Finding closed patterns in long biological datasets. In Proceedings of 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 637--642, 2003.]]

Digital Library

[14]

N. Pasquier, et al. Discovering frequent closed itemsets for association rules. In Proceedings of 7th International Conference on Database Theory, pages 398--416, 1999.]]

Digital Library

[15]

N. Pasquier, et al. Efficient mining of association rules using closed itemset lattices. Information Systems, 24:25--46, 1999.]]

Digital Library

[16]

J. Pei, et al. CLOSET: An efficient algorithm for mining frequent closed itemsets. In Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pages 21--30, 2000.]]

[17]

P.-N. Tan, et al. Selecting the right interestingness measure for association patterns, In Proceedings of 8th ACM-SIGKDD International Conference on Knowledge Dicovery and Data Mining, pages 32--41, 2002.]]

Digital Library

[18]

P.-N. Tan, et al. Selecting the right objective measure for association analysis, Information systems, 29:293--313, 2004.]]

Digital Library

[19]

J. Wang, et al. CLOSET+: Search for the best strategies for mining frequent closed itemsets. In Proceedings of 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 236--245, 2003.]]

Digital Library

[20]

K. M. Weiss. Genetic Variation and Human Disease: Principles and Evolutionary Approaches. Cambridge University Press, 1993.]]

[21]

M. J. Zaki and C.-J. Hsiao. CHARM: An efficient algorithm for closed itemset mining. In Proceedings of 2nd SIAM International Conference on Data Mining, pages 457--473, 2002.]]

Cited By

Chen JZhou JHao X(2024)A Time-Efficient Distributed Constant Conditional Functional Dependency Discovery Algorithm for Data Consistency2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00036(198-203)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00036
Liu DLi YBaskett WLin DShyu C(2022)RHPTree—Risk Hierarchical Pattern Tree for Scalable Long Pattern MiningACM Transactions on Knowledge Discovery from Data10.1145/348838016:4(1-33)Online publication date: 8-Jan-2022
https://dl.acm.org/doi/10.1145/3488380
Weng CHuang C(2020)Discovering Specific Sales Patterns Among Different Market SegmentsInternational Journal of Data Warehousing and Mining10.4018/IJDWM.202007010316:3(37-59)Online publication date: 1-Jul-2020
https://dl.acm.org/doi/10.4018/IJDWM.2020070103
Show More Cited By

Recommendations

Risk matrix driven supply chain risk management: Adapting risk matrix based tools to modelling interdependent risks and risk appetite
Highlights
- A major research gap is identified in the literature on supply chain risk management.
Abstract
There is a major research gap of developing a supply chain risk management process integrating the risk appetite of a decision maker and all stages of the risk management process within an interdependent network of systemic risks. We ...
Risk Summarization
ICCIS '11: Proceedings of the 2011 International Conference on Computational and Information Sciences

Risk management includes phases such as risk identification, risk assessment and risk control. Little work has focused on risk summarization, which is a useful phase of risk management. Risk summarization means a list of occurred risks, risk ...
The relative worst-order ratio applied to paging

The relative worst-order ratio, a relatively new measure for the quality of on-line algorithms, is extended and applied to the paging problem. We obtain results significantly different from those obtained with the competitive ratio. First, we devise a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PODS '05: Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

June 2005

388 pages

ISBN:1595930620

DOI:10.1145/1065167

General Chair:
Georg Gottlob
Vienna University of Technology, Austria
,
Program Chair:
Foto Afrati
National Technical University of Athens, Greece

Copyright © 2005 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 June 2005

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Article

Conference

SIGMOD/PODS05

Sponsor:

SIGMOD/PODS05: International Conference on Management of Data and Symposium on Principles Database and Systems

June 13 - 15, 2005

Maryland, Baltimore

Acceptance Rates

Overall Acceptance Rate 642 of 2,707 submissions, 24%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

41
Total Citations
View Citations
525
Total Downloads

Downloads (Last 12 months)40
Downloads (Last 6 weeks)3

Reflects downloads up to 06 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chen JZhou JHao X(2024)A Time-Efficient Distributed Constant Conditional Functional Dependency Discovery Algorithm for Data Consistency2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00036(198-203)Online publication date: 2-Jul-2024
https://doi.org/10.1109/COMPSAC61105.2024.00036
Liu DLi YBaskett WLin DShyu C(2022)RHPTree—Risk Hierarchical Pattern Tree for Scalable Long Pattern MiningACM Transactions on Knowledge Discovery from Data10.1145/348838016:4(1-33)Online publication date: 8-Jan-2022
https://dl.acm.org/doi/10.1145/3488380
Weng CHuang C(2020)Discovering Specific Sales Patterns Among Different Market SegmentsInternational Journal of Data Warehousing and Mining10.4018/IJDWM.202007010316:3(37-59)Online publication date: 1-Jul-2020
https://dl.acm.org/doi/10.4018/IJDWM.2020070103
Elhilbawi HEldawlatly SMahdi H(2019)A Taxonomy of Discretization Techniques based on Class Labels and Attributes' Relationship2019 14th International Conference on Computer Engineering and Systems (ICCES)10.1109/ICCES48960.2019.9068185(316-321)Online publication date: Dec-2019
https://doi.org/10.1109/ICCES48960.2019.9068185
Sheri ARafique MHassan MJunejo KJeon M(2019)Boosting Discrimination Information Based Document Clustering Using Consensus and ClassificationIEEE Access10.1109/ACCESS.2019.29234627(78954-78962)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2923462
Song CHe YBo YWang JRen ZGuo JYang H(2019)Disease relative risk downscaling model to localize spatial epidemiologic indicators for mapping hand, foot, and mouth disease over ChinaStochastic Environmental Research and Risk Assessment10.1007/s00477-019-01728-533:10(1815-1833)Online publication date: 12-Sep-2019
https://doi.org/10.1007/s00477-019-01728-5
Bravo Ilisástigui LMartín Rodríguez DGarcía-Borroto M(2019)A New Method to Evaluate Subgroup Discovery AlgorithmsProgress in Pattern Recognition, Image Analysis, Computer Vision, and Applications10.1007/978-3-030-33904-3_39(417-426)Online publication date: 22-Oct-2019
https://doi.org/10.1007/978-3-030-33904-3_39
Garzón-Garnica ECano-Olivos PSánchez-Partida DMartínez-Flores J(2019)Data Mining/Mediation to Evaluate Risk of a Humanitarian Logistics Network in MexicoTechniques, Tools and Methodologies Applied to Global Supply Chain Ecosystems10.1007/978-3-030-26488-8_16(359-381)Online publication date: 30-Aug-2019
https://doi.org/10.1007/978-3-030-26488-8_16
Abuzaid FBailis PDing JGan EMadden SNarayanan DRong KSuri S(2018)MacroBaseACM Transactions on Database Systems10.1145/327646343:4(1-45)Online publication date: 6-Dec-2018
https://dl.acm.org/doi/10.1145/3276463
Zhou JCheng QLi S(2018)iCFDMinerProceedings of the 2018 International Conference on Computing and Data Engineering10.1145/3219788.3219808(15-21)Online publication date: 4-May-2018
https://dl.acm.org/doi/10.1145/3219788.3219808
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents