research-article

On the appropriate pattern frequentness measure and pattern generation mode: a critical review

Authors:

Bipin C. DesaiAuthors Info & Claims

IDEAS '19: Proceedings of the 23rd International Database Applications & Engineering Symposium

Article No.: 32, Pages 1 - 15

https://doi.org/10.1145/3331076.3331125

Published: 10 June 2019 Publication History

Abstract

The classic case pattern mining is a fundamental subject in data mining and big data science. The goal of the mining is to find correctly from a given dataset the patterns and their respective intrinsic frequentness. This paper examines two important yet misused instruments, the pattern frequentness measure "support" and the full enumeration pattern generation mode, which cause serious Overfitting thus deviate from the mining goal. A theoretic combined solution for the two critical issues is then proposed. This solution plus the equilibrium condition introduced in this paper forms a set of three fundamental rationality check criteria that every mining approach should observe. As such, the rationality of the mining theory and the reliability of the mining results would be substantially improved from the previous work. These together promise a significant change towards more effective pattern mining.

References

[1]

Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, Washington, D.C., USA, 1993.

Digital Library

[2]

Kdnuggets (2011) Poll results: Data types analyzed/mined, 06 (2011). Retrieved June 30, 2011 from http://www.kdnuggets.com/2011/06/poll-results-data-types-analyzed-mined.html?k11n15.

[3]

Kdnuggets (2012) Poll Results: Where did you apply Analytics/Data Mining. Kdnuggets news. Retrieved Dec 10, 2012 from http://www.kdnuggets.com/2012/12/poll-results-where-did-you-apply-analytics-data-mining.html.

[4]

Heikki Mannila and Hannu Toivonen. 1996. Multiple uses of frequent sets and condensed representations. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996, pp189--194.

Digital Library

[5]

Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th VLDB Conference, Santiago, Chile.

Digital Library

[6]

Allan Gut. 2005. Probability: A Graduate Course. Springer 2005, ISBN 0387228330.

[7]

Jiawei Han, Jian Pei, Yiwen Yin and Runying Mao. 2000. Mining frequent patterns without candidate generation. In Proceeding of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD'00), Dallas, TX, 2000, pp 1--12.

Digital Library

[8]

Hannu Toivonen. 1996. Sampling Large Databases for Association Rules. In Proceedings of the 22nd VLDB Conference, Mumbai(Bombay), India, 1996, pp 134--145.

Digital Library

[9]

Pradeep Shenoy, Gaurav Bhalotia, Jayant R. Haritsa, Mayank Bawa, S. Sudarshan, Devavrat Shah. 2000. Turbo-charging vertical mining of large databases. ACM SIGMOD Record, Volume 29, Issue 2, (June 2000), pp 22--23, ISSN:0163-5808.

Digital Library

[10]

Mohammed Zaki. 2000. Scalable algorithms for association mining. IEEE Transactions on Knowledge Data Engineering, Volume 12, Issue 3, 2000. -390, ISSN: 1041-4347.

Digital Library

[11]

Krishna Gade, Jianyong Wang, and George Karypis. 2004. Efficient closed pattern mining in the presence of tough block constraints. In Proceeding of the 2004 international conference on knowledge discovery and data mining (KDD'04), Seattle, WA, 2004.

Digital Library

[12]

D. T. Drewry, L. Gu, A. B. Hocking, K. D. Kang, R. C. Schutt, C. M. Taylor, J. L. Pfaltz. 2002. Current State of Data Mining, Technical Report: CS-2001-15, University of Virginia, Charlottesville, VA, USA.

Digital Library

[13]

Nicolas Pasquier, Yves Bastide, Rafik Taouil, Lotfi Lakhal. 1999. Discovering frequent closed itemsets for association rules. In Proceedings of the 7th international conference on database theory (ICDT'99), Jerusalem, Israel, 1999, pp 398--416.

Digital Library

[14]

Hui Xiong, Pang-Ning Tan, Vipin Kumar. 2006. Hyperclique pattern discovery. Data Mining and Knowledge Discovery, Volume 13, Number 2, (September 2006), pp. 219--242(24), Publisher: Springer.

Digital Library

[15]

Unil Yun, Gangin Lee, and Kyung-Min Lee. 2016. Efficient representative pattern mining based on weight and maximality conditions. Expert Systems 33(5) (2016).

Digital Library

[16]

Henk Tijms (2004) Understanding Probability. Cambridge University Press, 2004. ISBN: 0521833299.

[17]

P. Billingsley (1996) Probability and Measure, 3rd Edition. Wiley-Interscience, 1995. ISBN-10: 0471007102.

[18]

David J. Hand. 1999. Statistics and Data Mining: Intersecting Disciplines. SIGKDD exploration, ACM SIGKDD, volume 1, issue 1, 1999.

Digital Library

[19]

Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan. 2007. Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, Volume 15, No. 1, (2007), pp55--86. •

Digital Library

[20]

Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Alex T. Pang. 1998. Exploratory mining and pruning optimizations of constrained associations rules. In Proceeding of the 1998 ACM-SIGMOD international conference on management of data SIGMOD'98), Seattle, WA, 1998, pp 13--24.

Digital Library

[21]

Jian Pei, Jiawei Han and Laks V. S. Lakshmanan (2001) Mining frequent itemsets with convertible constraints. In Proceeding of the 2001 international conference on data engineering (ICDE'01), Heidelberg, Germany, 2001.

Digital Library

[22]

Brad Morantz. 2009. Constrained Data Mining. Encyclopedia of Data Warehouse, Volume I, by J. Wang, Second Edition. Publisher, Information Science Reference, 2009, ISBN: 978-1-60566-010-3.

[23]

Toon Calders and Bart Goethals. 2002. Mining All Non-derivable Frequent Itemsets. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD, 2002.

Digital Library

[24]

Jean-Fran, cois Boulicaut, Artur Bykowski and Christophe Rigotti. 2000. Approximation of frequency queries by means of free-sets. In Proceedings of PKDD Intentional Conference on Principles of Data Mining and Knowledge Discovery, 2000,

Digital Library

[25]

Guimei Liu, Jinyan Li and Limsoon Wong. 2008. A new concise representation of frequent itemsets using generators and a positive border. Knowledge and Information Systems, Vol. 17, Issue 1, (2008), pp 35--56, ISSN:0219--1377.

Digital Library

[26]

Marzena Kryszkiewicz. 2001. Concise representation of Frequent patterns based on disjunction-free generators. In Proceedings of IEEE Int. Conf. on Data Mining, 2001.

Digital Library

[27]

Jianyong Wang, Jiawei Han, Ying Lu, and Petre Tzvetkov. 2005. TFP: An efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng (2005) 17, pp 652--664.

Digital Library

[28]

Xifeng Yan, Hong Cheng, Jiawei Han and Dong Xin. 2005. Summarizing itemset patterns: a profile-based approach. In Proceedings of the 2005 ACM SIGKDD international conference on knowledge discovery in databases (KDD'05), Chicago, IL.

Digital Library

[29]

Yang Xiang, Ruoming Jin, David Fuhry and Feodor F. Dragan. 2008. Succinct summarization of transactional databases: an overlapped hyperrectangle scheme. In Proceedings of KDD'08.

Digital Library

[30]

Taneli Mielikäinen. 2004. An Automata Approach to Pattern Collections. In Knowledge Discovery in Inductive Databases, 3rd International Workshop, KDID, 2004.

[31]

Taneli Mielikäinenv. 2004. Implicit Enumeration of Patterns. In Knowledge Discovery in Inductive Databases, 3rd International Workshop, KDID, 2004.

[32]

Chee-yong Chan and Yannis Ioannidis. 1999. An Efficient Bitmap Encoding Scheme for Selection Queries. In Proceedings of the 1999 ACM SIGMOD international conference on management of data, 1999.

Digital Library

[33]

Jilles Vreeken, Matthijs van Leeuwen, Arno Siebes. 2011. Krimp: Mining itemsets that compress. Data Mining and Knowledge Discovery, 2011, 23(1).

Digital Library

[34]

D.W. Cheung, Jiawei Han, V.T. Ng, C.Y. Wong. 1996. Maintenance of discovered association rules in large databases: an incremental updating technique. In Proceedings of the 1996 international conference on data engineering (ICDE'96), New Orleans, LA, 1996.

Digital Library

[35]

Sergey Brin, Rajeev Motwani, Jeffrey Ullman and Shalom Tsur. 1997. Dynamic itemset counting and implication rules for market basket analysis. In Proceedings of the 1997 ACM-SIGMOD international conference on management of data (SIGMOD'97), Tucson, AZ, 1997, pp 255--264.

Digital Library

[36]

D.W. Cheung, Jiawei Han, V.T. Ng, A.W. Fu, Yongjian Fu. 1996. A fast distributed algorithm for mining association rules. In Proceedings of the 1996 international conference on parallel and distributed information systems, Miami Beach, FL, 1996.

Digital Library

[37]

Heungmo Ryangand Unil Yun. 2015. Top-K High Utility Pattern Mining with Effective Threshold Raising Strategies, Knowledge-Based Systems, 76, 109--126.

Digital Library

[38]

Jong Soo Park, Ming-syan Chen and Philip S. Yu. 1995. An effective hash based algorithm for mining association rules. In Proceedings of the 1995 ACM-SIGMOD international conference on management of data(SIGMOD'95), San Jose, CA, 1995.

Digital Library

[39]

Ashok Savasere, Edward Omiecinski and Shamkant Navathe. 1996. An efficient algorithm for mining association rules in large databases. In Proceeding of the 1995 international conference on very large data bases (VLDB'95), Zurich, Switzerland, 1995,

Digital Library

[40]

Jin Soung Yoo and Mark Bow. 2011. Mining top-k closed co-location patterns. In IEEE international conference on spatial data mining and geographical knowledge services (ICSDM), June 2011. •

[41]

Guimei Liu, Hongjun Lu, Wenwu Lou and Jeffrey Xu Yu. 2003. On computing, storing and querying frequent patterns. In Proceedings of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD'03), Washington, DC, 2003.

Digital Library

[42]

Gösta Grahne and Jianfei Zhu (2003) Efficiently using prefix- trees in mining frequent itemsets. In Proceedings of the ICDM'03 international workshop on frequent itemset mining implementations (FIMI'03), Melbourne, FL, 2003.

[43]

C. Ordonez, E. Omiecinski, L. de Braal, C.A. Santana, N. Ezquerra and J.A. Taboad (2001) Mining constrained association rules to predict heart disease. IEEE International Conf. on Data Mining, ICDM 2001.

Digital Library

[44]

Charu C. Aggarwal. 2014. An Introduction to Frequent Pattern Mining. Chapter 1 of Frequent Pattern Mining, edited by Charu C. Aggarwal and Jiawei. Han, Springer International Publishing, 2014, Printed ISBN 978-3-319-07820-5.

[45]

Stephen Stigler. 2008. Fisher and the 5% level. Chance, Vol. 21. No. 4, Springer New York, 2008, pp 12, ISSN: 0933-2480 (Print) 1867--2280 (Online).

[46]

Jacob Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. 1988. ISBN 0-8058-0283-5.

[47]

FIMI. 2009. Frequent Itemset Mining Dataset Repository. Retrieved July 2009 from http://fimi.cs.helsinki.fi/data/

[48]

Unil Yun, Donggyu Kim (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Future Generation Comp. Syst. 68 (2017).

[49]

Zhongmei Zhou, Zhaohui Wu, Yi Feng, Zhongmei Zhou, Zhaohui Wu and Yi Feng. 2006. Enhancing Reliability throughout Knowledge Discovery Process. In Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), ICDMW, 2006, pp754--758,

Digital Library

[50]

Tongyuan Wang, Bipin C. Desai. 2009. "Issues in Pattern Mining and their Resolutions", Proceedings of Canadian Conference on Computer Science & Software Engineering, C3S2E 2009, Montreal, Quebec, Canada. ACM International Conference Proceeding Series, ACM 2009, pp17--28. ISBN 978-1-60558-401-0.

Digital Library

[51]

Zaheer Ul-Haq and Jeffry D. Madura. 2015. Computer Applications for Drug Design and Biomolecular Systems, Frontiers in Computational Chemistry: Volume 2, 1st Edition, Nov. 2015. Print Book ISBN: 9781608059799, eBook ISBN: 9781608059782.

[52]

John F. Lucas (1990) Introduction to Abstract Mathematics. Rowman & Littlefield. ISBN 9780912675732.

[53]

Richard A. Brualdi. 2004. Introductory Combinatorics (4th ed.). Pearson Prentice Hall. ISBN 0-13-100119-1.

[54]

Gregory Piatetsky-Shapiro, and Christopher J. Andmatheus. 1994. The interestingness of deviations. In Proceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases (KDD-94). Seattle, WA. 25--36.

Digital Library

[55]

Robert J. Hilderman and Howard J. Hamilton (2003) Measuring the interestingness of discovered knowledge: A principled approach. Intelligent Data Analysis 7(4).

Digital Library

[56]

Pang-ning Tan, Vipin Kumar, and Jaideep Srivastava. 2002. Selecting the right interestingness measure for association patterns. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Press.

Digital Library

[57]

Liqiang Geng and Howard J. Hamilton (2006) Interestingness measures for data mining: A survey. ACM Computing Surveys (CSUR) 38 (3), 9, 2006

Digital Library

[58]

Kenneth McGarry (2005) A survey of interestingness measures for knowledge discovery. Knowl. Eng. Review 20, 1, 39--61, 2005.

Digital Library

[59]

Philippe Lenca, Patrick Meyer, Benoît Vaillant and Stéphane Lallich. 2004. A multicriteria decision aid for interestingness measure selection. Tech. Rep. LUSSI-TR-2004-01-EN, May 2004. LUSSI Department, GET/ENST, Bretagne, France.

[60]

Miho Ohsaki, Shinya Kitaguchi, Kazuya Okamoto, Hideto Yokoi and Takahira Yamaguchi. 2004. Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In Proceedings of the 8th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2004). Pisa, Italy. 362--373.

Digital Library

[61]

Nada Lavrač, Peter Flach and Blaz Zupan. 1999. Rule evaluation measures: A unifying view. In Proceedings of the 9th International Workshop on Inductive Logic Programming (ILP '99). Bled, Slovenia. Springer-Verlag, 174--185.

Digital Library

[62]

Martin Kirchgessner, Vincent Leroy, Sihem Amer-Yahia and Shashwat Mishra. 2016. Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns. Computing Research Repository, 2016, Volume abs/1603.04792.

[63]

Fabrice Guillet and Howard J. Hamilton (Eds.). 2007. Quality Measures in Data Mining. Studies in Computational Intelligence, 2007, Volume 43. ISBN 3-540-44911-6.

Digital Library

[64]

M. Padmavalli, K. Sreenivasa Rao (2013) An Efficient Interesting Weighted Association Rule Mining. International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 10, October 2013 ISSN: 2277 128X.

[65]

Haoran Zhang, Jianwu Zhang, Xuyang Wei, Xueyan Zhang, Tengfei Zou and Guocai Yang. 2017. A New Frequent Pattern Mining Algorithm with Weighted Multiple Minimum Supports. Intelligent Automation & Soft Computing, 23:4, 605--612

[66]

D. Sujatha and Naveen C. H. (2011) Quantitative Association Rule Mining on Weighted Transactional Data, International Journal of Information and Education Technology, Vol. 1, No. 3, August 2011.

[67]

Bay Vo, Frans Coenen and Bac Le. 2013. A new method for mining Frequent Weighted Itemsets based on WIT-trees, Expert Systems with Applications, Volume 40, Issue 4, March 2013, Pages 1256--1264

Digital Library

[68]

Anshu Zhang, Wenzhong Shi and Geoffrey I. Webb. 2016, Mining significant association rules from uncertain data. (12 January 2016) Data Mining and Knowledge Discovery

Digital Library

[69]

Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong and Han-Chieh Chao. 2017. Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases. International Journal of Information Technology & Decision Making, 2017, Volume 16, Number 06, Page 1549

[70]

Raymond A. Serway, Robert J. Beichner and John W. Jewett, Jr. 2000. Physics for Scientists and Engineers, Saunders College Publishing. ISBN 0-03-022654-6

[71]

Bakshi Rohit Prasad and Sonali Agarwal. 2016. Stream Data Mining: Platforms, Algorithms, Performance Evaluators and Research Trends. International journal of database theory and application, Vol. 9, No. 9 (2016), pp 201--218

[72]

Shikha Mehta Janardan (2017) Concept drift in Streaming Data Classification: Algorithms, Platforms and Issues. Information Technology and Quantitative Management (ITQM 2017), Procedia Computer Science, Volume 122, 2017, Pages 804--811, Elsevier.

Cited By

Wang T(2022)The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern miningInternational Journal of Data Science and Analytics10.1007/s41060-022-00340-116:1(43-83)Online publication date: 20-Aug-2022
https://doi.org/10.1007/s41060-022-00340-1

Index Terms

On the appropriate pattern frequentness measure and pattern generation mode: a critical review
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
  2. Information systems applications
    1. Data mining

Recommendations

Performance and characteristic analysis of maximal frequent pattern mining methods using additional factors

Various data mining methods have been proposed to handle large-scale data and discover interesting knowledge hidden in the data. Maximal frequent pattern mining is one of the data mining techniques suggested to solve the fatal problem of traditional ...
Identification of adverse disease agents and risk analysis using frequent pattern mining
Highlights
- An improved algorithm is proposed to construct FP-tree from transactional datasets.
Abstract
Life-threatening illnesses such as cancer, cirrhosis of the liver, and hepatitis have become crucial problems for humanity. The risk of mortality can be deflated by early detection of symptoms and providing the best possible diagnosis. ...
Hyperclique pattern discovery

Existing algorithms for mining association patterns often rely on the support-based pruning strategy to prune a combinatorial search space. However, this strategy is not effective for discovering potentially interesting patterns at low levels of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

IDEAS '19: Proceedings of the 23rd International Database Applications & Engineering Symposium

June 2019

364 pages

ISBN:9781450362498

DOI:10.1145/3331076

General Chairs:
Bipin C. Desai
Concordia University
,
Dimosthenis Anagnostopoulos
Harokopio University of Athens
,
Program Chairs:
Yannis Manolopoulos
Open University of Cyprus
,
Mara Nikolaidou
Harokopio University of Athens

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

IDEAS 2019

IDEAS 2019: 23rd International Database Engineering & Applications Symposium

June 10 - 12, 2019

Athens, Greece

Acceptance Rates

Overall Acceptance Rate 74 of 210 submissions, 35%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
51
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 01 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wang T(2022)The pattern frequency distribution theory: a mathematic establishment toward rational and reliable pattern miningInternational Journal of Data Science and Analytics10.1007/s41060-022-00340-116:1(43-83)Online publication date: 20-Aug-2022
https://doi.org/10.1007/s41060-022-00340-1

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten