skip to main content
10.1145/3331076.3331125acmotherconferencesArticle/Chapter ViewAbstractPublication PagesideasConference Proceedingsconference-collections
research-article

On the appropriate pattern frequentness measure and pattern generation mode: a critical review

Published: 10 June 2019 Publication History

Abstract

The classic case pattern mining is a fundamental subject in data mining and big data science. The goal of the mining is to find correctly from a given dataset the patterns and their respective intrinsic frequentness. This paper examines two important yet misused instruments, the pattern frequentness measure "support" and the full enumeration pattern generation mode, which cause serious Overfitting thus deviate from the mining goal. A theoretic combined solution for the two critical issues is then proposed. This solution plus the equilibrium condition introduced in this paper forms a set of three fundamental rationality check criteria that every mining approach should observe. As such, the rationality of the mining theory and the reliability of the mining results would be substantially improved from the previous work. These together promise a significant change towards more effective pattern mining.

References

[1]
Rakesh Agrawal, Tomasz Imieliński, and Arun Swami. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, Washington, D.C., USA, 1993.
[2]
Kdnuggets (2011) Poll results: Data types analyzed/mined, 06 (2011). Retrieved June 30, 2011 from http://www.kdnuggets.com/2011/06/poll-results-data-types-analyzed-mined.html?k11n15.
[3]
Kdnuggets (2012) Poll Results: Where did you apply Analytics/Data Mining. Kdnuggets news. Retrieved Dec 10, 2012 from http://www.kdnuggets.com/2012/12/poll-results-where-did-you-apply-analytics-data-mining.html.
[4]
Heikki Mannila and Hannu Toivonen. 1996. Multiple uses of frequent sets and condensed representations. In Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), 1996, pp189--194.
[5]
Rakesh Agrawal and Ramakrishnan Srikant. 1994. Fast Algorithms for Mining Association Rules. In Proceedings of the 20th VLDB Conference, Santiago, Chile.
[6]
Allan Gut. 2005. Probability: A Graduate Course. Springer 2005, ISBN 0387228330.
[7]
Jiawei Han, Jian Pei, Yiwen Yin and Runying Mao. 2000. Mining frequent patterns without candidate generation. In Proceeding of the 2000 ACM-SIGMOD international conference on management of data (SIGMOD'00), Dallas, TX, 2000, pp 1--12.
[8]
Hannu Toivonen. 1996. Sampling Large Databases for Association Rules. In Proceedings of the 22nd VLDB Conference, Mumbai(Bombay), India, 1996, pp 134--145.
[9]
Pradeep Shenoy, Gaurav Bhalotia, Jayant R. Haritsa, Mayank Bawa, S. Sudarshan, Devavrat Shah. 2000. Turbo-charging vertical mining of large databases. ACM SIGMOD Record, Volume 29, Issue 2, (June 2000), pp 22--23, ISSN:0163-5808.
[10]
Mohammed Zaki. 2000. Scalable algorithms for association mining. IEEE Transactions on Knowledge Data Engineering, Volume 12, Issue 3, 2000. -390, ISSN: 1041-4347.
[11]
Krishna Gade, Jianyong Wang, and George Karypis. 2004. Efficient closed pattern mining in the presence of tough block constraints. In Proceeding of the 2004 international conference on knowledge discovery and data mining (KDD'04), Seattle, WA, 2004.
[12]
D. T. Drewry, L. Gu, A. B. Hocking, K. D. Kang, R. C. Schutt, C. M. Taylor, J. L. Pfaltz. 2002. Current State of Data Mining, Technical Report: CS-2001-15, University of Virginia, Charlottesville, VA, USA.
[13]
Nicolas Pasquier, Yves Bastide, Rafik Taouil, Lotfi Lakhal. 1999. Discovering frequent closed itemsets for association rules. In Proceedings of the 7th international conference on database theory (ICDT'99), Jerusalem, Israel, 1999, pp 398--416.
[14]
Hui Xiong, Pang-Ning Tan, Vipin Kumar. 2006. Hyperclique pattern discovery. Data Mining and Knowledge Discovery, Volume 13, Number 2, (September 2006), pp. 219--242(24), Publisher: Springer.
[15]
Unil Yun, Gangin Lee, and Kyung-Min Lee. 2016. Efficient representative pattern mining based on weight and maximality conditions. Expert Systems 33(5) (2016).
[16]
Henk Tijms (2004) Understanding Probability. Cambridge University Press, 2004. ISBN: 0521833299.
[17]
P. Billingsley (1996) Probability and Measure, 3rd Edition. Wiley-Interscience, 1995. ISBN-10: 0471007102.
[18]
David J. Hand. 1999. Statistics and Data Mining: Intersecting Disciplines. SIGKDD exploration, ACM SIGKDD, volume 1, issue 1, 1999.
[19]
Jiawei Han, Hong Cheng, Dong Xin, Xifeng Yan. 2007. Frequent pattern mining: current status and future directions. Data Mining and Knowledge Discovery, Volume 15, No. 1, (2007), pp55--86. •
[20]
Raymond T. Ng, Laks V. S. Lakshmanan, Jiawei Han, Alex T. Pang. 1998. Exploratory mining and pruning optimizations of constrained associations rules. In Proceeding of the 1998 ACM-SIGMOD international conference on management of data SIGMOD'98), Seattle, WA, 1998, pp 13--24.
[21]
Jian Pei, Jiawei Han and Laks V. S. Lakshmanan (2001) Mining frequent itemsets with convertible constraints. In Proceeding of the 2001 international conference on data engineering (ICDE'01), Heidelberg, Germany, 2001.
[22]
Brad Morantz. 2009. Constrained Data Mining. Encyclopedia of Data Warehouse, Volume I, by J. Wang, Second Edition. Publisher, Information Science Reference, 2009, ISBN: 978-1-60566-010-3.
[23]
Toon Calders and Bart Goethals. 2002. Mining All Non-derivable Frequent Itemsets. In Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery, PKDD, 2002.
[24]
Jean-Fran, cois Boulicaut, Artur Bykowski and Christophe Rigotti. 2000. Approximation of frequency queries by means of free-sets. In Proceedings of PKDD Intentional Conference on Principles of Data Mining and Knowledge Discovery, 2000,
[25]
Guimei Liu, Jinyan Li and Limsoon Wong. 2008. A new concise representation of frequent itemsets using generators and a positive border. Knowledge and Information Systems, Vol. 17, Issue 1, (2008), pp 35--56, ISSN:0219--1377.
[26]
Marzena Kryszkiewicz. 2001. Concise representation of Frequent patterns based on disjunction-free generators. In Proceedings of IEEE Int. Conf. on Data Mining, 2001.
[27]
Jianyong Wang, Jiawei Han, Ying Lu, and Petre Tzvetkov. 2005. TFP: An efficient algorithm for mining top-k frequent closed itemsets. IEEE Trans Knowl Data Eng (2005) 17, pp 652--664.
[28]
Xifeng Yan, Hong Cheng, Jiawei Han and Dong Xin. 2005. Summarizing itemset patterns: a profile-based approach. In Proceedings of the 2005 ACM SIGKDD international conference on knowledge discovery in databases (KDD'05), Chicago, IL.
[29]
Yang Xiang, Ruoming Jin, David Fuhry and Feodor F. Dragan. 2008. Succinct summarization of transactional databases: an overlapped hyperrectangle scheme. In Proceedings of KDD'08.
[30]
Taneli Mielikäinen. 2004. An Automata Approach to Pattern Collections. In Knowledge Discovery in Inductive Databases, 3rd International Workshop, KDID, 2004.
[31]
Taneli Mielikäinenv. 2004. Implicit Enumeration of Patterns. In Knowledge Discovery in Inductive Databases, 3rd International Workshop, KDID, 2004.
[32]
Chee-yong Chan and Yannis Ioannidis. 1999. An Efficient Bitmap Encoding Scheme for Selection Queries. In Proceedings of the 1999 ACM SIGMOD international conference on management of data, 1999.
[33]
Jilles Vreeken, Matthijs van Leeuwen, Arno Siebes. 2011. Krimp: Mining itemsets that compress. Data Mining and Knowledge Discovery, 2011, 23(1).
[34]
D.W. Cheung, Jiawei Han, V.T. Ng, C.Y. Wong. 1996. Maintenance of discovered association rules in large databases: an incremental updating technique. In Proceedings of the 1996 international conference on data engineering (ICDE'96), New Orleans, LA, 1996.
[35]
Sergey Brin, Rajeev Motwani, Jeffrey Ullman and Shalom Tsur. 1997. Dynamic itemset counting and implication rules for market basket analysis. In Proceedings of the 1997 ACM-SIGMOD international conference on management of data (SIGMOD'97), Tucson, AZ, 1997, pp 255--264.
[36]
D.W. Cheung, Jiawei Han, V.T. Ng, A.W. Fu, Yongjian Fu. 1996. A fast distributed algorithm for mining association rules. In Proceedings of the 1996 international conference on parallel and distributed information systems, Miami Beach, FL, 1996.
[37]
Heungmo Ryangand Unil Yun. 2015. Top-K High Utility Pattern Mining with Effective Threshold Raising Strategies, Knowledge-Based Systems, 76, 109--126.
[38]
Jong Soo Park, Ming-syan Chen and Philip S. Yu. 1995. An effective hash based algorithm for mining association rules. In Proceedings of the 1995 ACM-SIGMOD international conference on management of data(SIGMOD'95), San Jose, CA, 1995.
[39]
Ashok Savasere, Edward Omiecinski and Shamkant Navathe. 1996. An efficient algorithm for mining association rules in large databases. In Proceeding of the 1995 international conference on very large data bases (VLDB'95), Zurich, Switzerland, 1995,
[40]
Jin Soung Yoo and Mark Bow. 2011. Mining top-k closed co-location patterns. In IEEE international conference on spatial data mining and geographical knowledge services (ICSDM), June 2011. •
[41]
Guimei Liu, Hongjun Lu, Wenwu Lou and Jeffrey Xu Yu. 2003. On computing, storing and querying frequent patterns. In Proceedings of the 2003 ACM SIGKDD international conference on knowledge discovery and data mining (KDD'03), Washington, DC, 2003.
[42]
Gösta Grahne and Jianfei Zhu (2003) Efficiently using prefix- trees in mining frequent itemsets. In Proceedings of the ICDM'03 international workshop on frequent itemset mining implementations (FIMI'03), Melbourne, FL, 2003.
[43]
C. Ordonez, E. Omiecinski, L. de Braal, C.A. Santana, N. Ezquerra and J.A. Taboad (2001) Mining constrained association rules to predict heart disease. IEEE International Conf. on Data Mining, ICDM 2001.
[44]
Charu C. Aggarwal. 2014. An Introduction to Frequent Pattern Mining. Chapter 1 of Frequent Pattern Mining, edited by Charu C. Aggarwal and Jiawei. Han, Springer International Publishing, 2014, Printed ISBN 978-3-319-07820-5.
[45]
Stephen Stigler. 2008. Fisher and the 5% level. Chance, Vol. 21. No. 4, Springer New York, 2008, pp 12, ISSN: 0933-2480 (Print) 1867--2280 (Online).
[46]
Jacob Cohen. 1988. Statistical Power Analysis for the Behavioral Sciences. 2nd ed. 1988. ISBN 0-8058-0283-5.
[47]
FIMI. 2009. Frequent Itemset Mining Dataset Repository. Retrieved July 2009 from http://fimi.cs.helsinki.fi/data/
[48]
Unil Yun, Donggyu Kim (2017) Mining of high average-utility itemsets using novel list structure and pruning strategy. Future Generation Comp. Syst. 68 (2017).
[49]
Zhongmei Zhou, Zhaohui Wu, Yi Feng, Zhongmei Zhou, Zhaohui Wu and Yi Feng. 2006. Enhancing Reliability throughout Knowledge Discovery Process. In Proceedings of the Sixth IEEE International Conference on Data Mining - Workshops (ICDMW'06), ICDMW, 2006, pp754--758,
[50]
Tongyuan Wang, Bipin C. Desai. 2009. "Issues in Pattern Mining and their Resolutions", Proceedings of Canadian Conference on Computer Science & Software Engineering, C3S2E 2009, Montreal, Quebec, Canada. ACM International Conference Proceeding Series, ACM 2009, pp17--28. ISBN 978-1-60558-401-0.
[51]
Zaheer Ul-Haq and Jeffry D. Madura. 2015. Computer Applications for Drug Design and Biomolecular Systems, Frontiers in Computational Chemistry: Volume 2, 1st Edition, Nov. 2015. Print Book ISBN: 9781608059799, eBook ISBN: 9781608059782.
[52]
John F. Lucas (1990) Introduction to Abstract Mathematics. Rowman & Littlefield. ISBN 9780912675732.
[53]
Richard A. Brualdi. 2004. Introductory Combinatorics (4th ed.). Pearson Prentice Hall. ISBN 0-13-100119-1.
[54]
Gregory Piatetsky-Shapiro, and Christopher J. Andmatheus. 1994. The interestingness of deviations. In Proceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases (KDD-94). Seattle, WA. 25--36.
[55]
Robert J. Hilderman and Howard J. Hamilton (2003) Measuring the interestingness of discovered knowledge: A principled approach. Intelligent Data Analysis 7(4).
[56]
Pang-ning Tan, Vipin Kumar, and Jaideep Srivastava. 2002. Selecting the right interestingness measure for association patterns. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM Press.
[57]
Liqiang Geng and Howard J. Hamilton (2006) Interestingness measures for data mining: A survey. ACM Computing Surveys (CSUR) 38 (3), 9, 2006
[58]
Kenneth McGarry (2005) A survey of interestingness measures for knowledge discovery. Knowl. Eng. Review 20, 1, 39--61, 2005.
[59]
Philippe Lenca, Patrick Meyer, Benoît Vaillant and Stéphane Lallich. 2004. A multicriteria decision aid for interestingness measure selection. Tech. Rep. LUSSI-TR-2004-01-EN, May 2004. LUSSI Department, GET/ENST, Bretagne, France.
[60]
Miho Ohsaki, Shinya Kitaguchi, Kazuya Okamoto, Hideto Yokoi and Takahira Yamaguchi. 2004. Evaluation of rule interestingness measures with a clinical dataset on hepatitis. In Proceedings of the 8th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD 2004). Pisa, Italy. 362--373.
[61]
Nada Lavrač, Peter Flach and Blaz Zupan. 1999. Rule evaluation measures: A unifying view. In Proceedings of the 9th International Workshop on Inductive Logic Programming (ILP '99). Bled, Slovenia. Springer-Verlag, 174--185.
[62]
Martin Kirchgessner, Vincent Leroy, Sihem Amer-Yahia and Shashwat Mishra. 2016. Testing Interestingness Measures in Practice: A Large-Scale Analysis of Buying Patterns. Computing Research Repository, 2016, Volume abs/1603.04792.
[63]
Fabrice Guillet and Howard J. Hamilton (Eds.). 2007. Quality Measures in Data Mining. Studies in Computational Intelligence, 2007, Volume 43. ISBN 3-540-44911-6.
[64]
M. Padmavalli, K. Sreenivasa Rao (2013) An Efficient Interesting Weighted Association Rule Mining. International Journal of Advanced Research in Computer Science and Software Engineering, Volume 3, Issue 10, October 2013 ISSN: 2277 128X.
[65]
Haoran Zhang, Jianwu Zhang, Xuyang Wei, Xueyan Zhang, Tengfei Zou and Guocai Yang. 2017. A New Frequent Pattern Mining Algorithm with Weighted Multiple Minimum Supports. Intelligent Automation & Soft Computing, 23:4, 605--612
[66]
D. Sujatha and Naveen C. H. (2011) Quantitative Association Rule Mining on Weighted Transactional Data, International Journal of Information and Education Technology, Vol. 1, No. 3, August 2011.
[67]
Bay Vo, Frans Coenen and Bac Le. 2013. A new method for mining Frequent Weighted Itemsets based on WIT-trees, Expert Systems with Applications, Volume 40, Issue 4, March 2013, Pages 1256--1264
[68]
Anshu Zhang, Wenzhong Shi and Geoffrey I. Webb. 2016, Mining significant association rules from uncertain data. (12 January 2016) Data Mining and Knowledge Discovery
[69]
Jerry Chun-Wei Lin, Wensheng Gan, Philippe Fournier-Viger, Tzung-Pei Hong and Han-Chieh Chao. 2017. Mining Weighted Frequent Itemsets without Candidate Generation in Uncertain Databases. International Journal of Information Technology & Decision Making, 2017, Volume 16, Number 06, Page 1549
[70]
Raymond A. Serway, Robert J. Beichner and John W. Jewett, Jr. 2000. Physics for Scientists and Engineers, Saunders College Publishing. ISBN 0-03-022654-6
[71]
Bakshi Rohit Prasad and Sonali Agarwal. 2016. Stream Data Mining: Platforms, Algorithms, Performance Evaluators and Research Trends. International journal of database theory and application, Vol. 9, No. 9 (2016), pp 201--218
[72]
Shikha Mehta Janardan (2017) Concept drift in Streaming Data Classification: Algorithms, Platforms and Issues. Information Technology and Quantitative Management (ITQM 2017), Procedia Computer Science, Volume 122, 2017, Pages 804--811, Elsevier.

Cited By

View all

Index Terms

  1. On the appropriate pattern frequentness measure and pattern generation mode: a critical review

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Other conferences
      IDEAS '19: Proceedings of the 23rd International Database Applications & Engineering Symposium
      June 2019
      364 pages
      ISBN:9781450362498
      DOI:10.1145/3331076
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 10 June 2019

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. data mining
      2. frequentness measure
      3. overfitting
      4. pattern frequency
      5. pattern mining
      6. probability anomaly
      7. selective pattern generation
      8. underfitting

      Qualifiers

      • Research-article

      Conference

      IDEAS 2019

      Acceptance Rates

      Overall Acceptance Rate 74 of 210 submissions, 35%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)1
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 01 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media