skip to main content
research-article

Conditional discriminative pattern mining

Published: 01 January 2017 Publication History

Abstract

Discriminative pattern mining is used to discover a set of significant patterns that occur with disproportionate frequencies in different class-labeled data sets. Although there are many algorithms that have been proposed, the redundancy issue that the discriminative power of many patterns mainly derives from their sub-patterns has not been resolved yet. In this paper, we consider a novel notion dubbed conditional discriminative pattern to address this issue. To mine conditional discriminative patterns, we propose an effective algorithm called CDPM (Conditional Discriminative Patterns Mining) to generate a set of non-redundant discriminative patterns. Experimental results on real data sets demonstrate that CDPM has very good performance on removing redundant patterns that are derived from significant sub-patterns so as to generate a concise set of meaningful discriminative patterns.

References

[1]
R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: Proceedings of the Twentieth International Conference on Very Large Data Bases, Morgan Kaufmann, Santiago, Chile, 1994, pp. 487-499.
[2]
A. Agresti, John Wiley & Sons, 2013.
[3]
P.J. Azevedo, Rules for contrast sets, Intell. Data Anal., 14 (2010) 623-640.
[4]
S.D. Bay, M.J. Pazzani, Detecting change in categorical data: mining contrast sets, in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, USA, 1999, pp. 302-306.
[5]
S.D. Bay, M.J. Pazzani, Detecting group differences: mining contrast sets, Data Min. Knowl. Discov., 5 (2001) 213-246.
[6]
M. Boley, H. Grosskreutz, Non-redundant subgroup discovery using a closure system, in: Machine Learning and Knowledge Discovery in Databases, 5781, Springer, Heidelberg, Germany, 2009, pp. 179-194.
[7]
M. Boley, C. Lucchese, D. Paurat, T. Gärtner, Direct local pattern sampling by efficient two-step random procedures, in: Proceedings of the Seventeenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, USA, 2011, pp. 582-590.
[8]
C. Carmona, V. Ruiz-Rodado, M. del Jesus, A. Weber, M. Grootveld, P. González, D. Elizondo, A fuzzy genetic programming-based algorithm for subgroup discovery and the application to one problem of pathogenesis of acute sore throat conditions in humans, Inf. Sci., 298 (2015) 180-197.
[9]
H. Cheng, X. Yan, J. Han, C. Hsu, Discriminative frequent pattern analysis for effective classification, in: Proceedings of the Twenty Third International Conference on Data Engineering, IEEE, Washington, DC., 2007, pp. 716-725.
[10]
H. Cheng, X. Yan, J. Han, P.S. Yu, Direct discriminative pattern mining for effective classification, in: Proceedings of the Twenty Fourth International Conference on Data Engineering, IEEE, Washington, DC., 2008, pp. 169-178.
[11]
G. Cong, K.-L. Tan, A.K. Tung, X. Xu, Mining top-k covering rule groups for gene expression data, in: Proceedings of the 2005 ACM SIGMOD international conference on Management of Data, ACM, Baltimore, MD, 2005, pp. 670-681.
[12]
L. De Raedt, A. Zimmermann, Constraint-based pattern set mining, in: Proceedings of the Seventh SIAM International Conference on Data Mining, SIAM, Philadelphia, USA, 2007, pp. 237-248.
[13]
G. Dong, J. Li, Efficient mining of emerging patterns: discovering trends and differences, in: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, USA, 1999, pp. 43-52.
[14]
G. Fang, G. Pandey, W. Wang, M. Gupta, M. Steinbach, Mining low-support discriminative patterns from dense and high-dimensional data, IEEE Trans. Knowl. Data Eng., 24 (2012) 279-294.
[15]
M. Lichman, School of Information and Computer Science, University of California, Irvine, CA, 2013.
[16]
D. Gamberger, N. Lavrac, Expert-guided subgroup discovery: methodology and application, J. Artif. Intell. Res., 17 (2002) 501-527.
[17]
G.C. Garriga, P. Kralj, N. Lavrač, Closed sets for labeled data, J. Mach. Learn. Res., 9 (2008) 559-580.
[18]
H. Gong, Z. He, Permutation methods for testing the significance of phosphorylation motifs, Stat. Interface, 5 (2012) 61-73.
[19]
P.I. Good, Permutation, Parametric and Bootstrap Tests of Hypotheses, Springer, New York.
[20]
H. Grosskreutz, D. Paurat, Fast Discovery of Relevant Subgroups Using a Reduced Search Space, Fraunhofer Institute IAIS, 2011.
[21]
H. Großkreutz, D. Paurat, S. Rüping, An enhanced relevance criterion for more concise supervised pattern discovery, in: Proceedings of the Eighteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, Beijing, China, 2012, pp. 1442-1450.
[22]
T. Guns, S. Nijssen, L. De Raedt, K-pattern set mining under constraints, IEEE Trans. Knowl. Data Eng., 25 (2013) 402-418.
[23]
J. Han, J. Pei, Y. Yin, R. Mao, Mining frequent patterns without candidate generation: a frequent-pattern tree approach, Data Min. Knowl. Discov., 8 (2004) 53-87.
[24]
F. Herrera, C.J. Carmona, P. González, M.J.d. Jesus, An overview on subgroup discovery: foundations and applications, Knowl. Inf. Syst., 29 (2011) 495-525.
[25]
Y. Kameya, T. Sato, Rp-growth: top-k mining of relevant patterns with minimum support raising, in: Proceedings of the Twelfth SIAM International Conference on Data Mining, SIAM, Austin, Texas, USA, 2012, pp. 816-827.
[26]
A. Knobbe, B. Crémilleux, J. Fürnkranz, M. Scholz, From local patterns to global models: the Lego approach to data mining, in: Proceedings of the ECML PKDD 2008 Workshop, Antwerp, Belgium, 2008, pp. 1-16.
[27]
N. Lavrač, D. Gamberger, Relevancy in constraint-based subgroup discovery, in: Proceedings of the European Workshop on Inductive Databases and Constraint Based Mining, 3848, Springer, 2004, pp. 243-266.
[28]
N. Lavrač, B. Kavšek, P. Flach, L. Todorovski, Subgroup discovery with CN2-SD, J. Mach. Learn. Res., 5 (2004) 153-188.
[29]
J. Li, G. Liu, L. Wong, Mining statistically important equivalence classes and delta-discriminative emerging patterns, in: Proceedings of the Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, New York, USA, 2007, pp. 430-439.
[30]
J. Li, J. Liu, H. Toivonen, K. Satou, Y. Sun, B. Sun, Discovering statistically non-redundant subgroups, Knowl. Based Syst., 67 (2014) 315-327.
[31]
W. Li, J. Han, J. Pei, CMAR: accurate and efficient classification based on multiple class-association rules, in: Proceedings of the 2001 IEEE International Conference on Data Mining, IEEE Computer Society, Los Alamitos, USA, 2001, pp. 369-376.
[32]
H. Liu, Y.C. Yang, Z. Chen, Y. Zheng, A tree-based contrast set mining approach to detecting group differences, INFORMS J. Comput., 26 (2014) 208-221.
[33]
X. Liu, J. Wu, H. Gong, S. Deng, Z. He, Mining conditional phosphorylation motifs, IEEE/ACM Trans. Comput. Biol. Bioinf., 11 (2014) 915-927.
[34]
X. Liu, J. Wu, F. Gu, J. Wang, Z. He, Discriminative pattern mining and its applications in bioinformatics, Brief. Bioinf., 16 (2015) 884-900.
[35]
L. Ma, T.L. Assimes, N.B. Asadi, C. Iribarren, T. Quertermous, W.H. Wong, An almost exhaustive search-based sequential permutation method for detecting epistasis in disease association studies, Genet. Epidemiol., 34 (2010) 434-443.
[36]
P.K. Novak, N. Lavrač, G.I. Webb, Supervised descriptive rule discovery: a unifying survey of contrast set, emerging pattern and subgroup mining, J. Mach. Learn. Res., 10 (2009) 377-403.
[37]
T. Pang-Ning, M. Steinbach, V. Kumar, Addison-Wesley, 2006.
[38]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, E. Duchesnay, Scikit-learn: machine learning in python, J. Mach. Learn. Res., 12 (2011) 2825-2830.
[39]
K. Ramamohanarao, J. Bailey, Discovery of emerging patterns and their use in classification, in: Proceedings of the Sixteenth Australian Conference on Advances in Artificial Intelligence, Springer, Heidelberg, 2003, pp. 1-12.
[40]
D. Schwartz, S.P. Gygi, An iterative statistical approach to the identification of protein phosphorylation motifs from large-scale data sets, Nat. Biotechnol., 23 (2005) 1391-1398.
[41]
P. Terlecki, K. Walczak, Jumping emerging patterns with negation in transaction databases-classification and discovery, Inf. Sci., 177 (2007) 5675-5690.
[42]
M. van Leeuwen, A. Knobbe, Diverse subgroup set discovery, Data Min. Knowl. Discov., 25 (2012) 208-242.
[43]
T. Wang, A.N. Kettenbach, S.A. Gerber, C. Bailey-Kellogg, MMFPh: a maximal motif finder for phosphoproteomics datasets, Bioinformatics, 28 (2012) 1562-1570.
[44]
G.I. Webb, Discovering significant patterns, Mach. Learn., 68 (2007) 1-33.
[45]
S. Wrobel, An algorithm for multi-relational discovery of subgroups, in: Principles of Data Mining and Knowledge Discovery, Springer, Heidelberg, Germany, 1997, pp. 78-87.
[46]
X. Yin, J. Han, CPAR: classification based on predictive association rules, in: Proceedings of the Third SIAM International Conference on Data Mining, SIAM, Philadelphia, USA, 2003, pp. 331-335.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 375, Issue C
January 2017
314 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 January 2017

Author Tags

  1. Contrast sets
  2. Data mining
  3. Discriminative pattern
  4. Emerging pattern
  5. Subgroup discovery

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 18 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media