skip to main content
article

A Tree-Based Contrast Set-Mining Approach to Detecting Group Differences

Published: 01 May 2014 Publication History

Abstract

<P>Understanding differences between groups in a data set is one of the fundamental tasks in data analysis. As relevant applications accumulate, data-mining methods have been developed to specifically address the problem of group difference detection. Contrast set mining discovers group differences in the form of conjunction of feature-value pairs or items. In this paper, we incorporate absolute difference, relative difference, and statistical significance in our definition of a group difference, and develop a novel method named DIFF that uses the prefix-tree structure to compress the search space, follows a tree traversal procedure to discover the complete set of significant group differences, and employs efficient pruning strategies to expedite the search process. We conducted comprehensive experiments to compare our method with existing methods on completeness of results, pruning efficiency, and computational efficiency. The experiments demonstrate that our method guarantees completeness of results and achieves higher pruning efficiency and computational efficiency compared to STUCCO. In addition, our definition of group difference is more general than STUCCO. Our method is more effective than traditional approaches, such as classification trees, in discovering the complete set of significant group differences.</P>

References

[1]
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. Proc. 20th VLDB Conf. Santiago, Chile, 487-499.
[2]
Alqadah F, Bhatnagar R (2009) Discovering substantial distinctions among incremental bi-clusters. Proc. 9th SIAM Internat. Conf. Data Mining, Nevada, 197-208.
[3]
Bay SD, Pazzani MJ (1999) Detecting change in categorical data: Mining contrast sets. ACM SIGKDD Internat. Conf. Knowledge Discovery and Data Mining, San Diego, 302-306.
[4]
Bay SD, Pazzani MJ (2001) Detecting group differences: Mining contrast sets. Data Mining Knowledge Discovery 5(3):213-246.
[5]
Breiman L (2001) Random Forests. Machine Learn. 45(1):5-32.
[6]
Breiman L, Friedman J, Stone CJ, Olshen RA (1984) Classification and Regression Trees (Chapman and Hall, New York).
[7]
Darity WA (2000) Racial and ethnic economic inequality: The international record. Amer. Econom. Rev. 90(2):308-311.
[8]
Deng K, Zaïane OR (2009) Contrasting sequence groups by emerging sequences. Lecture Notes in Computer Science, Vol. 5808/2009 (Springer, Berlin), 377-384.
[9]
Dong G, Li J (1999) Efficient mining of emerging patterns: Discovering trends and differences. Proc. Fifth ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 43-52.
[10]
Fan H, Ramamohanara K (2003) A Bayesian approach to use emerging patterns for classification. Proc. 14th Australasian Database Conf. (ADC-03), Adelaide, Australia, 39-48.
[11]
Fan H, Fan M, Ramamohanarao K, Liu M (2006) Further improving emerging pattern based classifiers via bagging. Proc. 10th Pacific-Asia Conf. Knowledge Discovery Data Mining (PAKDD-06), Singapore, 91-96.
[12]
Frank A, Asuncion A (2010) UCI Machine Learning Repository. School of Information and Computer Science, University of California, Irvine. Accessed August 2011, http://archive.ics.uci.edu/ml.
[13]
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining Knowledge Discovery 8(1):53-87.
[14]
Hilderman RJ, Peckham T (2007) Statistical methodologies for mining potentially interesting contrast sets. Stud. Comput. Intelligence 43:153-177.
[15]
Kobylinski L, Walczak K (2011) Efficient mining of jumping emerging patterns with occurrence counts for classification. Transactions on Rough Sets XIII. Lecture Notes in Computer Science, Vol. 6499/2011 (Springer, Berlin), 73-88.
[16]
Kralj P, Lavrac N, Gamberger D, Krsta¿ic A (2007a) Contrast set mining for distinguishing between similar diseases. Proc. 11th Conf. Artificial Intelligence in Medicine (AIME-07) (Springer, Berlin), 109-118.
[17]
Kralj P, Lavrac N, Gamberger D, Krstacic A (2007b) Contrast set mining through subgroup discovery applied to brain ischaemia data. Proc. 11th Pacific-Asia Conf. Adv. Knowledge Discovery and Data Mining: (PAKDD-07) (Springer, Berlin), 579-586.
[18]
Lavrac N, Kavsek B, Flach PA, Todorovski L (2004) Subgroup discovery with CN2-SD. J. Machine Learn. Res. 5:153-188.
[19]
Liu B, Hsu W, Ma Y (2001) Discovering the set of fundamental rule changes. Proc. 7th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (KDD-01) (ACM, New York), 335-340.
[20]
Liu B, Hsu W, Han H-S, Xia Y (2000) Mining changes for real-life applications. Proc. 2nd Internat. Conf. Data Warehousing Knowledge Discovery (DaWaK-2000) (Springer, Berlin), 337-346.
[21]
Loekito E, Bailey J (2006) Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams. Proc. 12th ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 307-316.
[22]
Loekito E, Bailey J (2008) Mining influential attributes that capture class and group contrast behavior. Proc. 17th ACM Conf. Inform. Knowledge Management, Napa Valley, CA, 971-980.
[23]
Minaei-Bidgoli B, Tan PN, Punch WF (2004) Mining interesting contrast rules for a web-based educational system. Internat. Conf. Machine Learn. Appl. (IEEE Computer Society, Louisville, KY), 1-8.
[24]
Nazeri Z, Barbara D, De Jong K, Donohue G, Sherry L (2008) Contrast-Set Mining of Aircraft Accidents and Incidents (Springer-Verlag, Berlin, Heidelberg).
[25]
Quinlan JR (1993) C4.5: Programs for Machine Learning (Morgan Kaufman Publishers, San Francisco).
[26]
Ramamohanarao K (2010) Contrast pattern mining and its application for building robust classifiers. Pfahringer B, Holmes G, Hoffmann A, eds. Discovery Science. Lecture Notes in Computer Science, Vol. 6332/2010 (Springer, Berlin), 380.
[27]
Ramamohanarao K, Bailey J, Fan H (2005) Efficient mining of contrast patterns and their applications to classification. Proc. Third Internat. Conf. Intelligent Sensing Inform. Processing (IEEE Computer Society, Washington, DC), 39-47.
[28]
Ruggles S (1997) The rise of divorce and separation in the United States, 1880-1990. Demography 34(4):455-466.
[29]
Satsangi A, Zaiane OR (2007) Contrasting the contrast sets: An alternative approach. 11th Internat. Database Engrg. Appl. Sympos., Alberta, Canada, 114-119.
[30]
Siu KKW, Butler SM, Beveridge T, Gillam JE, Hall CJ, Kaye AH, Lewis RA, et al. (2005) Identifying markers of pathology in SAXS data of malignant tissues of the brain. Nuclear Instruments Methods Phys. Res. Sect. A 548(1-2):140-146.
[31]
Song HS, Kim JK, Kim SH (2001) Mining the change of customer behavior in an internet shopping mall. Expert Systems Appl. 21(3):157-168.
[32]
Wang K, Zhou S, Fu AW-C, Yu JX (2003) Mining changes of classification by correspondence tracing. Proc. 3rd SIAM Internat. Conf. Data Mining (SDM-03), San Francisco, 95-106.
[33]
Webb GI (2000) Efficient search for association rules. Sixth ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, New York, 99-107.
[34]
Webb GI (2001) Magnum Opus version 1.3. Computer software, Distributed by Rulequest Research, http://www.rulequest.com.
[35]
Webb GI, Butler SM, Newlands D (2003) On detecting differences between groups. Proc. ACM SIGKDD Internat. Conf. Knowledge Discovery Data Mining (ACM, New York), 256-265.
[36]
Wong T, Tseng K-L (2005) Mining negative contrast sets from data with discrete attributes. Expert Systems Appl. 29(2):401-407.

Cited By

View all
  1. A Tree-Based Contrast Set-Mining Approach to Detecting Group Differences

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image INFORMS Journal on Computing
      INFORMS Journal on Computing  Volume 26, Issue 2
      May 2014
      215 pages

      Publisher

      INFORMS

      Linthicum, MD, United States

      Publication History

      Published: 01 May 2014
      Accepted: 01 March 2013
      Received: 01 August 2010

      Author Tags

      1. contrast set mining
      2. data mining
      3. group difference detection

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 29 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all

      View Options

      View options

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media