skip to main content
10.1145/2339530.2339674acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Model mining for robust feature selection

Published: 12 August 2012 Publication History

Abstract

A common problem with most of the feature selection methods is that they often produce feature sets--models--that are not stable with respect to slight variations in the training data. Different authors tried to improve the feature selection stability using ensemble methods which aggregate different feature sets into a single model. However, the existing ensemble feature selection methods suffer from two main shortcomings: (i) the aggregation treats the features independently and does not account for their interactions, and (ii) a single feature set is returned, nevertheless, in various applications there might be more than one feature sets, potentially redundant, with similar information content. In this work we address these two limitations. We present a general framework in which we mine over different feature models produced from a given dataset in order to extract patterns over the models. We use these patterns to derive more complex feature model aggregation strategies that account for feature interactions, and identify core and distinct feature models. We conduct an extensive experimental evaluation of the proposed framework where we demonstrate its effectiveness over a number of high-dimensional problems from the fields of biology and text-mining.

Supplementary Material

JPG File (311a_t_talk_9.jpg)
MP4 File (311a_t_talk_9.mp4)

References

[1]
T. Abeel et al. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics, 26:392--398, 2010.
[2]
A.-L. Boulesteix and M. Slawski. Stability and aggregation of ranked gene lists. Briefings in Bioinformatics, 10:556--568, 2009.
[3]
D. Burdick, M. Calimlim, and J. Gehrke. Mafia: A maximal frequent itemset algorithm for transactional databases. In ICDE, 2001.
[4]
R. P. DeConde et al. Combining results of microarray experiments: A rank aggregation approach. Stat. Appl. Genet. Molec. Biol., 5(1), 2006.
[5]
R. Duda, P. Hart, and D. Stork. Pattern Classification and Scene Analysis. John Willey and Sons, 2001.
[6]
L. Ein-Dor et al. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. PNAS, 103(15):5923--8, 2006.
[7]
I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Mach. Learn., 46:389--422, March 2002.
[8]
M. A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, University of Waikato, 1998.
[9]
A.-C. Haury, P. Gestraud, and J.-P. Vert. The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. 2011.
[10]
J. P. Ioannidis. Microarrays and molecular research: noise discovery? Lancet, 365(9458):454--5, 2005.
[11]
A. Kalousis, J. Prados, and M. Hilario. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst., 12:95--116, May 2007.
[12]
L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, 1990.
[13]
S. Lin and J. Ding. Integration of ranked lists via cross entropy monte carlo with applications to mrna and microrna studies. Biometrics, 65(1):9--18, 2009.
[14]
S. Loscalzo, L. Yu, and C. Ding. Consensus group stable feature selection. In KDD, 2009.
[15]
M. S. Pepe et al. Selecting differentially expressed genes from microarray experiments. Biometrics, 59(1):133--142, 2003.
[16]
X. Qiu, Y. Xiao, A. Gordon, and A. Yakovlev. Assessing stability of gene selection in microarray data analysis. BMC Bioinformatics, 7(1), 2006.
[17]
M. Robnik-Šikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn., 53:23--69, October 2003.
[18]
Y. Saeys, T. Abeel, and Y. Van de Peer. Robust feature selection using ensemble feature selection techniques. In PKDD, 2008.
[19]
L. Yu, C. Ding, and S. Loscalzo. Stable feature selection via dense feature groups. In KDD, 2008.
[20]
M. J. Zaki. Generating non-redundant association rules. In KDD, 2000.
[21]
H. Zou and T. Hastie. Regularization and variable selection via the elastic net. J R Stat Soc Series B, 67(2):301--320, 2005.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. classification
  2. feature selection
  3. high-dimensional data
  4. model mining
  5. stability

Qualifiers

  • Research-article

Conference

KDD '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)0
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media