research-article

Model mining for robust feature selection

Authors:

Alexandros KalousisAuthors Info & Claims

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

Pages 913 - 921

https://doi.org/10.1145/2339530.2339674

Published: 12 August 2012 Publication History

Abstract

A common problem with most of the feature selection methods is that they often produce feature sets--models--that are not stable with respect to slight variations in the training data. Different authors tried to improve the feature selection stability using ensemble methods which aggregate different feature sets into a single model. However, the existing ensemble feature selection methods suffer from two main shortcomings: (i) the aggregation treats the features independently and does not account for their interactions, and (ii) a single feature set is returned, nevertheless, in various applications there might be more than one feature sets, potentially redundant, with similar information content. In this work we address these two limitations. We present a general framework in which we mine over different feature models produced from a given dataset in order to extract patterns over the models. We use these patterns to derive more complex feature model aggregation strategies that account for feature interactions, and identify core and distinct feature models. We conduct an extensive experimental evaluation of the proposed framework where we demonstrate its effectiveness over a number of high-dimensional problems from the fields of biology and text-mining.

Supplementary Material

JPG File (311a_t_talk_9.jpg)

Download
10.42 KB

MP4 File (311a_t_talk_9.mp4)

Download
164.94 MB

References

[1]

T. Abeel et al. Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics, 26:392--398, 2010.

Digital Library

[2]

A.-L. Boulesteix and M. Slawski. Stability and aggregation of ranked gene lists. Briefings in Bioinformatics, 10:556--568, 2009.

[3]

D. Burdick, M. Calimlim, and J. Gehrke. Mafia: A maximal frequent itemset algorithm for transactional databases. In ICDE, 2001.

Digital Library

[4]

R. P. DeConde et al. Combining results of microarray experiments: A rank aggregation approach. Stat. Appl. Genet. Molec. Biol., 5(1), 2006.

[5]

R. Duda, P. Hart, and D. Stork. Pattern Classification and Scene Analysis. John Willey and Sons, 2001.

Digital Library

[6]

L. Ein-Dor et al. Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. PNAS, 103(15):5923--8, 2006.

[7]

I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Mach. Learn., 46:389--422, March 2002.

Digital Library

[8]

M. A. Hall. Correlation-based Feature Subset Selection for Machine Learning. PhD thesis, University of Waikato, 1998.

[9]

A.-C. Haury, P. Gestraud, and J.-P. Vert. The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures. 2011.

[10]

J. P. Ioannidis. Microarrays and molecular research: noise discovery? Lancet, 365(9458):454--5, 2005.

[11]

A. Kalousis, J. Prados, and M. Hilario. Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst., 12:95--116, May 2007.

Digital Library

[12]

L. Kaufman and P. J. Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley-Interscience, 1990.

[13]

S. Lin and J. Ding. Integration of ranked lists via cross entropy monte carlo with applications to mrna and microrna studies. Biometrics, 65(1):9--18, 2009.

[14]

S. Loscalzo, L. Yu, and C. Ding. Consensus group stable feature selection. In KDD, 2009.

Digital Library

[15]

M. S. Pepe et al. Selecting differentially expressed genes from microarray experiments. Biometrics, 59(1):133--142, 2003.

[16]

X. Qiu, Y. Xiao, A. Gordon, and A. Yakovlev. Assessing stability of gene selection in microarray data analysis. BMC Bioinformatics, 7(1), 2006.

[17]

M. Robnik-Šikonja and I. Kononenko. Theoretical and empirical analysis of relieff and rrelieff. Mach. Learn., 53:23--69, October 2003.

Digital Library

[18]

Y. Saeys, T. Abeel, and Y. Van de Peer. Robust feature selection using ensemble feature selection techniques. In PKDD, 2008.

Digital Library

[19]

L. Yu, C. Ding, and S. Loscalzo. Stable feature selection via dense feature groups. In KDD, 2008.

Digital Library

[20]

M. J. Zaki. Generating non-redundant association rules. In KDD, 2000.

Digital Library

[21]

H. Zou and T. Hastie. Regularization and variable selection via the elastic net. J R Stat Soc Series B, 67(2):301--320, 2005.

Cited By

Marion RFrénay B(2024)Improving the Feature Selection Stability of the Delta Test in RegressionIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33131295:5(1911-1917)Online publication date: May-2024
https://doi.org/10.1109/TAI.2023.3313129
Bach JBöhm K(2024)Alternative feature selection with user controlInternational Journal of Data Science and Analytics10.1007/s41060-024-00527-8Online publication date: 26-Mar-2024
https://doi.org/10.1007/s41060-024-00527-8
Yang LGuo XChen JWang YMa HLi YYang ZShi Y(2023)Vote-Based Feature Selection Method for Stratigraphic Recognition in Tunnelling Process of Shield MachineChinese Journal of Mechanical Engineering10.1186/s10033-023-00932-336:1Online publication date: 24-Oct-2023
https://doi.org/10.1186/s10033-023-00932-3
Show More Cited By

Index Terms

Model mining for robust feature selection
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms
      1. Feature selection

Recommendations

Stable feature selection via dense feature groups
KDD '08: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Many feature selection algorithms have been proposed in the past focusing on improving classification accuracy. In this work, we point out the importance of stable feature selection for knowledge discovery from high-dimensional data, and identify two ...
Consensus group stable feature selection
KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Stability is an important yet under-addressed issue in feature selection from high-dimensional and small sample data. In this paper, we show that stability of feature selection has a strong dependency on sample size. We propose a novel framework for ...
Ensemble feature selection for high dimensional data: a new method and a comparative study

The curse of dimensionality is based on the fact that high dimensional data is often difficult to work with. A large number of features can increase the noise of the data and thus the error of a learning algorithm. Feature selection is a solution for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining

August 2012

1616 pages

ISBN:9781450314626

DOI:10.1145/2339530

General Chair:
Qiang Yang
Hong Kong University of Science and Technology
,
Program Chairs:
Deepak Agarwal
LinkedIn
,
Jian Pei
Simon Fraser University

Copyright © 2012 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2012

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

KDD '12

Sponsor:

KDD '12: The 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

August 12 - 16, 2012

Beijing, China

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Sponsor:
sigkdd
sigkdd

The 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining

August 3 - 7, 2025

Toronto , ON , Canada

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

23
Total Citations
View Citations
683
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 06 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Marion RFrénay B(2024)Improving the Feature Selection Stability of the Delta Test in RegressionIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.33131295:5(1911-1917)Online publication date: May-2024
https://doi.org/10.1109/TAI.2023.3313129
Bach JBöhm K(2024)Alternative feature selection with user controlInternational Journal of Data Science and Analytics10.1007/s41060-024-00527-8Online publication date: 26-Mar-2024
https://doi.org/10.1007/s41060-024-00527-8
Yang LGuo XChen JWang YMa HLi YYang ZShi Y(2023)Vote-Based Feature Selection Method for Stratigraphic Recognition in Tunnelling Process of Shield MachineChinese Journal of Mechanical Engineering10.1186/s10033-023-00932-336:1Online publication date: 24-Oct-2023
https://doi.org/10.1186/s10033-023-00932-3
Hu HYang DLiu YLi MZheng BWang H(2023)Database-Integrated Machine Learning for Enhanced Performance2023 9th International Conference on Big Data and Information Analytics (BigDIA)10.1109/BigDIA60676.2023.10429411(203-209)Online publication date: 15-Dec-2023
https://doi.org/10.1109/BigDIA60676.2023.10429411
Zheng WChen SFu ZZhu FYan HYang J(2022)Feature Selection Boosted by Unselected FeaturesIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2021.305817233:9(4562-4574)Online publication date: Sep-2022
https://doi.org/10.1109/TNNLS.2021.3058172
Pes B(2019)Ensemble feature selection for high-dimensional data: a stability analysis across multiple domainsNeural Computing and Applications10.1007/s00521-019-04082-3Online publication date: 25-Feb-2019
https://doi.org/10.1007/s00521-019-04082-3
Zhai TWang HKoriche FGao Y(2019)Online Feature Selection by Adaptive Sub-gradient MethodsMachine Learning and Knowledge Discovery in Databases10.1007/978-3-030-10928-8_26(430-446)Online publication date: 23-Jan-2019
https://doi.org/10.1007/978-3-030-10928-8_26
Guo YJi JHuo HFang TLi D(2018)SIP-FS: a novel feature selection for data representationEURASIP Journal on Image and Video Processing10.1186/s13640-018-0252-32018:1Online publication date: 20-Feb-2018
https://doi.org/10.1186/s13640-018-0252-3
Liu ZLi YJi W(2018)Differential Private Ensemble Feature Selection2018 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN.2018.8489308(1-6)Online publication date: Jul-2018
https://doi.org/10.1109/IJCNN.2018.8489308
Sasikala SDevi D(2017)A review of traditional and swarm search based feature selection algorithms for handling data stream classification2017 Third International Conference on Sensing, Signal Processing and Security (ICSSS)10.1109/SSPS.2017.8071650(514-520)Online publication date: May-2017
https://doi.org/10.1109/SSPS.2017.8071650
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents