skip to main content
research-article

Towards efficient and effective discovery of Markov blankets for feature selection

Published: 01 January 2020 Publication History

Abstract

The Markov blanket (MB), a key concept in a Bayesian network (BN), is essential for large-scale BN structure learning and optimal feature selection. Many MB discovery algorithms that are either efficient or effective have been proposed for addressing high-dimensional data. In this paper, we propose a new algorithm for Efficient and Effective MB discovery, called EEMB. Specifically, given a target feature, the EEMB algorithm discovers the PC (i.e., parents and children) and spouses of the target simultaneously and can distinguish PC from spouses during MB discovery. We compare EEMB with the state-of-the-art MB discovery algorithms using a series of benchmark BNs and real-world datasets. The experiments demonstrate that EEMB is competitive with the fastest MB discovery algorithm in terms of computational efficiency and achieves almost the same MB discovery accuracy as the most accurate of the compared algorithms.

References

[1]
C.F. Aliferis, A. Statnikov, I. Tsamardinos, S. Mani, X.D. Koutsoukos, Local causal and Markov blanket induction for causal discovery and feature selection for classification part i: algorithms and empirical evaluation, J. Mach. Learn. Res. 11 (Jan) (2010) 171–234.
[2]
C.F. Aliferis, A. Statnikov, I. Tsamardinos, S. Mani, X.D. Koutsoukos, Local causal and Markov blanket induction for causal discovery and feature selection for classification part ii: analysis and extensions, J. Mach. Learn. Res. 11 (Jan) (2010) 235–284.
[3]
C.F. Aliferis, I. Tsamardinos, A. Statnikov, Hiton: a novel Markov blanket algorithm for optimal variable selection, Proceedings of the AMIA Annual Symposium Proceedings, 2003, American Medical Informatics Association, 2003, p. 21.
[4]
I.A. Beinlich, H.J. Suermondt, R.M. Chavez, G.F. Cooper, The alarm monitoring system: A case study with two probabilistic inference techniques for belief networks, Proceedings of the AIME, Springer, 1989, pp. 247–256.
[5]
J. Binder, D. Koller, S. Russell, K. Kanazawa, Adaptive probabilistic networks with hidden variables, Mach. Learn. 29 (2–3) (1997) 213–244.
[6]
V. Bolón-Canedo, D. Rego-Fernández, D. Peteiro-Barral, A. Alonso-Betanzos, B. Guijarro-Berdiñas, N. Sánchez-Maroño, On the scalability of feature selection methods on high-dimensional data, Knowl. Inf. Syst. (2018) 1–48.
[7]
A. Dawid, R. Cowell, S. Lauritzen, D. Spiegelhalter, Probabilistic Networks and Expert Systems, Springer-Verlag, 1999.
[8]
S.R. De Morais, A. Aussem, A novel scalable and data efficient feature subset selection algorithm, Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2008, pp. 298–312.
[9]
J. Demšar, Statistical comparisons of classifiers over multiple data sets, J. Mach. Learn. Res. 7 (Jan) (2006) 1–30.
[10]
D. Dua, K.T. Efi, UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences, 2017.
[11]
S. Fu, M.C. Desmarais, Fast Markov blanket discovery algorithm via local learning within single pass, Proceedings of the Conference of the Canadian Society for Computational Studies of Intelligence, Springer, 2008, pp. 96–107.
[12]
T. Gao, Q. Ji, Efficient Markov blanket discovery and its application, IEEE Trans. Cybern. 47 (5) (2017) 1169–1179.
[13]
B. Hitt, P. Levine, Multiple high-resolution serum proteomic features for ovarian cancer detection, 2006, US Patent App. 11/093,018.
[14]
D. Koller, M. Sahami, Toward optimal feature selection, Technical Report, Stanford InfoLab, 1996.
[15]
D. Margaritis, S. Thrun, Bayesian network induction via local neighborhoods, Proceedings of the Advances in Neural Information Processing Systems, 2000, pp. 505–511.
[16]
J. Pearl, Morgan Kaufmann series in representation and reasoning, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, San Mateo, CA, US, 1988.
[17]
J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Elsevier, 2014.
[18]
J.M. Pena, R. Nilsson, J. Björkegren, J. Tegnér, Towards scalable and data efficient learning of Markov boundaries, Int. J. Approx. Reason. 45 (2) (2007) 211–232.
[19]
P. Spirtes, C.N. Glymour, R. Scheines, Causation, Prediction, and Search, MIT press, 2000.
[20]
A. Statnikov, I. Tsamardinos, C. Aliferis, An algorithm for generation of large Bayesian networks, Technical Report DSL-03-01, Department of Biomedical Informatics, Discovery Systems Laboratory, Vanderbilt University, 2003.
[21]
I. Tsamardinos, C.F. Aliferis, A. Statnikov, Time and sample efficient discovery of Markov blankets and direct causal relations, Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2003, pp. 673–678.
[22]
I. Tsamardinos, C.F. Aliferis, A.R. Statnikov, E. Statnikov, Algorithms for large scale Markov blanket discovery., Proceedings of the FLAIRS Conference, 2, 2003, pp. 376–380.
[23]
I. Tsamardinos, L.E. Brown, C.F. Aliferis, The max-min hill-climbing Bayesian network structure learning algorithm, Mach. Learn. 65 (1) (2006) 31–78.
[24]
Y. Wang, J.G. Klijn, Y. Zhang, A.M. Sieuwerts, M.P. Look, F. Yang, D. Talantov, M. Timmermans, M.E. Meijer-van Gelder, J. Yu, et al., Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer, The Lancet 365 (9460) (2005) 671–679.
[25]
X. Wu, K. Yu, W. Ding, H. Wang, X. Zhu, Online feature selection with streaming features, IEEE Trans. Pattern Anal. Mach. Intell. 35 (5) (2013) 1178–1192.
[26]
X. Xue, M. Yao, Z. Wu, A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm, Knowl. Inf. Syst. 57 (2) (2018) 389–412.
[27]
S. Yaramakala, D. Margaritis, Speculative markov blanket discovery for optimal feature selection, Proceedings of the Fifth IEEE International Conference on Data Mining, IEEE, 2005, p. 4.
[28]
K. Yu, L. Liu, J. Li, A unified view of causal and non-causal feature selection, (2018). arXiv preprint arXiv:1802.05844.
[29]
K. Yu, L. Liu, J. Li, W. Ding, T. Le, Multi-source causal feature selection, IEEE Trans. Pattern Anal. Mach. Intell. (2019),.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Information Sciences: an International Journal
Information Sciences: an International Journal  Volume 509, Issue C
Jan 2020
530 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 January 2020

Author Tags

  1. Markov blanket
  2. Bayesian network
  3. Feature selection

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 29 Jan 2025

Other Metrics

Citations

Cited By

View all

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media