skip to main content
research-article

An effective genetic algorithm-based feature selection method for intrusion detection systems

Published: 01 November 2021 Publication History

Abstract

Availability of suitable and validated data is a key issue in multiple domains for implementing machine learning methods. Higher data dimensionality has adverse effects on the learning algorithm's performance. This work aims to design a method that preserves most of the unique information related to the data with minimum number of features. Addressing the feature selection problem in the domain of network security and intrusion detection, this work contributes an enhanced Genetic Algorithm (GA)-based feature selection method, named as GA-based Feature Selection (GbFS), to increase the classifiers’ accuracy. Securing a network from the cyber-attacks is a critical task and needs to be strengthened. Machine learning, due to its proven results, is widely used in developing firewalls and Intrusion Detection Systems (IDSs) to identify new kinds of attacks. Utilizing machine learning algorithms, IDSs are able to detect the intruder by analyzing the network traffic passing through it. This work presents parameter tuning for the GA-based feature selection along with a novel fitness function. The present work develops an enhanced GA-based feature selection method which is tested over three benchmark network traffic datasets, namely, CIRA-CIC-DOHBrw-2020, UNSW-NB15, and Bot-IoT. A comparison is also performed with the standard feature selection methods. Results show that the accuracies improve using GbFS by achieving a maximum accuracy of 99.80%.

References

[1]
N. Afzaliseresht, Y. Miao, S. Michalska, Q. Liu, H Wang, From logs to stories: human-centred data mining for cyber threat intelligence, IEEE Access 8 (2020) 19089–19099.
[2]
I. Ahmad, A. Abdullah, A. Alghamdi, K. Alnfajan, M Hussain, Intrusion detection using feature subset selection based on MLP, Sci. Res. Essays 6 (34) (2011) 6804–6810.
[3]
H. Alazzam, A. Sharieh, K.E Sabri, A feature selection algorithm for intrusion detection system based on pigeon inspired optimizer, Expert Syst. Appl. 148 (2020).
[4]
M. Alloghani, D. Al-Jumeily, A. Hussain, J. Mustafina, T. Baker, A.J Aljaaf, Implementation of machine learning and data mining to improve cybersecurity and limit vulnerabilities to cyber attacks, Nature-Inspired Computation in Data Mining and Machine Learning, 2020, pp. 47–76.
[5]
B.M. Aslahi-Shahri, R. Rahmani, M. Chizari, A. Maralani, M. Eslami, M.J. Golkar, A Ebrahimi, A hybrid method consisting of GA and SVM for intrusion detection system, Neural Comput. Appl. 27 (6) (2016) 1669–1676.
[6]
F Amini, G. Hu, A two-layer feature selection method using genetic algorithm and elastic net, Expert Syst. Appl. 166 (2021).
[7]
V. Bolon-Canedo, N. Sanchez-Marono, A. Alonso-Betanzos, Feature selection and classification in multiple class datasets: an application to KDD Cup 99 dataset, Expert Syst. Appl. 38 (5) (2011) 5947–5957.
[8]
A. Carlin, M. Hammoudeh, O Aldabbas, Intrusion detection and countermeasure of virtual cloud systems-state of the art and current challenges, Int. J. Adv. Comput. Sci. Appl. 6 (6) (2015).
[9]
M. Conti, T. Dargahi, A Dehghantanha, Cyber threat intelligence: challenges and opportunities, Cyber Threat Intelligence, Springer, Cham, 2018, pp. 1–6.
[10]
A.K. Das, S. Das, A. Ghosh, Ensemble feature selection using bi-objective genetic algorithm, Knowl. Based Syst. 123 (2017) 116–127. 2018.
[11]
S Dwivedi, M Vardhan, S. Tripathi, Building an efficient intrusion detection system using grasshopper optimization algorithm for anomaly detection, Clust. Comput. (2021) 1–20,.
[12]
M. Elingiusti, L. Aniello, L. Querzoni, R. Baldoni, PDF-malware detection: a survey and taxonomy of current techniques, Cyber Threat Intelligence (2018),.
[13]
M. Gamal, H. Abbas, R Sadek, Hybrid approach for improving intrusion detection based on deep learning and machine learning techniques, in: Proceedings of the Joint European-US Workshop on Applications of Invariance in Computer Vision, 2020, pp. 225–236.
[14]
H. Gharaee, H. Hosseinvand, A new feature selection IDS based on genetic algorithm and SVM, in: Proceedings of the 8th International Symposium on Telecommunications (IST), 2016, pp. 139–144.
[15]
J. Giraldo, E. Sarkar, A.A. Cardenas, M. Maniatakos, M Kantarcioglu, Security and privacy in cyber-physical systems: a survey of surveys, IEEE Des. Test 34 (4) (2017) 7–17.
[16]
W Guo, C Wu, Z Ding, Q. Zhou, Prediction of surface roughness based on a hybrid feature selection method and long short-term memory network in grinding, Int. J. Adv. Manuf. Technol. 112 (9) (2021) 2853–2871.
[17]
Z Halim, O Ali, G Khan, On the efficient representation of datasets as graphs to mine maximal frequent itemsets, IEEE Trans. Knowl. Data Eng. 4 (33) (2021) 1674–1691.
[18]
Z Halim, M Waqar, M. Tahir, A machine learning-based investigation utilizing the in-text features for the identification of dominant emotion in an email, Knowl. Based Syst. 208 (2020).
[19]
Z Halim, M. Rehan, On identification of driving-induced stress using electroencephalogram signals: a framework based on wearable safety-critical scheme and machine learning, Inf. Fusion 53 (2020) 66–79.
[20]
N.F. Haq, A.R. Onik, M.A.K. Hridoy, M. Rafni, F.M. Shah, D.M Farid, Application of machine learning approaches in intrusion detection system: a survey, IJARAI Int. J. Adv. Res. Artif. Intell. 4 (3) (2015) 9–18.
[21]
T.C. Ho, Network-Based Anomaly Intrusion Detection using Ant Colony Clustering Model and Genetic-Fuzzy Rule Mining Approach, PhD Thesis City University of Hong Kong, 2006.
[22]
PR Kannari, NC Shariff, RL. Biradar, Network intrusion detection using sparse autoencoder with swish-PReLU activation Model, J. Ambient Intell. Hum. Comput. (2021) 1–3,.
[23]
N. Koroniotis, N. Moustafa, E. Sitnikova, B Turnbull, Towards the development of realistic botnet dataset in the internet of things for network forensic analytics: Bot-iot dataset, Future Gener. Comput. Syst. 100 (2019) 779–796.
[24]
F. Kuang, W. Xu, S Zhang, A novel hybrid KPCA and SVM with GA model for intrusion detection, Appl. Soft Comput. 18 (2014) 178–184.
[25]
X Li, P Yi, W Wei, Y Jiang, L. Tian, LNNLS-KH: A feature selection method for network intrusion detection, Secur. Commun. Netw. 2021 (2021),.
[26]
H. Liu, H. Motoda (Eds.), Computational Methods of Feature Selection, CRC Press, 2007.
[27]
X. Liu, J. Tang, Mass classification in mammograms using selected geometry and texture features, and a new SVM-based feature selection method, IEEE Syst. J. 8 (3) (2013) 910–920.
[28]
A Mahindru, AL. Sangal, FSDroid:-a feature selection technique to detect malware from android using machine learning techniques, Multimed. Tools Appl. 14 (2021) 1–53.
[29]
N Maleki, Y Zeinali, ST Niaki, A k-NN method for lung cancer prognosis with the use of a genetic algorithm for feature selection, Expert Syst. Appl. 164 (2021).
[30]
N. Moustafa, J. Slay, UNSW-NB15: a comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set), in: Proceedings of the Military Communications and Information Systems Conference (MilCIS), 2015, pp. 1–6.
[31]
B Nouri-Moghaddam, M Ghazanfari, M. Fathian, A novel multi-objective forest optimization algorithm for wrapper feature selection, Expert Syst. Appl. 175 (2021).
[32]
D.D. Protić, Review of KDD Cup'99, NSL-KDD and Kyoto 2006+ datasets, Vojnoteh. Glas. 66 (3) (2018) 580–596.
[33]
R. Riesco, X. Larriva-Novo, V.A. Villagrá, Cybersecurity threat intelligence knowledge exchange based on blockchain, Telecommun. Syst. 73 (2) (2020) 259–288.
[34]
A. Shalaginov, S. Banin, A. Dehghantanha, K Franke, Machine learning aided static malware analysis: A survey and tutorial, Cyber Threat Intelligence, 2018, pp. 7–45,.
[35]
S.S.S. Sindhu, S. Geetha, A Kannan, Decision tree based light weight intrusion detection using a wrapper approach, Expert Syst. Appl. 39 (1) (2012) 129–141.
[36]
M. Stampar, K. Fertalj, Artificial intelligence in network intrusion detection, in: Proceedings of the 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), 2015, pp. 1318–1323.
[37]
G. Stein, B. Chen, A.S. Wu, K.A. Hua, Decision tree classifier for network intrusion detection with GA-based feature selection, in: Proceedings of the 43rd Annual Southeast Regional Conference, 2, 2005, pp. 136–141.
[38]
I Sumaiya Thaseen, J Saira Banu, K Lavanya, M Rukunuddin Ghalib, K Abhishek, An integrated intrusion detection system using correlation-based attribute selection and artificial neural network, Trans. Emerg. Telecommun. Technol. 32 (2) (2021) e4014.
[39]
M Tahir, A Tubaishat, F Al-Obeidat, B Shah, Z Halim, M. Waqas, A novel binary chaotic genetic algorithm for feature selection and its utility in affective computing and healthcare, Neural Comput. Appl. 18 (2021) 1–22.
[40]
B.A. Tama, M. Comuzzi, K.H. Rhee, TSE-IDS: a two-stage classifier ensemble for intelligent anomaly-based intrusion detection system, IEEE Access 7 (2019) 94497–94507.
[41]
M. Tausif, J. Ferzund, S. Jabbar, R Shahzadi, Towards designing efficient lightweight ciphers for internet of things, KSII Trans. Internet Inf. Syst. 11 (8) (2017).
[42]
H. Tribak, B.L. Delgado-Marquez, P. Rojas, O. Valenzuela, H. Pomares, I. Rojas, Statistical analysis of different artificial intelligent techniques applied to intrusion detection system, in: Proceedings of the 2012 International Conference on Multimedia Computing and Systems, 2012, pp. 434–440.
[43]
S. Tu, M. Waqas, S.U. Rehman, T. Mir, G. Abbas, Z.H. Abbas, Z. Halim, I Ahmad, Reinforcement learning assisted impersonation attack detection in device-to-device communications, IEEE Trans. Veh. Technol. 70 (2) (2021) 1474–1479.
[44]
Al-Obeidat F Uzma, A Tubaishat, B Shah, Z Halim, Gene encoder: A feature selection technique through unsupervised deep learning-based clustering for large gene expression data, Neural Comput. Appl. (2021) 1–23,.
[45]
Z.J. Viharos, K.B. Kis, Á. Fodor, Á.M. Büki, Adaptive, hybrid feature selection (AHFS), Pattern Recognit. 11 (2021).
[46]
R. Von Solms, J Van Niekerk, From information security to cyber security, Comput. Secur. 38 (2013) 97–102.
[47]
J. Wan, M. Waqas, S. Tu, S.M. Hussain, A. Shah, S.U. Rehman, M Hanif, An efficient impersonation attack detection method in fog computing, CMC Comput. Mater. Contin. 68 (1) (2021) 267–281.
[48]
J. Wang, C. Liu, M Zhou, Improved bacterial foraging algorithm for cell formation and product scheduling considering learning and forgetting factors in cellular manufacturing systems, IEEE Syst. J. 14 (2) (2020) 3047–3056.
[49]
Y. Xue, Y. Tang, X. Xu, J. Liang, F Neri, Multi-objective feature selection with missing data in classification, IEEE Trans. Emerg. Top. Comput. Intell. (2021),.
[50]
Y. Xue, B. Xue, M Zhang, Self-adaptive particle swarm optimization for large-scale feature selection in classification, ACM Trans. Knowl. Discov. Data 13 (5) (2019) 1–27.
[51]
B. Xue, M. Zhang, W.N. Browne, X Yao, A survey on evolutionary computation approaches to feature selection, IEEE Trans. Evol. Comput. 20 (4) (2016) 606–626.
[52]
H. Xu, Y. Fu, C. Fang, Q. Cao, J. Su, S. Wei, An improved binary whale optimization algorithm for feature selection of network intrusion detection, in: Proceedings of the IEEE 4th International Symposium on Wireless Systems within the International Conferences on Intelligent Data Acquisition and Advanced Computing Systems, 2018, pp. 10–15.
[53]
M. Yousefi-Azar, V. Varadharajan, L. Hamey, U. Tupakula, Autoencoder-based feature learning for cyber security applications, in: 2017 International joint conference on neural networks, 2017, pp. 3854–3861,.
[54]
F Zhao, Y Xin, K Zhang, X. Niu, Representativeness-based instance selection for intrusion detection, Secur. Commun. Netw. (2021) 1–13,.

Cited By

View all

Index Terms

  1. An effective genetic algorithm-based feature selection method for intrusion detection systems
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Computers and Security
        Computers and Security  Volume 110, Issue C
        Nov 2021
        504 pages

        Publisher

        Elsevier Advanced Technology Publications

        United Kingdom

        Publication History

        Published: 01 November 2021

        Author Tags

        1. Feature selection
        2. Genetic algorithm
        3. Intrusion detection
        4. Machine learning
        5. Data analysis

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 03 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media