CN113381996B - C & C communication attack detection method based on machine learning - Google Patents

C & C communication attack detection method based on machine learning Download PDF

Info

Publication number
CN113381996B
CN113381996B CN202110637965.1A CN202110637965A CN113381996B CN 113381996 B CN113381996 B CN 113381996B CN 202110637965 A CN202110637965 A CN 202110637965A CN 113381996 B CN113381996 B CN 113381996B
Authority
CN
China
Prior art keywords
flow
machine learning
communication
packet
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110637965.1A
Other languages
Chinese (zh)
Other versions
CN113381996A (en
Inventor
黄丽荣
陈耿生
蔡悦贞
戴宏鹏
黄嘉诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Fufu Information Technology Co Ltd
Original Assignee
China Telecom Fufu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Fufu Information Technology Co Ltd filed Critical China Telecom Fufu Information Technology Co Ltd
Priority to CN202110637965.1A priority Critical patent/CN113381996B/en
Publication of CN113381996A publication Critical patent/CN113381996A/en
Application granted granted Critical
Publication of CN113381996B publication Critical patent/CN113381996B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a machine learning-based C & C communication attack detection method, which comprises the following steps: obtaining continuous downlink flow packets and filtering the flow packets so that the distribution of the length of the flow packets is normal distribution, and performing session aggregation on the flow packets according to specified conditions; extracting session flow characteristics by utilizing random cluster sampling and Apriori algorithm; and performing similarity calculation on the aggregated traffic context data by combining the sequence similarity detection with the Longest Common Subsequence (LCS) by adopting an edit distance. The invention can detect undiscovered malicious software communication without relying on a feature library; when a large number of attack flow samples are detected, the detection time complexity is low, and the detection time is shorter.

Description

C & C communication attack detection method based on machine learning
Technical Field
The invention relates to the technical field of communication security, in particular to a C & C communication attack detection method based on machine learning.
Background
At present, three aspects of C & C communication detection are respectively statistical feature detection based on flow packets, feature code detection based on flow payload and supervised machine learning method detection based on existing malicious software.
The prior art detects certain defects aiming at C & C communication attack. First, existing methods have certain drawbacks for detection of unpublished or undiscovered malware. Secondly, the more dependent feature library of the detection effect of the existing method is not comprehensive. Finally, as the network scene used by the normal user is diversified, the situation that the traffic attribute characteristics of the normal user are similar to those of the malicious traffic is easily caused, for example, the traffic is judged according to the size and the arrival time interval of the data packet, and the communication process of the existing part of chat software is likely to have similar characteristics with the malicious software. Therefore, the conventional method has a certain limitation on the detection accuracy and detection effect of C & C communication. The method has certain defects in the aspect of C & C communication detection. Based on the statistical feature detection of the traffic packet, as the communication of the malicious software changes along with the change of network congestion, and as the current normal network application scene is more and more, the statistical features of the normal user traffic and the malicious user traffic are easy to be similar, so that the false alarm rate is higher. Based on the feature code detection in the traffic payload, the method has a higher detection effect on the existing known malicious software, but detection failure can be caused if the mutation feature code of the malicious software changes. The existing malicious software-based supervised machine learning method detection is mainly based on the flow characteristics of the existing malicious software for supervised learning, and the detection effect is more dependent on the coverage breadth of a training set of machine learning and the scientificity of the learning method.
Disclosure of Invention
The invention aims to provide a C & C communication attack detection method based on machine learning.
The technical scheme adopted by the invention is as follows:
the C & C communication attack detection method based on machine learning comprises the following steps:
step 1, filtering a flow packet: the continuous downlink flow packets are obtained and filtered, so that the distribution of the length of the flow packets is normal,
step 2, traffic session aggregation: performing session aggregation on the traffic packets according to specified conditions;
step 3, extracting session flow characteristics by utilizing random cluster sampling and Apriori algorithm;
and 4, performing similarity calculation on the aggregated flow context data by combining the sequence similarity detection with the Longest Common Subsequence (LCS) by adopting an edit distance.
And step 5, judging whether the C & C communication is abnormal according to whether the context similarity of the downlink flow of the session exceeds a set value.
Further, as a preferred embodiment, step 1 sets a filtering threshold according to the positive too much distribution of the traffic packet length, filters a portion of the uncorrelated traffic,
further, as a preferred embodiment, step 1 calculates a packet length critical value of the small-flow packet by setting a packet filtering rate, and the final filtering packet length is determined by adopting a normal distribution estimation and threshold setting mode to perform comprehensive calculation.
Further, as a preferred embodiment, in step 2, session aggregation is performed according to the source address, the source port, the destination address or the destination port.
Further, as a preferred embodiment, when the amount of the processed data in the step 3 is too large, probability sampling is performed by adopting a reservoir sampling algorithm.
Further, as a preferred embodiment, in step 4, the edit distance calculation is performed on the sequence pairs, the sequence pairs with larger distance values are filtered according to the calculation result, and then LCS calculation is performed on the sequence pairs.
According to the technical scheme, the method and the device for detecting the malicious software communication in the network traffic are used for filtering, sampling and aggregating the network traffic, and then detecting the context similarity of the aggregated session traffic data, so as to detect whether the malicious software communication exists. The invention has the following advantages: 1. undiscovered malware traffic may be detected without relying on feature libraries. 2. Unlike existing malware supervised machine learning methods, which mainly perform supervised learning based on flow characteristics of existing malware, the detection effect is more dependent on coverage breadth of a training set of machine learning and scientificity of the learning method. 3. In C & C communication detection, the downlink payload-based similarity detection algorithm has higher accuracy and recall rate relative to the flow packet detection algorithm and payload feature code detection, and has certain advantages in detection time, especially when detecting a large number of attack flow samples, the detection time is lower in complexity and shorter in detection time.
Drawings
The invention is described in further detail below with reference to the drawings and detailed description;
fig. 1 is a flow chart of a method for detecting a C & C communication attack based on machine learning according to the present invention.
Detailed Description
For the purposes, technical solutions and advantages of the embodiments of the present application, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.
As shown in fig. 1, the invention discloses a method for detecting C & C communication attack based on machine learning, which comprises the following steps:
step 1, filtering a flow packet: acquiring a continuous downlink flow packet; at present, the flow in the existing network environment is bigger and bigger, and the downlink flow packet of malicious software is mostly smaller, in order to avoid the filtering of the flow packet caused by the meaningless analysis and detection of the irrelevant flow and the waste of resources, the distribution of the length of the flow packet is normal,
further, as a preferred embodiment, step 1 sets a filtering threshold according to the positive-ethernet distribution of the traffic packet lengths, and filters a portion of the uncorrelated traffic. Specifically, the packet filter rate is set to calculate the packet length critical value of the small flow packet, and the final filter packet length is determined by adopting a normal distribution estimation and set threshold value mode to comprehensively calculate.
Step 2, traffic session aggregation: performing session aggregation on the traffic packets according to specified conditions;
step 3, extracting session flow characteristics by utilizing random cluster sampling and Apriori algorithm;
and 4, performing similarity calculation on the aggregated flow context data by combining the sequence similarity detection with the Longest Common Subsequence (LCS) by adopting an edit distance.
And step 5, judging whether the C & C communication is abnormal according to whether the context similarity of the downlink flow of the session exceeds a set value.
Further, as a preferred embodiment, in step 2, session aggregation is performed according to the source address, the source port, the destination address or the destination port.
Further, as a preferred embodiment, the sampling in step 3 refers to extracting a sample that can represent the population from the population through a certain sampling algorithm. The invention detects the content similarity in the payload of the continuous downlink flow by detecting the features of the extracted samples to predict the overall features, and considers the condition that the same advantages may appear continuity in the actual attack process, so a random cluster sampling algorithm is adopted, and if the processed data volume is too large, a reservoir sampling algorithm can be adopted to sample the washed probability.
Further, as a preferred embodiment, in step 4, the edit distance calculation is performed on the sequence pairs, the sequence pairs with larger distance values are filtered according to the calculation result, and then LCS calculation is performed on the sequence pairs.
Specifically, the detection of the sequence similarity of the downlink traffic packet is mainly based on a combination of a value algorithm for solving the Longest Common Subsequence (LCS) and calculating the edit distance of the two sequences. Wherein LCS is the longest common subsequence, and the similarity of two sequences is determined by determining the length of the largest common subsequence of the two sequences. The longest common subsequence is typically found using a dynamic programming algorithm. Wherein the edit distance, also known as the Levenshtein distance, represents the minimum number of edits required to convert from one string to another, where editing refers to replacing one character in the string with another, or inserting a delete character.
Because the calculation time complexity of the editing distance is low, some irrelevant sequence pairs can be removed firstly, and because the LCS calculation similarity is more accurate, the detection result is more credible.
According to the technical scheme, the method and the device for detecting the malicious software communication in the network traffic are used for filtering, sampling and aggregating the network traffic, and then detecting the context similarity of the aggregated session traffic data, so as to detect whether the malicious software communication exists. The invention has the following advantages: 1. undiscovered malware traffic may be detected without relying on feature libraries. 2. Unlike existing malware supervised machine learning methods, which mainly perform supervised learning based on flow characteristics of existing malware, the detection effect is more dependent on coverage breadth of a training set of machine learning and scientificity of the learning method. 3. In C & C communication detection, the downlink payload-based similarity detection algorithm has higher accuracy and recall rate relative to the flow packet detection algorithm and payload feature code detection, and has certain advantages in detection time, especially when detecting a large number of attack flow samples, the detection time is lower in complexity and shorter in detection time.
It will be apparent that the embodiments described are some, but not all, of the embodiments of the present application. Embodiments and features of embodiments in this application may be combined with each other without conflict. The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the detailed description of the embodiments of the present application is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

Claims (4)

1. The C & C communication attack detection method based on machine learning is characterized by comprising the following steps of: which comprises the following steps:
step 1, filtering a flow packet: the continuous downlink flow packets are obtained and filtered, so that the distribution of the length of the flow packets is normal,
step 2, traffic session aggregation: performing session aggregation on the traffic packets according to specified conditions;
step 3, extracting session flow characteristics by using random cluster sampling and Apriori algorithm;
step 4, calculating the similarity of the aggregated flow context data by adopting a mode of combining the edit distance with the longest public subsequence for sequence similarity detection;
and step 5, judging whether the C & C communication is abnormal according to whether the context similarity of the downlink flow of the session exceeds a set value.
2. The machine learning based C & C communication attack detection method of claim 1, wherein: step 1, setting a filtering threshold according to the positive-Ethernet distribution of the length of the flow packet, and filtering partial irrelevant flow; and calculating a packet length critical value of the small flow packet by setting a packet filtering rate, and determining the final filtering packet length by adopting a normal distribution estimation and set threshold value mode through comprehensive calculation.
3. The machine learning based C & C communication attack detection method of claim 1, wherein: and 2, performing session aggregation according to the source address, the source port, the destination address or the destination port.
4. The machine learning based C & C communication attack detection method of claim 1, wherein: and (3) when the processed data volume is too large in the step (3), probability sampling is carried out by adopting a reservoir sampling algorithm.
CN202110637965.1A 2021-06-08 2021-06-08 C & C communication attack detection method based on machine learning Active CN113381996B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110637965.1A CN113381996B (en) 2021-06-08 2021-06-08 C & C communication attack detection method based on machine learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110637965.1A CN113381996B (en) 2021-06-08 2021-06-08 C & C communication attack detection method based on machine learning

Publications (2)

Publication Number Publication Date
CN113381996A CN113381996A (en) 2021-09-10
CN113381996B true CN113381996B (en) 2023-04-28

Family

ID=77576530

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110637965.1A Active CN113381996B (en) 2021-06-08 2021-06-08 C & C communication attack detection method based on machine learning

Country Status (1)

Country Link
CN (1) CN113381996B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683346A (en) * 2015-03-06 2015-06-03 西安电子科技大学 P2P botnet detection device and method based on flow analysis

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030236995A1 (en) * 2002-06-21 2003-12-25 Fretwell Lyman Jefferson Method and apparatus for facilitating detection of network intrusion
US20100138919A1 (en) * 2006-11-03 2010-06-03 Tao Peng System and process for detecting anomalous network traffic
US9083741B2 (en) * 2011-12-29 2015-07-14 Architecture Technology Corporation Network defense system and framework for detecting and geolocating botnet cyber attacks
CN103297433B (en) * 2013-05-29 2016-03-30 中国科学院计算技术研究所 The HTTP Botnet detection method of data flow Network Based and system
US9870465B1 (en) * 2013-12-04 2018-01-16 Plentyoffish Media Ulc Apparatus, method and article to facilitate automatic detection and removal of fraudulent user information in a network environment
CN103746982B (en) * 2013-12-30 2017-05-31 中国科学院计算技术研究所 A kind of http network condition code automatic generation method and its system
CN106034056B (en) * 2015-03-18 2020-04-24 北京启明星辰信息安全技术有限公司 Method and system for analyzing business safety
US10230739B2 (en) * 2015-06-26 2019-03-12 Board Of Regents, The University Of Texas System System and device for preventing attacks in real-time networked environments
CN106101121B (en) * 2016-06-30 2019-01-22 中国人民解放军防空兵学院 A kind of all-network flow abnormity abstracting method
KR102088299B1 (en) * 2016-11-10 2020-04-23 한국전자통신연구원 Apparatus and method for detecting drdos
CN107665191B (en) * 2017-10-19 2020-08-04 中国人民解放军陆军工程大学 Private protocol message format inference method based on extended prefix tree
CN107733937A (en) * 2017-12-01 2018-02-23 广东奥飞数据科技股份有限公司 A kind of Abnormal network traffic detection method
CN108965248B (en) * 2018-06-04 2021-08-20 上海交通大学 P2P botnet detection system and method based on traffic analysis

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104683346A (en) * 2015-03-06 2015-06-03 西安电子科技大学 P2P botnet detection device and method based on flow analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
牛伟纳 ; 张小松 ; 孙恩博 ; 杨国武 ; 赵凌园 ; .基于流相似性的两阶段P2P僵尸网络检测方法.电子科技大学学报.2017,(06),全文. *
苏欣 ; 张大方 ; 罗章琪 ; 曾彬 ; 黎文伟 ; .基于Command and Control通信信道流量属性聚类的僵尸网络检测方法.电子与信息学报.2012,(08),全文. *

Also Published As

Publication number Publication date
CN113381996A (en) 2021-09-10

Similar Documents

Publication Publication Date Title
CN110519290B (en) Abnormal flow detection method and device and electronic equipment
CN107154950B (en) Method and system for detecting log stream abnormity
CN111935170A (en) Network abnormal flow detection method, device and equipment
CN111355697B (en) Detection method, device, equipment and storage medium for botnet domain name family
CN112887159B (en) Statistical alarm method and device
EP3905084A1 (en) Method and device for detecting malware
CN107666468B (en) Network security detection method and device
CN110798426A (en) Method and system for detecting flood DoS attack behavior and related components
CN113645182B (en) Denial of service attack random forest detection method based on secondary feature screening
CN109660517B (en) Abnormal behavior detection method, device and equipment
CN111654482B (en) Abnormal flow detection method, device, equipment and medium
CN110224852A (en) Network security monitoring method and device based on HTM algorithm
CN111224984B (en) Snort improvement method based on data mining algorithm
CN110086829B (en) Method for detecting abnormal behaviors of Internet of things based on machine learning technology
CN113381996B (en) C & C communication attack detection method based on machine learning
CN113037748A (en) C and C channel hybrid detection method and system
CN112953948A (en) Real-time network transverse worm attack flow detection method and device
CN112235242A (en) C & C channel detection method and system
CN115102758B (en) Method, device, equipment and storage medium for detecting abnormal network flow
CN116405261A (en) Malicious flow detection method, system and storage medium based on deep learning
CN112615713A (en) Detection method and device of hidden channel, readable storage medium and electronic equipment
CN114448699B (en) Data detection method, device, electronic equipment and storage medium
KR20020024508A (en) An Anomaly Detection Method for Network Intrusion Detection
CN115297189B (en) Method and system for reversely analyzing man-machine cooperation fast industrial control protocol
CN111314170B (en) Feature fuzzy P2P protocol identification method based on connection statistical rule analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant