CN110365603A - A kind of self adaptive network traffic classification method open based on 5G network capabilities - Google Patents

A kind of self adaptive network traffic classification method open based on 5G network capabilities Download PDF

Info

Publication number
CN110365603A
CN110365603A CN201910579744.6A CN201910579744A CN110365603A CN 110365603 A CN110365603 A CN 110365603A CN 201910579744 A CN201910579744 A CN 201910579744A CN 110365603 A CN110365603 A CN 110365603A
Authority
CN
China
Prior art keywords
network
cluster
data
network flow
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910579744.6A
Other languages
Chinese (zh)
Inventor
曲桦
赵季红
都鹏飞
段喆琳
崔若星
徐阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201910579744.6A priority Critical patent/CN110365603A/en
Publication of CN110365603A publication Critical patent/CN110365603A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/022Capturing of monitoring data by sampling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2441Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a kind of self adaptive network traffic classification methods open based on 5G network capabilities, comprising the following steps: 1) constructs overall data, then extract to the feature vector of overall data;2) each feature vector for extracting step 1) is as data sample, initial cluster center is calculated by the known category information of marked data sample to optimize k-means algorithm, several k-means central points are obtained, each k-means central point is utilized to construct initial center point set M;3) k-means cluster is carried out to network flow using initial center point set M, the result that clusters is obtained according to evaluation function using k cluster and k cluster central point;4) number for the marked network flow that clusters is counted, the classification of network flow is carried out according to the number for the marked network flow that clusters, and realizes the self adaptive network traffic classification open based on 5G network capabilities, it is short that this method models the time, Space-time Complexity is low, and has a wide range of application.

Description

A kind of self adaptive network traffic classification method open based on 5G network capabilities
Technical field
The invention belongs to network information fields, are related to a kind of self adaptive network traffic classification open based on 5G network capabilities Method.
Background technique
In recent years, ability is opened the markets, and potentiality are big, and customer demand is vigorous, and communication capacity is open to have become operator with integrated The emphasis of hot spot and 5G network Development that future increases.The 5G stage is that network enables the enabled transformation of business, for network energy The calling degree of power is wider deeper, and the open type of ability and range are more.But the open current standard aspect of 5G ability and industry Business demand level is still in the research and probe stage, and from the point of view of investigating feedback, the open research and development of products work of 5G ability is also not Therefore deeply expansion is highly desirable to carry out relevant research work in terms of to standard aspect and Evolution Strategies, need by right 5G network capabilities is open to carry out deep Study on Development Tactics, and then the smooth evolution for ability platform of promoting business.
Under current complicated network environment, in order to realize effective supervision and control to network flow, to network bandwidth Resource carries out reasonable distribution and guarantees that the safe and reliable transmission of the network information, the research of net flow assorted technology become especially to weigh It wants.At the same time, compared with the network environment of traditional 4G or 3G, occur a large amount of new application in the network of 5G, these The composition of network flow is become more complicated using brought unknown protocol flow.According to statistics, belong in current network The network flow of new application has accounted for the 30% of 60% and unidentified bit number of unidentified data network stream, therefore, into When row net flow assorted, if classifier is not handled these novel unknown protocol network flows, net will seriously affect The whole accuracy rate of network traffic classification.The new application largely occurred is that traditional net flow assorted method brings some skills Art problem needs new technology to improve original classification schemes, adapts to nowadays complicated network environment.
In recent years, net flow assorted method mainly has 4 kinds at present: traffic classification technology based on port is based on depth The traffic classification technology of packet detection (DeepPacket Inspection, DPI), the machine learning based on statistical flow characteristic The traffic classification method of (Machine Learning, ML) and traffic classification technology based on user behavior characteristics.
Traffic classification technology based on port
At the initial stage of internet development, the traffic classes and number in network are all relatively fewer, interconnect network data dispenser Structure IANA tissue is that some common network protocols are assigned with fixed port numbers, therefore classify in early stage to network flow When, the affiliated application protocol type of the flow can be judged by identifying source port number and the destination slogan of data packet.
Traffic classification technology based on deep-packet detection
The payload segment of data packet contains bulk information, and DPI is exactly to be classified using these information.Based on DPI flow Sorting technique is realized according to the condition code of specific protocol or application, by carrying out to the load data in network flow special Code matching is levied, to obtain the classification of flow.
Sorting technique based on DPI although accuracy rate with higher, but there are also some disadvantages: the calculating that consumption is more Resource, weaker to data encryption classification capacity, relatively difficult analysis to load data is extracted and updated to types of applications condition code The infringement to user privacy right can be brought.
Machine learning method based on statistical flow characteristic
In recent years, with the development of artificial intelligence technology, more and more researchers start with machine learning algorithm Solve the problems, such as traffic classification.Traffic classification is solved the problems, such as using machine learning, and main there are two parts: training dataset and machine Learning algorithm.The generation of training dataset marks training firstly the need of using DPI tool, system process monitoring or artificial method Sample obtains sample label, and the feature of data flow is then extracted from network flow, is finally calculated using training set and machine learning Method obtains classifier, that is, trained classifier can be used to classify network flow.
Traffic classification technology based on user behavior characteristics
With emerging in large numbers for stream feature encryption technology, certain limitation is brought to traffic classification is carried out based on statistical flow characteristic Property.In recent years, researcher starts with the different communication behavior mode of host and carries out net flow assorted, and proposes utilization The Host behaviors features such as user's connection mode, connection figure, network connection diameter, analyze network flow, open analysis The new method of net flow assorted.The traffic classification technology of Behavior-based control feature mainly passes through the company of analysis network protocol and application The inherent characteristic in characteristic and behavior pattern is connect, achievees the purpose that classify to different flow.When this method usually models Between it is longer, Space-time Complexity is high, using there is certain limitation.
Summary of the invention
It is an object of the invention to overcome the above-mentioned prior art, provide a kind of open based on 5G network capabilities Self adaptive network traffic classification method, this method modeling time is short, and Space-time Complexity is low, and has a wide range of application.
In order to achieve the above objectives, the self adaptive network traffic classification method open based on 5G network capabilities of the present invention The following steps are included:
1) network incremental data is handled using adaptive sliding window dynamic, by original sliding window data with Increment sliding window data constructs overall data, then extracts to the feature vector of overall data;
2) each feature vector for extracting step 1) is as data sample, wherein a marked network of branch's data sample Stream, the unmarked network flow of another part data sample calculate initial clustering by the known category information of marked data sample Center obtains several k-means central points to optimize k-means algorithm, constructs initial center point set using each k-means central point Close M;
3) k-means cluster is carried out to network flow using initial center point set M, obtains k cluster and k cluster central point, so The result that clusters is obtained according to evaluation function using k cluster and k cluster central point afterwards;
4) statistics clusters the number of marked network flow, and the number of marked network flow is less than default network flow in cluster When threshold value, then this clusters as unknown protocol cluster;When the number for the middle known network stream that clusters is more than or equal to default network flow threshold value, The posterior probability that marked network flow of all categories is then calculated according to maximum a posteriori probability formula, maximum posteriori probability value is corresponding Classification of the classification as the network flow, realize the self adaptive network traffic classification open based on 5G network capabilities.
The concrete operations of step 1) are as follows:
Dynamically network incremental data is handled using Adaptive windowing mouth, by raw window data and increment window Mouth data are respectively X with matrix1=[x1,x2,…,xm] and X2=[xm+1,xm+2,…,xm+r], all data samples are represented by X =[X1,X2], if the mutual information matrix of all data samples is S, the mutual information matrix of raw window data is S1, increase window newly The mutual information matrix of data is S2, then the mutual information matrix S of all data samples are as follows:
Utilize S1Feature decomposition by S1Diagonalization is unit battle array, i.e.,
Then by S2It projects to by H1The space opened, then have
Formula (1) is added with formula (2), is obtained:
It acquiresFeature decomposition, it may be assumed that
It brings formula (5) into formula (4), obtains:
By formula (1) and formula (6), the feature decomposition of the mutual information matrix S of all data samples is obtained, from formula (2):
Wherein, Bi∈Rn×kFor the principal component decision matrix of initial data, Λ1∈Rm×kFor the preceding k eigenvalue cluster of selection At matrix;
S is found out according to formula (5)2Characteristic value Λ2=[μ12,…,μn], feature vector P2=[β12,…,βn] and its Corresponding feature vector acquires the characteristic value of S according to the k characteristic value and feature vector are as follows:
Wherein, m and r is the sample size of historical data and newly-increased data respectively;
The feature vector of S:
P=H1βi(9)。
The concrete operations of step 2) are as follows:
Each feature vector that step 1) is obtained is as data sample, wherein a marked network flow of branch's data sample, The unmarked network flow of another part data sample;
Initial cluster center is calculated by the known category information of own flag data sample to optimize k-means algorithm;And Utilize own token network stream calculation k-means central point, wherein
Wherein, each k-means central point miBy belonging to classification CiMarked network flow f determine, niExpression belongs to classification CiOwn token network stream f number, utilize each k-means central point miConstruct initial center point set M.
The concrete operations of step 3) are as follows:
31) k-means, which is clustered, to be determined to mixed network flow using initial center point set M, obtained in k cluster and k cluster Heart point;
32) according to the k cluster and k cluster central point Calculation Estimation function, the value of evaluation function is obtained, while described in utilization K cluster central point resets set M, obtains set M newly;
33) it calculates in NetFlow characteristic vector collection X with a distance from all central points in the new set M and maximum k A vector point;
34) according to density calculation formula, the maximum vector of density in the distance and maximum k vector point is determined Point, and the maximum vector point of the density is added in the new set M;
35) updating k value is k+1, goes to step 31), until k is greater than
36) when counting each iteration in step 32) evaluation function value, chosen from the value of all evaluation functions minimum Value obtains the corresponding k value of value of minimum evaluation function, then the corresponding result that clusters of the k value is exported.
The concrete operations of step 4) are as follows:
To the C that clustersi, count the total number of own token network stream in clusterWhenLess than default network flow threshold gammaiWhen, By cluster CiFor unknown protocol cluster;WhenValue be more than or equal to default network flow threshold gammaiWhen, then calculate own mark of all categories The posterior probability for remembering network flow counts maximum posterior probability values, and the cluster is determined into the corresponding net of maximum posteriori probability value Network stream type.
The maximum a posteriori probability of own token network stream in clustering are as follows:
nijIndicate the network flow amount for belonging to type j in the marked network flow in cluster i, niIndicate oneself label stream in cluster i Total quantity.
The invention has the following advantages:
The self adaptive network traffic classification method open based on 5G network capabilities of the present invention is when specific operation, first Network incremental data is handled using adaptive sliding window dynamic, obtains the feature vector of overall data, then calculate To several k-means central points, k-means then is carried out to network flow using obtained k-means central point and is clustered, and is utilized Evaluation function is clustered as a result, the last number according to the marked network flow that clusters obtains network flow using posterior probability formula Classification, convenient and simple for operation, accuracy is higher, modeling the time it is short, Space-time Complexity is low, and has a wide range of application.
Detailed description of the invention
Fig. 1 is flow chart of the invention;
Fig. 2 is the schematic diagram of feature extraction in the present invention.
Specific embodiment
The invention will be described in further detail with reference to the accompanying drawing:
Referring to Figure 1 and Figure 2, the self adaptive network traffic classification method packet open based on 5G network capabilities of the present invention Include following steps:
1) network incremental data is handled using adaptive sliding window dynamic, by original sliding window data with Increment sliding window data constructs overall data, then extracts to the feature vector of overall data;
2) each feature vector for extracting step 1) is as data sample, wherein a marked network of branch's data sample Stream, the unmarked network flow of another part data sample calculate initial clustering by the known category information of marked data sample Center obtains several k-means central points to optimize k-means algorithm, constructs initial center point set using each k-means central point Close M;
3) k-means cluster is carried out to network flow using initial center point set M, obtains k cluster and k cluster central point, so The result that clusters is obtained according to evaluation function using k cluster and k cluster central point afterwards;
4) statistics clusters the number of marked network flow, and the number of marked network flow is less than default network flow in cluster When threshold value, then this clusters as unknown protocol cluster;When the number for the middle known network stream that clusters is more than or equal to default network flow threshold value, The posterior probability that marked network flow of all categories is then calculated according to maximum a posteriori probability formula, maximum posteriori probability value is corresponding Classification of the classification as the network flow, realize the self adaptive network traffic classification open based on 5G network capabilities.
The concrete operations of step 1) are as follows:
Most of data are not global linear in 5G network flow, they often obey non-linear point of certain form Cloth rule, and traditional some linear dimension-reduction algorithm such as principal component analysis, drop of the Fisher discriminant analysis for nonlinear data Poor effect is tieed up, algorithm proposed by the present invention carries out dimensionality reduction while keeping efficient, to data, remains the big of initial data Partial information.
In addition, data acquisition equipment endlessly acquires from network flow new in the following 5G application process in real time Data, traditional feature extraction algorithm can not quickly be handled incremental data, if the only merely newly-increased number of processing According to not considering that historical data influences it, algorithm just can not carry out feature extraction from global angle, and extracted data contain Information also will be greatly reduced.
In response to this problem, the present invention combines historical data and newly-increased data, dynamically using Adaptive windowing mouth Network incremental data is handled, is respectively X with matrix by raw window data and increment window data1=[x1,x2,…, xm] and X2=[xm+1,xm+2,…,xm+r], all data samples are represented by X=[X1,X2], if the mutual trust of all data samples Breath matrix is S, and the mutual information matrix of raw window data is S1, the mutual information matrix for increasing window data newly is S2, then all data The mutual information matrix S of sample are as follows:
Utilize S1Feature decomposition by S1Diagonalization is unit battle array, i.e.,
Then by S2It projects to by H1The space opened, then have
Formula (1) is added with formula (2), is obtained:
It acquiresFeature decomposition, it may be assumed that
It brings formula (5) into formula (4), obtains:
By formula (1) and formula (6), the feature decomposition of the mutual information matrix S of all data samples is obtained, from formula (2):
Wherein, Bi∈Rn×kFor the principal component decision matrix of initial data, Λ1∈Rm×kFor the preceding k eigenvalue cluster of selection At matrix;
S is found out according to formula (5)2Characteristic value Λ2=[μ12,…,μn], feature vector P2=[β12,…,βn] and its Corresponding feature vector acquires the characteristic value of S according to the k characteristic value and feature vector are as follows:
Wherein, m and r is the sample size of historical data and newly-increased data respectively;
The feature vector of S:
P=H1βi(9)。
Principal component decision battle array is formed, maps the data into principal component decision battle array, that is, realizes dimensionality reduction, subsequent window weight This multiple process.
It is not suitable for nonlinear data for traditional characteristic extraction algorithm, and is unable to satisfy industrial big data real-time etc. to ask Topic, the invention proposes a kind of real-time characteristic extraction algorithm based on mutual information, pseudo-code of the algorithm is as follows:
Input: raw data set
Output: the data set after dimensionality reduction
1. raw data set is pressed given pace input block buffer, when rate is more than certain value, dynamic increases Buffer area
2. sliding window reads data from the smallest buffer area of number;
3.int id=1;The buffer area/* number be initially 1*/
4.While(buffer[id]!=null) do
5.Read Matrixi;
6.if (Matrixi.Id==1) then
7.MIandEigDesposition(Matrixi);
8.unitedMatrix=UnitedMatrix (Matrixi);
9.Output Matrixi*eigVecMatrix
10.else
11.compute MIMatrix;
12.projection MIMatrix on unitedMatrix
13.proMatrix=PorjectMatrix (MIMatrix);
14.MIandEigDesposition(proMatrix);
15.Output MIMatrix*eigVecMatrix
16.end if
17.id=id+1;
18.end while
Wherein, Matrix is the data matrix in window, is realized with two-dimensional array, and UnitedMatrix () function is used to ask Unitization matrix is solved, and ProjectMatrix () function is used to seek the matrix after projection;
By scanning each window one by one, first judge whether current window is first window, if it is, finding out current The mutual information matrix of window, then carries out feature decomposition, selects principal component decision matrix, original matrix is then mapped to decision On matrix, dimensionality reduction is realized;Otherwise, then the mutual information matrix for finding out this window is projected in the unitization square of last window In battle array, characteristic value and feature vector are then found out according to formula (3) and formula (4), and form principal component decision battle array, realizes dimensionality reduction, it is whole Body flow chart is as shown in Figure 2.
The concrete operations of step 2) are as follows:
Each feature vector that step 1) is obtained is as data sample, wherein a marked network flow of branch's data sample, The unmarked network flow of another part data sample;
By the known category information calculating initial cluster center of own flag data sample to optimize k-means algorithm, with The convergence time of k-means is reduced, and is combined with the method for the iteration of next stage addition central point, the accurate of cluster is improved Property.
Since the purpose of clustering algorithm is that the data for belonging to classification of the same race flock together, the other data of variety classes are drawn It assigns in different cluster, therefore using the classification of own token network stream, one group of initial center point can be calculated, first substantially Determine some cluster ranges.Utilize own token network stream calculation k-means central point, wherein
Wherein, each k-means central point miBy belonging to classification CiMarked network flow f determine, niExpression belongs to classification CiOwn token network stream f number, utilize each k-means central point miConstruct initial center point set M.
The concrete operations of step 3) are as follows:
31) k-means, which is clustered, to be determined to mixed network flow using initial center point set M, obtained in k cluster and k cluster Heart point;
32) according to the k cluster and k cluster central point Calculation Estimation function, the value of evaluation function is obtained, while described in utilization K cluster central point resets set M, obtains set M newly;
33) it calculates in NetFlow characteristic vector collection X with a distance from all central points in the new set M and maximum k A vector point;
34) according to density calculation formula, the maximum vector of density in the distance and maximum k vector point is determined Point, and the maximum vector point of the density is added in the new set M;
35) updating k value is k+1, goes to step 31), until k is greater than
36) when counting each iteration in step 32) evaluation function value, chosen from the value of all evaluation functions minimum Value obtains the corresponding k value of value of minimum evaluation function, then the corresponding result that clusters of the k value is exported.
Wherein, the distance metric in improved k-means algorithm is weighted euclidean distance, i.e.,
Wherein, k value is by k during improved k-means algorithm iterationmin=p is changed tokminValue It is to be determined by the categorical measure of the marked stream inputted, kmaxValue be then that the k-means that summarizes and verify according to document is calculated What the experience maximum value of method was determined.
Wherein, the value of evaluation function is smaller, illustrates that the distance of each sample point in each cluster is closer, that is, the effect clustered is got over It is good.
The present invention consider from selected in k farthest central point away from current central point density it is maximum one carry out Addition, falls into local optimum because can be effectively avoided to distribute apart from farthest point, and density maximum can then guarantee the point Representativeness, the calculation formula of density are as follows:
Wherein, d (xi,xj) indicate vector point xiWith vector point xjBetween weighted euclidean distance,Indicate all vector points Calculation times when combination of two, wherein N is the number of all vector points.
In successive ignition, after adding representative central point, it can use evaluation function and automatically determine iterative process Obtained in best cluster result and the k value corresponding to it, not only realize the adaptive of parameter, while ensure that output is poly- The high accuracy of class result.
The concrete operations of step 4) are as follows:
To the C that clustersi, count the total number of own token network stream in clusterWhenLess than default network flow threshold gammaiWhen, By cluster CiFor unknown protocol cluster;WhenValue be more than or equal to default network flow threshold gammaiWhen, then calculate own mark of all categories The posterior probability for remembering network flow counts maximum posterior probability values, and the cluster is determined into the corresponding net of maximum posteriori probability value Network stream type.
Wherein, the maximum a posteriori probability of own token network stream in clustering are as follows:
nijIndicate the network flow amount for belonging to type j in the marked network flow in cluster i, niIndicate oneself label stream in cluster i Total quantity.
Network flow threshold gammaiAre as follows:
γiIt is even all types of in some cluster for ratio shared by marked network flow in the network flow of Mixed design Own token network stream total number is added, still less than all-network stream in the cluster number multiplied by γi1/2 when, which will be temporary When be determined as unknown protocol cluster.In view of marked data randomly select, for belonging to the network of non-unknown protocol classification There should be number for stream, in the clustering cluster where them to be greater thanOwn token network stream.In view of cluster result Contingency, it is believed that when the number of the own token network stream in certain cluster is less than (1/2)When, according to marked in cluster The data when type decision of the type progress cluster of network flow are inadequate, it is believed that it determines that result does not have representativeness, therefore will These clusters it is temporary be divided into unknown protocol cluster, need to conduct further research in system update module to it.
By the improved classification mapping method that clusters, can make can be wrong in traditional semi-supervised traffic classification method Accidentally being divided into certain, oneself knows that the unknown protocol classification cluster in protocol class is also identified and extracts, and utilizes such result that clusters Classifier on the line trained, can greatly improve the accuracy rate of classifier on line, while realize the extraction of unknown protocol on line.

Claims (6)

1. a kind of self adaptive network traffic classification method open based on 5G network capabilities, which comprises the following steps:
1) network incremental data is handled using adaptive sliding window dynamic, passes through original sliding window data and increment Sliding window data constructs overall data, then extracts to the feature vector of overall data;
2) each feature vector for extracting step 1) is as data sample, wherein a marked network flow of branch's data sample, separately A part of unmarked network flow of data sample, by the known category information of marked data sample calculate initial cluster center with Optimize k-means algorithm, obtain several k-means central points, constructs initial center point set M using each k-means central point;
3) k-means cluster is carried out to network flow using initial center point set M, obtains k cluster and k cluster central point, it is then sharp The result that clusters is obtained according to evaluation function with k cluster and k cluster central point;
4) statistics clusters the number of marked network flow, and the number of marked network flow is less than default network flow threshold value in cluster When, then this clusters as unknown protocol cluster;When the number for the middle known network stream that clusters is more than or equal to default network flow threshold value, then root The posterior probability that marked network flow of all categories is calculated according to maximum a posteriori probability formula, by the corresponding class of maximum posteriori probability value Not as the classification of the network flow, the self adaptive network traffic classification open based on 5G network capabilities is realized.
2. the self adaptive network traffic classification method open based on 5G network capabilities according to claim 1, feature exist In the concrete operations of step 1) are as follows:
Dynamically network incremental data is handled using Adaptive windowing mouth, by raw window data and increment window number According to matrix being respectively X1=[x1,x2,…,xm] and X2=[xm+1,xm+2,…,xm+r], all data samples are represented by X= [X1,X2], if the mutual information matrix of all data samples is S, the mutual information matrix of raw window data is S1, increase window number newly According to mutual information matrix be S2, then the mutual information matrix S of all data samples are as follows:
Utilize S1Feature decomposition by S1Diagonalization is unit battle array, i.e.,
Then by S2It projects to by H1The space opened, then have
Formula (1) is added with formula (2), is obtained:
It acquiresFeature decomposition, it may be assumed that
It brings formula (5) into formula (4), obtains:
By formula (1) and formula (6), the feature decomposition of the mutual information matrix S of all data samples is obtained, from formula (2):
Wherein, Bi∈Rn×kFor the principal component decision matrix of initial data, Λ1∈Rm×kFor selection preceding k eigenvalue cluster at Matrix;
S is found out according to formula (5)2Characteristic value Λ2=[μ12,…,μn], feature vector P2=[β12,…,βn] and its it is corresponding Feature vector, the characteristic value of S is acquired according to the k characteristic value and feature vector are as follows:
Wherein, m and r is the sample size of historical data and newly-increased data respectively;
The feature vector of S:
P=H1βi (9)。
3. the self adaptive network traffic classification method open based on 5G network capabilities according to claim 1, feature exist In the concrete operations of step 2) are as follows:
Each feature vector that step 1) is obtained is as data sample, wherein a marked network flow of branch's data sample, it is another The unmarked network flow of partial data sample;
Initial cluster center is calculated by the known category information of own flag data sample to optimize k-means algorithm;And it utilizes Own token network stream calculation k-means central point, wherein
Wherein, each k-means central point miBy belonging to classification CiMarked network flow f determine, niExpression belongs to classification Ci's The number of own token network stream f utilizes each k-means central point miConstruct initial center point set M.
4. the self adaptive network traffic classification method open based on 5G network capabilities according to claim 1, feature exist In the concrete operations of step 3) are as follows:
31) k-means, which is clustered, to be determined to mixed network flow using initial center point set M, obtains k cluster and k cluster central point;
32) according to the k cluster and k cluster central point Calculation Estimation function, the value of evaluation function is obtained, while utilizing the k Cluster central point resets set M, obtains set M newly;
33) calculate in NetFlow characteristic vector collection X with a distance from all central points in the new set M and maximum k it is a to Amount point;
34) according to density calculation formula, the maximum vector point of density in the distance and maximum k vector point is determined, and The maximum vector point of the density is added in the new set M;
35) updating k value is k+1, goes to step 31), until k is greater than
36) count each iteration when step 32) in evaluation function value, choose minimum value from the value of all evaluation functions, obtain The corresponding k value of the value of minimum evaluation function is taken, then the corresponding result that clusters of the k value is exported.
5. the self adaptive network traffic classification method open based on 5G network capabilities according to claim 1, feature exist In the concrete operations of step 4) are as follows:
To the C that clustersi, count the total number of own token network stream in clusterWhenLess than default network flow threshold gammaiWhen, by this Cluster CiFor unknown protocol cluster;WhenValue be more than or equal to default network flow threshold gammaiWhen, then calculate own label net of all categories The posterior probability of network stream counts maximum posterior probability values, and the cluster is determined into the corresponding network flow of maximum posteriori probability value Type.
6. the self adaptive network traffic classification method open based on 5G network capabilities according to claim 1, feature exist In, cluster in own token network stream maximum a posteriori probability are as follows:
nijIndicate the network flow amount for belonging to type j in the marked network flow in cluster i, niIndicate that oneself label flows total in cluster i Quantity.
CN201910579744.6A 2019-06-28 2019-06-28 A kind of self adaptive network traffic classification method open based on 5G network capabilities Pending CN110365603A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910579744.6A CN110365603A (en) 2019-06-28 2019-06-28 A kind of self adaptive network traffic classification method open based on 5G network capabilities

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910579744.6A CN110365603A (en) 2019-06-28 2019-06-28 A kind of self adaptive network traffic classification method open based on 5G network capabilities

Publications (1)

Publication Number Publication Date
CN110365603A true CN110365603A (en) 2019-10-22

Family

ID=68216017

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910579744.6A Pending CN110365603A (en) 2019-06-28 2019-06-28 A kind of self adaptive network traffic classification method open based on 5G network capabilities

Country Status (1)

Country Link
CN (1) CN110365603A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112396090A (en) * 2020-10-22 2021-02-23 国网浙江省电力有限公司杭州供电公司 Clustering method and device for power grid service big data detection and analysis
CN113242207A (en) * 2021-04-02 2021-08-10 河海大学 Iterative clustering network flow abnormity detection method
CN113807373A (en) * 2020-06-11 2021-12-17 中移(苏州)软件技术有限公司 Traffic identification method and device, equipment and storage medium
WO2021258961A1 (en) * 2020-06-22 2021-12-30 南京邮电大学 Network traffic classification method and system based on improved k-means algorithm

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106878073A (en) * 2017-02-14 2017-06-20 南京邮电大学 Network multimedia business semisupervised classification method based on t Distribution Mixed Models
CN107819698A (en) * 2017-11-10 2018-03-20 北京邮电大学 A kind of net flow assorted method based on semi-supervised learning, computer equipment
CN107846326A (en) * 2017-11-10 2018-03-27 北京邮电大学 A kind of adaptive semi-supervised net flow assorted method, system and equipment
CN108537288A (en) * 2018-04-19 2018-09-14 辽宁大学 A kind of real-time feature extraction method based on mutual information
CN109309630A (en) * 2018-09-25 2019-02-05 深圳先进技术研究院 A kind of net flow assorted method, system and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106878073A (en) * 2017-02-14 2017-06-20 南京邮电大学 Network multimedia business semisupervised classification method based on t Distribution Mixed Models
CN107819698A (en) * 2017-11-10 2018-03-20 北京邮电大学 A kind of net flow assorted method based on semi-supervised learning, computer equipment
CN107846326A (en) * 2017-11-10 2018-03-27 北京邮电大学 A kind of adaptive semi-supervised net flow assorted method, system and equipment
CN108537288A (en) * 2018-04-19 2018-09-14 辽宁大学 A kind of real-time feature extraction method based on mutual information
CN109309630A (en) * 2018-09-25 2019-02-05 深圳先进技术研究院 A kind of net flow assorted method, system and electronic equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
孔晓晨: "基于半监督学习的网络流量分类技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王妍 等: "一种基于互信息的实时特征提取算法", 《小型微型计算机系统》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807373A (en) * 2020-06-11 2021-12-17 中移(苏州)软件技术有限公司 Traffic identification method and device, equipment and storage medium
CN113807373B (en) * 2020-06-11 2024-02-02 中移(苏州)软件技术有限公司 Traffic identification method and device, equipment and storage medium
WO2021258961A1 (en) * 2020-06-22 2021-12-30 南京邮电大学 Network traffic classification method and system based on improved k-means algorithm
US11570069B2 (en) 2020-06-22 2023-01-31 Nanjing University Of Posts And Telecommunicatins Network traffic classification method and system based on improved K-means algorithm
CN112396090A (en) * 2020-10-22 2021-02-23 国网浙江省电力有限公司杭州供电公司 Clustering method and device for power grid service big data detection and analysis
CN113242207A (en) * 2021-04-02 2021-08-10 河海大学 Iterative clustering network flow abnormity detection method
CN113242207B (en) * 2021-04-02 2022-06-17 河海大学 Iterative clustering network flow abnormity detection method

Similar Documents

Publication Publication Date Title
CN107846326A (en) A kind of adaptive semi-supervised net flow assorted method, system and equipment
CN110365603A (en) A kind of self adaptive network traffic classification method open based on 5G network capabilities
CN109067586B (en) DDoS attack detection method and device
CN110311829A (en) A kind of net flow assorted method accelerated based on machine learning
CN106228398A (en) Specific user's digging system based on C4.5 decision Tree algorithms and method thereof
CN107819698A (en) A kind of net flow assorted method based on semi-supervised learning, computer equipment
CN114553475A (en) Network attack detection method based on network flow attribute directed topology
CN111897733A (en) Fuzzy test method and device based on minimum set coverage
CN109490838A (en) A kind of Recognition Method of Radar Emitters of data base-oriented incompleteness
CN113452802A (en) Equipment model identification method, device and system
CN104680193A (en) Online target classification method and system based on fast similarity network fusion algorithm
CN110334777A (en) A kind of unsupervised attribute selection method of weighting multi-angle of view
CN112183459B (en) Remote sensing water quality image classification method based on evolution multi-objective optimization
WO2020024444A1 (en) Group performance grade recognition method and apparatus, and storage medium and computer device
CN109523514A (en) To the batch imaging quality assessment method of Inverse Synthetic Aperture Radar ISAR
CN109583519A (en) A kind of semisupervised classification method based on p-Laplacian figure convolutional neural networks
CN107392249A (en) A kind of density peak clustering method of k nearest neighbor similarity optimization
CN112383488B (en) Content identification method suitable for encrypted and non-encrypted data streams
CN104468276B (en) Network flow identification method based on random sampling multi-categorizer
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN114666273A (en) Application layer unknown network protocol oriented traffic classification method
CN112633475A (en) Large-scale network burst flow identification model and method and model training method
CN108268478A (en) A kind of unbalanced dataset feature selection approach and device based on ur-CAIM algorithms
CN117633627A (en) Deep learning unknown network traffic classification method and system based on evidence uncertainty evaluation
CN113746707B (en) Encrypted traffic classification method based on classifier and network structure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191022

RJ01 Rejection of invention patent application after publication