CN110365603A - A kind of self adaptive network traffic classification method open based on 5G network capabilities - Google Patents
A kind of self adaptive network traffic classification method open based on 5G network capabilities Download PDFInfo
- Publication number
- CN110365603A CN110365603A CN201910579744.6A CN201910579744A CN110365603A CN 110365603 A CN110365603 A CN 110365603A CN 201910579744 A CN201910579744 A CN 201910579744A CN 110365603 A CN110365603 A CN 110365603A
- Authority
- CN
- China
- Prior art keywords
- network
- cluster
- data
- network flow
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing data switching networks
- H04L43/02—Capturing of monitoring data
- H04L43/022—Capturing of monitoring data by sampling
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L47/00—Traffic control in data switching networks
- H04L47/10—Flow control; Congestion control
- H04L47/24—Traffic characterised by specific attributes, e.g. priority or QoS
- H04L47/2441—Traffic characterised by specific attributes, e.g. priority or QoS relying on flow classification, e.g. using integrated services [IntServ]
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The invention discloses a kind of self adaptive network traffic classification methods open based on 5G network capabilities, comprising the following steps: 1) constructs overall data, then extract to the feature vector of overall data;2) each feature vector for extracting step 1) is as data sample, initial cluster center is calculated by the known category information of marked data sample to optimize k-means algorithm, several k-means central points are obtained, each k-means central point is utilized to construct initial center point set M;3) k-means cluster is carried out to network flow using initial center point set M, the result that clusters is obtained according to evaluation function using k cluster and k cluster central point;4) number for the marked network flow that clusters is counted, the classification of network flow is carried out according to the number for the marked network flow that clusters, and realizes the self adaptive network traffic classification open based on 5G network capabilities, it is short that this method models the time, Space-time Complexity is low, and has a wide range of application.
Description
Technical field
The invention belongs to network information fields, are related to a kind of self adaptive network traffic classification open based on 5G network capabilities
Method.
Background technique
In recent years, ability is opened the markets, and potentiality are big, and customer demand is vigorous, and communication capacity is open to have become operator with integrated
The emphasis of hot spot and 5G network Development that future increases.The 5G stage is that network enables the enabled transformation of business, for network energy
The calling degree of power is wider deeper, and the open type of ability and range are more.But the open current standard aspect of 5G ability and industry
Business demand level is still in the research and probe stage, and from the point of view of investigating feedback, the open research and development of products work of 5G ability is also not
Therefore deeply expansion is highly desirable to carry out relevant research work in terms of to standard aspect and Evolution Strategies, need by right
5G network capabilities is open to carry out deep Study on Development Tactics, and then the smooth evolution for ability platform of promoting business.
Under current complicated network environment, in order to realize effective supervision and control to network flow, to network bandwidth
Resource carries out reasonable distribution and guarantees that the safe and reliable transmission of the network information, the research of net flow assorted technology become especially to weigh
It wants.At the same time, compared with the network environment of traditional 4G or 3G, occur a large amount of new application in the network of 5G, these
The composition of network flow is become more complicated using brought unknown protocol flow.According to statistics, belong in current network
The network flow of new application has accounted for the 30% of 60% and unidentified bit number of unidentified data network stream, therefore, into
When row net flow assorted, if classifier is not handled these novel unknown protocol network flows, net will seriously affect
The whole accuracy rate of network traffic classification.The new application largely occurred is that traditional net flow assorted method brings some skills
Art problem needs new technology to improve original classification schemes, adapts to nowadays complicated network environment.
In recent years, net flow assorted method mainly has 4 kinds at present: traffic classification technology based on port is based on depth
The traffic classification technology of packet detection (DeepPacket Inspection, DPI), the machine learning based on statistical flow characteristic
The traffic classification method of (Machine Learning, ML) and traffic classification technology based on user behavior characteristics.
Traffic classification technology based on port
At the initial stage of internet development, the traffic classes and number in network are all relatively fewer, interconnect network data dispenser
Structure IANA tissue is that some common network protocols are assigned with fixed port numbers, therefore classify in early stage to network flow
When, the affiliated application protocol type of the flow can be judged by identifying source port number and the destination slogan of data packet.
Traffic classification technology based on deep-packet detection
The payload segment of data packet contains bulk information, and DPI is exactly to be classified using these information.Based on DPI flow
Sorting technique is realized according to the condition code of specific protocol or application, by carrying out to the load data in network flow special
Code matching is levied, to obtain the classification of flow.
Sorting technique based on DPI although accuracy rate with higher, but there are also some disadvantages: the calculating that consumption is more
Resource, weaker to data encryption classification capacity, relatively difficult analysis to load data is extracted and updated to types of applications condition code
The infringement to user privacy right can be brought.
Machine learning method based on statistical flow characteristic
In recent years, with the development of artificial intelligence technology, more and more researchers start with machine learning algorithm
Solve the problems, such as traffic classification.Traffic classification is solved the problems, such as using machine learning, and main there are two parts: training dataset and machine
Learning algorithm.The generation of training dataset marks training firstly the need of using DPI tool, system process monitoring or artificial method
Sample obtains sample label, and the feature of data flow is then extracted from network flow, is finally calculated using training set and machine learning
Method obtains classifier, that is, trained classifier can be used to classify network flow.
Traffic classification technology based on user behavior characteristics
With emerging in large numbers for stream feature encryption technology, certain limitation is brought to traffic classification is carried out based on statistical flow characteristic
Property.In recent years, researcher starts with the different communication behavior mode of host and carries out net flow assorted, and proposes utilization
The Host behaviors features such as user's connection mode, connection figure, network connection diameter, analyze network flow, open analysis
The new method of net flow assorted.The traffic classification technology of Behavior-based control feature mainly passes through the company of analysis network protocol and application
The inherent characteristic in characteristic and behavior pattern is connect, achievees the purpose that classify to different flow.When this method usually models
Between it is longer, Space-time Complexity is high, using there is certain limitation.
Summary of the invention
It is an object of the invention to overcome the above-mentioned prior art, provide a kind of open based on 5G network capabilities
Self adaptive network traffic classification method, this method modeling time is short, and Space-time Complexity is low, and has a wide range of application.
In order to achieve the above objectives, the self adaptive network traffic classification method open based on 5G network capabilities of the present invention
The following steps are included:
1) network incremental data is handled using adaptive sliding window dynamic, by original sliding window data with
Increment sliding window data constructs overall data, then extracts to the feature vector of overall data;
2) each feature vector for extracting step 1) is as data sample, wherein a marked network of branch's data sample
Stream, the unmarked network flow of another part data sample calculate initial clustering by the known category information of marked data sample
Center obtains several k-means central points to optimize k-means algorithm, constructs initial center point set using each k-means central point
Close M;
3) k-means cluster is carried out to network flow using initial center point set M, obtains k cluster and k cluster central point, so
The result that clusters is obtained according to evaluation function using k cluster and k cluster central point afterwards;
4) statistics clusters the number of marked network flow, and the number of marked network flow is less than default network flow in cluster
When threshold value, then this clusters as unknown protocol cluster;When the number for the middle known network stream that clusters is more than or equal to default network flow threshold value,
The posterior probability that marked network flow of all categories is then calculated according to maximum a posteriori probability formula, maximum posteriori probability value is corresponding
Classification of the classification as the network flow, realize the self adaptive network traffic classification open based on 5G network capabilities.
The concrete operations of step 1) are as follows:
Dynamically network incremental data is handled using Adaptive windowing mouth, by raw window data and increment window
Mouth data are respectively X with matrix1=[x1,x2,…,xm] and X2=[xm+1,xm+2,…,xm+r], all data samples are represented by X
=[X1,X2], if the mutual information matrix of all data samples is S, the mutual information matrix of raw window data is S1, increase window newly
The mutual information matrix of data is S2, then the mutual information matrix S of all data samples are as follows:
Utilize S1Feature decomposition by S1Diagonalization is unit battle array, i.e.,
Then by S2It projects to by H1The space opened, then have
Formula (1) is added with formula (2), is obtained:
It acquiresFeature decomposition, it may be assumed that
It brings formula (5) into formula (4), obtains:
By formula (1) and formula (6), the feature decomposition of the mutual information matrix S of all data samples is obtained, from formula (2):
Wherein, Bi∈Rn×kFor the principal component decision matrix of initial data, Λ1∈Rm×kFor the preceding k eigenvalue cluster of selection
At matrix;
S is found out according to formula (5)2Characteristic value Λ2=[μ1,μ2,…,μn], feature vector P2=[β1,β2,…,βn] and its
Corresponding feature vector acquires the characteristic value of S according to the k characteristic value and feature vector are as follows:
Wherein, m and r is the sample size of historical data and newly-increased data respectively;
The feature vector of S:
P=H1βi(9)。
The concrete operations of step 2) are as follows:
Each feature vector that step 1) is obtained is as data sample, wherein a marked network flow of branch's data sample,
The unmarked network flow of another part data sample;
Initial cluster center is calculated by the known category information of own flag data sample to optimize k-means algorithm;And
Utilize own token network stream calculation k-means central point, wherein
Wherein, each k-means central point miBy belonging to classification CiMarked network flow f determine, niExpression belongs to classification
CiOwn token network stream f number, utilize each k-means central point miConstruct initial center point set M.
The concrete operations of step 3) are as follows:
31) k-means, which is clustered, to be determined to mixed network flow using initial center point set M, obtained in k cluster and k cluster
Heart point;
32) according to the k cluster and k cluster central point Calculation Estimation function, the value of evaluation function is obtained, while described in utilization
K cluster central point resets set M, obtains set M newly;
33) it calculates in NetFlow characteristic vector collection X with a distance from all central points in the new set M and maximum k
A vector point;
34) according to density calculation formula, the maximum vector of density in the distance and maximum k vector point is determined
Point, and the maximum vector point of the density is added in the new set M;
35) updating k value is k+1, goes to step 31), until k is greater than
36) when counting each iteration in step 32) evaluation function value, chosen from the value of all evaluation functions minimum
Value obtains the corresponding k value of value of minimum evaluation function, then the corresponding result that clusters of the k value is exported.
The concrete operations of step 4) are as follows:
To the C that clustersi, count the total number of own token network stream in clusterWhenLess than default network flow threshold gammaiWhen,
By cluster CiFor unknown protocol cluster;WhenValue be more than or equal to default network flow threshold gammaiWhen, then calculate own mark of all categories
The posterior probability for remembering network flow counts maximum posterior probability values, and the cluster is determined into the corresponding net of maximum posteriori probability value
Network stream type.
The maximum a posteriori probability of own token network stream in clustering are as follows:
nijIndicate the network flow amount for belonging to type j in the marked network flow in cluster i, niIndicate oneself label stream in cluster i
Total quantity.
The invention has the following advantages:
The self adaptive network traffic classification method open based on 5G network capabilities of the present invention is when specific operation, first
Network incremental data is handled using adaptive sliding window dynamic, obtains the feature vector of overall data, then calculate
To several k-means central points, k-means then is carried out to network flow using obtained k-means central point and is clustered, and is utilized
Evaluation function is clustered as a result, the last number according to the marked network flow that clusters obtains network flow using posterior probability formula
Classification, convenient and simple for operation, accuracy is higher, modeling the time it is short, Space-time Complexity is low, and has a wide range of application.
Detailed description of the invention
Fig. 1 is flow chart of the invention;
Fig. 2 is the schematic diagram of feature extraction in the present invention.
Specific embodiment
The invention will be described in further detail with reference to the accompanying drawing:
Referring to Figure 1 and Figure 2, the self adaptive network traffic classification method packet open based on 5G network capabilities of the present invention
Include following steps:
1) network incremental data is handled using adaptive sliding window dynamic, by original sliding window data with
Increment sliding window data constructs overall data, then extracts to the feature vector of overall data;
2) each feature vector for extracting step 1) is as data sample, wherein a marked network of branch's data sample
Stream, the unmarked network flow of another part data sample calculate initial clustering by the known category information of marked data sample
Center obtains several k-means central points to optimize k-means algorithm, constructs initial center point set using each k-means central point
Close M;
3) k-means cluster is carried out to network flow using initial center point set M, obtains k cluster and k cluster central point, so
The result that clusters is obtained according to evaluation function using k cluster and k cluster central point afterwards;
4) statistics clusters the number of marked network flow, and the number of marked network flow is less than default network flow in cluster
When threshold value, then this clusters as unknown protocol cluster;When the number for the middle known network stream that clusters is more than or equal to default network flow threshold value,
The posterior probability that marked network flow of all categories is then calculated according to maximum a posteriori probability formula, maximum posteriori probability value is corresponding
Classification of the classification as the network flow, realize the self adaptive network traffic classification open based on 5G network capabilities.
The concrete operations of step 1) are as follows:
Most of data are not global linear in 5G network flow, they often obey non-linear point of certain form
Cloth rule, and traditional some linear dimension-reduction algorithm such as principal component analysis, drop of the Fisher discriminant analysis for nonlinear data
Poor effect is tieed up, algorithm proposed by the present invention carries out dimensionality reduction while keeping efficient, to data, remains the big of initial data
Partial information.
In addition, data acquisition equipment endlessly acquires from network flow new in the following 5G application process in real time
Data, traditional feature extraction algorithm can not quickly be handled incremental data, if the only merely newly-increased number of processing
According to not considering that historical data influences it, algorithm just can not carry out feature extraction from global angle, and extracted data contain
Information also will be greatly reduced.
In response to this problem, the present invention combines historical data and newly-increased data, dynamically using Adaptive windowing mouth
Network incremental data is handled, is respectively X with matrix by raw window data and increment window data1=[x1,x2,…,
xm] and X2=[xm+1,xm+2,…,xm+r], all data samples are represented by X=[X1,X2], if the mutual trust of all data samples
Breath matrix is S, and the mutual information matrix of raw window data is S1, the mutual information matrix for increasing window data newly is S2, then all data
The mutual information matrix S of sample are as follows:
Utilize S1Feature decomposition by S1Diagonalization is unit battle array, i.e.,
Then by S2It projects to by H1The space opened, then have
Formula (1) is added with formula (2), is obtained:
It acquiresFeature decomposition, it may be assumed that
It brings formula (5) into formula (4), obtains:
By formula (1) and formula (6), the feature decomposition of the mutual information matrix S of all data samples is obtained, from formula (2):
Wherein, Bi∈Rn×kFor the principal component decision matrix of initial data, Λ1∈Rm×kFor the preceding k eigenvalue cluster of selection
At matrix;
S is found out according to formula (5)2Characteristic value Λ2=[μ1,μ2,…,μn], feature vector P2=[β1,β2,…,βn] and its
Corresponding feature vector acquires the characteristic value of S according to the k characteristic value and feature vector are as follows:
Wherein, m and r is the sample size of historical data and newly-increased data respectively;
The feature vector of S:
P=H1βi(9)。
Principal component decision battle array is formed, maps the data into principal component decision battle array, that is, realizes dimensionality reduction, subsequent window weight
This multiple process.
It is not suitable for nonlinear data for traditional characteristic extraction algorithm, and is unable to satisfy industrial big data real-time etc. to ask
Topic, the invention proposes a kind of real-time characteristic extraction algorithm based on mutual information, pseudo-code of the algorithm is as follows:
Input: raw data set
Output: the data set after dimensionality reduction
1. raw data set is pressed given pace input block buffer, when rate is more than certain value, dynamic increases
Buffer area
2. sliding window reads data from the smallest buffer area of number;
3.int id=1;The buffer area/* number be initially 1*/
4.While(buffer[id]!=null) do
5.Read Matrixi;
6.if (Matrixi.Id==1) then
7.MIandEigDesposition(Matrixi);
8.unitedMatrix=UnitedMatrix (Matrixi);
9.Output Matrixi*eigVecMatrix
10.else
11.compute MIMatrix;
12.projection MIMatrix on unitedMatrix
13.proMatrix=PorjectMatrix (MIMatrix);
14.MIandEigDesposition(proMatrix);
15.Output MIMatrix*eigVecMatrix
16.end if
17.id=id+1;
18.end while
Wherein, Matrix is the data matrix in window, is realized with two-dimensional array, and UnitedMatrix () function is used to ask
Unitization matrix is solved, and ProjectMatrix () function is used to seek the matrix after projection;
By scanning each window one by one, first judge whether current window is first window, if it is, finding out current
The mutual information matrix of window, then carries out feature decomposition, selects principal component decision matrix, original matrix is then mapped to decision
On matrix, dimensionality reduction is realized;Otherwise, then the mutual information matrix for finding out this window is projected in the unitization square of last window
In battle array, characteristic value and feature vector are then found out according to formula (3) and formula (4), and form principal component decision battle array, realizes dimensionality reduction, it is whole
Body flow chart is as shown in Figure 2.
The concrete operations of step 2) are as follows:
Each feature vector that step 1) is obtained is as data sample, wherein a marked network flow of branch's data sample,
The unmarked network flow of another part data sample;
By the known category information calculating initial cluster center of own flag data sample to optimize k-means algorithm, with
The convergence time of k-means is reduced, and is combined with the method for the iteration of next stage addition central point, the accurate of cluster is improved
Property.
Since the purpose of clustering algorithm is that the data for belonging to classification of the same race flock together, the other data of variety classes are drawn
It assigns in different cluster, therefore using the classification of own token network stream, one group of initial center point can be calculated, first substantially
Determine some cluster ranges.Utilize own token network stream calculation k-means central point, wherein
Wherein, each k-means central point miBy belonging to classification CiMarked network flow f determine, niExpression belongs to classification
CiOwn token network stream f number, utilize each k-means central point miConstruct initial center point set M.
The concrete operations of step 3) are as follows:
31) k-means, which is clustered, to be determined to mixed network flow using initial center point set M, obtained in k cluster and k cluster
Heart point;
32) according to the k cluster and k cluster central point Calculation Estimation function, the value of evaluation function is obtained, while described in utilization
K cluster central point resets set M, obtains set M newly;
33) it calculates in NetFlow characteristic vector collection X with a distance from all central points in the new set M and maximum k
A vector point;
34) according to density calculation formula, the maximum vector of density in the distance and maximum k vector point is determined
Point, and the maximum vector point of the density is added in the new set M;
35) updating k value is k+1, goes to step 31), until k is greater than
36) when counting each iteration in step 32) evaluation function value, chosen from the value of all evaluation functions minimum
Value obtains the corresponding k value of value of minimum evaluation function, then the corresponding result that clusters of the k value is exported.
Wherein, the distance metric in improved k-means algorithm is weighted euclidean distance, i.e.,
Wherein, k value is by k during improved k-means algorithm iterationmin=p is changed tokminValue
It is to be determined by the categorical measure of the marked stream inputted, kmaxValue be then that the k-means that summarizes and verify according to document is calculated
What the experience maximum value of method was determined.
Wherein, the value of evaluation function is smaller, illustrates that the distance of each sample point in each cluster is closer, that is, the effect clustered is got over
It is good.
The present invention consider from selected in k farthest central point away from current central point density it is maximum one carry out
Addition, falls into local optimum because can be effectively avoided to distribute apart from farthest point, and density maximum can then guarantee the point
Representativeness, the calculation formula of density are as follows:
Wherein, d (xi,xj) indicate vector point xiWith vector point xjBetween weighted euclidean distance,Indicate all vector points
Calculation times when combination of two, wherein N is the number of all vector points.
In successive ignition, after adding representative central point, it can use evaluation function and automatically determine iterative process
Obtained in best cluster result and the k value corresponding to it, not only realize the adaptive of parameter, while ensure that output is poly-
The high accuracy of class result.
The concrete operations of step 4) are as follows:
To the C that clustersi, count the total number of own token network stream in clusterWhenLess than default network flow threshold gammaiWhen,
By cluster CiFor unknown protocol cluster;WhenValue be more than or equal to default network flow threshold gammaiWhen, then calculate own mark of all categories
The posterior probability for remembering network flow counts maximum posterior probability values, and the cluster is determined into the corresponding net of maximum posteriori probability value
Network stream type.
Wherein, the maximum a posteriori probability of own token network stream in clustering are as follows:
nijIndicate the network flow amount for belonging to type j in the marked network flow in cluster i, niIndicate oneself label stream in cluster i
Total quantity.
Network flow threshold gammaiAre as follows:
γiIt is even all types of in some cluster for ratio shared by marked network flow in the network flow of Mixed design
Own token network stream total number is added, still less than all-network stream in the cluster number multiplied by γi1/2 when, which will be temporary
When be determined as unknown protocol cluster.In view of marked data randomly select, for belonging to the network of non-unknown protocol classification
There should be number for stream, in the clustering cluster where them to be greater thanOwn token network stream.In view of cluster result
Contingency, it is believed that when the number of the own token network stream in certain cluster is less than (1/2)When, according to marked in cluster
The data when type decision of the type progress cluster of network flow are inadequate, it is believed that it determines that result does not have representativeness, therefore will
These clusters it is temporary be divided into unknown protocol cluster, need to conduct further research in system update module to it.
By the improved classification mapping method that clusters, can make can be wrong in traditional semi-supervised traffic classification method
Accidentally being divided into certain, oneself knows that the unknown protocol classification cluster in protocol class is also identified and extracts, and utilizes such result that clusters
Classifier on the line trained, can greatly improve the accuracy rate of classifier on line, while realize the extraction of unknown protocol on line.
Claims (6)
1. a kind of self adaptive network traffic classification method open based on 5G network capabilities, which comprises the following steps:
1) network incremental data is handled using adaptive sliding window dynamic, passes through original sliding window data and increment
Sliding window data constructs overall data, then extracts to the feature vector of overall data;
2) each feature vector for extracting step 1) is as data sample, wherein a marked network flow of branch's data sample, separately
A part of unmarked network flow of data sample, by the known category information of marked data sample calculate initial cluster center with
Optimize k-means algorithm, obtain several k-means central points, constructs initial center point set M using each k-means central point;
3) k-means cluster is carried out to network flow using initial center point set M, obtains k cluster and k cluster central point, it is then sharp
The result that clusters is obtained according to evaluation function with k cluster and k cluster central point;
4) statistics clusters the number of marked network flow, and the number of marked network flow is less than default network flow threshold value in cluster
When, then this clusters as unknown protocol cluster;When the number for the middle known network stream that clusters is more than or equal to default network flow threshold value, then root
The posterior probability that marked network flow of all categories is calculated according to maximum a posteriori probability formula, by the corresponding class of maximum posteriori probability value
Not as the classification of the network flow, the self adaptive network traffic classification open based on 5G network capabilities is realized.
2. the self adaptive network traffic classification method open based on 5G network capabilities according to claim 1, feature exist
In the concrete operations of step 1) are as follows:
Dynamically network incremental data is handled using Adaptive windowing mouth, by raw window data and increment window number
According to matrix being respectively X1=[x1,x2,…,xm] and X2=[xm+1,xm+2,…,xm+r], all data samples are represented by X=
[X1,X2], if the mutual information matrix of all data samples is S, the mutual information matrix of raw window data is S1, increase window number newly
According to mutual information matrix be S2, then the mutual information matrix S of all data samples are as follows:
Utilize S1Feature decomposition by S1Diagonalization is unit battle array, i.e.,
Then by S2It projects to by H1The space opened, then have
Formula (1) is added with formula (2), is obtained:
It acquiresFeature decomposition, it may be assumed that
It brings formula (5) into formula (4), obtains:
By formula (1) and formula (6), the feature decomposition of the mutual information matrix S of all data samples is obtained, from formula (2):
Wherein, Bi∈Rn×kFor the principal component decision matrix of initial data, Λ1∈Rm×kFor selection preceding k eigenvalue cluster at
Matrix;
S is found out according to formula (5)2Characteristic value Λ2=[μ1,μ2,…,μn], feature vector P2=[β1,β2,…,βn] and its it is corresponding
Feature vector, the characteristic value of S is acquired according to the k characteristic value and feature vector are as follows:
Wherein, m and r is the sample size of historical data and newly-increased data respectively;
The feature vector of S:
P=H1βi (9)。
3. the self adaptive network traffic classification method open based on 5G network capabilities according to claim 1, feature exist
In the concrete operations of step 2) are as follows:
Each feature vector that step 1) is obtained is as data sample, wherein a marked network flow of branch's data sample, it is another
The unmarked network flow of partial data sample;
Initial cluster center is calculated by the known category information of own flag data sample to optimize k-means algorithm;And it utilizes
Own token network stream calculation k-means central point, wherein
Wherein, each k-means central point miBy belonging to classification CiMarked network flow f determine, niExpression belongs to classification Ci's
The number of own token network stream f utilizes each k-means central point miConstruct initial center point set M.
4. the self adaptive network traffic classification method open based on 5G network capabilities according to claim 1, feature exist
In the concrete operations of step 3) are as follows:
31) k-means, which is clustered, to be determined to mixed network flow using initial center point set M, obtains k cluster and k cluster central point;
32) according to the k cluster and k cluster central point Calculation Estimation function, the value of evaluation function is obtained, while utilizing the k
Cluster central point resets set M, obtains set M newly;
33) calculate in NetFlow characteristic vector collection X with a distance from all central points in the new set M and maximum k it is a to
Amount point;
34) according to density calculation formula, the maximum vector point of density in the distance and maximum k vector point is determined, and
The maximum vector point of the density is added in the new set M;
35) updating k value is k+1, goes to step 31), until k is greater than
36) count each iteration when step 32) in evaluation function value, choose minimum value from the value of all evaluation functions, obtain
The corresponding k value of the value of minimum evaluation function is taken, then the corresponding result that clusters of the k value is exported.
5. the self adaptive network traffic classification method open based on 5G network capabilities according to claim 1, feature exist
In the concrete operations of step 4) are as follows:
To the C that clustersi, count the total number of own token network stream in clusterWhenLess than default network flow threshold gammaiWhen, by this
Cluster CiFor unknown protocol cluster;WhenValue be more than or equal to default network flow threshold gammaiWhen, then calculate own label net of all categories
The posterior probability of network stream counts maximum posterior probability values, and the cluster is determined into the corresponding network flow of maximum posteriori probability value
Type.
6. the self adaptive network traffic classification method open based on 5G network capabilities according to claim 1, feature exist
In, cluster in own token network stream maximum a posteriori probability are as follows:
nijIndicate the network flow amount for belonging to type j in the marked network flow in cluster i, niIndicate that oneself label flows total in cluster i
Quantity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910579744.6A CN110365603A (en) | 2019-06-28 | 2019-06-28 | A kind of self adaptive network traffic classification method open based on 5G network capabilities |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910579744.6A CN110365603A (en) | 2019-06-28 | 2019-06-28 | A kind of self adaptive network traffic classification method open based on 5G network capabilities |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110365603A true CN110365603A (en) | 2019-10-22 |
Family
ID=68216017
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910579744.6A Pending CN110365603A (en) | 2019-06-28 | 2019-06-28 | A kind of self adaptive network traffic classification method open based on 5G network capabilities |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110365603A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112396090A (en) * | 2020-10-22 | 2021-02-23 | 国网浙江省电力有限公司杭州供电公司 | Clustering method and device for power grid service big data detection and analysis |
CN113242207A (en) * | 2021-04-02 | 2021-08-10 | 河海大学 | Iterative clustering network flow abnormity detection method |
CN113807373A (en) * | 2020-06-11 | 2021-12-17 | 中移(苏州)软件技术有限公司 | Traffic identification method and device, equipment and storage medium |
WO2021258961A1 (en) * | 2020-06-22 | 2021-12-30 | 南京邮电大学 | Network traffic classification method and system based on improved k-means algorithm |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106878073A (en) * | 2017-02-14 | 2017-06-20 | 南京邮电大学 | Network multimedia business semisupervised classification method based on t Distribution Mixed Models |
CN107819698A (en) * | 2017-11-10 | 2018-03-20 | 北京邮电大学 | A kind of net flow assorted method based on semi-supervised learning, computer equipment |
CN107846326A (en) * | 2017-11-10 | 2018-03-27 | 北京邮电大学 | A kind of adaptive semi-supervised net flow assorted method, system and equipment |
CN108537288A (en) * | 2018-04-19 | 2018-09-14 | 辽宁大学 | A kind of real-time feature extraction method based on mutual information |
CN109309630A (en) * | 2018-09-25 | 2019-02-05 | 深圳先进技术研究院 | A kind of net flow assorted method, system and electronic equipment |
-
2019
- 2019-06-28 CN CN201910579744.6A patent/CN110365603A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106878073A (en) * | 2017-02-14 | 2017-06-20 | 南京邮电大学 | Network multimedia business semisupervised classification method based on t Distribution Mixed Models |
CN107819698A (en) * | 2017-11-10 | 2018-03-20 | 北京邮电大学 | A kind of net flow assorted method based on semi-supervised learning, computer equipment |
CN107846326A (en) * | 2017-11-10 | 2018-03-27 | 北京邮电大学 | A kind of adaptive semi-supervised net flow assorted method, system and equipment |
CN108537288A (en) * | 2018-04-19 | 2018-09-14 | 辽宁大学 | A kind of real-time feature extraction method based on mutual information |
CN109309630A (en) * | 2018-09-25 | 2019-02-05 | 深圳先进技术研究院 | A kind of net flow assorted method, system and electronic equipment |
Non-Patent Citations (2)
Title |
---|
孔晓晨: "基于半监督学习的网络流量分类技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
王妍 等: "一种基于互信息的实时特征提取算法", 《小型微型计算机系统》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113807373A (en) * | 2020-06-11 | 2021-12-17 | 中移(苏州)软件技术有限公司 | Traffic identification method and device, equipment and storage medium |
CN113807373B (en) * | 2020-06-11 | 2024-02-02 | 中移(苏州)软件技术有限公司 | Traffic identification method and device, equipment and storage medium |
WO2021258961A1 (en) * | 2020-06-22 | 2021-12-30 | 南京邮电大学 | Network traffic classification method and system based on improved k-means algorithm |
US11570069B2 (en) | 2020-06-22 | 2023-01-31 | Nanjing University Of Posts And Telecommunicatins | Network traffic classification method and system based on improved K-means algorithm |
CN112396090A (en) * | 2020-10-22 | 2021-02-23 | 国网浙江省电力有限公司杭州供电公司 | Clustering method and device for power grid service big data detection and analysis |
CN113242207A (en) * | 2021-04-02 | 2021-08-10 | 河海大学 | Iterative clustering network flow abnormity detection method |
CN113242207B (en) * | 2021-04-02 | 2022-06-17 | 河海大学 | Iterative clustering network flow abnormity detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107846326A (en) | A kind of adaptive semi-supervised net flow assorted method, system and equipment | |
CN110365603A (en) | A kind of self adaptive network traffic classification method open based on 5G network capabilities | |
CN109067586B (en) | DDoS attack detection method and device | |
CN110311829A (en) | A kind of net flow assorted method accelerated based on machine learning | |
CN106228398A (en) | Specific user's digging system based on C4.5 decision Tree algorithms and method thereof | |
CN107819698A (en) | A kind of net flow assorted method based on semi-supervised learning, computer equipment | |
CN114553475A (en) | Network attack detection method based on network flow attribute directed topology | |
CN111897733A (en) | Fuzzy test method and device based on minimum set coverage | |
CN109490838A (en) | A kind of Recognition Method of Radar Emitters of data base-oriented incompleteness | |
CN113452802A (en) | Equipment model identification method, device and system | |
CN104680193A (en) | Online target classification method and system based on fast similarity network fusion algorithm | |
CN110334777A (en) | A kind of unsupervised attribute selection method of weighting multi-angle of view | |
CN112183459B (en) | Remote sensing water quality image classification method based on evolution multi-objective optimization | |
WO2020024444A1 (en) | Group performance grade recognition method and apparatus, and storage medium and computer device | |
CN109523514A (en) | To the batch imaging quality assessment method of Inverse Synthetic Aperture Radar ISAR | |
CN109583519A (en) | A kind of semisupervised classification method based on p-Laplacian figure convolutional neural networks | |
CN107392249A (en) | A kind of density peak clustering method of k nearest neighbor similarity optimization | |
CN112383488B (en) | Content identification method suitable for encrypted and non-encrypted data streams | |
CN104468276B (en) | Network flow identification method based on random sampling multi-categorizer | |
CN114897085A (en) | Clustering method based on closed subgraph link prediction and computer equipment | |
CN114666273A (en) | Application layer unknown network protocol oriented traffic classification method | |
CN112633475A (en) | Large-scale network burst flow identification model and method and model training method | |
CN108268478A (en) | A kind of unbalanced dataset feature selection approach and device based on ur-CAIM algorithms | |
CN117633627A (en) | Deep learning unknown network traffic classification method and system based on evidence uncertainty evaluation | |
CN113746707B (en) | Encrypted traffic classification method based on classifier and network structure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191022 |
|
RJ01 | Rejection of invention patent application after publication |