CN104063747A

CN104063747A - Performance abnormality prediction method in distributed system and system

Info

Publication number: CN104063747A
Application number: CN201410294472.2A
Authority: CN
Inventors: 曹健; 杨定裕; 仇沂; 顾骅; 沈琪骏; 王烺
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2014-06-26
Filing date: 2014-06-26
Publication date: 2014-09-24

Abstract

The invention relates to a performance abnormality prediction method in a distributed system and a system. The historical performance data and real-time performance data are collected through the monitoring system of a distributed environment, a characteristic value is employed to extract the characteristic of description data, the mode of a performance variable is constructed, a classification model is trained through Naive Bayesian classification, a current data mode and historical data modes are compared, a mode which is most similar to the current data mode is found in the historical data modes, and finally a question whether the current data mode is in an abnormal state is predicated according to a Naive Bayesian predication model. According to the method and the system, for the abnormal performance prediction in the distributed system, the problem of the characteristic of a variable is considered comprehensively, the accuracy is high, a machine learning method Bayesian model is employed to guide the prediction, the performance abnormality situation is detected in real time, the detected prediction is estimated and analyzed through the previously obtained Bayesian model, the confidence of the prediction is raised, the degree of automation is high, and the reliability and practicality of the prediction are improved.

Description

Performance abnormity prediction method and system in distributed system

Technical Field

The present invention relates to a performance anomaly detection and prediction method and system, and in particular, to a performance anomaly prediction method and system in a distributed system.

Background

In a distributed system, the computers are independent of each other, may be physically adjacent, or may be geographically distributed, and are connected by a network or other means to form a whole. From the research point of view, distributed computing has the following characteristics: 1. resource sharing; 2. scalability; 3. fault tolerance; 4. and (4) concurrency.

Monitoring of a distributed computing environment becomes particularly important and critical in order to better embody the powerful ability of distributed computing to handle data computations. The system must coordinate the operation of these tasks, allocate resources reasonably so that the resources are fully utilized and improve the performance of the whole system. Typically, the system employs a scheduler to manage these tasks. The scheduler will gather information about the various resources in the system to determine whether the resources are available, and then the scheduling algorithm will prioritize and assign the tasks to their available resources based on the availability of the resources, the running time of the tasks, etc. However, as the task runs, the states of various resources, such as CPU load, remaining memory, and remaining space of the hard disk, change at any time, and if it is predicted that the resources will still be available at a future time before scheduling is performed, and the use of the resources in an abnormal period is reasonably avoided, the scheduling result of the system will be more ideal. Therefore, it is important to monitor the resources in the system in real time and detect a precursor to an anomaly before it occurs.

The system performance abnormity refers to the phenomenon that the performance of a computer system is gradually reduced to an intolerable degree due to the gradual exhaustion of resources or the gradual accumulation of operation errors during the operation of software. A system performance exception is typically a system state behavior (e.g., CPU load, memory usage, etc.) that does not maintain the existing application program work. Most of the abnormal prediction models are only models based on regression technology, and the regression technology has specific limitations, so that the models have respective defects, or are only suitable for specific data, or have large prediction errors and the like. On the basis of the existing abnormal prediction model based on classification, identification needs to be manually allocated to historical data, the degree of automation is not high, and the characteristics of variables cannot be comprehensively considered only from the perspective of the values of the variables, so that a prediction result has certain errors.

Disclosure of Invention

The invention aims to provide a performance abnormity prediction method and a performance abnormity prediction system in a distributed system, and solves the problems that the automation degree of the performance prediction of a distributed environment is not high, and the characteristics of variables cannot be considered comprehensively only from the angle of variable values.

In order to solve the above problems, the present invention relates to a performance anomaly prediction method in a distributed system, comprising the following steps:

s1: extracting a target data value from historical performance data obtained by a plurality of monitoring nodes in a monitoring system to serve as a training data source, and calculating characteristic values of historical data patterns in the data source;

s2: respectively obtaining prior probability distribution of each historical data mode in various states according to the characteristic value of each historical data mode, and counting the probability distribution of each state, thereby training Bayesian models of the states of each data mode;

s3: calculating a characteristic value of a current data mode according to real-time performance data acquired by a monitoring system;

s4: finding a data pattern most similar to a current data pattern from the historical data patterns;

s5: predicting through a Bayesian model trained in S2 according to the output result of S4 to respectively obtain probability distributions of the multiple states;

s6: a confidence factor and an abnormality threshold are set based on the result in S5, and an abnormal state is predicted if the confidence factor exceeds the abnormality threshold.

Preferably, the characteristic values include a performance value change amount, a performance value change rate, and a performance value.

Preferably, in S2, the variance of each eigenvalue of all historical data patterns is arranged according to the value size, and divided into a plurality of subspaces, and the prior probability of the specific state of the variance of each eigenvalue corresponding to each subspace is calculated.

Preferably, in S2, a bayesian model of each historical data pattern is trained according to the feature values of each historical data pattern, and prior probabilities of multiple states of each pattern are obtained respectively.

Preferably, S4 further includes:

calculating the standard deviation of the characteristic values between the current data mode and each historical normal mode;

and obtaining the historical data pattern with the minimum sum of all standard deviations of the current data pattern as the most similar pattern of the current data pattern.

Preferably, the states are an abnormal state, a warning state and a normal state.

Preferably, S6 further includes setting an alarm threshold, and predicting to be an alarm state if the confidence factor is between the alarm threshold and the abnormal threshold, and predicting to be a normal state if the confidence factor is smaller than the alarm threshold.

In order to solve the above problem, the present invention further relates to a performance anomaly prediction system in a distributed system, connected to a monitoring system of the distributed system, including:

the historical characteristic value calculation module extracts a target data value from historical performance data obtained by a plurality of monitoring nodes in the monitoring system to serve as a training data source and calculates characteristic values of historical data patterns in the data source;

the prior probability module is connected with the output end of the historical characteristic value calculation module, respectively obtains prior probability distribution of each historical data mode in various states according to the characteristic value of each historical data mode, and counts the probability distribution of each state, thereby training the Bayesian model of each data mode;

the real-time characteristic value calculating module is used for calculating the characteristic value of the current data mode according to the real-time performance data acquired by a plurality of monitoring nodes in the monitoring system;

the similar mode module is connected with the output end of the historical characteristic value calculating module and the output end of the real-time characteristic calculating module, and finds a data mode which is most similar to the current data mode from the historical data modes;

the probability calculation module predicts through a Bayes model trained in the prior probability module according to the output result of the similar mode module and respectively obtains the probability distribution of the multiple states; and

and the abnormal alarm module is used for setting a confidence factor and an abnormal threshold according to the result in the probability calculation module, and predicting an abnormal state if the confidence factor exceeds the abnormal threshold.

Preferably, the states include an abnormal state, a warning state and a normal state.

Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages and positive effects:

1) according to the invention, aiming at performance abnormity prediction in a distributed system, the problem of variable characteristics is comprehensively considered by analyzing the performance of distributed nodes through a special value and a divided data mode, and the accuracy is higher;

2) according to the invention, the Bayes model of the machine learning method is adopted to guide prediction, the performance abnormal condition is detected in real time, and the detected prediction is evaluated and analyzed through the Bayes model obtained before, so that the prediction confidence is provided, the automation degree is high, and the reliability and the practicability of the prediction are improved;

3) the invention converts the standard variance of the characteristic values of each historical data mode into a plurality of subspaces, trains the subspaces as the parameters of the Bayesian model, calculates the prior probability of the specific state corresponding to each subspace, and further improves the accuracy of the abnormal prediction.

Drawings

FIG. 1 is a flow chart of a performance anomaly prediction method in a distributed system in accordance with the present invention;

fig. 2 is a system block diagram of a performance anomaly prediction system in a distributed system according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the present invention, and it is obvious that what is described herein is only a part of the embodiments of the present invention, and not all of the embodiments, and all other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts belong to the protection scope of the present invention.

For the convenience of understanding the embodiments of the present invention, the following description will be further explained by taking specific embodiments as examples with reference to the drawings, and the embodiments are not to be construed as limiting the embodiments of the present invention.

Example one

Referring to fig. 1, the present invention provides a performance anomaly prediction method in a distributed system, which mainly includes the following steps:

in this embodiment, a data point is described by using three aspect feature values, including Change Value (CV), Change Rate (CR), and performance Value (Value, V). The value of the property being a time t₁The value of the performance metric of (a).

The variation of the performance value being a time t₁With another time t₂Difference in performance metric of (a):

CV (t_{i}) = V_{t_{i}} - V_{t_{i - 1}}

wherein,time t_iI-0, 1, …, n;

time t_i-1I-1, …, n.

The rate of change of a performance value is the rate of change of a performance metric, equal to the amount of change of the performance value divided by the current timet₁Performance value of (2):

wherein,time t_iI-0, 1, …, n;

time t_i-1I-1, …, n.

according to the data feature result of S1, dividing the historical data into a plurality of modes, marking the modes in three states, namely an abnormal state, a warning state and a normal state, then training out prior probability distribution through the three states, counting out the probability distribution of each mode in each state, training out Bayes models of each mode, converting the features of the modes into a plurality of subspaces in order to further improve the model accuracy, and training the subspaces as the parameters of the Bayes models.

In S2, a bayesian model of each historical data pattern may be trained according to the feature values of each historical data pattern, and prior probabilities of multiple states of each pattern may be obtained. The plurality of states may be an abnormal state, a warning state, and a normal state.

A classification model is built using a naive bayes classifier. The use limitation of the naive Bayes classification is that the parameters are independent from each other, and the formality of the obtained pattern is three parameters which are independent from each other, thereby meeting the requirements of the naive Bayes classification.

Assume that the current time is t_iThen from t_i-LTo t_iThe characteristic values related to all data within the time period of (a) constitute the current data pattern, where L is the length of the current data pattern.

During training, each pattern in the training data is tagged to indicate the state of the pattern, i.e., a pattern may be represented as (Vt1, Vt2, …, Vtn, Status). Using the training dataset containing the labels, a prior probability distribution (prior distribution) of all modes of the three states can be obtained:

P((SD_CV,SD_CR,SD_V)|status)

where status is normal state normal, alarm state alert or abnormal state abnormal.

The three standard deviations of the most similar patterns are respectively SD_CV，SD_CR，SD_VThe state corresponding to this mode is the probability of status. According to the training data, the distribution situation p (status) of each state can also be obtained.

According to the prior probability, the probability of a specific state can be calculated under the condition that the variance value is obtained, and the probability is obtained by Bayesian classification:

P (status | ({SD}_{CV}, {SD}_{CR}, {SD}_{V})) = \frac{P (({SD}_{CV}, {SD}_{CR}, {SD}_{V}) | status) P (status)}{P (({SD}_{cv}, {SD}_{CR}, {SD}_{V}))}

as mentioned above, the three parameters are independent of each other and can therefore be expressed as:

\begin{matrix} P (status | ({SD}_{CV}, {SD}_{CR}, {SD}_{V})) \\ = \frac{P ({SD}_{CV} | status) P ({SD}_{CR} | status) P ({SD}_{V} | status) P (status)}{P ({SD}_{CV}) P ({SD}_{CR}) P ({SD}_{V})} \end{matrix}

in order to further improve the model correctness, the various eigenvalue variances of all historical data modes can be arranged according to the value size and divided into a plurality of subspaces, and the prior probability of the specific state of the eigenvalue variance corresponding to each subspace is calculated, wherein the specific state can be an abnormal state, a warning state or a normal state.

The pattern space is divided into a plurality of subspaces, each subspace comprises all specific characteristic values existing in a continuous value range, so that a plurality of discrete subspaces are obtained, and the subspaces are used as parameters of naive Bayes classification. For example, the variance SD of the rate of change of the performance values_CRAll value ranges of (a) are r ═ a, b]Wherein a is the minimum value taken by the variance of the rate of change of the performance values, and b isThe maximum value taken by the variance of the rate of change of the performance values. Dividing the space into m subspaces, the length of each subspace is:

therefore, each subspace can be represented as:

S_SDCR1＝[a,a+Δr],S_SDCR2＝[a+Δr,a+2*Δr],...,S_SDCR1＝[b-Δr,b]

for each performance value rate of change variance, it is simply put into the appropriate subspace. Therefore, the prior probability of the specific state corresponding to each variance does not need to be calculated, and only the prior probability of the specific state corresponding to each subspace needs to be calculated:

\begin{matrix} P (status | ({SD}_{CV}, {SD}_{CR}, {SD}_{V})) \\ = \frac{P (S_{SDCVi} | status) P (S_{SDCVj} | status) P (S_{SDCVk} | status)}{P (S_{SDCVi}) P (S_{SDCRj}) P (S_{SDVk})} \end{matrix}

wherein S is_SDCViVariance SD of Performance value_CVA certain corresponding subspace;

S_SDCRjvariance SD of the rate of change of the value of the property_CRA certain corresponding subspace;

S_SDVk-variance of Performance value SD_VA certain corresponding subspace;

status-a particular state, normal, alert, or abnormal.

S3: and calculating the characteristic value of the current data mode according to the real-time performance data acquired by the monitoring system. Assume that the current time is t_iThen from t_i-L to t_iWherein L is the length of the current data pattern.

S4: finding a data pattern which is most similar to the current data pattern from the historical data patterns;

the method specifically comprises the following steps:

s41: calculating the standard deviation of the characteristic values between the current data mode and each historical normal mode;

each time t_iThe data of (c) all have three characteristics, namely (CV (t)_i),CR(t_i) V (ti)). Assume that the current time is t_iThen from t_i-LTo t_iAll data-related features within the time period of (a) constitute a pattern of the current performance metric, where L is the length of the current data pattern.

As shown in fig. 2, the current pattern is compared with the historical normal patterns, and a pattern most similar to the current data pattern is found among the historical normal patterns. Standard deviations (Standard deviations) of the respective features between the current data pattern and the respective historical normal patterns are calculated. If a historical data pattern is from time t_jBeginning of L to t_jEnding, and recording the standard deviation of the performance value variation between the current data mode and the historical data mode as SD_CV(t_j) The standard deviation of the change rate of the performance value between the current data pattern and the historical data pattern is recorded as SD_CR(t_j) The standard deviation SD between the current data pattern and the historical data pattern_V(t_j). The current data pattern is compared with the previous historical data pattern one by one,

s42: if the sum of all standard deviations of the current data mode and a historical data mode is minimum, the historical data mode is set as the most similar mode of the current data mode.

When a pattern in the historical data satisfies the following formula:

{SD}_{CV} (t_{k}) + {SD}_{CR} (t_{k}) + {SD}_{V} (t_{k}) = \min_{j} {{SD}_{CV} (t_{j}) + {SD}_{CR} (t_{j}) + {SD}_{V} (t_{j})}

wherein, { SD_CV(t_j)+SD_CR(t_j)+SD_V(t_j) } -a set of standard deviations of features between the current data pattern and all historical data patterns;

min — the minimum in the set.

I.e., the sum of all standard deviations that satisfy the current data pattern and this historical data pattern is minimal, then the historical data pattern is said to be the most similar pattern of the current data pattern. Thus, for each current data pattern, the most similar pattern in the history can be found:

(SD_CV(t_k)，SD_CR(t_k)，SD_V(t_k))。

s5: predicting through a Bayes model trained in the S2 according to the output result of the S4, and respectively obtaining probability distribution of multiple states;

most similar mode (SD) according to S4 in this example_CV(t_k)，SD_CR（t_k)，SD_V(t_k) Guided prediction from the bayesian model trained in S2, the probability situation of the state of the pattern is obtained:

P (status | ({SD}_{CV}, {SD}_{CR}, {SD}_{V})) = \frac{P (({SD}_{CV}, {SD}_{CR}, {SD}_{V}) | status) P (status)}{P (({SD}_{cv}, {SD}_{CR}, {SD}_{V}))}

the mode state is determined by obtaining the mode probability, and the mode state is accurately judged, so that the precursor of the abnormal occurrence can be captured, and the abnormal prediction is realized.

The method also comprises the step of setting an alarm threshold value, if the confidence factor is between the alarm threshold value and the abnormal threshold value, the alarm state is predicted, and if the confidence factor is smaller than the alarm threshold value, the normal state is predicted. An alarm mechanism is also needed to be set, and defense treatment measures after alarm are taken through a preset alarm mechanism.

In the present embodiment, for the current mode (SD)_CV，SD_CR，SD_V) According to the above method, the probabilities corresponding to three states are obtained:

P(normal|(SD_cv,SD_CR,SD_V))

P(alert|(SD_cv,SD_CR,SD_V))

P(abnormal|(SD_cv,SD_CR,SD_V))

to determine which state the pattern is in, the probabilities of the three states are compared accordingly:

δ₁＝logP(alert|(SD_CV,SD_CR,SD_V))-logP(normal|(SD_CV,SD_CR,SD_V))

δ₂＝logP(alert|(SD_CV,SD_CR,SD_V))-logP(abnormal|(SD_CV,SD_CR,SD_V))

if the following conditions are met, the current data mode is judged to be in an alarm state, and then abnormity may occur:

δ₁is not less than 0 and delta₂≥0

δ₁Indicating which of the greater the likelihood of the current data pattern being in an alarm state and in a normal state, δ₂Indicating that the probability of the current data pattern being in an alarm state and in an abnormal state is greater. If the formula (3-10) is satisfied, it indicates that the current data pattern is more likely to be in an alarm state than in a normal or abnormal state, and it can be determined that an abnormality is likely to occur next.

When an alarm predicting an abnormality is issued, if δ₁Is not less than 0 and is delta₁The larger the value, the more likely it is that the mode is in an alarm state than in a normal state. Likewise, if δ₂Is not less than 0 and is delta₂The larger the value, the more likely it is that the mode is in an alarm state than in an abnormal state. It can be said that | δ₁| and | δ₂The larger the value of | is, the higher the reliability of the prediction result is, so | δ may be set₁| and | δ₂And | is used as a reference index of the credibility of the abnormal prediction. Each anomaly prediction made is assigned a Confidence Factor (CF) which is calculated as follows:

CF＝δ₁+δ₂

clearly, the greater the likelihood that the pattern is alert state, the greater the CF value, and thus this is a way to effectively measure the confidence of the anomaly prediction. According to the CF value, the degree of reliability of prediction can be known, an alarm threshold value is determined according to the degree of reliability, if the confidence factor is between the alarm threshold value and the abnormal threshold value, an alarm state is predicted, if the confidence factor is smaller than the alarm threshold value, a normal state is predicted, an alarm mechanism needs to be set, and defense treatment measures are taken in the alarm state and the abnormal state through the preset alarm mechanism to prevent the abnormal occurrence or reduce the loss caused by the abnormal occurrence.

Example two

Referring to fig. 2, the present invention provides a performance anomaly prediction system in a distributed system, which is connected to a monitoring system of the distributed system, and mainly includes: the device comprises a historical characteristic value calculating module, a prior probability module, a real-time characteristic value calculating module, a similar mode module, a probability calculating module and an abnormal alarm module.

CV (t_{i}) = V_{t_{i}} - V_{t_{i - 1}}

wherein,time t_iI-0, 1, …, n;

time t_i-1I-1, …, n.

wherein,time t_iI-0, 1, …, n;

time t_i-1I-1, …, n.

according to the data feature result output by the historical feature value calculation module, historical data are divided into a plurality of modes, the modes are marked in three states, namely an abnormal state, a warning state and a normal state, then prior probability distribution is trained through the three states, probability distribution of each mode in each state is counted, Bayes models of various modes are trained, in order to further improve the model accuracy, the features of the modes are converted into a plurality of subspaces, and the subspaces are used as parameters of the Bayes models for training.

In the prior probability module, a Bayesian model of each historical data mode can be trained according to the characteristic value of each historical data mode, and prior probabilities of various states of each mode are obtained respectively. The plurality of states may be an abnormal state, a warning state, and a normal state.

P((SD_CV,SD_CR,SD_V)|status)

P (status | ({SD}_{CV}, {SD}_{CR}, {SD}_{V})) = \frac{P (({SD}_{CV}, {SD}_{CR}, {SD}_{V}) | status) P (status)}{P (({SD}_{cv}, {SD}_{CR}, {SD}_{V}))}

\begin{matrix} P (status | ({SD}_{CV}, {SD}_{CR}, {SD}_{V})) \\ = \frac{P ({SD}_{CV} | status) P ({SD}_{CR} | status) P ({SD}_{V} | status) P (status)}{P ({SD}_{CV}) P ({SD}_{CR}) P ({SD}_{V})} \end{matrix}

The pattern space is divided into a plurality of subspaces, each subspace comprises all specific characteristic values existing in a continuous value range, so that a plurality of discrete subspaces are obtained, and the subspaces are used as parameters of naive Bayes classification. For example, the variance SD of the rate of change of the performance values_CRAll value ranges of (a) are r ═ a, b]Where a is the minimum value taken by the variance of the rate of change of the performance value and b is the maximum value taken by the variance of the rate of change of the performance value.Dividing the space into m subspaces, the length of each subspace is:

therefore, each subspace can be represented as:

S_SDCR1＝[a,a+Δr],S_SDCR2＝[a+Δr,a+2*Δr],...,S_SDCR1＝[b-Δr,b]

\begin{matrix} P (status | ({SD}_{CV}, {SD}_{CR}, {SD}_{V})) \\ = \frac{P (S_{SDCVi} | status) P (S_{SDCVj} | status) P (S_{SDCVk} | status)}{P (S_{SDCVi}) P (S_{SDCRj}) P (S_{SDVk})} \end{matrix}

S_SDVk-variance of Performance value SD_VA certain corresponding subspace;

status-a specific state, normal state normal, alarm state alert or abnormal state abnormal.

And the real-time characteristic value calculating module is used for calculating the characteristic value of the current data mode according to the real-time performance data acquired by a plurality of monitoring nodes in the monitoring system. Assume that the current time is t_iThen from t_i-L to t_iWherein L is the length of the current data pattern.

The similar mode module is connected with the real-time characteristic value calculating module and the historical characteristic value calculating module and finds a data mode which is most similar to the current data mode from the historical data modes;

the method specifically comprises the following steps:

the historical data comparison module is connected with the real-time characteristic value calculation module and the historical characteristic value calculation module and used for calculating the standard deviation of the characteristic values between the current data mode and each historical normal mode;

As shown in fig. 2, the current pattern is compared with the historical normal patterns, and a pattern most similar to the current data pattern is found among the historical normal patterns. Standard deviations (Standard deviations) of the respective features between the current data pattern and the respective historical normal patterns are calculated. If a historical data pattern is from time t_jBeginning of L to t_jEnding, and recording the standard deviation of the performance value variation between the current data mode and the historical data mode as SD_CV(t_j) The standard deviation of the change rate of the performance value between the current data pattern and the historical data pattern is recorded as SD_CR(t_j) Current data mode and the historical data modeThe standard deviation SD between the formulae_V(t_j). The current data pattern is compared with the previous historical data pattern one by one,

and the minimum variance acquisition module is connected with the output end of the historical data comparison module, and if the sum of all standard variances of the current data mode and a historical data mode is minimum, the historical data mode is set as the most similar mode of the current data mode.

When a pattern in the historical data satisfies the following formula:

{SD}_{CV} (t_{k}) + {SD}_{CR} (t_{k}) + {SD}_{V} (t_{k}) = \min_{j} {{SD}_{CV} (t_{j}) + {SD}_{CR} (t_{j}) + {SD}_{V} (t_{j})}

wherein, { SD_CV(t_j)+SD_CR(t_j)+SD_V(t_j) } -a set of standard deviations of features between the current data pattern and all historical data patterns; min — the minimum in the set.

(SD_CV(t_k)，SD_CR(t_k)，SD_V(t_k)。

the probability calculation module predicts through a Bayes model trained in the prior probability module according to the output result of the similar mode module and respectively obtains probability distribution of various states;

in this embodiment, according to the most similar mode of the minimum variance obtaining module:

(SD_CV(t_k)，SD_CR(t_k)，SD_V(t_k) And guiding prediction by a Bayes model trained in a prior probability module to obtain probability conditions of each state of the model:

P (status | ({SD}_{CV}, {SD}_{CR}, {SD}_{V})) = \frac{P (({SD}_{CV}, {SD}_{CR}, {SD}_{V}) | status) P (status)}{P (({SD}_{cv}, {SD}_{CR}, {SD}_{V}))}

And the abnormality alarm module is used for setting a confidence factor and an abnormality threshold according to the output result in the probability calculation module, and predicting an abnormal state if the confidence factor exceeds the abnormality threshold.

Generally, the method also comprises the step of setting an alarm threshold value, if the confidence factor is between the alarm threshold value and the abnormal threshold value, the alarm state is predicted, and if the confidence factor is smaller than the alarm threshold value, the normal state is predicted. An alarm mechanism is also needed to be set, and defense treatment measures after alarm are taken through a preset alarm mechanism.

In the present embodiment, for the current mode (SD)_CV，SD_CR，SD_V) The above method yields a summary of the three statesRate:

P(normal|(SD_cv,SD_CR,SD_V))

P(alert|(SD_cv,SD_CR,SD_V))

P(abnormal|(SD_cv,SD_CR,SD_V))

δ₁＝logP(alert|(SD_CV,SD_CR,SD_V))-logP(normal|(SD_CV,SD_CR,SD_V))

δ₂＝logP(alert|(SD_CV,SD_CR,SD_V))-logP(abnormal|(SD_CV,SD_CR,SD_V))

δ₁is not less than 0 and delta₂≥0

When an alarm predicting an abnormality is issued, if δ₁Is not less than 0 and is delta₁The larger the value, the more likely it is that the mode is alert state than normal state. Likewise, if δ₂Is not less than 0 and is delta₂The larger the value, the more likely it is that the mode is in an alarm state than in an abnormal state. It can be said that | δ₁| and | δ₂The larger the value of | is, the higher the reliability of the prediction result is, so | δ may be set₁| and | δ₂And | is used as a reference index of the credibility of the abnormal prediction. Each anomaly prediction made is assigned a Confidence Factor (CF) which is calculated as follows:

CF＝δ₁+δ₂

The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A performance anomaly prediction method in a distributed system is characterized by comprising the following steps:

2. The method of claim 1, wherein the performance anomaly prediction method comprises,

the characteristic values include a performance value change amount, a performance value change rate, and a performance value.

3. The method of claim 1, wherein in S2, the variance of each eigenvalue of all historical data patterns is arranged according to the value size and divided into a plurality of subspaces, and the prior probability of a specific state of the variance of each eigenvalue corresponding to each subspace is calculated.

4. The method of claim 3, wherein in step S2, a Bayesian model of each historical data pattern is trained according to the feature values of each historical data pattern, and the prior probabilities of the multiple states of each pattern are obtained.

5. The method of claim 3, wherein the step S4 further comprises:

6. The method of claim 3, wherein the states are an abnormal state, a warning state and a normal state.

7. The method of claim 3, wherein the step S6 further comprises setting an alarm threshold, and predicting an alarm state if the confidence factor is between the alarm threshold and the abnormal threshold, and predicting a normal state if the confidence factor is less than the alarm threshold.

8. A performance anomaly prediction system in a distributed system, coupled to a monitoring system of the distributed system, comprising:

the similar mode module is connected with the output end of the historical characteristic value calculation module and the real-time characteristic calculation module, and finds a data mode which is most similar to the current data mode from the historical data modes;

9. The system of claim 8, wherein the characteristic values include performance value variation, performance value variation rate, and performance value.

10. The system of claim 8, wherein the states include an abnormal state, an alert state, and a normal state.