CN108446741B

CN108446741B - Method, system and storage medium for evaluating importance of machine learning hyper-parameter

Info

Publication number: CN108446741B
Application number: CN201810270934.5A
Authority: CN
Inventors: 孙运雷; 魏倩; 孔言
Original assignee: China University of Petroleum East China
Current assignee: China University of Petroleum East China
Priority date: 2018-03-29
Filing date: 2018-03-29
Publication date: 2020-01-07
Anticipated expiration: 2038-03-29
Also published as: CN108446741A

Abstract

The invention discloses a method, a system and a storage medium for evaluating the importance of machine learning hyper-parameters, which are used for acquiring different data sets in OpenML, extracting meta-features to represent each data set, and collecting data of the performance of a classification algorithm to be evaluated under different hyper-parameter configurations; extracting meta-features to represent a target data set in use, and obtaining an increasing sequence of distances between the target data set and the historical data set by calculating distances between the meta-features; and evaluating the importance of the hyper-parameters by using performance data of different hyper-parameters of the classification algorithm to be evaluated, sequentially executing the proposed Relief and clustering algorithm on the previous m historical data sets close to the target data set according to the ordered sequence of increasing distances between the historical data sets and the target data set, and finally obtaining the automatic parameter adjusting process of ordering and guiding the importance of the hyper-parameters of the classification algorithm to be evaluated. The invention provides certain guidance for the super-parameter adjustment of the classification algorithm black box, thereby achieving the purposes of saving time and improving efficiency.

Description

Method, system and storage medium for evaluating importance of machine learning hyper-parameter

Technical Field

The invention relates to a method and a system for evaluating the importance of a machine learning hyper-parameter and a storage medium.

Background

Machine learning provides important technical support for data processing and data classification, however, model selection and parameter adjustment are still two major problems troubling users, and thus automatic machine learning systems are produced. The automatic machine learning system achieves the purposes of automatic data preprocessing, automatic algorithm selection and automatic parameter adjustment by utilizing an automatic machine learning algorithm, improves the accuracy of data classification prediction, and simultaneously relieves users from the heavy tasks of algorithm selection and repeated parameter adjustment.

Because the core of the automatic machine learning is automatic Algorithm Selection and automatic super-parameter configuration, the system reduces the machine learning process into an Algorithm Selection and super-parameter optimization (CASH) problem. The CASH problem is that the selection of the algorithm is used as a new super parameter of the root level, and therefore the problems of the selection algorithm and the super parameter value are mapped to the problem of the selection of the super parameter value. By using data preprocessing and feature selection techniques as the hyper-parameters, the system can automatically select data preprocessing and feature selection techniques. The final conclusion of the super-parameter optimization problem can be that the optimal solution can be found through a classical Bayesian optimization algorithm, so that the effect of improving the data classification prediction precision is achieved.

However, the configuration process of the hyper-parameter configuration module of the current automatic machine learning system is completely empirical, or the final result is obtained by repeated iteration to adjust the configuration of a plurality of hyper-parameters one by one, so that the defects are as follows: machine learning time is wasted, computer resources are wasted by repeated iterations, and adjusting the configuration of all hyper-parameters without significant concern wastes time and effort by the user.

Disclosure of Invention

The invention relates to a method, a system and a storage medium for evaluating the importance of machine learning hyper-parameters, aiming at solving the technical problem of accurately evaluating the hyper-parameter importance of a machine learning algorithm and using the same for guiding automatic hyper-parameter configuration and enhancing the interpretability of the hyper-parameter configuration.

As a first aspect of the present invention:

the method for evaluating the importance of the machine learning hyper-parameter comprises the following steps:

step (1): acquiring a plurality of new data sets similar to the target data set type from an open machine learning environment OpenML, and extracting meta-feature vectors for each new data set, so that each new data set is represented by the meta-feature vectors;

collecting data of the performance of a classification algorithm to be evaluated under different hyper-parameter configurations from an open machine learning environment OpenML;

storing the meta-feature vector of each new data set and the performance data corresponding to different hyper-parameter configurations in corresponding historical data sets;

step (2): extracting meta-feature vectors of the target data set to represent the target data set, calculating the distance between the meta-feature vectors of the target data set and the meta-feature vectors of the historical data sets, and obtaining a distance sequence from near to far between the target data set and each historical data set;

and (3): sequentially executing a Relief-Cluster algorithm on the first f historical data sets closest to the target data set: further calculating the average weight of each type of hyper-parameter through the weight of each type of hyper-parameter obtained by a Relief algorithm, and preliminarily obtaining the importance weight sequence of each type of hyper-parameter by utilizing the average weight of each type of hyper-parameter; further verifying the accuracy of the super-parameter importance evaluation by using a clustering algorithm; and finally, obtaining the super-parameter importance ranking of the classification algorithm to be evaluated.

The machine learning super-parameter importance evaluation method comprises the following steps:

and (4): and setting a plurality of parameters with the top importance ranking according to the obtained super-parameter importance ranking of the classification algorithm to be evaluated, and then classifying the data to be classified by using the classification algorithm with the set parameters.

In the step (1), each data set D_iIs described as a vector represented by F meta-features

In the step (1), the meta-feature includes: simple meta-features, statistical meta-features and significance meta-features of the data set;

the simple meta-features include: the number of data set samples, the number of features, the number of categories, or the number of missing values;

statistical meta-features of the data set, including: the kurtosis of the mean, variance, or distance vector;

significance meta-features including: performance obtained by running a machine learning algorithm on the data set.

The performance of the classification algorithm to be evaluated in the step (1) under different hyper-parameter configurations comprises the following steps: misclassification rate or RMSE;

in addition, for many common algorithms, the open machine learning environment OpenML already contains very comprehensive performance data, and is suitable for different hyper-parameter configurations on various data sets, namely, collecting a data set D_iHyper-parametric configuration theta under classification algorithm to be evaluated_iAnd performance y_iData of

For the target data set D_N'Extracting the meta-feature V_N'To represent a target data set and based onThe dissimilar data sets have the principle that the hyper-parameter configuration of the algorithm also has difference, and the distance sequence between the target data set and the historical data set is obtained by utilizing the distance between the meta-feature vectors. Evaluating the importance of the hyper-parameters of the first f historical data sets close to the target data set by using performance data of the algorithm in different hyper-parameters;

measuring target data set D by using distance between meta-feature vectors_N'With historical data set D_iA distance d between_pn(D_N′，D_i)：

d_pn(D_N′，D_i)＝||V_N′-V_i||_pn

Wherein, V_N'Representing a data set D_N'Meta feature vector of (V)_iRepresenting a historical data set D_iPn denotes the p-norm.

And comparing the distances between the target data set and the meta-feature vectors of the historical data set to obtain an ordering sequence pi (1) of the distances between the historical data set and the target data set from near to far, wherein the ordering sequence pi (1) is

Sequentially executing a Relief-Cluster algorithm on the first f historical data sets close to the target data set according to the sorting queues pi (1) from the historical data set to the target data set from near to far. Firstly, preliminarily evaluating the importance of the hyper-parameter by the average weight of each type of the hyper-parameter obtained by a Relief algorithm, then further verifying the accuracy of the evaluation of the importance of the hyper-parameter by utilizing the r (C) index of the clustering algorithm, repeating the above two steps for m times, selecting the corresponding evaluation result of the importance of the hyper-parameter when the r (C) index is maximum, finally obtaining the ranking of the importance of the hyper-parameter of the classification algorithm to be evaluated, and finally guiding the automatic parameter adjusting process of a target data set in the classification algorithm to be evaluated.

The weight of each type of hyper-parameter obtained by the Relief algorithm comprises the following steps:

setting a threshold according to the size of performance data under different super-parameter configurations, and collecting historical dataDividing performance data corresponding to different medium-sized hyper-parameter configurations into high-performance samples and low-performance samples, and randomly selecting one sample s from the performance data by a Relief algorithm_iThen, a distance s is selected from each of the high-performance samples and the low-performance samples_iThe most recent sample;

and s_iHomogeneous samples s_jIs represented by M, with s_iSamples of different classes s_jWeight w of per-class hyperparameter h, denoted by Q_hUpdating according to equation (1):

w_h＝w_h-diff(h,s_i,M)/rt+diff(h,s_i,Q)/rt (1)

diff(h,s_im) represents two samples s_iThe difference from M in the hyperparameter h;

diff(h,s_iq) represents two samples s_iThe difference from Q in the hyperparameter h;

two samples s_iAnd s_jThe difference diff (h, s) in the hyperparameter h_i,s_j) Is defined as:

if the superparameter h is a scalar type superparameter,

if the hyperparameter h is a numerical hyperparameter,

wherein i is not less than 1 and not more than j and not more than m, h is not less than 1 and not more than ph, max_hIs the maximum value of the hyperparameter h in the sample set, min_hIs the minimum value of the hyperparameter h in the sample set, m represents the number of samples, each sample contains ph hyperparameters, rt represents the iteration number, and rt represents>1, to avoid the randomness of one sampling; s_ihIs shown in sample s_iValue of upper parameter h, s_jhIs shown in sample s_jThe value of the upper parameter h.

As can be seen from equation (1), the superparameters that contribute greatly to high performance are represented by large differences among different classes and small differences among similar classes, and therefore the weight of the superparameters having the ability to distinguish is a positive value.

In order to avoid the randomness of one-time sampling, rt is performed in an iteration mode for more than 1 time, and importance weight sequencing of each type of super parameter is obtained.

The step of further verifying the accuracy of the hyperparameter importance assessment by using the clustering algorithm comprises the following steps:

sorting the superparameters positioned in the top K classes according to the importance weights of the obtained superparameters of each class, clustering the superparameters positioned in the top K classes, and calculating the importance of the superparameters, wherein a superparameter sample set is assumed to be S, T is the size of the superparameter sample set, K is the number of the classes to which the superparameter samples belong, and p_ikRepresenting the probability of a sample belonging to class k, C_kThe actual class label representing a hyper-parameter sample, C represents a hyper-parameter set, and the importance measure at C, r (C), is expressed as:

wherein F (C) represents the difference between the result of clustering on the hyper-parameter set C and the class label on the whole hyper-parameter sample set, C represents the hyper-parameter set, F_i(C) Denotes the difference, X, between the result of clustering on the hypercameter set C and the class label within each class_iA sample set of hyper-parameters representing the ith class.

The higher the value of r (C), the greater the correlation between the clustering result and the actual class label, and the greater the influence of the hyper-parameter set C on the classification. And selecting the corresponding super parameter importance evaluation result when the r (C) index is maximum.

Class labels refer to high performance and low performance labels.

As a second aspect of the present invention,

the machine learning super-parameter importance evaluation system comprises: the computer program product comprises a memory, a processor, and computer instructions stored on the memory and executed on the processor, wherein the computer instructions, when executed by the processor, perform the steps of any of the above methods.

As a third aspect of the present invention,

a computer readable storage medium having computer instructions embodied thereon, which, when executed by a processor, perform the steps of any of the above methods.

The invention has the beneficial effects that:

the invention can accurately evaluate the super-parameter importance of the machine learning algorithm, and is used for guiding the automatic super-parameter configuration and enhancing the interpretability problem of the super-parameter configuration. The super-parameter importance for describing the machine learning algorithm per se provides effective reference and good interpretability for a super-parameter configuration process. The module is used for solving the technical problem of accurately evaluating the super-parameter importance of the machine learning algorithm and using the super-parameter importance to guide automatic super-parameter configuration and enhance the interpretability of the super-parameter configuration.

(1) The method saves resources and time, reduces the search space by providing proper prior knowledge, ensures that the super-parameter configuration process has certain guidance, and gets rid of the state of the prior complete black box.

(2) Meanwhile, the user can intuitively know which type of hyper-parameters has greater influence on the performance of the algorithm.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the application and, together with the description, serve to explain the application and are not intended to limit the application.

FIG. 1 is a flow chart provided by the present invention;

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.

The method fully utilizes a plurality of data sets in the open machine learning environment OpenML and performance data of each data set under various algorithms, calculates the distance between a target data set and a historical data set by combining a meta-learning method, obtains the importance sequence of each type of hyper-parameters of a classification algorithm to be evaluated by utilizing a Relief algorithm and a clustering algorithm, and the sequencing result is used for guiding the automatic parameter adjusting process of the target data set in the classification algorithm to be evaluated. The invention provides proper prior knowledge, reduces the search space, ensures that the super-parameter configuration process has certain instructive performance, and gets rid of the state of the prior complete black box; meanwhile, the user can intuitively know which type of hyper-parameters has greater influence on the performance of the algorithm.

As shown in fig. 1, the present invention comprises the steps of:

step A, obtaining different data sets in OpenML, extracting meta-features from each data set, enabling each data set to be represented by the meta-features, and collecting theta of a classification algorithm to be evaluated in different hyper-parametric configurations_iLower performance y_i(e.g., misclassification rate or RMSE) of data

Storing the meta-feature vector of each data set and performance data corresponding to different super-parameter configurations in a historical data set sample library;

the meta-features extracted in step a mainly include three major parts, namely, simple meta-features (for example, the number of samples in the data set, the number of features, the number of categories, the number of missing values, etc.), statistical meta-features of the data set (for example, the average value, the variance, the kurtosis of the distance vector, etc.), and importance meta-features (for example, information about the performance obtained by running a machine learning algorithm on the data set, etc.).

And step B, for the target data set used by the user, extracting meta-features to represent the target data set, and obtaining a distance sequence between the target data set and the historical data set by using the distance between meta-feature vectors on the basis of the principle that the super-parameter configuration of the algorithm used by dissimilar data sets also has difference. For the first f historical data sets which are closer to the target data set, the importance of the super parameters can be evaluated by using performance data of different super parameters of a classification algorithm to be evaluated;

in step B, the distance between the meta feature vectors is used to measure the target data set D_N'With historical data set D_i(i ═ 1,2, … N), where the distance formula we use the usual p-norm that measures the difference between the feature vectors of the dataset elements: d_pn(D_N′，D_i)＝||V_N′-V_i||_pn. By comparing the distances between the target data set and the meta-feature vectors of the historical data set, the ordered sequence pi (1) of the historical data set and the target data set from near to far can be obtained, wherein the ordered sequence pi (1) is

And step C, sequentially executing the proposed Relief-Cluster algorithm on the first f historical data sets close to the target data set according to the ordered sequence of the distance between the historical data sets and the target data set from near to far. Firstly, preliminarily evaluating the importance of the hyper-parameter by the average weight of each type of the hyper-parameter obtained by a Relief algorithm, then further verifying the accuracy of the evaluation of the importance of the hyper-parameter by utilizing the r (C) index of the clustering algorithm, repeating the two steps for m times, selecting the corresponding evaluation result of the importance of the hyper-parameter when the r (C) index is maximum, and finally obtaining the ranking of the importance of the hyper-parameter of the classification algorithm to be evaluated and then using the ranking for guiding the automatic parameter adjusting process of the target data set in the classification algorithm to be evaluated.

In the present invention, step C specifically includes the following steps:

step C1, setting a threshold according to the size of performance data under different super-parameter configurations to divide the data into a high-performance class and a low-performance class, and randomly selecting a sample s from a super-parameter sample set by a Relief algorithm_iThen selecting a distance s from each of the two types of samples_iThe most recent sample. And s_iSamples of the same kind are denoted by M, and_ithe samples of different classes are represented by Q, and the weight w of each class of super parameter h_hUpdating according to equation (1):

w_h＝w_h-diff(h,s_i,M)/rt+diff(h,s_i,Q)/rt (1)

in the above formula, two samples s_iAnd s_j(1 ≦ i ≠ j ≦ m) the difference in the hyper-parameter h (1 ≦ h ≦ ph) is defined as:

if the super parameter h is a scalar type super parameter,

if the super parameter h is a numerical super parameter,

therein, max_hAnd min_hThe maximum and minimum values of the hyper-parameter h in the sample set, respectively.

As can be seen from equation (1), the hyperparameter with a large contribution to high performance should be represented by a large difference between different categories and a small difference between the same categories, and thus the weight of the hyperparameter with the ability to distinguish should be a positive value. To avoid the randomness of one sample, the above process iterates rt >1 times.

And step C2, according to the importance weight sequence of each type of super-parameter obtained in the previous step, clustering the super-parameters in the top K types, and calculating the feature importance, wherein a super-parameter sample set is assumed to be S, T is the size of the super-parameter sample set, K is the number of the classes to which the super-parameter samples belong, and p_ikRepresenting the probability of a sample belonging to class k, C_kRepresenting the actual class label of the hyper-parameter sample, C represents the hyper-parameter subset, the importance measure at C, r (C), can be expressed as:

wherein F (C) represents the difference between the clustering result on the hyper-parameter set C and the class label on the whole hyper-parameter sample set, C represents the hyper-parameter set, F_i(C) Denotes differences within respective classes, X_iA set of hyper-reference samples representing the ith class. The higher the r (C) value is, the greater the correlation degree between the clustering result and the actual class label is, and the greater the influence of the super-parameter set C on the classification is.

And (3) iterating the two steps for m times, selecting the corresponding super-parameter importance ranking when the r (r) and the C (c) are the maximum, and finally converting the obtained super-parameter importance ranking result into an automatic parameter adjusting process for guiding the target data set in a classification algorithm to be evaluated.

The flow chart of the Relief-Cluster algorithm in the invention is as follows:

inputting: hyper-parametric sample set S, hyper-parametric class number hc, sampling/iteration number rt

And (3) outputting: cluster evaluation index r (C), hyperparametric importance weight matrix W

Randomly selecting a sample S from S_i；

From and s_iSelecting and s from samples of the same kind_iThe nearest neighbor is marked as M;

from and s_iSelection of S from heterogeneous samples_iThe nearest neighbor is marked as N;

updating the super-parameter importance weight vector W by adopting a formula (1);

selecting a super parameter set with the size of X;

clustering samples on the hyper-parameter set;

calculating the correlation degree r (C) of the clustering result and the actual result

Selecting corresponding super-parameter importance sequencing when the value is maximum from m r (C);

End

the above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. The classification system of the data to be classified based on the machine learning super-parameter importance evaluation is characterized by comprising the following steps:

a historical data set acquisition module configured to: acquiring a plurality of new data sets similar to the target data set type from an open machine learning environment OpenML, and extracting meta-features from each new data set to enable each new data set to be represented by a meta-feature vector;

a distance sequence acquisition module configured to: extracting meta-feature vectors of the target data set to represent the target data set, calculating the distance between the meta-feature vectors of the target data set and the meta-feature vectors of the historical data sets, and obtaining a distance sequence from near to far between the target data set and each historical data set;

an output module configured to: sequentially executing a Relief-Cluster algorithm on the first f historical data sets closest to the target data set: further calculating the average weight of each type of hyper-parameter through the weight of each type of hyper-parameter obtained by a Relief algorithm, and preliminarily obtaining the importance weight sequence of each type of hyper-parameter by utilizing the average weight of each type of hyper-parameter; further verifying the accuracy of the super-parameter importance evaluation by using a clustering algorithm; finally, obtaining the super-parameter importance ranking of the classification algorithm to be evaluated;

a classification module configured to: and setting a plurality of parameters with the top importance ranking according to the obtained super-parameter importance ranking of the classification algorithm to be evaluated, and then classifying the data to be classified by using the classification algorithm with the set parameters.

2. The system of claim 1, wherein each data set D in the historical data set acquisition module_iIs described as a vector represented by F meta-features

3. The system of claim 1, wherein the meta-features in the historical data set acquisition module include: simple meta-features, statistical meta-features and significance meta-features of the data set;

the importance meta-feature comprises: performance obtained by running a machine learning algorithm on the data set.

4. The system of claim 1, wherein the performance of the classification algorithm to be evaluated in the historical data set acquisition module under different hyper-parameter configurations comprises: misclassification rate or RMSE.

5. The system of claim 1, wherein the distance between meta-feature vectors is used to scale the target data set D_N+1With historical data set D_iA distance d between_pn(D_N′，D_i)：

d_pn(D_N′，D_i)＝||V_N′-V_i||_pn

Wherein, V_N′Representing a target data set D_N′Meta feature vector of (V)_iRepresenting a historical data set D_iP represents the p-norm;

and comparing the distances between the target data set and the meta-feature vectors of the historical data set to obtain an ordering sequence pi (1) of the distances between the historical data set and the target data set from near to far.

6. The system of claim 1, wherein,

setting a threshold according to the size of performance data under different super-parameter configurations, dividing the performance data corresponding to different super-parameter configurations in a historical data set into high-performance samples and low-performance samples, and randomly selecting a sample s from the performance data by a Relief algorithm_iThen, a distance s is selected from each of the high-performance samples and the low-performance samples_iThe most recent sample;

w_h＝w_h-diff(h，s_i，M)/rt+diff(h，s_i，Q)/rt (1)

diff(h，s_im) represents two samples s_iThe difference from M in the hyperparameter h;

diff(h，s_iq) represents two samples s_iThe difference from Q in the hyperparameter h;

two samples s_iAnd s_jThe difference diff (h, s) in the hyperparameter h_i，s_j) Is defined as:

if the superparameter h is a scalar type superparameter,

if the hyperparameter h is a numerical hyperparameter,

wherein i is not less than 1 but not more than j and m is not less than 1 but not more than hph，max_hIs the maximum value of the hyperparameter h in the sample set, min_hIs the minimum value of the hyperparameter h in the sample set, m represents the number of samples, each sample contains ph hyperparameters, rt represents the iteration number, rt >1, s_ihIs shown in sample s_iValue of upper parameter h, s_jhIs shown in sample s_jThe value of the upper parameter h.

7. The classification system of the data to be classified based on the machine learning super-parameter importance evaluation is characterized by comprising the following steps: a memory, a processor, and computer instructions stored on the memory and executed on the processor, the computer instructions when executed by the processor performing the steps of:

step (1): acquiring a plurality of new data sets similar to the target data set type from an open machine learning environment OpenML, and extracting meta-features from each new data set to enable each new data set to be represented by a meta-feature vector;

and (3): sequentially executing a Relief-Cluster algorithm on the first f historical data sets closest to the target data set: further calculating the average weight of each type of hyper-parameter through the weight of each type of hyper-parameter obtained by a Relief algorithm, and preliminarily obtaining the importance weight sequence of each type of hyper-parameter by utilizing the average weight of each type of hyper-parameter; further verifying the accuracy of the super-parameter importance evaluation by using a clustering algorithm; finally, obtaining the super-parameter importance ranking of the classification algorithm to be evaluated;

8. The system of claim 7, wherein in step (1), each data set D_iIs described as a vector represented by F meta-features

9. The system of claim 7, wherein in step (1), the meta-features comprise: simple meta-features, statistical meta-features and significance meta-features of the data set;

10. The system of claim 7, wherein the performance of the classification algorithm to be evaluated in step (1) under different hyper-parameter configurations comprises: misclassification rate or RMSE.

11. The system of claim 7, wherein the distance between meta feature vectors is used to scale the target data set D_N+1With historical data set D_iA distance d between_pn(D_N′，D_i)：

d_pn(D_N′，D_i)＝||V_N′-V_i||_pn；

and comparing the distances between the target data set and the meta-features of the historical data set to obtain an ordering sequence pi (1) of the distances between the historical data set and the target data set from near to far.

12. The system of claim 7, wherein,

w_h＝w_h-diff(h，s_i，M)/rt+diff(h，s_i，Q)/rt (1)

if the superparameter h is a scalar type superparameter,

if the hyperparameter h is a numerical hyperparameter,

wherein i is not less than 1 and not more than j and not more than m, h is not less than 1 and not more than ph, max_hIs the maximum value of the hyperparameter h in the sample set, min_hIs the minimum value of the hyperparameter h in the sample set, m represents the number of samples, each sample contains ph hyperparameters, rt represents the iteration number, rt >1, s_ihIs shown in sample s_iValue of upper parameter h, s_jhIs shown in sample s_jThe value of the upper parameter h.

13. A computer readable storage medium having computer instructions embodied thereon, said computer instructions when executed by a processor performing the steps of:

14. The medium of claim 13, wherein in step (1), each data set D_iIs described as a vector represented by F meta-features

15. The medium of claim 13, wherein in step (1), the meta-feature comprises: simple meta-features, statistical meta-features and significance meta-features of the data set;

16. The medium of claim 13, wherein the performance of the classification algorithm under evaluation in step (1) under different hyper-parameter configurations comprises: misclassification rate or RMSE.

17. The medium of claim 13, wherein the distance between meta feature vectors is used to scale the target data set D_N+1With historical data set D_iA distance d between_pn(D_N′，D_i)：

d_pn(D_N′，D_i)＝||V_N′-V_i||_pn；

Wherein, V_N′Representing a target data set D_N′Meta feature vector of (V)_iRepresenting the number of historiesData set D_iP represents the p-norm;

18. The medium of claim 13, wherein the weight for each type of hyperparameter obtained by the Relief algorithm comprises:

w_h＝w_h-diff(h，s_i，M)/rt+diff(h，s_i，Q)/rt (1)

if the superparameter h is a scalar type superparameter,

if the hyperparameter h is a numerical hyperparameter,