CN111695042B

CN111695042B - User behavior prediction method and system based on deep walking and ensemble learning

Info

Publication number: CN111695042B
Application number: CN202010524285.4A
Authority: CN
Inventors: 陈佐; 吴志良; 杨胜刚; 朱桑之; 谷浩然; 杨捷琳
Original assignee: Hunan Huda Jinke Technology Development Co ltd
Current assignee: Hunan Huda Jinke Technology Development Co ltd
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2023-04-18
Anticipated expiration: 2040-06-10
Also published as: CN111695042A

Abstract

The invention discloses a user behavior prediction method and a system based on deep Walk and ensemble learning, wherein the method comprises the steps of preprocessing problems such as repetition, abnormity, redundancy and the like existing in an original data set, extracting statistical information and liveness information capable of reflecting behavior habits and preference degrees of consumers from the preprocessed data set to construct user portraits for users, then performing random Walk (Ramdon Walk) on social network diagram structures of commodities purchased by the users to obtain a new behavior sequence, and then adding context information of each behavior of the users obtained by a Word2vec model into a machine learning model to train and learn, so that the prediction reliability and the prediction accuracy of the model are improved.

Description

User behavior prediction method and system based on deep walking and ensemble learning

Technical Field

The invention relates to the technical field of machine identification, in particular to a user behavior prediction method and a user behavior prediction system based on deep walking and ensemble learning.

Background

With the rapid development of internet technology and electronic commerce, more and more people enjoy shopping from the internet, and the problem of daily article demand is solved. Every day, thousands of users purchase commodities from the E-commerce online shopping platform, and it is significant to analyze the historical behaviors of the users by using an artificial intelligence algorithm to judge whether the users purchase the commodities. For example, researchers have found that by analyzing historical shopping data of users on a certain e-commerce platform, good preference and behavior characteristics can be mined, which has a great effect on personalized recommendation, user relationship management and advertisement placement cost. Therefore, the artificial intelligence algorithm is of great research significance for judging whether the user purchases the commodity historically.

Machine learning algorithms have been a common method of determining whether a user has purchased or collected merchandise. Research shows that a user behavior prediction model is generally established and optimized from two angles, and the generalization capability of the algorithm model is optimized from the angle of the model algorithm; and the other method is to improve the generalization capability of the user behavior prediction model by analyzing the behavior sequence of the user so as to establish an algorithm model. However, with the rise of ensemble learning, the prior art improves the generalization capability of the algorithm by fusing single models. Both of these methods have their own advantages, but at present, some disadvantages remain. The method comprises the following steps:

(1) Context semantic information of each behavior of the user is not effectively considered when a user behavior sequence is researched, so that the learning capability and the prediction accuracy of a model obtained by training are low;

(2) In the ensemble learning process, most studies adopt a random sampling method to generate training subsets to construct several single classifiers, however, the diversity of the training subsets is not guaranteed, which may result in the degradation of the overall classification performance.

Disclosure of Invention

In order to solve at least one of the above problems, the present invention provides a user behavior prediction method based on deep walking and ensemble learning.

The invention is realized by the following technical scheme:

the user behavior prediction method based on deep walking and ensemble learning comprises the following steps:

s1, acquiring an original data set and preprocessing the original data set;

s2, constructing a user portrait based on the preprocessed data set to form a commodity social network graph structure;

s3, randomly walking the commodity social network diagram structure to obtain new behavior sequence data, and then training the new behavior sequence data by using a Word2vec model to generate an embedding vector;

and S4, inputting the embedding vector into a machine learning model for training to obtain a single user behavior prediction model.

According to the commodity social network graph structure, the commodity social network graph structure is formed by constructing the user portrait, the reliability and the precision of user behavior prediction can be improved by utilizing a deep walking technology based on the social network graph structure.

Furthermore, in order to further improve the accuracy and reliability of user behavior prediction, the invention integrates (fuses) the single user behavior prediction model to obtain a fusion model with higher prediction accuracy. The method further comprises the step S5 of fusing two models with the largest difference in the plurality of single user behavior prediction models to obtain the user behavior prediction model.

Preferably, the model fusion step of the invention specifically adopts a model difference measurement method of MIC and confusion matrix to realize model fusion, so that the learning capability of the model can be improved and the generalization capability can be more excellent. Step S5 of the present invention specifically includes:

step S51, repeatedly executing the step S3 and the step S4 by adjusting the step length of random walk and the dimensionality of the embedding vector, and constructing and obtaining a plurality of single user behavior prediction models;

s52, selecting n models from a plurality of single user behavior prediction models according to generalization capability; wherein n is a positive integer greater than or equal to 3;

step S53, calculating the maximum information coefficient MIC between each model and each model in the n models, and constructing a confusion matrix for visualization;

and S54, finding out two single models with the minimum similarity on the obtained confusion matrix, and fusing to obtain a user behavior prediction model.

Preferably, in step S2 of the present invention, the user image is constructed from three angles, which are the basic information of the user, the activity information of the user, and the statistical information of the user operation behavior.

Preferably, the random walk process in step S3 of the present invention specifically includes: starting from any node of the network graph structure, randomly selecting one from a plurality of points connected with the current node in each step of the migration, and repeating the process continuously until the set migration length is reached, and stopping the migration, thereby obtaining a new piece of user behavior sequence data.

On the other hand, the invention also provides a user behavior prediction system based on deep walking and ensemble learning, and the system comprises a data acquisition module, a preprocessing module, a user portrait module, a random walking module and a training module;

the data acquisition module is used for acquiring original behavior data of a user, constructing an original data set and sending the original data set to the preprocessing module;

the preprocessing module is used for preprocessing the original data set and sending the preprocessed data to the user portrait module;

the user portrait module is used for constructing a user portrait based on the preprocessed data set, forming a commodity social network graph structure and sending the commodity social network graph structure to the walking module;

the walking module is used for randomly walking the commodity social network diagram structure to obtain new behavior sequence data, then training the new behavior sequence data by using a Word2vec model to generate an embedding vector and sending the embedding vector to the training module;

the training module is used for inputting the embedding vector into the machine learning model for training to obtain the single user behavior prediction model.

Preferably, the system of the present invention further comprises: a fusion module;

the fusion module is used for receiving the single user behavior prediction models output by the training module and fusing the two models with the maximum difference to obtain the user behavior prediction model.

The fusion module comprises a selection unit, a calculation unit and a fusion unit;

the selection unit selects n models from a plurality of single user behavior prediction models according to generalization capability; wherein n is a positive integer greater than or equal to 3;

the computing unit computes a maximum information coefficient MIC between each model in the n models and the model, and a confusion matrix is constructed and visualized;

and the fusion unit finds out two single models with the minimum similarity on the obtained confusion matrix for fusion to obtain a user behavior prediction model.

The user portrait module of the invention constructs the user portrait from three angles, namely the basic information of the user, the activity information of the user and the statistical information of the operation behavior of the user.

The random walk module of the invention is configured to perform the following process: starting from any node of the network graph structure, randomly selecting one from a plurality of points connected with the current node in each step of the wandering, and repeating the process continuously until the set wandering length is reached, and stopping the wandering, thereby obtaining a new piece of user behavior sequence data.

The invention has the following advantages and beneficial effects:

1. the method carries out preprocessing work on the problems of repetition, abnormality, redundancy and the like existing in an original data set, extracts statistical information and activeness information capable of reflecting behavior habits and preference degrees of consumers from the preprocessed data set, constructs a user portrait for a user, carries out random Walk (Ramdon Walk) on a social network diagram structure of a commodity purchased by the user to obtain a new behavior sequence, and adds context information of each behavior of the user obtained by a Word2vec model into a machine learning model to train and learn, so that the prediction reliability and the prediction precision of the model are improved.

2. The method further performs selective integration (fusion) on the obtained single model by adopting an MIC (many integrated core) and confusion matrix method, thereby further enhancing the prediction performance and reliability of the model.

Drawings

The accompanying drawings, which are included to provide a further understanding of the embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

fig. 1 is a schematic diagram of a user behavior prediction model building process according to a first embodiment of the present invention.

FIG. 2 is a schematic diagram of a random search process according to the present invention.

Fig. 3 is a schematic diagram illustrating a user behavior prediction model construction process according to a second embodiment of the present invention.

FIG. 4 is a flow chart of selective model fusion based on MIC and confusion matrix.

FIG. 5 is a ROC graph.

FIG. 6 is a schematic diagram of the model building process during testing and verification according to the present invention.

FIG. 7 is a comparison of a user profile and a verification set AUC of an original model according to the present invention.

FIG. 8 is a comparison of AUC of a user profile and a test set of original models in accordance with the present invention.

FIG. 9 is a confusion matrix visualization of the present invention.

FIG. 10 is a comparison graph of AUC in the model fusion validation set of the present invention.

FIG. 11 is a comparison of AUC for the model fusion test set of the present invention.

FIG. 12 is a comparison graph of AUC in AUC-ranked fusion validation sets of the present invention.

FIG. 13 is a comparison graph of AUC in the AUC ranking fusion test set of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to examples and accompanying drawings, and the exemplary embodiments and descriptions thereof are only used for explaining the present invention and are not meant to limit the present invention.

Example 1

The embodiment provides a user behavior prediction method based on deep walking and ensemble learning.

This embodiment considers a commodity of the behavior sequence purchased by the user as a word and all commodities as a document, so that some Natural Language Processing (NLP) techniques can be used to train the word vector. On the other hand, under the situation of a user purchasing behavior sequence, a large amount of graph structure information exists between data, the data information is very important, and the deep walk (deep walk) technology is well applied to a purchasing behavior network structure in the embodiment. The deep Walk (deep Walk) technology utilizes a Random Walk (Random Walk) technology to randomly Walk network nodes in a graph to form a behavior sequence, the behavior sequence of a user is regarded as a Word, all behavior sequence documents are pre-trained by using a Word2vec algorithm model, the deep Walk technology is added on the basis of an original model, and a classifier algorithm based on the deep Walk is provided.

As shown in fig. 1, the method of the present embodiment mainly includes the following steps:

1. acquiring an original data set and preprocessing the original data set;

2. constructing a user image based on the preprocessed data set to form a graph structure of related commodities of the user behavior sequence;

3. and randomly selecting a starting point in a random walk mode in the graph structure, and regenerating the behavior sequence of the commodity. The method specifically comprises the following steps:

the process of accessing other remaining nodes from a certain vertex in the graph structure is called graph traversal, and graph traversal methods are generally two, namely breadth-first search (BFS) and depth-first search (DFS), and are a premise for solving a problem related to a graph topology structure. Breadth-first-search (BFS) traverses its adjacent nodes starting from the starting point, thereby spreading out continuously, giving priority to the amount of information brought by the near-end connection. Starting from a vertex v, the depth-first search (DFS) firstly marks v as a traversed vertex, then selects a vertex u adjacent to v which is not traversed, if u does not exist, the search is terminated, if u exists, DFS is started again from u, and the process is circulated until no vertex exists, and the depth-first search (DFS) utilizes the information quantity implied by the far-end connection. RandomWalk is a depth-first traversal algorithm that can repeatedly access visited nodes. Random Walk (Random Walk) is to continuously and repeatedly Walk in a network graph structure randomly, starting from a specific vertex in the graph structure, randomly selecting one from a plurality of points connected with a current node in each step of the Walk, continuously repeating the process, and stopping the Walk after a set Walk length is reached so as to obtain a piece of sequence data.

4. Inputting the new behavior sequence into a Word2vec model, and training to generate an embedding vector of the commodity. The method specifically comprises the following steps:

the Word2vec algorithm represents semantic information of words by learning a text and then by means of Word vectors, namely, a space where an original Word is located is mapped into a new space, so that semantically similar words are close to each other in distance in the space. Word2vec contains a total of two language algorithm models, the CBOW model and the Skip-gram model. The CBOW model and the Skip-gram model both comprise an input layer, a hidden layer and an output layer, and the CBOW model is a current word w _t Context w of _t-1 ，w _t-2 ,w _t+1 ，w _t+2 Predict the current word w on the premise of _t Whereas the Skip-gram model is just the opposite, when the current word w is known _t On the premise of (1), predicting its context w _t-1 ，w _t-2 ,w _t+1 ，w _t+2 . Word2vec algorithm provides two optimization algorithms of Hierarchical Softmax and Negative Sampling to reduce training time of Word vectors. Hierarchical Softmax and Negative SampliBoth ng optimization methods use BP neural networks as classification methods. And finally, each word is represented by an N-dimensional vector randomly generated by an algorithm, and the optimal word vector, namely the embedding vector of each word can be obtained after the training of a Wood 2vec algorithm model.

An algorithmic model of learning a word vector predicts the next word given a context word. In the algorithm model framework, each term in the document is projected into a vector space, wherein each term in the document corresponds to a unique column vector in the matrix W, and the position of the column vector is determined by the position of the term in the document. The concatenation or addition of the context word vectors is then used as a feature vector to predict the next word.

Suppose there is a sentence W that contains T words, W respectively ₁ ,W ₂ ,…,W _i ,…,W _T Our goal is to maximize the function L, i.e.

Taking log and averaging to obtain

The idea of softmax multi-classification is mainly used in the prediction task, and in the formula, the posterior probability is

Each yi in the above equation is a log of the probability of not being normalized, and is calculated as follows:

y＝b+Uh(w _t-k ，…，w _t+k ；W)

the Word vector obtained by training the Word2vec algorithm model contains context information. The Word2vec algorithm model has the advantages that the context information can be obtained, the data dimension scale is compressed compared with one-hot, and the time efficiency of the model is greatly reduced.

5. And inputting the embedding vector into a machine learning model for training and learning, thereby obtaining a single user behavior prediction model.

In the embodiment, ramdon Walk is performed on a commodity graph structure formed after data preprocessing and user portrait construction to randomly Walk each node to obtain a new behavior sequence, then the new behavior sequence is trained and optimized by using a Word2vec algorithm model to obtain an embedding vector, and finally a machine learning model is used for training and optimizing to obtain a final output result. When the model is trained, two hyper-parameters need to be controlled, namely the dimensionality of the embedding vector and the step length of random walk. The more the dimensionality of the embedding vector is, the better the dimensionality is, the less dimensionality possibly influences the generalization capability of the model, the high dimensionality possibly causes dimensionality disasters, and therefore a grid search is carried out to determine the optimal dimensionality of the embedding vector. The grid search is to specify the dimension of the word vector for training of the model, and on this basis, increase the dimension of the word vector for training and prediction of the model, if the model has a better effect than the previous model, the dimension of the embedding vector is further increased, otherwise, the optimal dimension of the embedding vector is achieved, as shown in fig. 2. Similarly, the optimal solution of the length of the step length of the random walk can also be obtained by random search.

Example 2

This embodiment further fuses a single model based on embodiment 1 described above, as shown in fig. 3. The fusion method of the embodiment firstly measures the difference between each single learner by using the Maximum Information Coefficient (MIC), and then expresses the difference in the form of a confusion matrix, so that two single learners with the largest difference are selected for model fusion to obtain more excellent generalization capability.

Wherein:

1. the Maximum Information Coefficient (MIC) is used for measuring the correlation degree between two variables and is a linear relationOr a non-linear relationship. The calculation of the Maximum Information Coefficient (MIC) mainly utilizes Mutual Information (MI) and a mesh division method. Mutual Information (MI) is a measure of the degree of correlation between two variables, a given set of variables B = { B = { ₁ ,b ₂ ,…,b _n N is the number of samples, and Mutual Information (MI) can be defined as

Where p (a, B) is the joint probability density between variable a and variable B, and p (a) and p (B) are the edge probability densities of variable a and variable B, respectively, the joint probability calculation is generally relatively complex. The idea of the Maximum Information Coefficient (MIC) is to disperse the relationship between two variables in a two-dimensional space, and to use a scatter diagram to represent, divide the current two-dimensional space into a certain number of intervals in the x and y directions, and then check the situation that the current scatter point falls in each square, which is the calculation of the joint probability, thus solving the problem that the joint probability in the mutual information is difficult to solve. Assuming a finite set of ordered pairs D = { (a 1, B1), (a 2, B2), …, (an, bn) }, a partition G is defined to divide the value range of the variable a and the variable B into x-segments and y-segments, respectively, i.e., a grid of x × y coordinate axes. Calculating inter-computed mutual information MI (A, B) separately for each of the obtained mesh partition segments, there are many ways for an x y mesh partition method, taking the maximum MI (A, B) value in different partition methods as the mutual information value of partition G, and then defining the maximum mutual information formula of D under partition G as:

MI*(D，x，y)＝maxMI(D|G)

where D | G denotes the division of the data set D over G. Normalizing the maximum mutual information obtained under different divisions to obtain characteristic moments M (D) x, y, wherein the calculation formula is

The Maximum Information Coefficient (MIC) is defined as

Where B (n) is a variable representing the maximum value of the grid division x × y, given in the literature when B (n) = n ^0.6 The effect is the best.

2. The confusion matrix expresses the relation between the prediction result of the data sample model and the real attribute value, and is a common mode for evaluating the generalization capability of the classifier. Assuming now that there are a total of N classes of classification tasks, the set of data samples D contains T in total ₀ A strip data record, each category containing T _i And (i is more than or equal to 1 and less than or equal to N). Method for constructing classifier C, cmi by utilizing machine learning or deep learning model _j The data records representing the ith category are judged by the classifier C as a percentage of the total number of data records of the ith category of data records of the jth category, so that a confusion matrix CM (C, D) can be obtained, the dimension of which is N × N:

the column indices of the elements in the confusion matrix represent the predicted results of the classifier C model on the data samples, and the row indices represent the true label values of the data samples. The diagonal elements of the confusion matrix represent the probability that each class can be correctly predicted by classifier C, while the non-diagonal portions represent the probability that classifier C is misjudged.

In the field of machine learning, if the similarity between two classes is relatively high, the data samples of the two classes are highly likely to be predicted as opposite classes by a classifier. Confusion matrix row vector CM _i (1 ≦ i ≦ N) indicates the tendency of the data sample for category i to go through model prediction for each category. Based on the confusion matrix, the present embodiment defines a correlation matrix between a single classifier. Assuming that there are M single classifiers, the embodiment converts the confusion matrix corresponding to each single classifier into a row vector, and specifically expands the confusion matrix by rows in sequence as follows:

wherein CM ⁽ⁱ⁾ (i is more than or equal to 1 and less than or equal to M). Then all the row vectors CM ⁽ⁱ⁾ (i is more than or equal to 1 and less than or equal to M) are combined to form a matrix to obtain a confusion matrix of all single classifiers, which is defined as CMS (C, D) as follows:

based on the obtained confusion matrix and the Maximum Information Coefficient (MIC), a similarity measurement matrix Q in ensemble learning can be obtained. The Q matrix reflects the correlation between each single classifier, and the smaller the value of Qij is, the smaller the correlation between two single classifiers is, and the larger the correlation is. Therefore, the Q matrix can well measure the similarity between each classifier, and the use of the Q matrix can provide a method for how to find two single classifiers with great difference in the ensemble learning. The formula for the Q matrix is shown below.

As shown in fig. 3, the method of the present embodiment further includes the following steps based on the above embodiment 1:

selecting n user behavior prediction models with relatively strong generalization ability from the models constructed in the embodiment 1; calculating the maximum information coefficient MIC between each model and each model; constructing a confusion matrix and carrying out visualization; and fourthly, finding out two single models with lighter colors, namely smaller similarity on the obtained confusion matrix to perform Bagging fusion. The specific flow chart is shown in fig. 4.

When performing model fusion, since simple weighted fusion is performed, how to determine the optimal coefficient of each model is usually a difficulty when performing weighted fusion. In the weighted fusion, the simplest method is the average weighted fusion, because theoretically, if the model fusion can bring about the improvement of the model effect, the improvement of the model effect can be certainly brought by averaging each single model in the fusion, but the averaging of each model may not be the optimal coefficient when each model is subjected to the model fusion. Instead of simply performing simple mean weighted average fusion, a fusion method based on AUC ranking is proposed based on the characteristics of AUC ranking. The calculation formula of AUC is as follows

Where M is the number of positive samples, N is the number of negative samples, rank _insi Refers to the order in which the positive samples are arranged in the data set. From the above formula, the AUC is essentially a sort, and we can use this characteristic to calculate the coefficient of each model when model fusion is performed, and finally perform model fusion. Each model coefficient calculation formula is as follows. The specific mode is that the results obtained by each model are sorted in descending order according to AUC values, and the reciprocal of each sample after being sorted is multiplied by the AUC obtained by model prediction to be added to obtain the final fusion result.

The final value obtained by model fusion is the sum of the AUC value of each sample of each model multiplied by the reciprocal of the AUC value of each sample in the relative ordering of the single model.

Example 3

In this embodiment, the method provided in the above embodiment is tested by taking a background log of APP for shopping in a certain bank as an example.

1. Raw data set

The time span of the obtained bank shopping APP background log is one month, the obtained bank shopping APP background log mainly comprises 4 thousands of user consumption behavior data, each row corresponds to one operation record of a user, and sequencing is carried out according to the operation time of the user. The data set contains the relevant fields as shown in table 1.

Table 1 original data set basic information table

The raw data set is preprocessed.

2. User representation construction

Each line record in the unprocessed data set is an information record of a single operation behavior of the user, which takes the operation behavior of the user as granularity, and the embodiment is information of finer granularity required by each user for predicting whether the user purchases a certain product. Therefore, the processing of the original data set and the construction of the user portrait are very critical, so that the characteristic of finer granularity of the operation behavior of each user is obtained. The embodiment mainly constructs the user portrait by calculating the user behavior statistics, thereby discovering the behavior habit of the user.

Grouping and sequencing unprocessed data sets according to the User-id field, and obtaining the statistical information of each User behavior. The user behavior statistical information describes the behavior habits of the user from a plurality of aspects, and mainly comprises the following aspects:

(1) Basic information of user

This part of the data information consists of the User-id field and the User related variable Userinfo _ X in the original data. The data field does not need to be subjected to additional recombination and calculation, and only needs to be updated for corresponding numerical values when each record of a user is read.

(2) User liveness information

The user's activity information reflects the user's preferences for apps and can be thought of from multiple directions. The user activity information indicators used primarily herein are shown in table 2.

TABLE 2 subscriber liveness information Table

(2) User operation behavior statistical information

The user operation behavior information reflects how the user interacts with the APP and the preference degree of the user for a certain function of the APP. The data field mainly includes the number of times, the proportion, etc. of each operation of the user, and the detailed characteristic field is as shown in table 3.

TABLE 3 user behavior operation statistics Table

3. Evaluation index

The prediction results of the model are represented by a confusion matrix, as shown in table 4.

TABLE 4 confusion matrix

The Accuracy (ACC), precision (Precision), recall (Recall), false positive (FRP) and F1-resource can be defined by the above confusion matrix, and the calculation formulas are respectively as follows:

ACC＝(TP+TN)/(TP+FN+FP+TN)

P＝TP/(TP+FP)

R＝TP/(TP+FN)

FRP＝FP/(TN+FP)

F1-sorce＝(2×P×R)/(P+R)

the ROC curve is a curve with Recall (Recall) as ordinate and false positive (FRP) as abscissa, and the area under the ROC curve is the AUC value, as shown in fig. 5, it is obvious that the value of this area is not larger than 1. Since ROC curves are generally located above the line y = x, the AUC generally ranges between 0.5 and 1. 17410 records purchased by a user and 15590 records not purchased by the user in the original data set are unbalanced, and the data sample is not proper by using the accuracy as a model evaluation index, and generally an AUC is taken as an evaluation mode for the user behavior prediction problem, so that the evaluation index selected by the embodiment is the AUC. The greater the AUC, the better the model.

In order to verify the effectiveness of the user portrait and the prediction model based on the deep walk user behavior, two basic machine learning models, namely xgboost xgb and lightgbm lgb, are selected as basic classifiers in the embodiment, and are respectively compared with xgb and lgb extended models added with the user portrait and the deepwater technology. In the deep walk technology, random search is carried out with the step length walklength being 1 in the range of [5, 10), and random search is carried out with the step length being 1 in the range of [5, 10) for the word vector dimension size. In this embodiment, the work of data preprocessing and user portrait creation is implemented by pandas, numpy, and sklern. xgb and lgb are implemented using python xgboost and lightgbm packets, respectively. The implementation of the Word2vec model in the Deepwalk model is implemented by the python genetic package. The five-fold cross validation used for model validation is implemented by sklern. xgb and lgb model parameters have a grid search to determine the optimal parameter values. The structure of the model constructed in this embodiment is shown in fig. 6. Where size represents the dimension of the training Word2Vec embedded vector and walklength represents the step size of the random walk.

Each line record in the original data set is information of a single operation with the operation behavior of a certain user as granularity, and the embodiment is information of single user granularity needed for predicting whether the user purchases a certain product. Therefore, it is necessary to re-integrate and compute the original data set to create a user profile to obtain a finer grained characterization of the operation behavior of each user. The embodiment mainly constructs the user portrait by calculating the user behavior statistics, thereby discovering the behavior habit of the user.

As shown in fig. 7, it is a graph of the AUC performance indicators for the lgb, xgb models and the validation set after user imaging. In comparison, the AUC value of xgb basic model, namely model 2, is 0.0027 higher than that of lgb basic model, namely model 1, reaches 0.5219, and the AUC values of lgb and xgb models formed after the user portrait is added, namely model 3 and model 4, model 3 is higher than that of model 4, and reaches 0.7131, while the AUC value of model 3 is higher than that of model 1 by 0.1939, and the AUC value of model 4 is higher than that of model 2 by 0.1905, and experiments prove that the model has a great breakthrough in the performance aspects of either xgb model or lgb model after the user portrait is established.

FIG. 8 is an experimental effect of the model after the user portrait is added and the original model on the test set, i.e., the new data set. The test set does not contain data of the training set, so that the prediction of the model has high unpredictability, and the prediction of the model result has appropriate fluctuation. However, it can be seen that the AUC values of lgb and xgb models after user portrayal are both higher than those of the other models, where the AUC value of model 3 reaches 0.7292 and the AUC value of model 4 reaches 0.7238. Compared with the validation set, the AUC values of lgb and xgb base models are low, but after the user portrait is added, the AUC values of the test set are higher than those of the validation set, which is caused by uncertainty in new data, and model 3 and model 4 are at risk of under-fitting. In general, the model after the user portrait is added is stronger than the learning ability of the basic model in both the test set and the verification set.

Wherein class 1 corresponds to model 5,6,7,8,9, class 2 corresponds to model 10,11,12,13,14, class 3 corresponds to model 15,16,17,18,19, class 4 corresponds to model 20,21,22,23,24, class 5 corresponds to model 25,26,27,28,29, class 6 corresponds to model 30,31,32,33,34, class 7 corresponds to model 35,36,37,38,39, class 8 corresponds to 40,41,42,43,44, class 9 corresponds to 45,46,47,48,49, and class 10 corresponds to model 50,51,52,53,54. Experiments show that in a plurality of models obtained after a Deepwalk technology is added after a user portrait model 3 and a model 4 are added, the generalization capability of the models is improved regardless of the dimensions of Word2vec embedding vectors and the values of random walk step sizes, wherein in the case of the model 3 using Deepwalk, when the size =7 and the walk length =9, the obtained user behavior prediction model, namely the model 29 and AUC reaches 0.7431, which is 0.03 higher than AUC of the model 3, and when the size =7 and the walk length =8, the obtained model 28 has the lowest AUC value in the plurality of models evolved from the model 3, but the AUC value of the model 28 is 0.026 higher than the AUC value of the model. The AUC values of a plurality of models evolved from the model 4 are lower than the AUC values of the model evolved from the model 3, when the size =9 and the walklength =6, the obtained model 51 has the highest AUC value which reaches 0.7374, the AUC value is 0.025 higher than that of the model 4, the model with the lowest effect is the model 53, the AUC is 0.7342, and the AUC value is 0.021 higher than that of the model 4. In general, the user behavior prediction model with the Deepwalk technology has better performance than other basic models. The specific experimental data of each model on the validation set are shown in table 5.

TABLE 5 validation set AUC values for each model

On the verification set, the learning capacity of the model obtained by adding lgb and xgb into deep walk rapid expansion is almost the same, and the AUC value reaches 0.74. In the model extended by lgb, the predicted model performance of the obtained user behavior is best when size =5,walklength h =8, that is, model 8,auc value reaches 0.7479, which is 0.0187 higher than the AUC value of model 3 on the verification set, and when size =6,walklength h =7, the predicted model AUC value is lowest, that is, model 17, AUC value reaches 0.7451, which is only 0.0028 lower than model 8. In the model with the extension of xgb, the user behavior prediction model performance is best when size =7 and walklength =6, i.e., model 31, AUC value reaches 0.7466 and is 0.0228 higher than the AUC value of model 4 on the verification set, and the user behavior prediction model AUC value when size =6 and walklength =7 is lowest, i.e., model 23, AUC value reaches 0.7437 and is only 0.0029 lower than model 31. The specific experimental data for each model on the validation set are shown in table 6. In summary, the user behavior prediction model based on deep walk is superior to other models in performance whether on the verification set or the test set.

TABLE 6 AUC values for each model of test set

The calculation of the maximum mutual information and the construction of the confusion matrix used in the embodiment is realized by pandas and numpy together, and the visualization of the confusion matrix is realized by matplotlib. When model fusion is performed, the first step requires that the learning ability of a single learner is relatively strong. In the embodiment, 6 single models with strong generalization ability are selected from the models, namely the model 29, the model 47, the model 16, the model 51, the model 30, the model 54, the model 38 and the model 20, for integrated learning. Maximum Mutual Information (MIC) calculations were performed on these 6 models and between models and a confusion matrix visualization was constructed as shown in fig. 9. The Maximum Information (MIC) values between each model are shown in table 7.

TABLE 7 MIC values between models

As shown in fig. 9, lighter color indicates greater variability between the two models. In the embodiment, model fusion can be performed by selecting three pairs of models, namely, model 20 and model 30, model 30 and model 29, and model 47 and model 20, which have lighter colors and maximum mutual information MIC lower than 0.5, wherein the two models, namely model 30 and model 29, have the lightest color in the confusion matrix and the maximum information coefficient MIC is the lowest. The model 16 and the model 51 and the model 16 and the model 20 can be fused together by the aid of the darker color and the maximum information MIC height Yu Yu 0.5.5 to form a contrast experiment. The model fusion mode is Bagging fusion, and the weight of each single model is set to be 0.5.

FIG. 10 is a graph showing the comparison of the effects of the validation set after model fusion. As can be seen from the table, when two single models with lighter confusion matrix colors, i.e. larger differences, are merged, the ensemble learning can be effective. When the model 30 and the model 29 are fused, the effect is optimal, the AUC value reaches 0.7561, because the difference between the model 30 and the model 29 is the largest, the AUC value is respectively improved by 0.0192 and 0.013 compared with the model 30 and the model 29, the AUC value is respectively improved by 0.008 and 0.0073 compared with the AUC value of the model 47 and the model 20 respectively after the model 20 and the model 30 are fused, the AUC value of the model 47 and the model 20 is respectively improved by 0.0072 and 0.00139 compared with the AUC value of the model 20 and the model 20 respectively after the model 20 and the model 20 are fused, conversely, the effect of fusing the two models with smaller difference is not good, and the AUC value of the model 16 and the model 51 or the model 16 and the model 20 is weaker than the expression capability of the single model. The experiments prove that on the verification set, better learning capacity can be obtained only by finding a single learner with smaller similarity for fusion in model fusion, the gain caused by the fusion of the two models with the largest difference is the largest, and the improvement of the expression capacity of the models cannot be brought but the expression capacity of the models is weakened when the two similar models are fused. The results of the specific experiments are shown in table 8.

TABLE 8 model fusion verification set AUC comparison table

Fig. 11 shows the result of model performance on the test set, i.e. the new data set, after model fusion. It can be seen from this figure that model fusion can still be applied to new datasets. The method is consistent with the training set, when the model 30 and the model 29 are fused, the AUC value is the highest, which reaches 0.7612, is 0.0265 and 0.0151 respectively higher than the AUC value of the model 30 and the model 29 alone, the AUC value after the model 20 and the model 30 are fused reaches 0.7586, is 0.0145 and 0.0139 respectively higher than the AUC value of the model 20 and the model 30 alone, the AUC value after the model 47 and the model 20 are fused reaches 0.7566, and is 0.0098 and 0.0209 respectively higher than the AUC value of the model 47 and the model 20 alone. The AUC values when model 16 and model 51 or model 16 and model 20 are fused are weaker than the learning ability of the model alone. The experimental results on the verification set and the test set show that the selective model fusion method based on the MIC and the confusion matrix can find out the single learner with larger difference items for fusion, so that the relatively excellent generalization capability is obtained, and the expression capability of the model is not increased or decreased when two single models with smaller differences are fused. The results of the specific experiments are shown in table 9.

TABLE 9 AUC comparison table of model fusion validation set

FIG. 12 is a graph comparing the AUC-based ranking of fusion methods in the validation set with the simple fusion method. As can be seen from the figure, the AUC value of the fusion method based on the AUC sequence is higher than that of the common average weighting fusion method, and the expression capability of the model is stronger. After the model 20 and the model 30 are fused based on AUC sequencing, the AUC value is improved to the highest degree compared with other common weighted fusion, the AUC reaches 0.7611 and is improved by 0.0025, the AUC after the model 30 and the model 29 are fused is the highest AUC of the three models, and reaches 0.7622, the AUC value is improved by 0.001 compared with the AUC value of a common fusion method, the AUC value after the model 47 and the model 20 are fused is improved by 0.002, and the expression capacity of the model is poorer than that of the model after the first two models are fused. Experiments prove that the fusion method based on AUC (AUC-average score) sequencing has stronger learning capability on a verification set than the common weighted fusion method. The results of the specific experiments are shown in table 10.

TABLE 10 AUC sorting fusion validation set comparison table

Fig. 13 is a graph comparing the fusion method based on AUC ranking on the test set, i.e., the new data set, and the simple fusion method. As can be seen from the figure, the AUC sorting based fusion method has higher AUC value on the verification set than the common average weighting fusion method, and the expression capability of the model is stronger. After the AUC sorting, the AUC value after the fusion of the model 20 and the model 30 is improved to the highest degree compared with the AUC value obtained by other common weighted fusion methods, the AUC reaches 0.7662 and is improved by 0.0076, the AUC after the fusion of the model 30 and the model 29 is improved by 0.0031 and 0.7622 respectively, compared with the AUC value obtained by the common fusion method, the AUC value after the fusion of the model 47 and the model 20 is improved by 0.006, and the expression capability of the model is poorer than that of the model obtained by the fusion of the first two models. Experiments prove that the fusion method based on AUC (AUC-average score) ordering has stronger expression capability than the common weighted fusion method on a verification set or a test set. The results of the specific experiments are shown in table 11.

TABLE 11 AUC ranking fusion test set comparison Table

Through the test results and analysis, the performance of model fusion performed by using a single learner with smaller similarity is better than that of a single learner, and the performance of integrated learning performed by using two single learners with the largest difference is the best. And the user behavior prediction model obtained by the model fusion method based on AUC sequencing has better performance than that of a simple weighted fusion method.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. The user behavior prediction method based on deep walking and ensemble learning is characterized by comprising the following steps of:

s1, acquiring an original data set and preprocessing the original data set;

2. The method for predicting user behavior based on deep walking and ensemble learning of claim 1, further comprising:

and S5, fusing two models with the maximum difference in the plurality of single user behavior prediction models obtained through construction to obtain a user behavior prediction model.

3. The method for predicting user behavior based on deep walking and ensemble learning according to claim 2, wherein the step S5 specifically includes:

step S52, selecting n models from a plurality of single user behavior prediction models according to generalization capability; wherein n is a positive integer greater than or equal to 3;

step S53, calculating the maximum information coefficient MIC between each model and each model in the n models, and constructing a confusion matrix and visualizing the confusion matrix;

and S54, finding out two single models with the minimum similarity on the obtained confusion matrix for fusion to obtain a user behavior prediction model.

4. The method for predicting user behavior based on deep walking and ensemble learning according to claim 1,2 or 3, wherein the step S2 constructs the user profile from three angles, which are the basic information of the user, the activity information of the user and the statistical information of the user operation behavior.

5. The method for predicting user behavior based on deep walking and ensemble learning according to claim 1,2 or 3, wherein the random walking process in step S3 is specifically: starting from any node of the network graph structure, randomly selecting one from a plurality of points connected with the current node in each step of the migration, and repeating the process continuously until the set migration length is reached, and stopping the migration, thereby obtaining a new piece of user behavior sequence data.

6. The user behavior prediction system based on deep walking and ensemble learning is characterized by comprising a data acquisition module, a preprocessing module, a user portrait module, a random walking module and a training module;

7. The deep walking and ensemble learning based user behavior prediction method according to claim 6, further comprising: a fusion module;

8. The deep walking and ensemble learning based user behavior prediction method according to claim 7, wherein the fusion module includes a selection unit, a calculation unit, and a fusion unit;

9. The method for predicting user behavior based on deep walking and ensemble learning according to claim 6,7 or 8, wherein the user profile module constructs the user profile from three angles, which are basic information of the user, activity information of the user and statistical information of user operation behavior.

10. The deep walking and ensemble learning based user behavior prediction method according to claim 6,7 or 8, wherein the random walk module is configured to perform the following process: starting from any node of the network graph structure, randomly selecting one from a plurality of points connected with the current node in each step of the wandering, and repeating the process continuously until the set wandering length is reached, and stopping the wandering, thereby obtaining a new piece of user behavior sequence data.