CN110910207A

CN110910207A - Method and system for improving commodity recommendation diversity

Info

Publication number: CN110910207A
Application number: CN201911042387.6A
Authority: CN
Inventors: 马荣叶
Original assignee: Suning Cloud Computing Co Ltd
Current assignee: Suning Cloud Computing Co Ltd
Priority date: 2019-10-30
Filing date: 2019-10-30
Publication date: 2020-03-24

Abstract

The embodiment of the invention discloses a method for improving commodity recommendation diversity, which solves the problem that recommended commodities lack diversity, and obtains similar users by using a Canopy algorithm and a K-means algorithm according to user data; constructing a knowledge graph according to the full order data, and acquiring an SPU (short processing Unit) associated with the real-time interest of the user; and obtaining a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.

Description

Method and system for improving commodity recommendation diversity

Technical Field

The invention belongs to the field of commodity recommendation, and particularly relates to a method and a system for improving commodity recommendation diversity.

Background

The commodity recommendation algorithm in the field mainly comprises the following algorithms: content-based recommendations, data mining-based recommendations, combined recommendations, and memory-based collaborative filtering algorithms.

The recommendation technology based on the content is used for recommending according to the user model after the user feature model is generated, the recommendation result is visual and easy to understand, the recommendation can be recommended for special users, domain knowledge is not needed, and the method is narrow in application range.

The recommendation technology based on data mining is mainly used for recommending by generating association rules, is not limited by recommended contents, does not need domain knowledge, but is difficult to generate the association rules, consumes time, and is low in commodity personalization degree and insufficient in commodity diversity according to the recommendation rules.

In the combined recommendation, two or more recommendation methods are combined according to a specific rule to avoid or make up for the defects of a single recommendation technology before combination, and theoretically, a plurality of recommendation combination methods exist, but the recommendation combination methods are not effective in a specific problem and the adjustment of combination strategy parameters is difficult.

A collaborative filtering algorithm based on a memory is adopted, recommendation is carried out by calculating similar users or similar commodities, the model updating period is short, the interest change of the users can be reflected in time, and the problem of data sparsity exists.

Based on similarity calculation of users in commodity recall coordinated by users, a K-means algorithm is mostly utilized, but the cluster number K and an initial center are difficult to determine, and the calculated similarity and data size of the whole commodities are too large.

Due to the fact that the recommended algorithm is narrow in applicable content, the number K and the initial center in the K-means algorithm are difficult to determine, parameters among combined algorithms are difficult to adjust, time consumption for processing a large amount of data is too long, and data sparseness causes breakage of commodity relations, the obtained recommended commodities are single, the relevance degree is not enough, and diversity is lacking.

Disclosure of Invention

The invention provides a method and a system for improving commodity recommendation diversity, and solves the problem that recommended commodities lack diversity.

In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:

in a first aspect, an embodiment of the present invention provides a method for improving commodity recommendation diversity, where according to user data, similar users are obtained by using a Canopy algorithm and a K-means algorithm; constructing a knowledge graph according to the full order data, and acquiring an SPU (short processing Unit) associated with the real-time interest of the user; and obtaining a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.

With reference to the first aspect, as a first implementation manner, collecting user demographic attribute data and behavior data of a user on multiple screens, performing data cleaning on the user demographic attribute data and the behavior data, and calculating long-term interest preference and short-term interest preference of the user; carrying out rough clustering on the cleaned user demographic attribute data, the user long-term interest preference and the user short-term interest preference by using a Canopy algorithm, and determining the cluster number K and the initial center of the K-means algorithm; acquiring similar users by using a K-means algorithm;

with reference to the first possible implementation manner of the first aspect, as a second implementation manner, setting a first distance parameter T1 and a second distance parameter T2, where T1> T2, setting a user set as D, randomly selecting a user in the user set D, and setting a distance D between the user and other users in the user set D; when D is less than T1, putting the user into a Canopy and deleting the users with D less than T2 from the user set D; until the user set D is empty, the users are divided into a plurality of Canopy; the number of Canopy is used as the cluster number K of K-means, and the center of Canopy is used as the initial center of K-means.

With reference to the first aspect, as a third implementable manner, comparing the commodity of the similar user with the SPU associated with the real-time interest of the user to obtain a first recalling SPU; calculating a real-time interest score of the user according to the online behavior data of the user; according to the real-time interest scores of the users and the scores of the commodities of the similar users under the SPU (SPU) to be recalled, performing CTR/CVR score prediction on the SPU to be recalled, and then performing deduplication and filtering processing on the commodities to be recalled; and sorting the recalled commodities which are subjected to the duplicate removal and filtering treatment to obtain a commodity recommendation list.

With reference to the first aspect, as a fourth implementable manner, if the SPU associated with the real-time interest of the user cannot be acquired by using the knowledge graph, a recalled commodity set is obtained by using a collaborative filtering combination algorithm; comparing the commodities under the similar users with the recalled commodity set to obtain a second recalled SPU; calculating a real-time interest score of the user according to the online behavior data of the user; according to the real-time interest scores of the users and the scores of commodities under a second recalling SPU scored by the similar users, performing CTR/CVR score prediction on the second recalling SPU, and then performing deduplication and filtering processing on the recalled commodities; and sorting the recalled commodities which are subjected to the duplicate removal and filtering treatment to obtain a commodity recommendation list.

In a second aspect, an embodiment of the present invention provides a system for improving recommendation diversity of goods, including:

the user acquisition module is used for acquiring similar users by using a Canopy algorithm and a K-means algorithm according to user data;

the SPU acquisition module is used for constructing a knowledge map according to the full order data and acquiring an SPU related to the real-time interest of the user;

and the list acquisition module is used for acquiring a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.

With reference to the second aspect, as a first possible implementation manner, the user obtaining module includes:

the data acquisition submodule is used for acquiring user demographic attribute data and behavior data of the user on multiple screens;

the data cleaning submodule is used for cleaning the data of the data acquisition submodule;

the preference calculation submodule is used for calculating the long-term interest preference and the short-term interest preference of the user;

the rough clustering submodule is used for carrying out rough clustering on the cleaned user demographic attribute data, the user long-term interest preference and the user short-term interest preference by utilizing a Canopy algorithm and determining the cluster number K and the initial center of the K-means algorithm;

and the similar user obtaining submodule is used for obtaining similar users by utilizing the K-means algorithm.

With reference to the second aspect, as a second possible implementation manner, the coarse clustering sub-module includes:

the parameter calculation submodule is used for determining the cluster number K and the initial center of the K-means algorithm;

setting a first distance parameter T1 and a second distance parameter T2, wherein T1 is greater than T2, setting a user set D, randomly selecting a user in the user set D, and setting the distance D between the user and other users in the user set D;

when D is less than T1, putting the user into a Canopy and deleting the users with D less than T2 from the user set D;

until the user set D is empty, the users are divided into a plurality of Canopy;

the number of Canopy is used as the cluster number K of K-means, and the center of Canopy is used as the initial center of K-means.

With reference to the second aspect, as a third possible implementation manner, the list obtaining module includes:

the first score calculating submodule is used for calculating the real-time interest score of the user according to the online behavior data of the user;

the first recalling sub-module is used for comparing the commodities under the similar users with the SPU associated with the real-time interest of the users according to an SPU acquisition module to obtain a first recalling SPU;

the first score prediction submodule is used for performing CTR/CVR score prediction on the first recalling SPU according to the real-time interest score of the user and the score of the similar user on commodities under the first recalling SPU;

and the first processing submodule is used for performing duplicate removal and filtering processing on the recalled commodities, and sequencing the recalled commodities to obtain a commodity recommendation list.

With reference to the second aspect, as a fourth possible implementation manner, the list obtaining module includes:

the second score calculating submodule is used for calculating the real-time interest score of the user according to the online behavior data of the user;

the commodity recall module is used for obtaining a recalled commodity set by utilizing a collaborative filtering combination algorithm if the SPU associated with the real-time interest of the user cannot be obtained by utilizing the knowledge graph;

the second recalling submodule is used for comparing the commodities under the similar users with the recalling commodity set to obtain a second recalling SPU;

the second score prediction submodule is used for performing CTR/CVR score prediction on a second recalling SPU according to the real-time interest score of the user and the score of the similar user on commodities under the second recalling SPU;

and the second processing submodule is used for performing duplicate removal and filtering processing on the recalled commodities, and sequencing the recalled commodities to obtain a commodity recommendation list.

The method and the system for improving the commodity recommendation diversity provided by the embodiment of the invention solve the problems of single recommended commodity, insufficient association degree and lack of diversity. Compared with the prior art, in the embodiment of the invention, similar users are obtained by utilizing the Canopy algorithm and the K-means algorithm according to user data, the number of Canopy and the initial center determined by the Canopy algorithm are used as the number K of clusters and the initial center of the K-means algorithm, the problem of difficult parameter adjustment is solved, the calculation convergence is faster, and a large amount of data can be processed; according to the full order data, a knowledge graph is constructed, an SPU (SPU) associated with the real-time interest of a user is obtained, after association calculation, the data volume is reduced, the relation transfer of commodities is deeper, and more types of commodities can be recalled; and obtaining a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users, improving the recall divergence and obtaining the freshness of commodities, solving the problem of commodity relationship fracture caused by sparse data and enriching recommendation effect categories. The method is low in calculation complexity, the recommendation performance is superior under the condition of sparse data, the recommendation quantity is more, the cross-category recommendation quantity is more, and the effective quantity and the cross-category effective quantity which are consistent with the actual result are more.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flow chart of an embodiment of the present invention.

Fig. 2 is a flow chart of the Canopy algorithm in the embodiment of the present invention.

FIG. 3 is a system block diagram of an embodiment of the invention.

Fig. 4 is a block diagram of a system configuration according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, those skilled in the art can obtain the embodiments without any inventive step in advance, and the embodiments are within the protection scope of the present invention.

As shown in fig. 1, an embodiment of the present invention provides a method for improving recommendation diversity of a product, including:

s110, according to the user data, similar users are obtained by using a Canopy algorithm and a K-means algorithm.

The user data includes: user demographic attribute data and user behavior data across multiple screens. The user demographic attribute data includes, but is not limited to: gender, age, resident address, income, and education level. The behavior data of the user on multiple screens includes but is not limited to: browse, search, buy, comment, and share.

And after the user data is obtained, data cleaning is carried out. The data cleansing includes: and the missing values are smooth, the abnormal values are removed, the duplicate data is removed, and the data is normalized. Data cleaning aims to improve data quality and reduce the influence of dirty data on the accuracy of the model.

And performing real-time interest identification on the user by using the cleaned behavior data of the user on the multiple screens, establishing a user preference model, and calculating the long-term interest preference and the short-term interest preference of the user.

The long-term interest preference is an interest space or interest topic corresponding to the user's behavior for a longer period of time (e.g., 1 month or 3 months or 6 months or even longer). Short-term interest preferences are the space of interest or topic of interest to which a user behaves for a relatively short period of time (e.g., 7 or 3 days or instantaneously).

And roughly clustering the user demographic attribute data, the long-term interest preference and the short-term interest preference by using a Canopy clustering algorithm. The coarse clustering comprises: setting a first distance parameter T1 and a second distance parameter T2, wherein T1 is greater than T2, setting a user set D, randomly selecting a user in the user set D, and setting the distance D between the user and other users in the user set D; when D is less than T1, putting the user into a Canopy and deleting the users with D less than T2 from the user set D; until the user set D is empty, the users are divided into Canopy. The number of Canopy is taken as the cluster number of K-means, and the center of Canopy is taken as the initial center of K-means.

And acquiring similar users by using a K-means algorithm. And carrying out accurate similarity judgment on users in Canopy by using a K-means algorithm, wherein the users between different canlays do not carry out accurate similarity judgment. A Pearson Correlation Coefficient (Pearson Correlation Coefficient) is adopted to measure the similarity between users.

S120, according to the full order data, a knowledge graph is constructed, and an SPU related to the real-time interest of the user is obtained.

Full order data is collected. The full order data includes, but is not limited to: user ID and SKU (stock keeping Unit) are lost. Preferably, a data purge is performed on the full order data. The data cleansing includes: and the missing values are smooth, the abnormal values are removed, the duplicate data is removed, and the data is normalized.

Based on the cleaned full order data, a rule mining relation extraction technology is adopted, the sales relation among the SKUs is mined, the SKUs are aggregated, the relation of the associated sales is increased to an SPU (Standard Product Unit) level, the commodity data volume is degraded, the SPU pairs of the associated sales are stored by using a database, and the SPU knowledge graph construction is completed by combining service knowledge.

And acquiring the SPU corresponding to the real-time interest of the user by utilizing the transfer relationship between the SPU and the SPU established by the knowledge graph, thereby acquiring the SPU related to the real-time interest of the user in the knowledge graph.

S130, obtaining a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.

And carrying out real-time interest identification on the user online behavior data, wherein different behaviors correspond to different weights, different behavior durations correspond to different weights, and a plurality of real-time interest scores are calculated through linear combination.

Combining the SPU which is recalled by the similar user in a coordinated filtering way with the SPU which is recalled by the knowledge map, and acquiring a first recalled SPU, wherein the first recalled SPU is a commodity which only exists under the similar user but does not exist under the SPU which is related to the real-time interest of the user.

The method comprises the steps of utilizing real-time interest scores of users to combine with scores of similar users to score commodities under a first recalled SPU, conducting CTR/CVR (Click Through Rate, which is abbreviated as CTR in the text, ConVersionRate, which is abbreviated as CVR in the text) score prediction on the commodities under the first recalled SPU, then conducting de-duplication and filtering processing on the recalled commodities, and obtaining a commodity recommendation list after sorting the processed recalled commodities according to the CTR/CVR scores.

And if the SPU associated with the real-time interest of the user cannot be acquired by using the knowledge map, obtaining a recalled commodity set by using a collaborative filtering combination algorithm, and comparing the commodities under the similar user with the recalled commodity set to obtain a second recalled SPU, wherein the second recalled SPU is the commodity only existing under the similar user but not under the recalled commodity set.

And (3) performing CTR/CVR score prediction on the commodities under the second recalling SPU by using the real-time interest scores of the users and combining with the scores of the commodities under the second recalling SPU by similar users, performing deduplication and filtering processing on the recalled commodities, and sequencing the processed recalled commodities according to the CTR/CVR scores to obtain a commodity recommendation list.

Compared with the prior art, in the implementation of the invention, similar users are obtained by utilizing a Canopy algorithm and a K-means algorithm according to user data, the number of canlays and the initial center determined by the Canopy algorithm are used as the cluster number K and the initial center of the K-means algorithm, the number of the Canopy is the cluster number K of the K-means, adjustment is not needed according to experience, samples in the Canopy are calculated pairwise during the distance calculation, the pairwise distance of a total number of samples is not needed to be calculated, the calculated amount is smaller, the problem of difficult parameter adjustment is solved, the calculation convergence is faster, and a large amount of data can be processed; according to the full-amount order data, a knowledge map is constructed, an SPU (unified modeling unit) associated with the real-time interest of a user is obtained, the knowledge map has a transfer function, the SKU magnitude is far larger than that of the SPU, the SPU enables the data to be denser, the data volume is reduced after the association calculation, the relation transfer of commodities is deeper, and more types of commodities can be recalled; and acquiring a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users, wherein the narrower the recommendation is based on the coordination filtering, the more divergent the relation of the knowledge map, the divergence of the recall and the freshness of the acquired commodities are improved, the problem of breakage of the commodity relation caused by sparse data is solved, and the recommendation effect categories are richer. The method is low in calculation complexity, the recommendation performance is superior under the condition of sparse data, the recommendation quantity is more, the cross-category recommendation quantity is more, and the effective quantity and the cross-category effective quantity which are consistent with the actual result are more.

As shown in fig. 4, an embodiment of the present invention further provides a system for improving the diversity of recommended goods, including:

the user acquisition module is used for acquiring similar users by using a Canopy algorithm and a K-means algorithm according to user data; the SPU acquisition module is used for constructing a knowledge map according to the full order data and acquiring an SPU related to the real-time interest of the user; and the list acquisition module is used for acquiring a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.

According to an embodiment of the present invention, the user acquisition module includes:

According to an embodiment of the present invention, the coarse clustering submodule includes:

According to an embodiment of the present invention, the list obtaining module includes:

the first recalling SPU submodule is used for comparing the commodities under the similar users with the SPU associated with the actual interest of the users according to an SPU acquisition module to obtain a first recalling SPU;

The system for improving the commodity recommendation diversity provided by the embodiment of the invention solves the problems of single recommended commodity, insufficient association degree and lack of diversity. Compared with the prior art, in the implementation of the invention, the user acquisition module acquires similar users by using a Canopy algorithm and a K-means algorithm according to user data, the parameter calculation submodule in the user acquisition module takes the number of canlays and the initial center determined by the Canopy algorithm as the cluster number K and the initial center of the K-means algorithm, the number of the canlays is the cluster number K of the K-means, adjustment according to experience is not needed, the calculation of the distances is carried out in pairs in the Canopy, the distances of all samples are not needed to be calculated, the calculated amount is smaller, the problem of difficult parameter adjustment is solved, the calculation convergence is faster, and a large amount of data can be processed; the SPU acquisition module constructs a knowledge map according to the full order data, acquires an SPU associated with the real-time interest of a user, wherein the knowledge map has a transfer function, the SKU magnitude is far larger than that of the SPU, the SPU enables data to be denser, the data volume is reduced after the association calculation, the relation transfer of commodities is deeper, and more commodities can be recalled; the list acquisition module acquires a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users, the narrower the recommendation is based on the coordination filtering, the more divergent the relation of the knowledge map is, the divergence degree of the recall is improved, the freshness of the commodities is acquired, the problem of breakage of the commodity relation caused by sparse data is solved, and the recommendation effect categories are richer. The system is low in calculation complexity, superior in recommendation performance under the condition of sparse data, more in recommendation quantity, more in cross-category recommendation quantity, and more in effective quantity and cross-category effective quantity which are consistent with actual results.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method for improving commodity recommendation diversity, the method comprising:

acquiring similar users by using a Canopy algorithm and a K-means algorithm according to user data;

constructing a knowledge graph according to the full order data, and acquiring an SPU (short processing Unit) associated with the real-time interest of the user;

and acquiring a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.

2. The method for improving the diversity of commodity recommendations according to claim 1, wherein the obtaining similar users using a Canopy algorithm and a K-means algorithm based on user data comprises:

collecting user demographic attribute data and behavior data of a user on multiple screens, and performing data cleaning on the user demographic attribute data and the behavior data;

calculating long-term interest preference and short-term interest preference of the user;

carrying out rough clustering on the cleaned user demographic attribute data, the user long-term interest preference and the user short-term interest preference by using a Canopy algorithm, and determining the cluster number K and the initial center of the K-means algorithm;

and acquiring similar users by using the K-means algorithm.

3. The method for improving the diversity of commodity recommendations according to claim 2, wherein the determining the number K of clusters and the initial center of the K-means algorithm comprises:

setting a first distance parameter T1 and a second distance parameter T2, wherein T1 is greater than T2, setting a user set D, randomly selecting a user in the user set D, and setting the distance D between the user and other users in the user set D; when D is less than T1, putting the user into a Canopy and deleting the users with D less than T2 from the user set D; until the user set D is empty, the users are divided into a plurality of Canopy; the number of Canopy is used as the cluster number K of K-means, and the center of Canopy is used as the initial center of K-means.

4. The method as claimed in claim 1, wherein said obtaining a recommendation list of goods according to said similar users and said SPU associated with real-time user interest comprises:

comparing the commodities under the similar users with the SPU associated with the real-time interest of the users to obtain a first recalling SPU;

calculating a real-time interest score of the user according to the online behavior data of the user;

according to the real-time interest scores of the users and the scores of the commodities of the similar users under the SPU (SPU) to be recalled, performing CTR/CVR score prediction on the SPU to be recalled, and then performing deduplication and filtering processing on the commodities to be recalled;

and sorting the recalled commodities which are subjected to the duplicate removal and filtering treatment to obtain a commodity recommendation list.

5. The method as claimed in claim 1, wherein said obtaining a recommendation list of goods according to said similar users and said SPU associated with real-time user interest comprises:

if the SPU associated with the real-time interest of the user cannot be acquired by using the knowledge graph, a recalled commodity set is obtained by using a collaborative filtering combination algorithm;

comparing the commodities under the similar users with the recalled commodity set to obtain a second recalled SPU;

according to the real-time interest scores of the users and the scores of the commodities of the similar users under the SPU (SPU) for the second recall, the CTR/CVR score prediction is carried out on the SPU for the second recall, and then the recalled commodities are subjected to deduplication and filtering processing;

6. A system for improving commodity recommendation diversity, comprising:

7. The system for improving the diversity of recommended goods according to claim 6, wherein the user acquisition module comprises:

8. The system for improving the diversity of recommended goods according to claim 7, wherein the coarse clustering submodule comprises:

9. The system for improving the diversity of the recommendation of the merchandise according to claim 6, wherein the list obtaining module comprises:

10. The system for improving the diversity of the recommendation of the merchandise according to claim 6, wherein the list obtaining module comprises: