CN110910207A - Method and system for improving commodity recommendation diversity - Google Patents

Method and system for improving commodity recommendation diversity Download PDF

Info

Publication number
CN110910207A
CN110910207A CN201911042387.6A CN201911042387A CN110910207A CN 110910207 A CN110910207 A CN 110910207A CN 201911042387 A CN201911042387 A CN 201911042387A CN 110910207 A CN110910207 A CN 110910207A
Authority
CN
China
Prior art keywords
user
spu
users
real
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911042387.6A
Other languages
Chinese (zh)
Inventor
马荣叶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suning Cloud Computing Co Ltd
Original Assignee
Suning Cloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suning Cloud Computing Co Ltd filed Critical Suning Cloud Computing Co Ltd
Priority to CN201911042387.6A priority Critical patent/CN110910207A/en
Publication of CN110910207A publication Critical patent/CN110910207A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0633Lists, e.g. purchase orders, compilation or processing

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the invention discloses a method for improving commodity recommendation diversity, which solves the problem that recommended commodities lack diversity, and obtains similar users by using a Canopy algorithm and a K-means algorithm according to user data; constructing a knowledge graph according to the full order data, and acquiring an SPU (short processing Unit) associated with the real-time interest of the user; and obtaining a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.

Description

Method and system for improving commodity recommendation diversity
Technical Field
The invention belongs to the field of commodity recommendation, and particularly relates to a method and a system for improving commodity recommendation diversity.
Background
The commodity recommendation algorithm in the field mainly comprises the following algorithms: content-based recommendations, data mining-based recommendations, combined recommendations, and memory-based collaborative filtering algorithms.
The recommendation technology based on the content is used for recommending according to the user model after the user feature model is generated, the recommendation result is visual and easy to understand, the recommendation can be recommended for special users, domain knowledge is not needed, and the method is narrow in application range.
The recommendation technology based on data mining is mainly used for recommending by generating association rules, is not limited by recommended contents, does not need domain knowledge, but is difficult to generate the association rules, consumes time, and is low in commodity personalization degree and insufficient in commodity diversity according to the recommendation rules.
In the combined recommendation, two or more recommendation methods are combined according to a specific rule to avoid or make up for the defects of a single recommendation technology before combination, and theoretically, a plurality of recommendation combination methods exist, but the recommendation combination methods are not effective in a specific problem and the adjustment of combination strategy parameters is difficult.
A collaborative filtering algorithm based on a memory is adopted, recommendation is carried out by calculating similar users or similar commodities, the model updating period is short, the interest change of the users can be reflected in time, and the problem of data sparsity exists.
Based on similarity calculation of users in commodity recall coordinated by users, a K-means algorithm is mostly utilized, but the cluster number K and an initial center are difficult to determine, and the calculated similarity and data size of the whole commodities are too large.
Due to the fact that the recommended algorithm is narrow in applicable content, the number K and the initial center in the K-means algorithm are difficult to determine, parameters among combined algorithms are difficult to adjust, time consumption for processing a large amount of data is too long, and data sparseness causes breakage of commodity relations, the obtained recommended commodities are single, the relevance degree is not enough, and diversity is lacking.
Disclosure of Invention
The invention provides a method and a system for improving commodity recommendation diversity, and solves the problem that recommended commodities lack diversity.
In order to achieve the above purpose, the embodiment of the invention adopts the following technical scheme:
in a first aspect, an embodiment of the present invention provides a method for improving commodity recommendation diversity, where according to user data, similar users are obtained by using a Canopy algorithm and a K-means algorithm; constructing a knowledge graph according to the full order data, and acquiring an SPU (short processing Unit) associated with the real-time interest of the user; and obtaining a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.
With reference to the first aspect, as a first implementation manner, collecting user demographic attribute data and behavior data of a user on multiple screens, performing data cleaning on the user demographic attribute data and the behavior data, and calculating long-term interest preference and short-term interest preference of the user; carrying out rough clustering on the cleaned user demographic attribute data, the user long-term interest preference and the user short-term interest preference by using a Canopy algorithm, and determining the cluster number K and the initial center of the K-means algorithm; acquiring similar users by using a K-means algorithm;
with reference to the first possible implementation manner of the first aspect, as a second implementation manner, setting a first distance parameter T1 and a second distance parameter T2, where T1> T2, setting a user set as D, randomly selecting a user in the user set D, and setting a distance D between the user and other users in the user set D; when D is less than T1, putting the user into a Canopy and deleting the users with D less than T2 from the user set D; until the user set D is empty, the users are divided into a plurality of Canopy; the number of Canopy is used as the cluster number K of K-means, and the center of Canopy is used as the initial center of K-means.
With reference to the first aspect, as a third implementable manner, comparing the commodity of the similar user with the SPU associated with the real-time interest of the user to obtain a first recalling SPU; calculating a real-time interest score of the user according to the online behavior data of the user; according to the real-time interest scores of the users and the scores of the commodities of the similar users under the SPU (SPU) to be recalled, performing CTR/CVR score prediction on the SPU to be recalled, and then performing deduplication and filtering processing on the commodities to be recalled; and sorting the recalled commodities which are subjected to the duplicate removal and filtering treatment to obtain a commodity recommendation list.
With reference to the first aspect, as a fourth implementable manner, if the SPU associated with the real-time interest of the user cannot be acquired by using the knowledge graph, a recalled commodity set is obtained by using a collaborative filtering combination algorithm; comparing the commodities under the similar users with the recalled commodity set to obtain a second recalled SPU; calculating a real-time interest score of the user according to the online behavior data of the user; according to the real-time interest scores of the users and the scores of commodities under a second recalling SPU scored by the similar users, performing CTR/CVR score prediction on the second recalling SPU, and then performing deduplication and filtering processing on the recalled commodities; and sorting the recalled commodities which are subjected to the duplicate removal and filtering treatment to obtain a commodity recommendation list.
In a second aspect, an embodiment of the present invention provides a system for improving recommendation diversity of goods, including:
the user acquisition module is used for acquiring similar users by using a Canopy algorithm and a K-means algorithm according to user data;
the SPU acquisition module is used for constructing a knowledge map according to the full order data and acquiring an SPU related to the real-time interest of the user;
and the list acquisition module is used for acquiring a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.
With reference to the second aspect, as a first possible implementation manner, the user obtaining module includes:
the data acquisition submodule is used for acquiring user demographic attribute data and behavior data of the user on multiple screens;
the data cleaning submodule is used for cleaning the data of the data acquisition submodule;
the preference calculation submodule is used for calculating the long-term interest preference and the short-term interest preference of the user;
the rough clustering submodule is used for carrying out rough clustering on the cleaned user demographic attribute data, the user long-term interest preference and the user short-term interest preference by utilizing a Canopy algorithm and determining the cluster number K and the initial center of the K-means algorithm;
and the similar user obtaining submodule is used for obtaining similar users by utilizing the K-means algorithm.
With reference to the second aspect, as a second possible implementation manner, the coarse clustering sub-module includes:
the parameter calculation submodule is used for determining the cluster number K and the initial center of the K-means algorithm;
setting a first distance parameter T1 and a second distance parameter T2, wherein T1 is greater than T2, setting a user set D, randomly selecting a user in the user set D, and setting the distance D between the user and other users in the user set D;
when D is less than T1, putting the user into a Canopy and deleting the users with D less than T2 from the user set D;
until the user set D is empty, the users are divided into a plurality of Canopy;
the number of Canopy is used as the cluster number K of K-means, and the center of Canopy is used as the initial center of K-means.
With reference to the second aspect, as a third possible implementation manner, the list obtaining module includes:
the first score calculating submodule is used for calculating the real-time interest score of the user according to the online behavior data of the user;
the first recalling sub-module is used for comparing the commodities under the similar users with the SPU associated with the real-time interest of the users according to an SPU acquisition module to obtain a first recalling SPU;
the first score prediction submodule is used for performing CTR/CVR score prediction on the first recalling SPU according to the real-time interest score of the user and the score of the similar user on commodities under the first recalling SPU;
and the first processing submodule is used for performing duplicate removal and filtering processing on the recalled commodities, and sequencing the recalled commodities to obtain a commodity recommendation list.
With reference to the second aspect, as a fourth possible implementation manner, the list obtaining module includes:
the second score calculating submodule is used for calculating the real-time interest score of the user according to the online behavior data of the user;
the commodity recall module is used for obtaining a recalled commodity set by utilizing a collaborative filtering combination algorithm if the SPU associated with the real-time interest of the user cannot be obtained by utilizing the knowledge graph;
the second recalling submodule is used for comparing the commodities under the similar users with the recalling commodity set to obtain a second recalling SPU;
the second score prediction submodule is used for performing CTR/CVR score prediction on a second recalling SPU according to the real-time interest score of the user and the score of the similar user on commodities under the second recalling SPU;
and the second processing submodule is used for performing duplicate removal and filtering processing on the recalled commodities, and sequencing the recalled commodities to obtain a commodity recommendation list.
The method and the system for improving the commodity recommendation diversity provided by the embodiment of the invention solve the problems of single recommended commodity, insufficient association degree and lack of diversity. Compared with the prior art, in the embodiment of the invention, similar users are obtained by utilizing the Canopy algorithm and the K-means algorithm according to user data, the number of Canopy and the initial center determined by the Canopy algorithm are used as the number K of clusters and the initial center of the K-means algorithm, the problem of difficult parameter adjustment is solved, the calculation convergence is faster, and a large amount of data can be processed; according to the full order data, a knowledge graph is constructed, an SPU (SPU) associated with the real-time interest of a user is obtained, after association calculation, the data volume is reduced, the relation transfer of commodities is deeper, and more types of commodities can be recalled; and obtaining a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users, improving the recall divergence and obtaining the freshness of commodities, solving the problem of commodity relationship fracture caused by sparse data and enriching recommendation effect categories. The method is low in calculation complexity, the recommendation performance is superior under the condition of sparse data, the recommendation quantity is more, the cross-category recommendation quantity is more, and the effective quantity and the cross-category effective quantity which are consistent with the actual result are more.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments of the present invention will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart of an embodiment of the present invention.
Fig. 2 is a flow chart of the Canopy algorithm in the embodiment of the present invention.
FIG. 3 is a system block diagram of an embodiment of the invention.
Fig. 4 is a block diagram of a system configuration according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. Based on the embodiments of the present invention, those skilled in the art can obtain the embodiments without any inventive step in advance, and the embodiments are within the protection scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for improving recommendation diversity of a product, including:
s110, according to the user data, similar users are obtained by using a Canopy algorithm and a K-means algorithm.
The user data includes: user demographic attribute data and user behavior data across multiple screens. The user demographic attribute data includes, but is not limited to: gender, age, resident address, income, and education level. The behavior data of the user on multiple screens includes but is not limited to: browse, search, buy, comment, and share.
And after the user data is obtained, data cleaning is carried out. The data cleansing includes: and the missing values are smooth, the abnormal values are removed, the duplicate data is removed, and the data is normalized. Data cleaning aims to improve data quality and reduce the influence of dirty data on the accuracy of the model.
And performing real-time interest identification on the user by using the cleaned behavior data of the user on the multiple screens, establishing a user preference model, and calculating the long-term interest preference and the short-term interest preference of the user.
The long-term interest preference is an interest space or interest topic corresponding to the user's behavior for a longer period of time (e.g., 1 month or 3 months or 6 months or even longer). Short-term interest preferences are the space of interest or topic of interest to which a user behaves for a relatively short period of time (e.g., 7 or 3 days or instantaneously).
And roughly clustering the user demographic attribute data, the long-term interest preference and the short-term interest preference by using a Canopy clustering algorithm. The coarse clustering comprises: setting a first distance parameter T1 and a second distance parameter T2, wherein T1 is greater than T2, setting a user set D, randomly selecting a user in the user set D, and setting the distance D between the user and other users in the user set D; when D is less than T1, putting the user into a Canopy and deleting the users with D less than T2 from the user set D; until the user set D is empty, the users are divided into Canopy. The number of Canopy is taken as the cluster number of K-means, and the center of Canopy is taken as the initial center of K-means.
And acquiring similar users by using a K-means algorithm. And carrying out accurate similarity judgment on users in Canopy by using a K-means algorithm, wherein the users between different canlays do not carry out accurate similarity judgment. A Pearson Correlation Coefficient (Pearson Correlation Coefficient) is adopted to measure the similarity between users.
S120, according to the full order data, a knowledge graph is constructed, and an SPU related to the real-time interest of the user is obtained.
Full order data is collected. The full order data includes, but is not limited to: user ID and SKU (stock keeping Unit) are lost. Preferably, a data purge is performed on the full order data. The data cleansing includes: and the missing values are smooth, the abnormal values are removed, the duplicate data is removed, and the data is normalized.
Based on the cleaned full order data, a rule mining relation extraction technology is adopted, the sales relation among the SKUs is mined, the SKUs are aggregated, the relation of the associated sales is increased to an SPU (Standard Product Unit) level, the commodity data volume is degraded, the SPU pairs of the associated sales are stored by using a database, and the SPU knowledge graph construction is completed by combining service knowledge.
And acquiring the SPU corresponding to the real-time interest of the user by utilizing the transfer relationship between the SPU and the SPU established by the knowledge graph, thereby acquiring the SPU related to the real-time interest of the user in the knowledge graph.
S130, obtaining a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.
And carrying out real-time interest identification on the user online behavior data, wherein different behaviors correspond to different weights, different behavior durations correspond to different weights, and a plurality of real-time interest scores are calculated through linear combination.
Combining the SPU which is recalled by the similar user in a coordinated filtering way with the SPU which is recalled by the knowledge map, and acquiring a first recalled SPU, wherein the first recalled SPU is a commodity which only exists under the similar user but does not exist under the SPU which is related to the real-time interest of the user.
The method comprises the steps of utilizing real-time interest scores of users to combine with scores of similar users to score commodities under a first recalled SPU, conducting CTR/CVR (Click Through Rate, which is abbreviated as CTR in the text, ConVersionRate, which is abbreviated as CVR in the text) score prediction on the commodities under the first recalled SPU, then conducting de-duplication and filtering processing on the recalled commodities, and obtaining a commodity recommendation list after sorting the processed recalled commodities according to the CTR/CVR scores.
And if the SPU associated with the real-time interest of the user cannot be acquired by using the knowledge map, obtaining a recalled commodity set by using a collaborative filtering combination algorithm, and comparing the commodities under the similar user with the recalled commodity set to obtain a second recalled SPU, wherein the second recalled SPU is the commodity only existing under the similar user but not under the recalled commodity set.
And (3) performing CTR/CVR score prediction on the commodities under the second recalling SPU by using the real-time interest scores of the users and combining with the scores of the commodities under the second recalling SPU by similar users, performing deduplication and filtering processing on the recalled commodities, and sequencing the processed recalled commodities according to the CTR/CVR scores to obtain a commodity recommendation list.
Compared with the prior art, in the implementation of the invention, similar users are obtained by utilizing a Canopy algorithm and a K-means algorithm according to user data, the number of canlays and the initial center determined by the Canopy algorithm are used as the cluster number K and the initial center of the K-means algorithm, the number of the Canopy is the cluster number K of the K-means, adjustment is not needed according to experience, samples in the Canopy are calculated pairwise during the distance calculation, the pairwise distance of a total number of samples is not needed to be calculated, the calculated amount is smaller, the problem of difficult parameter adjustment is solved, the calculation convergence is faster, and a large amount of data can be processed; according to the full-amount order data, a knowledge map is constructed, an SPU (unified modeling unit) associated with the real-time interest of a user is obtained, the knowledge map has a transfer function, the SKU magnitude is far larger than that of the SPU, the SPU enables the data to be denser, the data volume is reduced after the association calculation, the relation transfer of commodities is deeper, and more types of commodities can be recalled; and acquiring a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users, wherein the narrower the recommendation is based on the coordination filtering, the more divergent the relation of the knowledge map, the divergence of the recall and the freshness of the acquired commodities are improved, the problem of breakage of the commodity relation caused by sparse data is solved, and the recommendation effect categories are richer. The method is low in calculation complexity, the recommendation performance is superior under the condition of sparse data, the recommendation quantity is more, the cross-category recommendation quantity is more, and the effective quantity and the cross-category effective quantity which are consistent with the actual result are more.
As shown in fig. 4, an embodiment of the present invention further provides a system for improving the diversity of recommended goods, including:
the user acquisition module is used for acquiring similar users by using a Canopy algorithm and a K-means algorithm according to user data; the SPU acquisition module is used for constructing a knowledge map according to the full order data and acquiring an SPU related to the real-time interest of the user; and the list acquisition module is used for acquiring a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.
According to an embodiment of the present invention, the user acquisition module includes:
the data acquisition submodule is used for acquiring user demographic attribute data and behavior data of the user on multiple screens;
the data cleaning submodule is used for cleaning the data of the data acquisition submodule;
the preference calculation submodule is used for calculating the long-term interest preference and the short-term interest preference of the user;
the rough clustering submodule is used for carrying out rough clustering on the cleaned user demographic attribute data, the user long-term interest preference and the user short-term interest preference by utilizing a Canopy algorithm and determining the cluster number K and the initial center of the K-means algorithm;
and the similar user obtaining submodule is used for obtaining similar users by utilizing the K-means algorithm.
According to an embodiment of the present invention, the coarse clustering submodule includes:
the parameter calculation submodule is used for determining the cluster number K and the initial center of the K-means algorithm;
setting a first distance parameter T1 and a second distance parameter T2, wherein T1 is greater than T2, setting a user set D, randomly selecting a user in the user set D, and setting the distance D between the user and other users in the user set D;
when D is less than T1, putting the user into a Canopy and deleting the users with D less than T2 from the user set D;
until the user set D is empty, the users are divided into a plurality of Canopy;
the number of Canopy is used as the cluster number K of K-means, and the center of Canopy is used as the initial center of K-means.
According to an embodiment of the present invention, the list obtaining module includes:
the first score calculating submodule is used for calculating the real-time interest score of the user according to the online behavior data of the user;
the first recalling SPU submodule is used for comparing the commodities under the similar users with the SPU associated with the actual interest of the users according to an SPU acquisition module to obtain a first recalling SPU;
the first score prediction submodule is used for performing CTR/CVR score prediction on the first recalling SPU according to the real-time interest score of the user and the score of the similar user on commodities under the first recalling SPU;
and the first processing submodule is used for performing duplicate removal and filtering processing on the recalled commodities, and sequencing the recalled commodities to obtain a commodity recommendation list.
According to an embodiment of the present invention, the list obtaining module includes:
the second score calculating submodule is used for calculating the real-time interest score of the user according to the online behavior data of the user;
the commodity recall module is used for obtaining a recalled commodity set by utilizing a collaborative filtering combination algorithm if the SPU associated with the real-time interest of the user cannot be obtained by utilizing the knowledge graph;
the second recalling submodule is used for comparing the commodities under the similar users with the recalling commodity set to obtain a second recalling SPU;
the second score prediction submodule is used for performing CTR/CVR score prediction on a second recalling SPU according to the real-time interest score of the user and the score of the similar user on commodities under the second recalling SPU;
and the second processing submodule is used for performing duplicate removal and filtering processing on the recalled commodities, and sequencing the recalled commodities to obtain a commodity recommendation list.
The system for improving the commodity recommendation diversity provided by the embodiment of the invention solves the problems of single recommended commodity, insufficient association degree and lack of diversity. Compared with the prior art, in the implementation of the invention, the user acquisition module acquires similar users by using a Canopy algorithm and a K-means algorithm according to user data, the parameter calculation submodule in the user acquisition module takes the number of canlays and the initial center determined by the Canopy algorithm as the cluster number K and the initial center of the K-means algorithm, the number of the canlays is the cluster number K of the K-means, adjustment according to experience is not needed, the calculation of the distances is carried out in pairs in the Canopy, the distances of all samples are not needed to be calculated, the calculated amount is smaller, the problem of difficult parameter adjustment is solved, the calculation convergence is faster, and a large amount of data can be processed; the SPU acquisition module constructs a knowledge map according to the full order data, acquires an SPU associated with the real-time interest of a user, wherein the knowledge map has a transfer function, the SKU magnitude is far larger than that of the SPU, the SPU enables data to be denser, the data volume is reduced after the association calculation, the relation transfer of commodities is deeper, and more commodities can be recalled; the list acquisition module acquires a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users, the narrower the recommendation is based on the coordination filtering, the more divergent the relation of the knowledge map is, the divergence degree of the recall is improved, the freshness of the commodities is acquired, the problem of breakage of the commodity relation caused by sparse data is solved, and the recommendation effect categories are richer. The system is low in calculation complexity, superior in recommendation performance under the condition of sparse data, more in recommendation quantity, more in cross-category recommendation quantity, and more in effective quantity and cross-category effective quantity which are consistent with actual results.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A method for improving commodity recommendation diversity, the method comprising:
acquiring similar users by using a Canopy algorithm and a K-means algorithm according to user data;
constructing a knowledge graph according to the full order data, and acquiring an SPU (short processing Unit) associated with the real-time interest of the user;
and acquiring a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.
2. The method for improving the diversity of commodity recommendations according to claim 1, wherein the obtaining similar users using a Canopy algorithm and a K-means algorithm based on user data comprises:
collecting user demographic attribute data and behavior data of a user on multiple screens, and performing data cleaning on the user demographic attribute data and the behavior data;
calculating long-term interest preference and short-term interest preference of the user;
carrying out rough clustering on the cleaned user demographic attribute data, the user long-term interest preference and the user short-term interest preference by using a Canopy algorithm, and determining the cluster number K and the initial center of the K-means algorithm;
and acquiring similar users by using the K-means algorithm.
3. The method for improving the diversity of commodity recommendations according to claim 2, wherein the determining the number K of clusters and the initial center of the K-means algorithm comprises:
setting a first distance parameter T1 and a second distance parameter T2, wherein T1 is greater than T2, setting a user set D, randomly selecting a user in the user set D, and setting the distance D between the user and other users in the user set D; when D is less than T1, putting the user into a Canopy and deleting the users with D less than T2 from the user set D; until the user set D is empty, the users are divided into a plurality of Canopy; the number of Canopy is used as the cluster number K of K-means, and the center of Canopy is used as the initial center of K-means.
4. The method as claimed in claim 1, wherein said obtaining a recommendation list of goods according to said similar users and said SPU associated with real-time user interest comprises:
comparing the commodities under the similar users with the SPU associated with the real-time interest of the users to obtain a first recalling SPU;
calculating a real-time interest score of the user according to the online behavior data of the user;
according to the real-time interest scores of the users and the scores of the commodities of the similar users under the SPU (SPU) to be recalled, performing CTR/CVR score prediction on the SPU to be recalled, and then performing deduplication and filtering processing on the commodities to be recalled;
and sorting the recalled commodities which are subjected to the duplicate removal and filtering treatment to obtain a commodity recommendation list.
5. The method as claimed in claim 1, wherein said obtaining a recommendation list of goods according to said similar users and said SPU associated with real-time user interest comprises:
if the SPU associated with the real-time interest of the user cannot be acquired by using the knowledge graph, a recalled commodity set is obtained by using a collaborative filtering combination algorithm;
comparing the commodities under the similar users with the recalled commodity set to obtain a second recalled SPU;
calculating a real-time interest score of the user according to the online behavior data of the user;
according to the real-time interest scores of the users and the scores of the commodities of the similar users under the SPU (SPU) for the second recall, the CTR/CVR score prediction is carried out on the SPU for the second recall, and then the recalled commodities are subjected to deduplication and filtering processing;
and sorting the recalled commodities which are subjected to the duplicate removal and filtering treatment to obtain a commodity recommendation list.
6. A system for improving commodity recommendation diversity, comprising:
the user acquisition module is used for acquiring similar users by using a Canopy algorithm and a K-means algorithm according to user data;
the SPU acquisition module is used for constructing a knowledge map according to the full order data and acquiring an SPU related to the real-time interest of the user;
and the list acquisition module is used for acquiring a commodity recommendation list according to the similar users and the SPU associated with the real-time interest of the users.
7. The system for improving the diversity of recommended goods according to claim 6, wherein the user acquisition module comprises:
the data acquisition submodule is used for acquiring user demographic attribute data and behavior data of the user on multiple screens;
the data cleaning submodule is used for cleaning the data of the data acquisition submodule;
the preference calculation submodule is used for calculating the long-term interest preference and the short-term interest preference of the user;
the rough clustering submodule is used for carrying out rough clustering on the cleaned user demographic attribute data, the user long-term interest preference and the user short-term interest preference by utilizing a Canopy algorithm and determining the cluster number K and the initial center of the K-means algorithm;
and the similar user obtaining submodule is used for obtaining similar users by utilizing the K-means algorithm.
8. The system for improving the diversity of recommended goods according to claim 7, wherein the coarse clustering submodule comprises:
the parameter calculation submodule is used for determining the cluster number K and the initial center of the K-means algorithm;
setting a first distance parameter T1 and a second distance parameter T2, wherein T1 is greater than T2, setting a user set D, randomly selecting a user in the user set D, and setting the distance D between the user and other users in the user set D;
when D is less than T1, putting the user into a Canopy and deleting the users with D less than T2 from the user set D;
until the user set D is empty, the users are divided into a plurality of Canopy;
the number of Canopy is used as the cluster number K of K-means, and the center of Canopy is used as the initial center of K-means.
9. The system for improving the diversity of the recommendation of the merchandise according to claim 6, wherein the list obtaining module comprises:
the first score calculating submodule is used for calculating the real-time interest score of the user according to the online behavior data of the user;
the first recalling sub-module is used for comparing the commodities under the similar users with the SPU associated with the real-time interest of the users according to an SPU acquisition module to obtain a first recalling SPU;
the first score prediction submodule is used for performing CTR/CVR score prediction on the first recalling SPU according to the real-time interest score of the user and the score of the similar user on commodities under the first recalling SPU;
and the first processing submodule is used for performing duplicate removal and filtering processing on the recalled commodities, and sequencing the recalled commodities to obtain a commodity recommendation list.
10. The system for improving the diversity of the recommendation of the merchandise according to claim 6, wherein the list obtaining module comprises:
the second score calculating submodule is used for calculating the real-time interest score of the user according to the online behavior data of the user;
the commodity recall module is used for obtaining a recalled commodity set by utilizing a collaborative filtering combination algorithm if the SPU associated with the real-time interest of the user cannot be obtained by utilizing the knowledge graph;
the second recalling submodule is used for comparing the commodities under the similar users with the recalling commodity set to obtain a second recalling SPU;
the second score prediction submodule is used for performing CTR/CVR score prediction on a second recalling SPU according to the real-time interest score of the user and the score of the similar user on commodities under the second recalling SPU;
and the second processing submodule is used for performing duplicate removal and filtering processing on the recalled commodities, and sequencing the recalled commodities to obtain a commodity recommendation list.
CN201911042387.6A 2019-10-30 2019-10-30 Method and system for improving commodity recommendation diversity Pending CN110910207A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911042387.6A CN110910207A (en) 2019-10-30 2019-10-30 Method and system for improving commodity recommendation diversity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911042387.6A CN110910207A (en) 2019-10-30 2019-10-30 Method and system for improving commodity recommendation diversity

Publications (1)

Publication Number Publication Date
CN110910207A true CN110910207A (en) 2020-03-24

Family

ID=69816136

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911042387.6A Pending CN110910207A (en) 2019-10-30 2019-10-30 Method and system for improving commodity recommendation diversity

Country Status (1)

Country Link
CN (1) CN110910207A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767464A (en) * 2020-07-09 2020-10-13 海口科博瑞信息科技有限公司 Course platform content recommendation method, device, equipment and storage medium
CN112102029A (en) * 2020-08-20 2020-12-18 浙江大学 Knowledge graph-based long-tail recommendation calculation method
CN112667885A (en) * 2020-12-04 2021-04-16 四川长虹电器股份有限公司 Matrix decomposition collaborative filtering method and system for coupling social trust information
CN115935068A (en) * 2022-12-12 2023-04-07 杭州洋驼网络科技有限公司 Commodity recommendation method and device for Internet platform

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102956009A (en) * 2011-08-16 2013-03-06 阿里巴巴集团控股有限公司 Electronic commerce information recommending method and electronic commerce information recommending device on basis of user behaviors
CN104641374A (en) * 2012-07-20 2015-05-20 英特托拉斯技术公司 Information targeting systems and methods
CN106919699A (en) * 2017-03-09 2017-07-04 华北电力大学 A kind of recommendation method for personalized information towards large-scale consumer
CN108153792A (en) * 2016-12-02 2018-06-12 阿里巴巴集团控股有限公司 A kind of data processing method and relevant apparatus
CN110148043A (en) * 2019-03-01 2019-08-20 安徽省优质采科技发展有限责任公司 The bid and purchase information recommendation system and recommended method of knowledge based map

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102956009A (en) * 2011-08-16 2013-03-06 阿里巴巴集团控股有限公司 Electronic commerce information recommending method and electronic commerce information recommending device on basis of user behaviors
CN104641374A (en) * 2012-07-20 2015-05-20 英特托拉斯技术公司 Information targeting systems and methods
CN108153792A (en) * 2016-12-02 2018-06-12 阿里巴巴集团控股有限公司 A kind of data processing method and relevant apparatus
CN106919699A (en) * 2017-03-09 2017-07-04 华北电力大学 A kind of recommendation method for personalized information towards large-scale consumer
CN110148043A (en) * 2019-03-01 2019-08-20 安徽省优质采科技发展有限责任公司 The bid and purchase information recommendation system and recommended method of knowledge based map

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周伟 等: "基于Canopy聚类的谱聚类算法", 《计算机工程与科学》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767464A (en) * 2020-07-09 2020-10-13 海口科博瑞信息科技有限公司 Course platform content recommendation method, device, equipment and storage medium
CN112102029A (en) * 2020-08-20 2020-12-18 浙江大学 Knowledge graph-based long-tail recommendation calculation method
CN112667885A (en) * 2020-12-04 2021-04-16 四川长虹电器股份有限公司 Matrix decomposition collaborative filtering method and system for coupling social trust information
CN115935068A (en) * 2022-12-12 2023-04-07 杭州洋驼网络科技有限公司 Commodity recommendation method and device for Internet platform
CN115935068B (en) * 2022-12-12 2023-09-05 杭州洋驼网络科技有限公司 Commodity recommendation method and device for Internet platform

Similar Documents

Publication Publication Date Title
CN109299370B (en) Multi-pair level personalized recommendation method
Gensler et al. Listen to your customers: Insights into brand image using online consumer-generated product reviews
CN110910207A (en) Method and system for improving commodity recommendation diversity
CN110069663B (en) Video recommendation method and device
CN110162693A (en) A kind of method and server of information recommendation
CN109492180A (en) Resource recommendation method, device, computer equipment and computer readable storage medium
CN103258025B (en) Generate the method for co-occurrence keyword, the method that association search word is provided and system
TW201501059A (en) Method and system for recommending information
CN104281622A (en) Information recommending method and information recommending device in social media
CN110175895B (en) Article recommendation method and device
CN109447762B (en) Commodity recommendation method and device, server and commodity recommendation system
CN111737418B (en) Method, apparatus and storage medium for predicting relevance of search term and commodity
CN108153792B (en) Data processing method and related device
CN112613953A (en) Commodity selection method, system and computer readable storage medium
CN109241451B (en) Content combination recommendation method and device and readable storage medium
CN111967970B (en) Bank product recommendation method and device based on spark platform
Wang et al. Advertisement click-through rate prediction using multiple criteria linear programming regression model
CN108805598A (en) Similarity information determines method, server and computer readable storage medium
CN110991189A (en) Method and system for generating decision result according to data acquired by acquisition module
US20160171590A1 (en) Push-based category recommendations
CN109977299A (en) A kind of proposed algorithm of convergence project temperature and expert's coefficient
CN117745349A (en) Personalized coupon pushing method and system based on user characteristics
CN111127074B (en) Data recommendation method
von Rimscha It’s not the economy, stupid! External effects on the supply and demand of cinema entertainment
CN113592589A (en) Textile raw material recommendation method and device and processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200324