CN105335391B

CN105335391B - The treating method and apparatus of searching request based on search engine

Info

Publication number: CN105335391B
Application number: CN201410326142.7A
Authority: CN
Inventors: 崔保良
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2014-07-09
Filing date: 2014-07-09
Publication date: 2019-02-15
Anticipated expiration: 2034-07-09
Also published as: HK1218449A1; CN105335391A

Abstract

The invention discloses a kind for the treatment of method and apparatus of searching request based on search engine.Wherein, this method comprises: the processing method of the searching request based on search engine, this method comprises: receiving the current search item of user's input；The used search terms of user are selected from historical search log, obtain at least one corresponding candidate search item of current search item；By using in historical search log historical search item set and search behavior information as training sample carry out off-line training, establish the prediction model of user behavior；The Correlation Calibration of candidate search item and user behavior is carried out to the corresponding candidate search item of current search item using prediction model；The candidate search item to impose a condition will be met as the corresponding recommendation search terms of current search item according to Correlation Calibration, generate the corresponding recommendation collection of search terms of current search item.The present invention solves the technical problem of search result inaccuracy.

Description

Search request processing method and device based on search engine

Technical Field

The invention relates to the field of computer internet, in particular to a method and a device for processing a search request based on a search engine.

Background

In the e-commerce search, in order to search for a commodity required by a user as quickly as possible, the input search term (query) may be very detailed, and a requirement is expressed by constructing a combination of a plurality of participles (term). However, for a business search engine, the recalling mode generally considers that a plurality of participles input by a user are in an AND relationship, AND the situations of no result AND few results are easy to occur. In this case, the most common way is to recall more items satisfying the user's intention by rewriting the search term input by the user through a search term omission technique.

The search term omission is to delete some word segments in the search term through a search term transformation technology so as to obtain a new and shorter rewritten search term (sub _ query) after rewriting, so that the rewritten search term (sub _ query) retains important information of an original search term as much as possible, and commodities obtained by searching after rewriting are as many as possible and meet the original shopping intention of a user.

The solutions of the search term omission technique provided by the prior art mainly include the following two solutions:

the first scheme is as follows: the importance of each participle contained in the search term is calculated and then a missing word is selected by ordering the participles. The method comprises the following specific steps: firstly, a search term (comprising a plurality of participles) input by a user is given; then, calculating the importance of each participle by using a logic algorithm; and finally, keeping the participle with the maximum importance, discarding other participles from small to large according to the importance, and generating the sub-search term.

Scheme II: the rewritten search terms are discarded first and then sorted. The method comprises the following specific steps: firstly, a search term (comprising a plurality of participles) input by a user is given; then, discarding some words by using an enumeration mode or other modes to generate a possible candidate participle subset; then, the subset is evaluated by using a mutual information method for determining the correlation between molecules in the text classification by a feature selection method, and finally, an optimal rewrite search term after rewrite is generated by using a maximum spanning tree condition.

In the application of the system for rewriting the search term input by the user, the following two methods are mainly used for generating a candidate rewritten search term set through the search term omission technology:

the first method is as follows: directly using the rewritten search terms after the optimal rewriting or combining the rewritten search term sets after the optimal rewriting to perform screening type query, and displaying the obtained search result sets to the user after sorting according to the relevance.

The second method comprises the following steps: and displaying the rewritten optimal candidate set to the user in a prompt bar mode, and enabling the user to decide to click on a certain prompt search term according to the intention of the user to obtain a search result.

As can be seen from the analysis, the rewriting method adopted in the above conventional application has the following disadvantages:

on the other hand, the number of recalled products after rewriting cannot be predicted. On one hand, multiple online attempts can seriously affect the performance and efficiency of the query; on the other hand, although the rewritten search term with less lost information is rewritten for accuracy, the user cannot recall more search results with good quality, and the decision of the user in the next step is also influenced.

On the other hand, it is not known whether the search term after rewriting also satisfies the search requirement of the user. The measure of determining whether the user search requirements are met may be used to advantage for the user's next operational behavior.

In addition, the above-described conventional technique cannot consider the rewritten search term after rewriting as a meaningful whole, and may consider that a combination of important partial words has no meaning or even makes a transition.

Aiming at the problem that the processing result of the search term input by the user is not perfect and the search result is not accurate in the prior art, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a search request processing method and device based on a search engine, and aims to improve the accuracy of a search result.

According to an aspect of the embodiments of the present invention, there is provided a method for processing a search request based on a search engine, the method including: receiving a current search term input by a user; selecting a search term used by a user from a historical search log, and acquiring at least one candidate search term corresponding to the current search term; taking a historical search term set and search behavior information in a historical search log as training samples to perform offline training, and establishing a prediction model of user behavior; performing correlation verification of the candidate search terms and the user behaviors on the candidate search terms corresponding to the current search terms by using a prediction model; and according to the relevance verification, taking the candidate search term meeting the set condition as a recommended search term corresponding to the current search term, and generating a recommended search term set corresponding to the current search term.

According to another aspect of the embodiments of the present invention, there is also provided a device for processing a search request based on a search engine, the device including: the receiving module is used for receiving a current search term input by a user; the acquisition module is used for selecting a search term used by a user from a historical search log and acquiring at least one candidate search term corresponding to the current search term; the model establishing module is used for performing off-line training by taking a historical search term set and search behavior information in a historical search log as training samples to establish a prediction model of user behaviors; the checking module is used for checking the correlation between the candidate search item and the user behavior on the candidate search item corresponding to the current search item by using the prediction model; and the generating module is used for taking the candidate search terms meeting the set conditions as the recommended search terms corresponding to the current search term according to the relevance verification and generating a recommended search term set corresponding to the current search term.

In the embodiment of the invention, the current search term input by a user is received; selecting a search term used by a user from a historical search log, and acquiring at least one candidate search term corresponding to the current search term; taking a historical search term set and search behavior information in a historical search log as training samples to perform offline training, and establishing a prediction model of user behavior; performing correlation verification of the candidate search terms and the user behaviors on the candidate search terms corresponding to the current search terms by using a prediction model; according to the relevance verification, candidate search terms meeting set conditions are used as recommended search terms corresponding to the current search terms, a mode of generating a recommended search term set corresponding to the current search terms is provided, a prediction model for rewriting the current search terms input by a user based on historical search terms recorded in historical search logs and historical behaviors of each historical search term is provided, the search terms input by the current user can be processed by using the prediction model obtained through modeling by learning a mode of actively rewriting the search terms in the historical search logs session and extracting training samples with effective characteristics for modeling, and the recommended search term set is determined for the current search terms input. Because the historical search log also provides the search behavior information of the historical search term, namely the feedback information of the search behavior is fused in the prediction model, better decision can be made on rewriting of the current search term, so that the query speed and accuracy of the user are improved, and the original intention of the user can be met to the greatest extent. And further the technical problem of inaccurate search results is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a block diagram of a hardware architecture for implementing a method for processing search requests based on a search engine according to an embodiment of the present invention;

FIG. 2 is a flowchart of a method for processing a search request based on a search engine according to an embodiment of the present invention;

FIG. 3 is a detailed flowchart of a method for processing a search request based on a search engine according to an embodiment of the present invention;

FIG. 4 is a diagram of a processing device for a search request based on a search engine according to a second embodiment of the present invention;

FIG. 5 is a diagram of an alternative search engine based search request processing apparatus according to a second embodiment of the present invention;

FIG. 6 is a diagram of an alternative search engine based search request processing apparatus according to a second embodiment of the present invention;

fig. 7 is a schematic diagram of an alternative processing apparatus for a search request based on a search engine according to a second embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Example 1

In accordance with an embodiment of the present invention, there is provided a method embodiment of a method for processing a search request based on a search engine, it is noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

The method provided by the first embodiment of the present application may be executed in a computer terminal, a mobile terminal, or a similar user computing device. Taking the example of running on a computer terminal, fig. 1 is a block diagram of a hardware structure of a computer terminal 10 and a background search server 30 for running a processing method of a search request based on a search engine according to an embodiment of the present invention. As shown in fig. 1, the computer terminal 10 may include one or more processors 102 (only one is shown in the figure) (the processor 102 may include but is not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission device 106 for communication function, and the background search server 30 may include one or more search engine processors 301 (only one is shown in the figure) (the search engine processors 301 may include but is not limited to a processing device such as a microprocessor MCU or a programmable logic device FPGA), and a communication device 303. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the processing method of the search request based on the search engine in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the processing method of upgrading the application software. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10.

Under the operating environment, the application provides a processing method of a search request based on a search engine as shown in fig. 2. Fig. 2 is a flowchart of a method for processing a search request based on a search engine according to an embodiment of the present invention.

As shown in fig. 2, the method for processing a search request based on a search engine may include the following steps:

in step S20, the current search term input by the user is received.

The search term in the above step S20 of the present application may be input by the user through an input device of the terminal device. Information for the search term entered by the user may be sent to the processor 102 shown in fig. 1.

Step S22, selecting a search term used by the user from the historical search log to obtain at least one candidate search term corresponding to the current search term.

In the above step S22, the processor 102 shown in fig. 1 may be used to select a search term used by the user from the historical search log as a sample according to the search term input by the user. At least one search term is selected as a candidate search term from the search terms used by the user. The candidate search term is a subset of a set of search terms historically entered by the user. All of the candidate search terms are likely to be displayed in the search bar as recommended search terms for selection by the user.

And step S24, performing off-line training by taking the historical search term set and the search behavior information in the historical search log as training samples, and establishing a prediction model of the user behavior. Wherein the search behavior information in this step is behavior information occurring when the search term is used in correspondence to the user history.

The step S24 can be implemented by the search engine processor 301 of the background search server 30 shown in fig. 1 or the processor 102 shown in fig. 1, to obtain a prediction model of user behavior by performing offline training using a historical search term set in the historical search log and behavior information of a user occurring based on a search term (such as click behavior of a search result after obtaining the search result using a search term search) as a training sample.

The predictive model is used to predict a set of recommended search terms that may be of interest to a user based on currently entered search terms. In addition, the process of performing the verification processing on the candidate search term by using the prediction model may be performing relevance verification, that is, matching the candidate search term with feedback information of the search behavior determined by the prediction model is realized, and the candidate search term with the matching degree greater than or equal to a threshold is extracted as the recommended search term.

The historical search term set is a search term used by a user in history, and can be a search term which is already input by the user within a certain time, and in the searching process, after the user inputs a search term to search, the user can search by replacing a similar search term. That is, if the user is not satisfied with the search result corresponding to an input search term, the term (term) of the search term is modified (which may include ways of adding, deleting, and updating the term (term)) to obtain a new search term, and the system stores the new search term as an input search term in the historical search term set, which is a mode of actively rewriting the search term by the user.

And step S26, performing correlation check of the candidate search term and the user behavior on the candidate search term corresponding to the current search term by using the prediction model. This step achieves that a correlation check of candidate search terms and user behavior is performed using a predictive model of user behavior established based on search terms used by the user's history and behavior information of the user occurring based on the search terms.

And step S28, according to the relevance verification, taking the candidate search term meeting the set condition as the recommended search term corresponding to the current search term, and generating a recommended search term set corresponding to the current search term.

Analysis shows that the above embodiments of the present application provide historical search terms recorded in the historical search log and historical behaviors of each historical search term. The method comprises the steps of actively rewriting a mode of a search term by a user in a historical search log through machine learning, extracting a training sample comprising effective characteristics, and modeling a prediction model of a search term rewriting mode, so that the search term input by the current user can predict the rewriting mode corresponding to the search term input by the user at present by using the prediction model obtained through modeling, namely, the rewritten search term corresponding to the search term input at present is predicted. Because the historical search log also provides the search behavior information of the historical search term, namely the feedback information of the search behavior is fused in the prediction model, better decision can be made on rewriting of the current search term, so that the query speed and accuracy of the user are improved, and the original intention of the user can be met to the greatest extent. And further the technical problem of inaccurate search results is solved.

Steps S20 to S28 provided in the above embodiments of the present application may be executed on a computer client, and in the implementation process, the computer client in the above embodiments may be a computer terminal installed with a browser or a search client for searching. Analysis shows that the application range of the search term omission technology used in the method is not limited to the search term omission direction, and the rewriting mode can be expanded to one or a mixed mode of addition, deletion and replacement of the participle (term), so as to improve the quality of the current search term and improve the efficiency and experience of user search.

The above embodiment is described in detail with reference to the flowchart shown in fig. 3.

As can be seen from fig. 3, the processing method for the search request based on the search engine according to the embodiment of the present application is a scheme for automatically rewriting the search term currently input by the user, and may include two parts, namely an offline model training process and an online real-time prediction process.

The offline model training function realizes the process of obtaining the prediction model by taking the historical search term set and the search behavior information in the historical search log as training samples to perform offline training, and the part of functions can comprise the following three implementation steps: a) selecting samples from historical search logs; b) extracting characteristics of the selected samples; c) and generating a training sample according to the extracted features, and performing model training on the training sample.

The online real-time prediction function is mainly realized, correlation verification processing is carried out on the currently received search term based on the obtained training module, a search term set recommended by the system for the current search term is obtained, and the online real-time prediction function mainly comprises the following two implementation steps: a) the current search term can be initially selected based on the historical search log; b) the candidate search terms obtained by the initial selection can be subjected to prediction processing based on the obtained training model, and the recommended search term corresponding to the current search term is obtained.

The above-described function of offline model training and online real-time prediction will be described in detail with reference to fig. 3.

In the above embodiment of the present application, the offline model training function performed in step S24, here to be noted that the step of establishing the prediction model of the user behavior by performing offline training with the historical search term set and the search behavior information in the historical search log as training samples may include the following embodiments:

step S241, reading the historical search log, and acquiring the historical search term set in a predetermined time period and the search behavior information of each historical search term in the historical search term set.

As can be seen from fig. 3, the step S241 implements a sample selection process. The historical search log session, which may be stored in the memory 104 shown in fig. 1, refers to a set of a series of actions that are continuously performed by a user at intervals of a predetermined time period (each of the historical search log sessions generally includes a search user cookie, search behavior information of the user using a search engine, a time when the search behavior is generated, a location where the search behavior is generated, and a search term corresponding to the search behavior), where the series of actions includes an input historical search term and search behavior information performed on each historical search term (such as the search behavior, a click behavior of a search result, a page turning behavior, a page closing behavior, and the like).

In this application, the key in step S241 is to obtain a search term sequence (i.e., a search term input in an input box of a search box page within a predetermined time by a user, such as a computer, a computer keyboard, a computer mouse, a repair computer, etc.) formed by search terms used by the user within a certain time interval, and search behavior information (e.g., click feedback information) after search corresponding to each search term.

In step S243, at least one matching pair of historical search terms < historical search term a, historical search term B > is extracted from the historical search term set, wherein the set of participles (term) included in the historical search term B is a proper subset of the set of participles (term) included in the historical search term a.

The above step S243 realizes that after the historical search term set in the historical search log is obtained, the historical search term matching pair < a, B > -which satisfies the following characteristics is selected from the historical search term set, wherein the participle (term) set of B is a proper subset of the participle (term) set of a. The historical search terms in the historical search term set can be stored in a form of a queue, so that the historical search term set comprises a group of historical search term sequences, and therefore, a forward scanning mode can be adopted to extract the historical search term matching pairs from the historical search term sequences.

And step S245, extracting the characteristics of the historical search term matching pairs < the historical search term A and the historical search term B > and the characteristics of the historical search term B in the historical search term matching pairs, and combining to generate a characteristic set.

Preferably, the historical search term matching pairs < historical search term a in step S245 above, and the features of the historical search term B > include any one or more of the following features: behavior characteristics, text characteristics, knowledge base attribute characteristics and statistical characteristics; the characteristics of the historical search term B include any one or more of the following: historical statistical features, term-segmented (term) combination features, text features, knowledge base attribute features, and part-of-speech features.

It should be noted here that the historical search term a and the historical search term B in the historical search term matching pair are both search terms that have been input by the user within a certain time, where the historical search term B may be a search term obtained by modifying a term (term) of the historical search term a after the historical search term a is input by the user, and therefore, as can be seen from fig. 3, the historical search term a and the historical search term B may be regarded as a rewritten search term obtained by rewriting a historical search term, and thus, the historical search term matching pair may also be represented as a query-sub _ query pair. Thus, the process of extracting the features in step S245 is the feature extraction for the query-sub _ query pair and the feature extraction for rewriting the search term sub _ query itself.

The following describes in detail the query-sub _ query pair and the features of the rewritten search term sub _ query in the history search term matching pair:

1. characteristics of the matching pairs (query-sub _ query pairs) for the historical search terms:

behavior characteristics: the click rate ctr of the lead after rewriting, bargaining.

The behavior characteristics represent the behavior characteristics of the user in the process of modifying from the historical search term A to the search term B: click rate ctr and whether to commit to characterize the effectiveness of this rewrite behavior. The introduction of the feature can effectively integrate behavior information of the rewritten user into the model as an empirical feature. This feature may not always be able to capture values, and the case where no values are captured is filled in with the mean.

Text characteristics: prefix coincidence proportion, suffix coincidence proportion and word missing position; dropping participle (term) proportion and length proportion; the id of the dropped word, the id of the retained word, etc.

The text feature represents the difference in text between the historical search term a and the historical search term B. Wherein, the prefix coincidence proportion, the suffix coincidence proportion and the word-missing position are used for learning the input position of the user which tends to the important information; drop participle (term) ratios and drop length ratios are used to learn whether a user tends to retain more words; losing the id of the word and keeping the id of the word are important characteristics, and the user can directly learn the word which is considered to be important.

Knowledge base attribute characteristics: the types of knowledge base attributes of dropped and retained participles (term) and proportion information are enumerated.

Since commercial searches are generally characterized by knowledge base attribute features such as products (e.g., cell phones), brands (e.g., Huaye), modifiers (e.g., red), and the like, adding these features may learn what knowledge base attributes users are more likely to retain or discard.

The part of speech characteristics are as follows: the part-of-speech characteristics and the proportion information of the dropped participles (term) are enumerated.

The part-of-speech characteristics are similar to the attributes of the knowledge base, and the part-of-speech is also differentiated in business search, for example, nouns, names of people and proper names are generally important, and the linguistic words, auxiliary words and the like can be ignored.

Statistical characteristics: category prediction similarity; gender intent similarity and other important attribute similarities.

The degree of matching of the important attributes of the search terms is an important index for measuring the similarity of the search terms. Other important attributes such as category prediction, gender intent, brand, etc.

2. For features that override the search term itself:

historical statistical characteristics: (ii) a category entropy of the recall result set; entropy of category of the historical click commodity; click rate ctr and deal.

These historical statistical features may reflect the quality of rewriting the search term sub _ query by rewriting its historical statistical information. Categories: the commodity classification names, each commodity is hung under a unique category. Entropy of the categories of the result set: the category and proportion information of the commodity result set searched and displayed in 30 days of history of a search term are obtained, the entropy is calculated, and the intention definition degree of the query result set can be measured. Category entropy of click goods: the category and proportion information of the commodity set clicked by a search term in 30 days in history are obtained, and the entropy is calculated, so that the definition degree of the click intention of the user under the search term can be reflected. The click through rate ctr and deal represent the ability of this rewritten search term to guide clicks and deals.

Participle (term) combination features: 1) first obtain related information at the participle (term) level: category entropy, degree of freedom, word segmentation weight, and the like; 2) combining related information of participles (term) in the following way: mutual information, sum, standard deviation, arithmetic mean, etc.

Class entropy of participles (term): including result entropy and click entropy, and class entropy are computed in the same way, except that the granularity of the statistics goes to the level of word segmentation (term). The term weight is used to mark the degree of importance of a term (term) in the search term. The information of the participle (term) level is obtained, and the information of the participle (term) level can be mapped to the level through some operations, and the modes mainly comprise mutual information, sum, standard deviation, arithmetic mean and the like.

Text characteristics: number of participles (term); a text length; the length and scale of the alphanumeric string, etc.

These text features are used to represent text features that overwrite the search term and to learn the state of text that the user prefers to overwrite.

Knowledge base attribute characteristics: the number and proportion of the various knowledge base attribute types contained. This feature is used to learn the distribution of knowledge base attributes that users tend to overwrite search terms.

The part of speech characteristics are as follows: the number and proportion of the various part of speech types contained. The method is used for learning the distribution of parts of speech of the rewritten search term which is prone to be rewritten by a user.

In step S247, the search behavior information of the historical search term B is extracted from the search behavior information of each historical search term in the historical search term set.

And step S249, generating a training sample according to the feature set and the search behavior information of the historical search term B. The search behavior information in this step may be a target of model training, where the target is whether there is click behavior in the search result for the historical search term B.

It should be noted here that the training sample may be generated by combining the feature set and the search behavior information of the historical search term B, where the combination manner is: the training sample is composed of features and targets, and if the historical search term B is clicked, the target is considered to be 1 (positive sample), otherwise the target is 0 (negative sample). For example, after the user searches for "Renhua Mobile phone", the user performs rewriting twice: "red cell-phone": no click; 'Hua is a mobile phone': a click occurs. We consider (cell phone, red cell phone) as negative sample and (cell phone, red cell phone) as positive sample.

It can be seen that, in the above steps S247 to S249, a training sample generation process is implemented, wherein, in order to describe a real scene more precisely, the above features may be extracted and combined after segmenting the number of terms (term) of the search term, the text length, and the like, so that the model may be more accurate.

It should be further noted here that the training samples of the present application may also be samples in the following manner: (1) and (4) similar search term pairs and similarity degrees obtained by calculating a search term similarity algorithm such as Simrank and the like are trained by taking the similarity as a target. In this way, the similarity of search terms can be learned from the clicking behavior of the user, but the trained query-sub _ query has no strong association in the time sequence, which is not beneficial to simulating the realistic rewriting scene. (2) After the rewriting system runs for a period of time, the user can take the rewriting system to perform actual behavior feedback after rewriting to construct a sample and update the model. The mode can directly learn the feedback of the user to the current rewriting system, and is beneficial to the self-adaptive updating of the system.

And step S251, performing model training processing on the training sample by adopting a logistic regression LR model to generate a prediction model.

The lr (logi stic Regres) model selected in step S251 has good description capability, and the time complexity and the space complexity of model training and prediction are low. The off-line training generates a weight value corresponding to each feature id, and after the features which can be calculated off-line are calculated in advance, the calculated features are stored; and then, directly looking up the table for the offline calculated features in the online prediction process, so that the weight values corresponding to the feature ids can be directly obtained by looking up the table, and then, obtaining the predicted values through simple calculation. Other machine learning models, such as decision trees, SVMs, etc., may also be selected for use in the present invention.

On the basis of the prediction model generated by the offline model training, the user uses the prediction model obtained by the offline training on line, so that the real-time prediction function is realized, and the real-time prediction is performed on the result of the automatic systematic rewriting of the currently input current search term. The function of real-time prediction on the line is described in detail below.

In the above embodiment of the present application, as can be seen from fig. 3, after the current search term input by the user is received in step S20, the current search term input by the user may be acquired to perform the necessary determination of whether there is rewriting. This can be achieved by implementing the following embodiments:

step S301, historical behavior information corresponding to the current search term is obtained by inquiring from the historical search log.

Step S303, when the historical behavior information corresponding to the current search term meets a recommendation condition, performing a step of obtaining at least one candidate search term corresponding to the current search term, where the recommendation condition includes any one or more of the following conditions: and the click behavior frequency in the historical behavior information corresponding to the current search term is less than the preset click frequency.

The recommended condition here may also be the following condition: the number of commodities searched by the current search term is less than a preset number; the relevance of the goods retrieved by the current search term is less than a predetermined value.

In the above optional embodiment of the present application, the scheme that the step S22 selects a search term used by the user from the history search log, and obtains at least one candidate search term corresponding to the current search term is implemented, so that the candidate search terms corresponding to the current search term are preliminarily screened, and candidate terms of the rewritten search term sub _ query are obtained. The method can comprise the following implementation steps:

step S221, extracting the search terms containing the current search term from the historical search log to obtain an initial search term set.

Step S223, generating at least one candidate search term according to the at least one initial candidate search term, to obtain a candidate search term set.

Preferably, the step S223 of generating at least one candidate search term according to at least one initial candidate search term may include the following two schemes:

the first scheme is as follows: at least one initial candidate search term may be saved directly to form at least one candidate search term.

The method provided by the first scheme is suitable for application scenes of searching behaviors aiming at relatively large electronic commerce websites. In this application scenario, the historical behavior of the user is very rich, and therefore, the historical search term that the user has historically searched can be selected from the search log as the rewritten sub _ candidate, where the participle (term) set of the historical search term is included in the currently input current search term. Therefore, various candidate behavior information can be conveniently acquired off line: and recalling commodity quality, click rate ct r, conversion rate and the like for model training.

In order to save the performance consumption of the first scheme, the candidate search term generated by the first scheme can be used as a preliminary screening result, and further screening characteristics can be performed, so that the second scheme can be seen in detail.

Scheme II:

firstly, obtaining a candidate feature set corresponding to each initial candidate item, wherein the screening of the feature set comprises: identifying central words, weighting values of participles and richness of attributes of the knowledge base.

It should be noted here that the above-mentioned headword identification means that core word mining needs to be performed on the initial candidate item initially selected, and whether the most core word in the current search term is deleted is detected, if the headword is lost, the initial candidate item does not meet the requirement, and a possible escape may occur; the weighted value of the participle (term) means that analysis can be carried out according to the attribute, the part of speech, the degree of freedom and the category entropy of a knowledge base; richness of knowledge base attributes: the number of the types of the contained knowledge base attributes. The main function of the step is to ensure that the rewritten search terms with good quality can pass the sea election as far as possible, and to save the calculation workload of the subsequent correlation verification as far as possible on the premise of ensuring the accuracy.

And then, performing linear weighted calculation on a plurality of screening features specified in the candidate feature set of each initial candidate search term to obtain the ranking value of each initial candidate search term.

And finally, sorting according to the sorting value of each initial candidate search term, and selecting a predetermined number of initial candidate search terms to obtain a candidate search term set.

The two steps realize the linear weighted combination of the screening characteristics, calculate the rule score of each initial candidate search term query, then use the rule score to sort the candidate search terms, and select the optimal set with the specified number to enter the model prediction stage for executing the recommended candidate search term. Therefore, the method realizes the sorting according to the model score and outputs the final rewriting result corresponding to the currently input current search term.

It should be noted here that the offline model training realizes that the rewriting process of the search term query is described by using a model, the model is trained based on the historical behavior of the user, and online prediction can be achieved and adaptive updating can be achieved. In the model target, sample selection and feature extraction, the historical behavior feedback information and the statistical information of the user are considered as much as possible, so that the model can describe and predict the real scene more accurately.

It should also be noted here that, regarding the acquisition of candidate search terms, for some very long-tailed queries, when no suitable candidate search term can be found in the historical search term set, other algorithms can be used to calculate rewritten candidates, so that these candidates may have better performance and better recall rate in terms of probability, and may continue to be predicted by the model subsequently, but some adjustment is made in terms of characteristics. In addition, the preliminary screening of the candidate set can adopt a more detailed mode, and the candidate set entering the model training is optimal as much as possible

In the embodiment of the application, the data statistics and the model training of the search results are performed in a Hadoop cluster, and the program can be realized by using Java and C + + languages. The online portion may be implemented using C + + language. The overall flow chart is as follows:

it should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to the embodiment of the invention, the device for implementing the method embodiment is also provided. Fig. 4 is a schematic diagram of a processing device for a search request based on a search engine according to a second embodiment of the present invention. The device provided by the above embodiment of the application can be operated on the network game client.

As shown in fig. 4, the apparatus may include: a receiving module 40, an obtaining module 42, a model building module 44, a verification module 46 and a generating module 48.

The receiving module 40 is configured to receive a current search term input by a user; an obtaining module 42, configured to select a search term used by a user from a historical search log, and obtain at least one candidate search term corresponding to a current search term; the model establishing module 44 is configured to perform offline training by using the historical search term set and the search behavior information in the historical search log as training samples, and establish a prediction model of user behavior; a checking module 46, configured to perform correlation checking between the candidate search term and the user behavior on the candidate search term corresponding to the current search term by using the prediction model; and a generating module 48, configured to use the candidate search term meeting the set condition as a recommended search term corresponding to the current search term according to the relevance verification, and generate a recommended search term set corresponding to the current search term.

The embodiment of the application provides a prediction model for rewriting a current search term input by a user based on historical search terms recorded in a historical search log and historical behaviors of each historical search term, and by learning a mode of actively rewriting the search term by the user in the historical search log session and extracting a training sample comprising effective characteristics for modeling, the search term input by the current user can be processed by using the prediction model obtained by modeling, so that a recommended search term set is determined for the search term input at present. Because the historical search log also provides the search behavior information of the historical search term, namely the feedback information of the search behavior is fused in the prediction model, better decision can be made on rewriting of the current search term, so that the query speed and accuracy of the user are improved, and the original intention of the user can be met to the greatest extent. And further the technical problem of inaccurate search results is solved.

The receiving module 40, the obtaining module 42, the model building module 44, the verifying module 46 and the generating module 48 provided in the above embodiments of the present application may be executed on a computer client, and in the implementation process, the computer client in the above embodiments may be a browser or a search client computer terminal installed for searching. Analysis shows that the application range of the search term omission technology used in the method is not limited to the search term omission direction, and the rewriting mode of the search term can be expanded to one or a mixed mode of adding, deleting and replacing the participles (term) in the search term, so as to improve the quality of the current search term and improve the efficiency and experience of user search.

It should be noted that the receiving module 40, the obtaining module 42, the model building module 44, the verifying module 46 and the generating module 48 correspond to steps S20 to S28 in the first embodiment, and the five modules are the same as the corresponding steps in the example and application scenarios, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as a part of the apparatus may be run in the computer terminal 10 provided in the first embodiment, and may be implemented by software or hardware.

Preferably, as shown in fig. 5, the model building module 44 may include: a reading module 441, a search term extraction module 443, an extracted features module 445, a search behavior extraction module 447, a group sum module 449, and a training module 451.

The reading module 441 is configured to read the historical search log, and obtain the historical search term set in a predetermined time period and the search behavior information of each historical search term in the historical search term set; a search term extraction module 443 configured to extract at least one matching pair of historical search terms < historical search term a, historical search term B >, from the historical set of search terms, wherein a set of participles (term) included in the historical search term B is a proper subset of a set of participles (term) included in the historical search term a; the characteristic extraction module 445 is configured to extract characteristics of the historical search term matching pair < the historical search term a and the historical search term B > and characteristics of the historical search term B in the historical search term matching pair, and combine the characteristics to generate a characteristic set; a search behavior extraction module 447, configured to extract search behavior information of the historical search term B from the search behavior information of each historical search term in the historical search term set; a group sum module 449, configured to generate training samples according to the feature set and the search behavior information of the historical search term B; the training module 451 is configured to perform model training processing on the training samples by using a logistic regression LR model to generate a prediction model.

It should be noted here that, the reading module 441, the search term extraction module 443, the feature extraction module 445, the search behavior extraction module 447, the group and module 449, and the training module 451, which are the same as the example and the application scenario realized by the corresponding steps S247 to S249 in the first embodiment, are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as a part of the apparatus may be run in the computer terminal 10 provided in the first embodiment, and may be implemented by software or hardware.

Preferably, the above-mentioned historical search term matching pair < historical search term a, and the feature of the historical search term B > includes any one or more of the following features: behavior characteristics, text characteristics, knowledge base attribute characteristics and statistical characteristics; the characteristics of the historical search term B include any one or more of the following: historical statistical features, term-segmented (term) combination features, text features, knowledge base attribute features, and part-of-speech features.

Preferably, as shown in fig. 6, after performing the function of the receiving module 40, the above-mentioned apparatus of the present application may further perform the following functional modules: a query module 411 and a run module 413.

The query module 411 is configured to query the historical search log to obtain historical behavior information corresponding to the current search term; an operation module 413, configured to execute the step of obtaining at least one candidate search term corresponding to the current search term when the historical behavior information corresponding to the current search term meets a recommendation condition, where the recommendation condition includes any one or more of the following conditions: the click behavior frequency in the historical behavior information corresponding to the current search term is smaller than the preset click frequency; the number of commodities searched by the current search term is less than a preset number; the relevance of the goods retrieved by the current search term is less than a predetermined value.

It should be noted here that, the query module 411 and the execution module 413 are the same as the example and the application scenario realized in the corresponding step S301 to step S303, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above as a part of the apparatus may be run in the computer terminal 10 provided in the first embodiment, and may be implemented by software or hardware.

As shown in fig. 7, the obtaining module 42 may include: an extraction module 421 and a get module 423.

The extracting module 421 is configured to extract a search term including a current search term from a historical search log to obtain an initial search term set; a deriving module 423, configured to generate at least one candidate search term according to the at least one initial candidate search term, so as to obtain a candidate search term set.

It should be noted here that, the above extraction module 421 and the obtaining module 423, both modules are the same as the example and the application scenario realized in the corresponding step S221 to step S223, but are not limited to the disclosure of the above embodiment. It should be noted that the modules described above as a part of the apparatus may be run in the computer terminal 10 provided in the first embodiment, and may be implemented by software or hardware.

Preferably, the obtaining module 423 may include: a storage module or a screening processing module.

The saving module is configured to directly save at least one initial candidate search term to form at least one candidate search term.

A screening processing module, configured to perform feature screening processing on each preliminary candidate item, where the screening processing module may include: a sub-obtaining module, configured to obtain a candidate feature set corresponding to each preliminary candidate item, where the screening of the feature set includes: the method comprises the following steps of (1) identifying a central word, weighting values of word segmentation and attribute richness of a knowledge base; the computing module is used for respectively carrying out linear weighted computation on a plurality of screening features specified in the candidate feature set of each initial candidate search term to obtain the ranking value of each initial candidate search term; and the selection module is used for sorting according to the sorting value of each initial candidate search term and selecting a predetermined number of initial candidate search terms to obtain a candidate search term set.

It should be noted here that the sub-modules included in the obtaining module 423 are the same as the step examples and application scenarios implemented by the first scheme and the second scheme in the first embodiment, but are not limited to the contents disclosed in the first embodiment. It should be noted that the modules described above as a part of the apparatus may be run in the computer terminal 10 provided in the first embodiment, and may be implemented by software or hardware.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, a division of a unit is merely a division of a logic function, and an actual implementation may have another division, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for processing a search request based on a search engine is characterized by comprising the following steps:

receiving a current search term input by a user;

selecting a search term used by a user from a historical search log, and acquiring at least one candidate search term corresponding to the current search term;

taking a historical search term set and search behavior information in the historical search log as training samples to perform offline training, and establishing a prediction model of user behavior;

using the prediction model to carry out correlation check of the candidate search term and the user behavior on the candidate search term corresponding to the current search term;

according to the relevance verification, taking candidate search terms meeting set conditions as recommended search terms corresponding to the current search terms, and generating a recommended search term set corresponding to the current search terms;

performing relevance check of the candidate search term corresponding to the current search term on the candidate search term corresponding to the current search term by using the prediction model comprises: and matching the candidate search terms with the feedback information of the search behavior determined by the prediction model, and extracting the candidate search terms with the matching degree larger than or equal to a threshold value as recommended search terms.

2. The method of claim 1, wherein the step of establishing a predictive model of user behavior by performing offline training using the historical search term set and search behavior information in the historical search logs as training samples comprises:

reading the historical search logs, and acquiring the historical search item set in a preset time period and the search behavior information of each historical search item in the historical search item set;

extracting at least one historical search term matching pair < historical search term A, historical search term B > from the historical search term set, wherein the segmented word set contained in the historical search term B is a proper subset of the segmented word set contained in the historical search term A;

extracting the characteristics of the historical search term matching pair < historical search term A, historical search term B > and the characteristics of the historical search term B in the historical search term matching pair, and combining to generate a characteristic set;

extracting the search behavior information of the historical search term B from the search behavior information of each historical search term in the historical search term set;

generating the training sample according to the feature set and the search behavior information of the historical search item B;

and performing model training processing on the training sample by adopting a logistic regression LR model to generate the prediction model.

3. The method of claim 2, wherein the characteristics of the historical search term matching pair < historical search term a, historical search term B > comprise any one or more of the following characteristics: behavior characteristics, text characteristics, knowledge base attribute characteristics and statistical characteristics; the characteristics of the historical search term B comprise any one or more of the following characteristics: historical statistical characteristics, word segmentation combination characteristics, text characteristics, knowledge base attribute characteristics and part-of-speech characteristics.

4. The method of claim 1, wherein after receiving a user-entered current search term, the method further comprises:

querying historical behavior information corresponding to the current search term from a historical search log;

executing a step of obtaining at least one candidate search term corresponding to the current search term under the condition that the historical behavior information corresponding to the current search term meets recommendation conditions, wherein the recommendation conditions include any one or more of the following conditions: the click behavior frequency in the historical behavior information corresponding to the current search term is smaller than a preset click frequency; the number of commodities searched by the current search term is less than a preset number; the relevance of the goods retrieved by the current search term is less than a predetermined value.

5. The method of claim 1, wherein selecting a search term used by a user from a historical search log, and wherein obtaining at least one candidate search term corresponding to the current search term comprises:

extracting search terms contained in the current search term from the historical search logs to obtain an initial search term set;

the at least one candidate search term is generated from at least one initial candidate search term, resulting in a set of candidate search terms.

6. The method of claim 5, wherein generating the at least one candidate search term from the at least one initial candidate search term to obtain a set of candidate search terms comprises:

directly saving the at least one initial candidate search term to form the at least one candidate search term; or,

performing feature screening processing on each of the preliminary candidate items, wherein the step of performing feature screening processing on each of the preliminary candidate items includes: acquiring a corresponding candidate feature set, wherein screening the feature set comprises: the method comprises the following steps of (1) identifying a central word, weighting values of word segmentation and attribute richness of a knowledge base; respectively carrying out linear weighted calculation on a plurality of screening features specified in the candidate feature set of each initial candidate search term to obtain the ranking value of each initial candidate search term; and sorting according to the sorting value of each initial candidate search term, and selecting a predetermined number of initial candidate search terms to obtain the candidate search term set.

7. A search engine based processing apparatus for a search request, comprising:

the receiving module is used for receiving a current search term input by a user;

the acquisition module is used for selecting a search term used by a user from a historical search log and acquiring at least one candidate search term corresponding to the current search term;

the model establishing module is used for performing off-line training by taking a historical search term set and search behavior information in the historical search log as training samples to establish a prediction model of user behaviors;

the checking module is used for carrying out correlation checking of the candidate search terms and the user behaviors on the candidate search terms corresponding to the current search terms by using the prediction model;

the generating module is used for taking the candidate search terms meeting the set conditions as the recommended search terms corresponding to the current search term according to the relevance verification and generating a recommended search term set corresponding to the current search term;

the checking module is also used for matching the candidate search terms with the feedback information of the search behavior determined by the prediction model, and extracting the candidate search terms with the matching degree larger than or equal to a threshold value as recommended search terms.

8. The apparatus of claim 7, wherein the model building module comprises:

the reading module is used for reading the historical search logs and acquiring the historical search item set in a preset time period and the search behavior information of each historical search item in the historical search item set;

a search term extraction module, configured to extract at least one historical search term matching pair < historical search term a, historical search term B > from the historical search term set, where a participle set included in the historical search term B is a proper subset of the participle set included in the historical search term a;

the characteristic extracting module is used for extracting the characteristics of the historical search term matching pair < the historical search term A and the historical search term B > and the characteristics of the historical search term B in the historical search term matching pair to generate a characteristic set in a combined mode;

the search behavior extraction module is used for extracting the search behavior information of the historical search term B from the search behavior information of each historical search term in the historical search term set;

the group sum module is used for generating the training sample according to the feature set and the search behavior information of the historical search term B;

and the training module is used for carrying out model training processing on the training samples by adopting a Logistic Regression (LR) model to generate the prediction model.

9. The apparatus of claim 8, wherein the characteristics of the historical search term matching pair < historical search term a, historical search term B > comprise any one or more of the following characteristics: behavior characteristics, text characteristics, knowledge base attribute characteristics and statistical characteristics; the characteristics of the historical search term B comprise any one or more of the following characteristics: historical statistical characteristics, word segmentation combination characteristics, text characteristics, knowledge base attribute characteristics and part-of-speech characteristics.

10. The apparatus of claim 7, further comprising:

the query module is used for querying historical behavior information corresponding to the current search term from a historical search log;

an operation module, configured to execute a step of obtaining at least one candidate search term corresponding to the current search term when historical behavior information corresponding to the current search term meets a recommendation condition, where the recommendation condition includes any one or more of the following conditions: the click behavior frequency in the historical behavior information corresponding to the current search term is smaller than a preset click frequency; the number of commodities searched by the current search term is less than a preset number; the relevance of the goods retrieved by the current search term is less than a predetermined value.

11. The apparatus of claim 7, wherein the obtaining module comprises:

the extraction module is used for extracting the search terms containing the current search term from the historical search log to obtain an initial search term set;

an obtaining module configured to generate the at least one candidate search term according to at least one initial candidate search term, so as to obtain a candidate search term set.

12. The apparatus of claim 11, wherein the means for obtaining comprises:

a saving module for directly saving the at least one initial candidate search term to form the at least one candidate search term; or,

a screening processing module, configured to perform feature screening processing on each of the preliminary candidate items, where the screening processing module includes: a sub-obtaining module, configured to obtain a candidate feature set corresponding to each of the preliminary candidate items, where screening the feature set includes: the method comprises the following steps of (1) identifying a central word, weighting values of word segmentation and attribute richness of a knowledge base; a calculation module, configured to perform linear weighting calculation on a plurality of screening features specified in the candidate feature set of each initial candidate search term, respectively, to obtain a ranking value of each initial candidate search term; and the selection module is used for sorting according to the sorting value of each initial candidate search term and selecting a predetermined number of initial candidate search terms to obtain the candidate search term set.