CN112699277A - Data detection method and device - Google Patents

Data detection method and device Download PDF

Info

Publication number
CN112699277A
CN112699277A CN201911013243.8A CN201911013243A CN112699277A CN 112699277 A CN112699277 A CN 112699277A CN 201911013243 A CN201911013243 A CN 201911013243A CN 112699277 A CN112699277 A CN 112699277A
Authority
CN
China
Prior art keywords
information
target object
search behavior
query
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911013243.8A
Other languages
Chinese (zh)
Inventor
贺国秀
康杨杨
蒋卓人
孙常龙
张琼
司罗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201911013243.8A priority Critical patent/CN112699277A/en
Publication of CN112699277A publication Critical patent/CN112699277A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0202Market predictions or forecasting for commercial activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements
    • G06Q30/0251Targeted advertisements
    • G06Q30/0255Targeted advertisements based on user history
    • G06Q30/0256User search

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data detection method and a data detection device. Wherein, the method comprises the following steps: acquiring search behavior information, wherein the search behavior information is used for indicating a query record input by a target object; generating a detection model according to the search behavior information and the information of the object to be selected; and processing the target object information in the preset period and the user searching behavior information corresponding to the target object information according to the detection model, and predicting to obtain a detection result. The invention solves the technical problem of low detection efficiency caused by the defects of the technology in the process of carrying out data detection on the hidden commodity in the prior art.

Description

Data detection method and device
Technical Field
The invention relates to the technical field of internet, in particular to a data detection method and device.
Background
Due to the platform attribute of the retail platform, merchants can release some hidden commodities which have illegal characteristics. The hidden commodity is characterized in that synonyms or even completely irrelevant words to the commodity are commonly used as the text introduction content of the commodity, so that the prevention and control mechanism of a platform is avoided. The commodities can be found out from a large amount of similar commodities by experienced buyers through self searching technology, so that the commodities are spread to a certain extent. Since these transmissions can have a large impact on the shopping experience of the platform and the user, the platform urgently needs a more intelligent and efficient method to find these concealed goods as soon as possible.
The existing detection method comprises the following steps: the method based on keyword interception and manual judgment comprises the following steps: and summarizing and collecting the commodity sensitive dictionary by the service expert according to the characteristics of the concealed commodities and the confirmed concealed commodity information. And carrying out manual verification on the basis of sensitive keyword interception, and adjusting and adding keywords in the sensitive dictionary according to business change. Although the method can quickly intercept most concealed commodities, the method is not flexible enough, is relatively lagged, is difficult to deal with new words or new descriptions created by sellers, and is easy to cause misjudgment.
In addition, the existing detection method further comprises: a method of machine learning based on manual features, the method comprising: and training a machine learning classification model by using the text information of the commodity as features to understand the semantics contained in the commodity text. However, existing machine learning-based models have difficulty capturing valid semantics because the textual content of covert merchandise is relatively obscure or similar to normal merchandise. Also, since these models require a large amount of training corpora, and the data of the information of the commodity itself, such as the title or description of the commodity, is difficult to capture the change quickly, the trained model is always an "old" model relative to the latest scene, so that the online usage cannot meet the business requirements very well.
Based on the above, the existing detection method further includes: a method based on deep learning, the method further comprising: similar to the flow of the manual feature machine learning method, the difference is that the text features of the input are automatically learned with deep learning.
Aiming at the problem that in the process of carrying out data detection on hidden commodities in the prior art, the detection efficiency is low due to the defects of the technology, an effective solution is not provided at present.
Disclosure of Invention
The embodiment of the invention provides a data detection method and device, which at least solve the technical problem of low detection efficiency caused by the defects of the technology in the process of carrying out data detection on hidden commodities in the prior art.
According to an aspect of an embodiment of the present invention, there is provided a data detection method, including: acquiring search behavior information, wherein the search behavior information is used for indicating a query record input by a target object; generating a detection model according to the search behavior information and the information of the object to be selected; and processing the target object information in the preset period and the user searching behavior information corresponding to the target object information according to the detection model, and predicting to obtain a detection result.
Optionally, the obtaining of the search behavior information includes: acquiring the record information of each user determined target object; extracting query records of users in the record information; acquiring a query sequence in a query record and a sequence of objects to be selected in the query sequence; and obtaining search behavior information according to the query sequence and the object sequence to be selected.
Optionally, generating the detection model according to the search behavior information and the candidate object information includes: performing vector calculation according to query data in the search behavior information to obtain a query vector matrix; performing vector calculation according to the data of the objects to be selected in the information of the objects to be selected to obtain a vector matrix of the objects to be selected; and generating a detection model according to the query vector matrix and the vector matrix of the object to be selected.
Further, optionally, the method further includes: obtaining semantic information and intention information according to the query vector matrix and the vector matrix of the object to be selected; obtaining a vector according to the semantic information and the intention information; and splicing according to the vectors of the semantic information and the intention information to obtain the label.
Optionally, the method further includes: and acquiring the splicing of the last time point of the forward direction and the backward direction of the intention information as a potential intention.
Optionally, the method further includes: acquiring the last potential semantic state in the semantic information; calculating the similarity with the rest potential semantic states according to the last potential semantic state; and pooling all semantic states according to the similarity.
Optionally, the processing, according to the detection model, the target object information in the preset period and the search behavior information of the user corresponding to the target object information, and predicting to obtain the detection result includes: acquiring target object information in a preset period; extracting search behavior information of a user corresponding to the target object information; detecting the target information and the search behavior information according to the detection model, and acquiring the number of target objects which do not meet preset detection conditions in the target object information; and taking the number of the target objects as a detection result.
Optionally, the method further includes: and recommending the commodities according to the popular phrases in the specific time period.
According to another aspect of the embodiments of the present invention, there is also provided a data detection method, including: acquiring all search behavior information on an online trading platform, wherein the search behavior information is used for indicating a query record input by a target object; generating a detection model according to the search behavior information and the information of the object to be selected; and processing the target object information and the search behavior information corresponding to the target object information according to the detection model, and predicting to obtain a detection result.
According to an aspect of another embodiment of the present invention, there is also provided an apparatus for data detection, including: the acquisition module is used for acquiring search behavior information, wherein the search behavior information is used for indicating the query record input by the target object; the model generation module is used for generating a detection model according to the search behavior information and the information of the object to be selected; and the detection module is used for processing the target object information in the preset period and the search behavior information of the user corresponding to the target object information according to the detection model and predicting to obtain a detection result.
According to another aspect of another embodiment of the present invention, there is also provided an apparatus for data detection, including: the crawling module is used for acquiring all searching behavior information on the online trading platform, wherein the searching behavior information is used for indicating the query records input by the target object; the model generation module is used for generating a detection model according to the search behavior information and the information of the object to be selected; and the detection module is used for processing the target object information and the search behavior information corresponding to the target object information according to the detection model and predicting to obtain a detection result.
According to an aspect of still another embodiment of the present invention, there is further provided a storage medium including a stored program, where the apparatus on which the storage medium is located is controlled to execute the above-mentioned data detection method when the program runs.
According to an aspect of still another embodiment of the present invention, there is further provided an apparatus for data detection, including a storage medium and a processor, where the processor is configured to execute a program stored in the storage medium, where the program executes the method for data detection.
In the embodiment of the invention, the search behavior information is acquired by introducing the search behavior information of the user when searching the concealed commodity, wherein the search behavior information is used for indicating the query record input by the target object; generating a detection model according to the search behavior information and the information of the object to be selected; the target object information in the preset period and the user search behavior information corresponding to the target object information are processed according to the detection model, the detection result is obtained through prediction, the purpose that detection omission caused by the fact that a seller intentionally avoids a data detection rule of a transaction platform is achieved, the technical effect of detecting the concealed commodities is improved, and the technical problem that in the process of carrying out data detection on the concealed commodities in the prior art, the detection efficiency is low due to the defects of the technology is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a block diagram of a hardware structure of a computer terminal (or mobile device) according to a method of data detection in embodiment 1 of the present invention;
FIG. 2 is a flow chart of a method of data detection according to embodiment 1 of the present invention;
FIG. 3 is a schematic diagram of the detection of concealed merchandise based on the original recurrent neural network and the tree pruning mechanism according to embodiment 1 of the present invention;
FIG. 4 is a flow chart of a method of data detection according to embodiment 2 of the present invention;
FIG. 5 is a schematic diagram of an apparatus for data detection according to embodiment 3 of the present invention;
FIG. 6 is a schematic diagram of an apparatus for data detection according to embodiment 4 of the present invention;
fig. 7 is a block diagram of a computer terminal according to embodiment 5 of the present application.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical terms related to the embodiments of the present application are as follows:
the user search behavior is as follows: the user makes efforts, such as submitting a query, clicking to view the item, etc., before the act of purchasing the item is taken.
Adopt the strawberry (berry) model: an information searching behavior model is used for representing that a user can continuously iterate self query according to results returned by a search engine in order to find results meeting self intention.
Recurrent Neural Network (RNN): an artificial neural network having a tree hierarchy and network nodes recursion input information in their order of connection.
A tree pruning mechanism: in order to simplify the decision tree model and avoid overfitting, some subtrees or leaf nodes in the decision tree model are subtracted, and the root nodes of the subtrees or leaf nodes are used as new leaf nodes, so that the model is simplified.
Example 1
In accordance with an embodiment of the present invention, there is provided a method embodiment of data detection, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking the example of being operated on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of a data detection method according to an embodiment of the present invention. As shown in fig. 1, the computer terminal 10 may include one or more (only one shown) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.
The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the data detection method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implements the data detection method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmission module 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission module 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission module 106 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.
Under the operating environment, the application provides a method for data detection as shown in fig. 2. Fig. 2 is a flowchart of a data detection method according to embodiment 1 of the present invention, as shown in fig. 2, the method including the steps of: a
Step S202, search behavior information is obtained, wherein the search behavior information is used for indicating the query record input by the target object.
In the above step S202, the search behavior information may represent a user search behavior, that is, an effort made by the user before generating a behavior of purchasing the target object, such as submitting a query, clicking to browse the object to be selected, and the like; the target object may be an item that the user finally purchases.
For example, e-commerce platform a prohibits the buying and selling of various drugs. However, the seller on the e-commerce platform A replaces the medicine with other vocabularies to avoid the data detection rule of the e-commerce platform A and obtain the payment. In order to buy a cheap cold medicine on the E-commerce platform A, a user can search for any commodity if the cold medicine is searched directly, and therefore search behavior information can be converted into cold, headache and the like.
It is easy to note that not every user search behavior can purchase the target object, so the user search behavior includes not only the record of the query that successfully inquires the target object and purchases, but also the record of the query that does not inquire the target object, and even the record of the query that only submits the query but does not click any goods.
And step S204, generating a detection model according to the search behavior information and the information of the object to be selected.
In the above step S204, the detection model may be a machine learning model, such as a recurrent neural network.
The recurrent neural network is an artificial neural network with a tree-like hierarchical structure and network nodes recur input information according to the connection sequence, and the inputs of the artificial neural network may be connected, so that x is input for multiple times1,x2,x3… at a timeThe intermediate information is stored and transmitted to the next input intermediate information, the calculation result output each time is not necessarily the target result, and can not be used, and only the final output is the needed prediction result.
Since there is often a correlation between the previous search behavior information and the next search behavior information of the user, the recurrent neural network is particularly suitable for natural language processing.
Specifically, after acquiring the search behavior information of the user for the target object, the platform may model the target object in a form of a Berrykeying search information tree. And for each purchase record, extracting the corresponding user search behavior, taking the user as a root, taking all query sequences submitted by the user as branches, and taking the commodity sequence clicked correspondingly by each query as a corresponding leaf. After the Berrykeying search information tree is constructed, the tree is encoded by using a recurrent neural network.
The behavior of the user before purchase is modeled by excavating Berrykeying search information trees in each purchase record, and hidden goods are detected in an auxiliary mode.
It should be noted that the search behavior information may be from the query records of the concealed merchandise that have been successfully detected by the platform, or may be randomly extracted from a plurality of query records of the normal merchandise, where the former is used as a positive sample and the latter is used as a negative sample to train the model, so that the model to be trained is more accurate.
And step S206, processing the target object information in the preset period and the search behavior information of the user corresponding to the target object information according to the detection model, and predicting to obtain a detection result.
In the above step S206, the preset period may be set according to the commodity attribute of the target object, for example, update time, transaction amount, harm crowd, and the like, for example, a week or a month.
Although the seller can circumvent the prevention and control mechanism of the platform by changing the description information of the goods, the seller cannot control the search behavior of the buyer. The first query of the buyer is usually a direct description of the concealed goods, and in case of non-ideal search results, the buyer will continuously modify its query to find its target goods. Therefore, a great deal of information is contained in the query sequence and the clicked commodity sequence of the buyer for mining.
Specifically, with reference to steps S202 to S206, for example, the e-commerce platform a still prohibits buying and selling various medicines, as described above, if the user directly searches for the cold medicine and inevitably cannot inquire any commodity, therefore, the search behavior information may be converted into a coolness, but the commodity to be selected corresponding to the coolness is a sweat towel, an air-conditioning wind guard, a summer quilt, etc., at this time, the user does not perform any click operation, and continues to input the next query word, i.e., headache, doze, etc., each query word corresponds to a plurality of commodities to be selected. Finally, the cold medicine is found in the commodity information corresponding to the drowsiness. The background generates a detection model according to the search behavior information and the information of the object to be selected, predicts the similar cold medicines and the search behavior information of the user corresponding to the similar cold medicines within a period of time to obtain the detection results of the commodities of the similar cold medicines, and finally judges whether the commodities are cold medicines or not manually.
In the embodiment of the invention, the search behavior information is acquired by introducing the search behavior information of the user when searching the concealed commodity, wherein the search behavior information is used for indicating the query record input by the target object; generating a detection model according to the search behavior information and the information of the object to be selected; the target object information in the preset period and the user search behavior information corresponding to the target object information are processed according to the detection model, the detection result is obtained through prediction, the purpose that detection omission caused by the fact that a seller intentionally avoids a data detection rule of a transaction platform is achieved, the technical effect of detecting the concealed commodities is improved, and the technical problem that in the process of carrying out data detection on the concealed commodities in the prior art, the detection efficiency is low due to the defects of the technology is solved.
Optionally, the step S202 of obtaining the search behavior information includes:
in step S2021, record information of each user determination target object is acquired.
In the above step S2021, the recorded information may be information of at least one target object purchased by the user.
Step S2022, extracts the query record of the user in the record information.
In the above step S2022, the query record may be a series of operations before the user queries the target object, such as inputting a query term, clicking one of the commodities corresponding to the query term, and the like.
Step S2023, obtain the query sequence in the query record and the sequence of the object to be selected in the query sequence.
In the above step S2023, the query sequence may be a sequence formed by all query terms before the user queries the target object; the sequence of the objects to be selected may be a sequence of the objects to be selected corresponding to each query term.
Step S2024, obtaining searching behavior information according to the query sequence and the object sequence to be selected.
And for each purchase record, extracting corresponding user search behaviors, taking the user as a root, taking all query sequences submitted by the user as branches, and taking commodity sequences clicked correspondingly by each query as corresponding leaves. The user search behavior information for each purchase record is organized in the shape of a tree.
Optionally, the step S204 of generating a detection model according to the search behavior information and the candidate object information includes:
step S20401, vector calculation is carried out according to the query data in the search behavior information, and a query vector matrix is obtained.
Step S20402, vector calculation is carried out according to the data of the objects to be selected in the information of the objects to be selected, and a vector matrix of the objects to be selected is obtained.
Step S20403, a detection model is generated according to the query vector matrix and the candidate vector matrix.
In the above-mentioned steps S20401 to S20403, the query data and the object data to be selected in the search behavior information are subjected to word segmentation, all the occurring words are counted to make a dictionary, then each word segmentation is mapped to a corresponding ID, and a corresponding word embedding vector is found by the ID of each word segmentation, so that each query data and object data to be selected can be represented as a matrix formed by word vectors, that is, a query vector matrix and an object vector matrix to be selected.
It should be noted that the word embedding vector in the above method can be obtained by pre-training or direct random initialization. In addition, BERT may be used to enhance the expression of each word in addition to the traditional word vectors.
BERT (bidirectional Encoder retrieval from transforms) in the embodiments of the present application, BERT framework replaces the original tf.embedded _ lookup (conventional word vector); the pre-trained model of BERT is loaded and the fine-tune parameter is set to true and the entire model is retrained. Further, optionally, the method for detecting data provided in the embodiment of the present application further includes:
step S20404, semantic information and intention information are obtained according to the query vector matrix and the candidate vector matrix.
In step S20404, the semantic information may be hidden semantic information of the product, and the intention information may include forward potential user intention information and backward potential user intention information.
It should be noted that the hidden semantic information and the potential user intention information may be encoded and modeled by a recurrent neural network.
Step S20405, a vector is obtained according to the semantic information and the intention information.
In step S20405, the vector may be obtained by an average pooling method, or by an attention pooling method based on query information, where the primary purpose of the average pooling or attention pooling (weighted averaging according to attention) is to compress and extract information. The vectors corresponding to each branch are obtained and these vectors form a matrix. But to obtain the output, further processing of these matrices is required to recompress to a vector.
The importance of all branches in the average pooling is the same; attention pooling, which may be preferred in embodiments of the present application, is achieved by first calculating the attention distribution (i.e., weight distribution) of each branch and then performing a weighted averaging.
The query vector matrix and the candidate object vector matrix can be subjected to average pooling operation, namely, each query vector matrix and the candidate object vector matrix are subjected to vector averaging in the dimension of the word to obtain a new vector, and the new vector represents query information and commodity information respectively. Thus, each query can be represented as a vector, and the corresponding sequence of clicked items can be represented as a matrix. And for each branch, performing average pooling operation on the clicked commodity sequence, and performing attention pooling operation based on query information.
And step S20406, splicing is carried out according to the vectors of the semantic information and the intention information to obtain a label.
Each branch of the Berrypicking search information tree consists of a query vector matrix and an object vector matrix to be selected. For each branch, the query vector and the candidate object vector are collected, and if the query vector and the candidate object vector are directly spliced, or full connection is added after splicing to map, or a door mechanism is used for combining the query vector and the candidate object vector.
In the embodiment of the application, the direct splicing is to simply splice two features together to form a sequence with a larger dimension;
after splicing, adding full joint mapping is equivalent to performing information filtering and compression once on the direct splicing technology;
the fusion by using a door mechanism is a fusion mode of combining different weight combinations for each dimension of two vectors, and the fusion mode is more precise for the fusion of two kinds of information.
So that the sequence of the whole branch can be represented as a matrix. Because the traditional recurrent neural network can only output one hidden state, namely hidden semantic information and potential user intention information of a modeling sequence cannot be simultaneously output, the scheme provides an original recurrent neural network neuron, which is shown as follows:
Figure BDA0002244833570000091
Figure BDA0002244833570000092
Figure BDA0002244833570000093
Figure BDA0002244833570000094
Figure BDA0002244833570000095
Figure BDA0002244833570000096
Figure BDA0002244833570000097
in the formula (1), W represents a parameter matrix, v represents a parameter vector, and b represents a bias vector, and the three are values to be learned by the recurrent neural network model; z is a radical oft、rt、itAll the functions are sigmoid functions, represent three gates and respectively represent a semantic fusion gate, an intention fusion gate and an interactive gate; h is1Representing hidden semantic information, h2Representing potential user intent information, x representing the current input,
Figure BDA0002244833570000101
representing an intermediate vector prepared to enhance the hidden semantic state; in the same way
Figure BDA0002244833570000102
Representing the intermediate vector for preparing the enhancement implicit intent, the tanh (0) function is a non-linear activation function that can compress the input to a number between-1 and 1. tanh (0) is a very standard neural network activation function.
Through the series of operations, the hidden semantic information and the potential user intention information of the previous state are better fused with the input of the current state.
According to the method from step S20404 to step S20406, the sequences of all branches, i.e. the above matrices, can be input into the original recurrent neural network of the present application, so as to obtain two corresponding output matrices. Similarly, using the recurrent neural network in both directions, four corresponding output matrices are obtained.
Optionally, the data detection method provided in the embodiment of the present application further includes:
step S20407, the concatenation of the last time point of the forward direction and the backward direction of the intention information is obtained as a potential intention.
Since the calculation result output by the recurrent neural network at each time is not necessarily the target result, it may not be used, and only the final output result is the required prediction result. Therefore, the scheme takes the concatenation of the last time step in the forward and backward directions of the potential user intention information as the input of the full concatenation.
It is easy to note that the above method can be used for mining the surface semantics of the behaviors in a formalized mode and mining the potential intention information of the user. And splicing vectors obtained by the hidden semantic information and the potential user intention information, and connecting the two fully-connected neural networks to obtain a final label. The whole neural network framework from input to output is an end-to-end neural network framework, cross entropy is taken as an objective function, and then random gradient descent is used for training.
Optionally, the data detection method provided in the embodiment of the present application further includes:
step S20408, the last potential semantic state in the semantic information is obtained.
Step S20409, the similarity to the remaining potential semantic states is calculated according to the last potential semantic state.
Step S20410, performing pooling on all semantic states according to the similarity.
The method comprises the steps of calculating the similarity between the last potential semantic state and the rest potential semantic states by taking the last potential semantic state as a basis, activating by using a logic function such as a sigmoid function, and activating by using softmax, so that the effects of other branches which are completely irrelevant to the last potential semantic state are reduced. This similarity is then used to pool all potential semantic states. Specifically, the formula of the tree pruning mechanism is as follows:
H1l=copy(last(Hl)),
pm=softmax(σ(similar(H1l,H1))),
h1*=pmT·H1, (2)
in the formula (2), H1For the expression of the sequences of the branches and leaves, after taking the last branch and copying, H1lThe matrix of the same length as H1 expressed for the last leaf; in the second formula, H is first calculated1And H1lThe cosine similarity is then activated by a sigmoid function, and then activated by a softmax function, and pm is the finally obtained distribution of the pruning mechanism weight; the final formula is that the expression of all branch and leaf sequences is weighted and summed by using a pruning mechanism to obtain the final expression h of all branches and leaves1*. By the method, the pruning mechanism can optimize the result of the recurrent neural network.
Optionally, in step S206, the target object information in the preset period and the search behavior information of the user corresponding to the target object information are processed according to the detection model, and predicting to obtain the detection result includes:
in step S2061, target object information in a preset period is acquired.
In the above step S2061, the preset period may also be set according to the commodity attribute of the target object, for example, update time, transaction amount, harm crowd, and the like, for example, a week or a month. For example, the platform obtains the transaction information of all cold medicines in the last month,
step S2062, the search behavior information of the user corresponding to the target object information is extracted.
Step S2063, detecting the target information and the search behavior information according to the detection model, and acquiring the number of target objects which do not meet the preset detection condition in the target object information.
In step S2063, the preset detection condition may be that the semantic correlation between the target object and the target information is lower than a minimum threshold.
In step S2064, the number of target objects is set as the detection result.
In order to strike the propagation of hidden commodities in real time, the platform firstly determines all commodities with transactions in a plurality of days before the current moment, and then extracts user searching behavior data associated with the commodities. Since each commodity is generally purchased by a plurality of users, in prediction, the number of the user search sequences with problems detected by the trained classifier of each commodity is counted and then is submitted to the platform maintainer in a sequence. Because the search behavior data of the user contains a large amount of information, compared with a model only considering commodity information in the prior art, the efficiency of detection can be greatly improved.
In the process of predicting or actually making the commodity online, any commodity with a transaction record can be used as an input of the model to judge whether the commodity is an illegal commodity. The selected category can be set by the platform according to the inspection requirement.
Fig. 3 is a schematic diagram of detecting concealed merchandise based on an original recurrent neural network and a tree pruning mechanism according to the embodiment. As shown in fig. 3, after acquiring the search behavior information of the user for the target object, the platform may model the target object in the form of a Berrypensing search information tree. And for each purchase record q, extracting the corresponding user search behavior, taking the user as a root, taking all queries submitted by the user as branches, and taking the commodity p clicked correspondingly by each query as a corresponding leaf. In branch 3, the user has only submitted a query, but has not clicked on any goods, for which case the corresponding model is directly empty. Through the inquiry from the branch 1 to the branch 4, the user is at q4Finds the target commodity in the query dialog library
Figure BDA0002244833570000121
And purchase itAnd (5) buying. Therefore, the model no longer considers branches after branch 4.
And after the user searching behavior of each purchase record is organized into a Berrykeying searching information tree, coding the tree by using a bi-directional recurrent neural network bi-BPTRU. Thus, the sequence of all branches is input into the bidirectional recurrent neural network, and four corresponding output matrixes are obtained: forward hidden semantic information, forward potential user intent information, backward hidden semantic information, and backward potential user intent information. The embodiment takes the concatenation of the last time step in the forward and backward directions that implies the user's intention as the final expression. Meanwhile, the similarity between the last potential semantic state and the rest potential semantic states is calculated by taking the last potential semantic state as a basis, then a logic function is used for activation, and then softmax is used for activation, so that the effects of other branches which are completely irrelevant to the last potential semantic state are reduced. And then, pooling all the implicit semantic states by using the similarity to obtain the distribution pm of the pruning mechanism weight. And finally, splicing the hidden semantic information and the vectors obtained by the potential user intention, and connecting the two fully-connected neural networks to obtain a final label. In the whole process, an end-to-end neural network framework is formed from input to output, cross entropy is used as an objective function, and then random gradient descent is used for training.
It should be noted that each branch in fig. 3 is formed by commodity information clicked by the query machine correspondingly; the whole tree is the sequence of the branches and leaves, and in addition, all expressed colors in each branch are basically uniform.
In the embodiment of the invention, the search behavior information is acquired by introducing the search behavior information of the user when searching the concealed commodity, wherein the search behavior information is used for indicating the query record input by the target object; generating a detection model according to the search behavior information and the information of the object to be selected; and processing the target object information in the preset period and the user search behavior information corresponding to the target object information according to the detection model, predicting to obtain a detection result, and achieving the purpose of avoiding detection omission caused by the fact that a seller intentionally avoids a data detection rule of a transaction platform. According to the scheme, the search behavior information is modeled into a Berrypicking search information tree, the characteristics of the commodity are enriched, then the tree is coded by using an original recurrent neural network, hidden semantic information and potential user intention information in the tree are mined, and the coding is optimized by using a tree pruning mechanism. Meanwhile, as the seller cannot directly change the idea and the search behavior of the buyer, the information implied by the behavior of the buyer is far larger than the information of the commodity, so that the technical effect of detecting the concealed commodity is improved, and the technical problem of low detection efficiency caused by the defects of the technology in the process of carrying out data detection on the concealed commodity in the prior art is solved.
In addition, the method for detecting data provided by the embodiment of the application further comprises the following steps: and recommending the commodities according to the popular phrases in the specific time period.
Specifically, according to the popular network popular words at present, after the user inputs the network popular words (namely, the search behavior information of the target object in the embodiment of the application) in the e-commerce transaction platform, commodity recommendation is performed on the basis of the network popular words through the detection model, so that commodity recommendation is performed through a new dimension, and the commodity recommendation efficiency is improved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.
Example 2
According to an embodiment of the present invention, there is also provided a data detection method, which can be applied to each online shopping platform, and fig. 4 is a flowchart of a data detection method according to embodiment 2 of the present invention, and as shown in fig. 4, the method may include the following steps:
step S402, acquiring all search behavior information on the online trading platform, wherein the search behavior information is used for indicating the query records input by the target object.
In the above step S402, the search behavior information may represent a user search behavior, that is, an effort made by the user before generating a behavior of purchasing the target object, such as submitting a query, clicking to browse the target object, and the like; the target object may be an item that the user finally purchases.
It is easy to note that not every user search behavior can purchase the target object, so the user search behavior includes not only the record of the query that successfully inquires the target object and purchases, but also the record of the query that does not inquire the target object, and even the record of the query that only submits the query but does not click any goods.
And S404, generating a detection model according to the search behavior information and the information of the object to be selected.
In the above step S404, the detection model may be a machine learning model, such as a recurrent neural network.
The recurrent neural network is an artificial neural network with a tree-like hierarchical structure and network nodes recur input information according to the connection sequence, and the inputs of the artificial neural network may be connected, so that x is input for multiple times1,x2,x3…, the intermediate information at each time is saved and transmitted to the next input intermediate information, and the calculation result at each output is not necessarily the target result, and may not be the target resultIn use, only the final output is the desired prediction.
Since there is often a correlation between the previous search behavior information and the next search behavior information of the user, the recurrent neural network is particularly suitable for natural language processing.
Specifically, after crawling the search behavior information of the user for the target object, the platform can model the search behavior information in the form of a Berrypensing search information tree. And for each purchase record, extracting the corresponding user search behavior, taking the user as a root, taking all query sequences submitted by the user as branches, and taking the commodity sequence clicked correspondingly by each query as a corresponding leaf. After the Berrykeying search information tree is constructed, the tree is encoded by using a recurrent neural network.
The behavior of the user before purchase is modeled by excavating Berrykeying search information trees in each purchase record, and hidden goods are detected in an auxiliary mode.
It should be noted that the search behavior information may be from the query records of the concealed merchandise that have been successfully detected by the platform, or may be randomly extracted from a plurality of query records of the normal merchandise, where the former is used as a positive sample and the latter is used as a negative sample to train the model, so that the model to be trained is more accurate.
Step S406, processing the target object information and the search behavior information corresponding to the target object information according to the detection model, and predicting to obtain a detection result.
Although the seller can circumvent the prevention and control mechanism of the platform by changing the description information of the goods, the seller cannot control the search behavior of the buyer. The first query of the buyer is usually a direct description of the concealed goods, and in case of non-ideal search results, the buyer will continuously modify its query to find its target goods. Therefore, a great deal of information is contained in the query sequence and the clicked commodity sequence of the buyer for mining.
Optionally, the step S402 of crawling all search behavior information on the online trading platform includes:
step S4021, acquiring the record information of each user determination target object.
In the above step S4021, the recorded information may be information of at least one target object purchased by the user.
Step S4022, extracting the query log of the user in the log information.
In the above step S4021, the query record may be a series of operations before the user queries the target object, such as inputting a query term, clicking one of the commodities corresponding to the query term, and the like.
Step S4023, obtaining the query sequence in the query record and the sequence of the object to be selected in the query sequence.
In the above step S4023, the query sequence may be a sequence formed by all query terms before the user queries the target object; the sequence of the objects to be selected may be a sequence of the objects to be selected corresponding to each query term.
Step S4024, obtaining search behavior information according to the query sequence and the object sequence to be selected.
And for each purchase record, extracting corresponding user search behaviors, taking the user as a root, taking all query sequences submitted by the user as branches, and taking commodity sequences clicked correspondingly by each query as corresponding leaves. The user search behavior information for each purchase record is organized in the shape of a tree.
Optionally, the step S404 of generating a detection model according to the search behavior information and the candidate object information includes:
step S40401, performing vector calculation according to the query data in the search behavior information to obtain a query vector matrix.
Step S40402, performing vector calculation according to the object data to be selected in the object information to be selected, and obtaining a vector matrix of the object to be selected.
Step S40403, generating a detection model according to the query vector matrix and the candidate vector matrix.
In steps S40401 to S40403, the query data and the object data to be selected in the search behavior information are segmented, all the occurring words are counted to form a dictionary, then each segmented word is mapped to a corresponding ID, and a corresponding word embedding vector is found by using the ID of each segmented word, so that each query data and object data to be selected can be represented as a matrix formed by word vectors, that is, a query vector matrix and an object vector matrix to be selected.
It should be noted that the word embedding vector in the above method can be obtained by pre-training or direct random initialization. In addition, BERT may be used to enhance the expression of each word in addition to the traditional word vectors.
BERT (bidirectional Encoder retrieval from transforms) in the embodiments of the present application, BERT framework replaces the original tf.embedded _ lookup (conventional word vector); the pre-trained model of BERT is loaded and the fine-tune parameter is set to true and the entire model is retrained.
Further, optionally, the method for detecting data provided in the embodiment of the present application further includes:
step S40404, semantic information and intention information are obtained according to the query vector matrix and the candidate vector matrix.
In step S40404, the semantic information may be hidden semantic information of the commodity, and the intention information may include forward potential user intention information and backward potential user intention information.
It should be noted that the hidden semantic information and the potential user intention information may be encoded and modeled by a recurrent neural network.
Step S40405, obtaining a vector according to the semantic information and the intention information.
In step S40405, the method for obtaining the vector may be an average pooling method, or may be an attention pooling method based on query information, where the primary purpose of the average pooling or attention pooling (weighted averaging according to attention) is to compress and extract information. The vectors corresponding to each branch are obtained and these vectors form a matrix. But to obtain the output, further processing of these matrices is required to recompress to a vector.
The importance of all branches in the average pooling is the same; attention pooling, which may be preferred in embodiments of the present application, is achieved by first calculating the attention distribution (i.e., weight distribution) of each branch and then performing a weighted averaging.
The query vector matrix and the candidate object vector matrix can be subjected to average pooling operation, namely, each query vector matrix and the candidate object vector matrix are subjected to vector averaging in the dimension of the word to obtain a new vector, and the new vector represents query information and commodity information respectively. Thus, each query can be represented as a vector, and the corresponding sequence of clicked items can be represented as a matrix. And for each branch, performing average pooling operation on the clicked commodity sequence, and performing attention pooling operation based on query information.
And S40406, splicing according to the vectors of the semantic information and the intention information to obtain a label.
Each branch of the Berrypicking search information tree consists of a query vector matrix and an object vector matrix to be selected. For each branch, the query vector and the candidate object vector are collected, and if the query vector and the candidate object vector are directly spliced, or full connection is added after splicing to map, or a door mechanism is used for combining the query vector and the candidate object vector.
In the embodiment of the application, the direct splicing is to simply splice two features together to form a sequence with a larger dimension;
after splicing, adding full joint mapping is equivalent to performing information filtering and compression once on the direct splicing technology;
the fusion by using a door mechanism is a fusion mode of combining different weight combinations for each dimension of two vectors, and the fusion mode is more precise for the fusion of two kinds of information.
So that the sequence of the whole branch can be represented as a matrix. Because the traditional recurrent neural network can only output one hidden state, namely hidden semantic information and potential user intention information of a modeling sequence cannot be simultaneously output, the scheme provides an original recurrent neural network neuron, which is shown as follows:
Figure BDA0002244833570000171
Figure BDA0002244833570000172
Figure BDA0002244833570000173
Figure BDA0002244833570000174
Figure BDA0002244833570000175
Figure BDA0002244833570000176
Figure BDA0002244833570000177
in the formula (1), W represents a parameter matrix, v represents a parameter vector, and b represents a bias vector, and the three are values to be learned by the recurrent neural network model; z is a radical oft、rt、itAll the functions are sigmoid functions, represent three gates and respectively represent a semantic fusion gate, an intention fusion gate and an interactive gate; h is1Representing hidden semantic information, h2Representing potential user intent information, x representing the current input,
Figure BDA0002244833570000178
representing an intermediate vector prepared to enhance the hidden semantic state; in the same way
Figure BDA0002244833570000179
Representing an intermediate vector intended to enhance the implicit intent, the tanh (0) function is a non-linear activation function that can be applied to the inputCompressed to a number between-1 and 1. tanh (0) is a very standard neural network activation function. Through the series of operations, the hidden semantic information and the potential user intention information of the previous state are better fused with the input of the current state.
According to the method from step S40404 to step S40406, the sequences of all branches, i.e. the above matrices, can be input into the original recurrent neural network of the present application, so as to obtain two corresponding output matrices. Similarly, using the recurrent neural network in both directions, four corresponding output matrices are obtained.
Optionally, the data detection method provided in the embodiment of the present application further includes:
step S40407, a splice of the last time points in the forward and backward directions of the intention information is obtained as a potential intention.
Since the calculation result output by the recurrent neural network at each time is not necessarily the target result, it may not be used, and only the final output result is the required prediction result. Therefore, the scheme takes the concatenation of the last time step in the forward and backward directions of the potential user intention information as the input of the full concatenation.
It is easy to note that the above method can be used for mining the surface semantics of the behaviors in a formalized mode and mining the potential intention information of the user. And splicing vectors obtained by the hidden semantic information and the potential user intention information, and connecting the two fully-connected neural networks to obtain a final label. The whole neural network framework from input to output is an end-to-end neural network framework, cross entropy is taken as an objective function, and then random gradient descent is used for training.
Optionally, the data detection method provided in the embodiment of the present application further includes:
step S40408, the last potential semantic state in the semantic information is obtained.
Step S40409, calculating the similarity with the remaining potential semantic states according to the last potential semantic state.
Step S40410, all semantic states are pooled according to the similarity.
The method comprises the steps of calculating the similarity between the last potential semantic state and the rest potential semantic states by taking the last potential semantic state as a basis, activating by using a logic function such as a sigmoid function, and activating by using softmax, so that the effects of other branches which are completely irrelevant to the last potential semantic state are reduced. This similarity is then used to pool all potential semantic states. Specifically, the formula of the tree pruning mechanism is as follows:
H1l=copy(last(H1)),
pm=softmax(σ(similar(H1l,Hl))),
h1*=pmT·H1, (2)
in the formula (2), H1For the expression of the sequences of the branches and leaves, after taking the last branch and copying, H1lThe matrix of the same length as H1 expressed for the last leaf; in the second formula, H is first calculated1And H1lThe cosine similarity is then activated by a sigmoid function, and then activated by a softmax function, and pm is the finally obtained distribution of the pruning mechanism weight; the final formula is that the expression of all branch and leaf sequences is weighted and summed by using a pruning mechanism to obtain the final expression h of all branches and leaves1*. By the method, the pruning mechanism can optimize the result of the recurrent neural network.
Optionally, in step S406, the target object information in the preset period and the search behavior information of the user corresponding to the target object information are processed according to the detection model, and predicting to obtain the detection result includes:
step S4061, the target object information in a preset period is acquired.
In step S4061, the preset period may also be set according to the commodity attributes of the target object, for example, update time, transaction amount, harm crowd, and the like, for example, a week or a month. For example, the platform obtains the transaction information of all cold medicines in the last month,
step S4062 extracts the search behavior information of the user corresponding to the target object information.
Step S4063, detecting the target information and the search behavior information according to the detection model, and acquiring the number of target objects which do not meet the preset detection condition in the target object information.
In step S4063, the preset detection condition may be that the semantic correlation between the target object and the target information is lower than a minimum threshold.
Step S4064, the number of target objects is used as the detection result.
In order to strike the propagation of hidden commodities in real time, the platform firstly determines all commodities with transactions in a plurality of days before the current moment, and then extracts user searching behavior data associated with the commodities. Since each commodity is generally purchased by a plurality of users, in prediction, the number of the user search sequences with problems detected by the trained classifier of each commodity is counted and then is submitted to the platform maintainer in a sequence. Because the search behavior data of the user contains a large amount of information, compared with a model only considering commodity information in the prior art, the efficiency of detection can be greatly improved.
In the process of predicting or actually making the commodity online, any commodity with a transaction record can be used as an input of the model to judge whether the commodity is an illegal commodity. The selected category can be set by the platform according to the inspection requirement.
It should be noted that, for alternative or preferred embodiments of the present embodiment, reference may be made to the description in embodiment 1, but the present embodiment is not limited to the disclosure in embodiment 1, and is not described herein again.
Example 3
According to an embodiment of the present application, there is also provided an apparatus for data detection, and fig. 5 is a schematic diagram of an apparatus for data detection according to embodiment 3 of the present invention, as shown in fig. 5, the apparatus 500 includes: an acquisition module 502, a model generation module 504, and a detection module 506.
The obtaining module 502 is configured to obtain search behavior information, where the search behavior information is used to indicate a query record input by a target object; the model generation module 504 is configured to generate a detection model according to the search behavior information and the information of the object to be selected; the detection module 506 is configured to process the target object information in the preset period and the search behavior information of the user corresponding to the target object information according to the detection model, and predict a detection result.
Optionally, the obtaining module 502 includes: the acquisition submodule is used for acquiring the record information of each user determined target object; the extraction module is used for extracting the query record of the user in the record information; the sequence acquisition module is used for acquiring a query sequence in the query record and a sequence of objects to be selected in the query sequence; and the obtaining module is used for obtaining the searching behavior information according to the query sequence and the object sequence to be selected.
Optionally, the model generating module 504 includes: the first calculation module is used for carrying out vector calculation according to query data in the search behavior information to obtain a query vector matrix; the second calculation module is used for performing vector calculation according to the data of the objects to be selected in the information of the objects to be selected to obtain a vector matrix of the objects to be selected; and the generating module is used for generating a detection model according to the query vector matrix and the vector matrix of the object to be selected.
Further, optionally, the apparatus for data detection provided in this embodiment of the present application further includes: the information acquisition module is used for acquiring semantic information and intention information according to the query vector matrix and the vector matrix of the object to be selected; the vector acquisition module is used for acquiring vectors according to the semantic information and the intention information; and the splicing module is used for splicing according to the vectors of the semantic information and the intention information to obtain the label.
Optionally, the apparatus for data detection provided in the embodiment of the present application further includes: and the intention acquisition module is used for acquiring the splicing of the last time point of the forward direction and the backward direction of the intention information as a potential intention.
Optionally, the apparatus for data detection provided in the embodiment of the present application further includes: the state acquisition module is used for acquiring the last potential semantic state in the semantic information; the third calculation module is used for calculating the similarity between the last potential semantic state and the rest potential semantic states; and the pooling module is used for pooling all semantic states according to the similarity.
Optionally, the detecting module 506 includes: the object information acquisition module is used for acquiring target object information in a preset period; the extraction submodule is used for extracting the searching behavior information of the user corresponding to the target object information; the detection submodule is used for detecting the target information and the search behavior information according to the detection model and acquiring the number of target objects which do not meet the preset detection condition in the target object information; and the result module is used for taking the number of the target objects as a detection result.
It should be noted here that the obtaining module 502, the model generating module 504, and the detecting module 506 correspond to steps S202 to S206 in embodiment 1, and the three modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.
Example 4
According to an embodiment of the present application, there is also provided an apparatus for data detection, and fig. 6 is a schematic diagram of an apparatus for data detection according to embodiment 4 of the present invention, as shown in fig. 6, the apparatus 600 includes: a crawling module 602, a model generating module 604, and a detecting module 606.
The crawling module 602 is configured to obtain all pieces of search behavior information on an online transaction platform, where the search behavior information is used to indicate a query record input by a target object; the model generating module 604 is configured to generate a detection model according to the search behavior information and the information of the object to be selected; the detection module 606 is configured to process the target object information and the search behavior information corresponding to the target object information according to the detection model, and predict a detection result.
Optionally, the crawling module 602 includes: the acquisition submodule is used for acquiring the record information of each user determined target object; the extraction module is used for extracting the query record of the user in the record information; the sequence acquisition module is used for acquiring a query sequence in the query record and a sequence of objects to be selected in the query sequence; and the obtaining module is used for obtaining the searching behavior information according to the query sequence and the object sequence to be selected.
Optionally, the model generating module 604 includes: the first calculation module is used for carrying out vector calculation according to query data in the search behavior information to obtain a query vector matrix; the second calculation module is used for performing vector calculation according to the data of the objects to be selected in the information of the objects to be selected to obtain a vector matrix of the objects to be selected; and the generating module is used for generating a detection model according to the query vector matrix and the vector matrix of the object to be selected.
Further, optionally, the apparatus for data detection provided in this embodiment of the present application further includes: the information acquisition module is used for acquiring semantic information and intention information according to the query vector matrix and the vector matrix of the object to be selected; the vector acquisition module is used for acquiring vectors according to the semantic information and the intention information; and the splicing module is used for splicing according to the vectors of the semantic information and the intention information to obtain the label.
Optionally, the apparatus for data detection provided in the embodiment of the present application further includes: and the intention acquisition module is used for acquiring the splicing of the last time point of the forward direction and the backward direction of the intention information as a potential intention.
Optionally, the apparatus for data detection provided in the embodiment of the present application further includes: the state acquisition module is used for acquiring the last potential semantic state in the semantic information; the third calculation module is used for calculating the similarity between the last potential semantic state and the rest potential semantic states; and the pooling module is used for pooling all semantic states according to the similarity.
Optionally, the detecting module 606 includes: the object information acquisition module is used for acquiring target object information in a preset period; the extraction submodule is used for extracting the searching behavior information of the user corresponding to the target object information; the detection submodule is used for detecting the target information and the search behavior information according to the detection model and acquiring the number of target objects which do not meet the preset detection condition in the target object information; and the result module is used for taking the number of the target objects as a detection result.
It should be noted here that the crawling module 602, the model generating module 604, and the detecting module 606 correspond to steps S402 to S406 in embodiment 2, and the three modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 2.
Example 5
The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.
Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.
In this embodiment, the computer terminal may execute program codes of the following steps in the data detection method of the application program: acquiring search behavior information, wherein the search behavior information is used for indicating a query record input by a target object; generating a detection model according to the search behavior information and the information of the object to be selected; and processing the target object information in the preset period and the user searching behavior information corresponding to the target object information according to the detection model, and predicting to obtain a detection result.
Optionally, the computer terminal includes a storage medium and a processor, and the processor is configured to execute a program stored in the storage medium, where the program executes to perform the method of detecting data according to embodiment 1 or 2.
Alternatively, fig. 7 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 7, the computer terminal a may include: one or more processors (only one of which is shown), memory, and a transmission module.
The memory may be configured to store software programs and modules, such as program instructions/modules corresponding to the data detection method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, so as to implement the data detection method. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: acquiring search behavior information, wherein the search behavior information is used for indicating a query record input before a target object is determined; generating a detection model according to the search behavior information and the information of the object to be selected; and processing the target object information in the preset period and the user searching behavior information corresponding to the target object information according to the detection model, and predicting to obtain a detection result.
Optionally, the processor may further execute the program code of the following steps: the acquiring of the search behavior information includes: acquiring the record information of each user determined target object; extracting query records of users in the record information; acquiring a query sequence in a query record and a sequence of objects to be selected in the query sequence; and obtaining search behavior information according to the query sequence and the object sequence to be selected.
Optionally, the processor may further execute the program code of the following steps: generating a detection model according to the search behavior information and the candidate object information comprises the following steps: performing vector calculation according to query data in the search behavior information to obtain a query vector matrix; performing vector calculation according to the data of the objects to be selected in the information of the objects to be selected to obtain a vector matrix of the objects to be selected; and generating a detection model according to the query vector matrix and the vector matrix of the object to be selected.
Optionally, the processor may further execute the program code of the following steps: the method further comprises the following steps: obtaining semantic information and intention information according to the query vector matrix and the vector matrix of the object to be selected; obtaining a vector according to the semantic information and the intention information; and splicing according to the vectors of the semantic information and the intention information to obtain the label.
Optionally, the processor may further execute the program code of the following steps: the method further comprises the following steps: and acquiring the splicing of the last time point of the forward direction and the backward direction of the intention information as a potential intention.
Optionally, the processor may further execute the program code of the following steps: the method further comprises the following steps: acquiring the last potential semantic state in the semantic information; calculating the similarity with the rest potential semantic states according to the last potential semantic state; and pooling all semantic states according to the similarity.
Optionally, the processor may further execute the program code of the following steps: the method further comprises the following steps: processing target object information in a preset period and user search behavior information corresponding to the target object information according to the detection model, and predicting to obtain a detection result comprises the following steps: acquiring target object information in a preset period; extracting search behavior information of a user corresponding to the target object information; detecting the target information and the search behavior information according to the detection model, and acquiring the number of target objects which do not meet preset detection conditions in the target object information; and taking the number of the target objects as a detection result.
It can be understood by those skilled in the art that the structure shown in fig. 7 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 7 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 7, or have a different configuration than shown in FIG. 7.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
Example 6
The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the data detection method provided in embodiment 1 or 2.
Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring search behavior information, wherein the search behavior information is used for indicating a query record input by a target object; generating a detection model according to the search behavior information and the information of the object to be selected; and processing the target object information in the preset period and the user searching behavior information corresponding to the target object information according to the detection model, and predicting to obtain a detection result.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (13)

1. A method of data detection, comprising:
acquiring search behavior information, wherein the search behavior information is used for indicating a query record input by a target object;
generating a detection model according to the search behavior information and the information of the object to be selected;
and processing the target object information in a preset period and the search behavior information of the user corresponding to the target object information according to the detection model, and predicting to obtain a detection result.
2. The method of claim 1, wherein obtaining search behavior information comprises:
acquiring record information of each user for determining the target object;
extracting the query record of the user in the record information;
acquiring a query sequence in the query record and a sequence of objects to be selected in the query sequence;
and obtaining the search behavior information according to the query sequence and the object sequence to be selected.
3. The method of claim 1 or 2, wherein generating a detection model from the search behavior information and candidate object information comprises:
performing vector calculation according to query data in the search behavior information to obtain a query vector matrix;
performing vector calculation according to the data of the objects to be selected in the information of the objects to be selected to obtain a vector matrix of the objects to be selected;
and generating the detection model according to the query vector matrix and the candidate object vector matrix.
4. The method of claim 3, wherein the method further comprises:
obtaining semantic information and intention information according to the query vector matrix and the vector matrix of the object to be selected;
obtaining a vector according to the semantic information and the intention information;
and splicing according to the vectors of the semantic information and the intention information to obtain a label.
5. The method of claim 4, wherein the method further comprises:
and acquiring the splicing of the last time point of the forward direction and the backward direction of the intention information as a potential intention.
6. The method of claim 4, wherein the method further comprises:
acquiring the last potential semantic state in the semantic information;
calculating the similarity between the last potential semantic state and the rest potential semantic states according to the last potential semantic state;
and pooling all semantic states according to the similarity.
7. The method according to claim 3, wherein the processing target object information in a preset period and user search behavior information corresponding to the target object information according to the detection model, and predicting a detection result comprises:
acquiring target object information in the preset period;
extracting search behavior information of a user corresponding to the target object information;
detecting the target information and the search behavior information according to the detection model, and acquiring the number of target objects which do not meet preset detection conditions in the target object information;
and taking the number of the target objects as the detection result.
8. The method of claim 1, wherein the method further comprises: and recommending the commodities according to the popular phrases in the specific time period.
9. A method of data detection, comprising:
acquiring all search behavior information on an online trading platform, wherein the search behavior information is used for indicating a query record input by a target object;
generating a detection model according to the search behavior information and the information of the object to be selected;
and processing the target object information and the search behavior information corresponding to the target object information according to the detection model, and predicting to obtain a detection result.
10. An apparatus for data detection, comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring search behavior information, and the search behavior information is used for indicating a query record input by a target object;
the model generating module is used for generating a detection model according to the searching behavior information and the information of the object to be selected;
and the detection module is used for processing the target object information in a preset period and the search behavior information of the user corresponding to the target object information according to the detection model and predicting to obtain a detection result.
11. An apparatus for data detection, comprising:
the system comprises a crawling module, a searching module and a searching module, wherein the crawling module is used for acquiring all searching behavior information on an online trading platform, and the searching behavior information is used for indicating query records input by a target object;
the model generating module is used for generating a detection model according to the searching behavior information and the information of the object to be selected;
and the detection module is used for processing the target object information and the search behavior information corresponding to the target object information according to the detection model and predicting to obtain a detection result.
12. A storage medium comprising a stored program, wherein the apparatus in which the storage medium is located is controlled to perform the method of data detection of claim 1 or 9 when the program is run.
13. An apparatus for data detection, comprising a storage medium and a processor for executing a program stored in the storage medium, wherein the program when executed performs the method of data detection of claim 1 or 9.
CN201911013243.8A 2019-10-23 2019-10-23 Data detection method and device Pending CN112699277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911013243.8A CN112699277A (en) 2019-10-23 2019-10-23 Data detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911013243.8A CN112699277A (en) 2019-10-23 2019-10-23 Data detection method and device

Publications (1)

Publication Number Publication Date
CN112699277A true CN112699277A (en) 2021-04-23

Family

ID=75505145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911013243.8A Pending CN112699277A (en) 2019-10-23 2019-10-23 Data detection method and device

Country Status (1)

Country Link
CN (1) CN112699277A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136457A1 (en) * 2005-12-14 2007-06-14 Microsoft Corporation Automatic detection of online commercial intention
CN105069077A (en) * 2015-07-31 2015-11-18 百度在线网络技术(北京)有限公司 Search method and device
CN105335391A (en) * 2014-07-09 2016-02-17 阿里巴巴集团控股有限公司 Processing method and device of search request on the basis of search engine
CN109002849A (en) * 2018-07-05 2018-12-14 百度在线网络技术(北京)有限公司 The method and apparatus for identifying object developing stage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136457A1 (en) * 2005-12-14 2007-06-14 Microsoft Corporation Automatic detection of online commercial intention
CN105335391A (en) * 2014-07-09 2016-02-17 阿里巴巴集团控股有限公司 Processing method and device of search request on the basis of search engine
CN105069077A (en) * 2015-07-31 2015-11-18 百度在线网络技术(北京)有限公司 Search method and device
CN109002849A (en) * 2018-07-05 2018-12-14 百度在线网络技术(北京)有限公司 The method and apparatus for identifying object developing stage

Similar Documents

Publication Publication Date Title
CN110909176B (en) Data recommendation method and device, computer equipment and storage medium
CN109903117B (en) Knowledge graph processing method and device for commodity recommendation
CN109934721A (en) Finance product recommended method, device, equipment and storage medium
CN101496002B (en) Utilize the content choice ad content of on-line session and/or other relevant informations for the system and method for display
US10691922B2 (en) Detection of counterfeit items based on machine learning and analysis of visual and textual data
CN111159407B (en) Method, apparatus, device and medium for training entity recognition and relation classification model
CN109767318A (en) Loan product recommended method, device, equipment and storage medium
CN111784455A (en) Article recommendation method and recommendation equipment
CN109992646A (en) The extracting method and device of text label
CN109584006B (en) Cross-platform commodity matching method based on deep matching model
CN113706251B (en) Model-based commodity recommendation method, device, computer equipment and storage medium
CN109492104A (en) Training method, classification method, system, equipment and the medium of intent classifier model
Abinaya et al. Enhancing top-N recommendation using stacked autoencoder in context-aware recommender system
CN114861050A (en) Feature fusion recommendation method and system based on neural network
CN111429214B (en) Transaction data-based buyer and seller matching method and device
CN112215629B (en) Multi-target advertisement generating system and method based on construction countermeasure sample
CN115511546A (en) Behavior analysis method, system, equipment and readable medium for E-commerce users
CN113868542B (en) Attention model-based push data acquisition method, device, equipment and medium
CN114693409A (en) Product matching method, device, computer equipment, storage medium and program product
CN115641179A (en) Information pushing method and device and electronic equipment
CN112699277A (en) Data detection method and device
CN116340643A (en) Object recommendation adjustment method and device, storage medium and electronic equipment
CN112115258B (en) Credit evaluation method and device for user, server and storage medium
CN115953217A (en) Commodity grading recommendation method and device, equipment, medium and product thereof
KR20230081388A (en) Customized wine recommendation method based on artificial intelligence and operating server for the method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination