WO2021124933A1 - Information processing system and information processing method - Google Patents

Information processing system and information processing method Download PDF

Info

Publication number
WO2021124933A1
WO2021124933A1 PCT/JP2020/045268 JP2020045268W WO2021124933A1 WO 2021124933 A1 WO2021124933 A1 WO 2021124933A1 JP 2020045268 W JP2020045268 W JP 2020045268W WO 2021124933 A1 WO2021124933 A1 WO 2021124933A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
score
node
network
documents
Prior art date
Application number
PCT/JP2020/045268
Other languages
French (fr)
Japanese (ja)
Inventor
桂太 杉原
Original Assignee
桂太 杉原
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 桂太 杉原 filed Critical 桂太 杉原
Priority to JP2021510244A priority Critical patent/JP6919961B1/en
Publication of WO2021124933A1 publication Critical patent/WO2021124933A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results

Definitions

  • This disclosure relates to an information processing system and an information processing method.
  • the page rank is determined higher for web pages linked from more web pages.
  • an adjacency matrix that binary-represents the connection relationship (in other words, the connection state) between web pages with values 0 and 1, and the value 0, 1 that is a modification of the adjacency matrix and other real numbers are used.
  • the matrix included in the components is used.
  • an information processing system includes a document network discriminating unit, a document discriminating unit, a sub-network discriminating unit, and a score calculating unit.
  • the document network discriminating unit is configured to discriminate a document network composed of at least a plurality of documents connected by a weak connection based on data representing a connection relationship between documents.
  • the document discriminating unit is configured to discriminate a specific document having an inlink from two or more documents included in the discriminated document network.
  • the sub-network discriminating unit is configured to discriminate a plurality of sub-networks included in the document network based on the discriminated specific document.
  • the score calculation unit is configured to calculate the scores of each of the plurality of documents constituting the document network by executing individual processing for each of the plurality of determined sub-networks. In the individual processing, the score of each document included in the corresponding subnetwork is calculated.
  • the document network contains one or more duplicate documents.
  • Each duplicate document is a document that belongs to two or more subnetworks out of a plurality of subnetworks.
  • the score calculation unit calculates one score for the corresponding duplicate document by integrating the scores of the corresponding duplicate document in two or more subnetworks for each of the one or more duplicate documents.
  • the information processing system it is possible to appropriately perform scoring of a plurality of documents in a document network including a document having inlinks from two or more documents. Therefore, the information processing system according to one aspect of the present disclosure is very useful for scoring a plurality of documents in a document network having a complicated connection relationship.
  • the sub-network discriminating unit may discriminate a plurality of sub-networks having a specific document as a boundary.
  • the plurality of subnetworks may have at least two or more upstream subnetworks and downstream subnetworks.
  • Two or more upstream subnetworks correspond to two or more inlinks of a particular document, and in each of the upstream networks, the particular document has one corresponding inlink.
  • the downstream subnetwork is connected to the specific document through the outlink that the specific document has.
  • the specific document is a duplicate document that belongs to two or more upstream subnetworks.
  • the score calculation unit calculates one integrated score for a specific document by integrating the scores in the upstream subnetwork of the specific document, and sets the score of each document belonging to the downstream subnetwork as the integrated score of the specific document. It may be calculated based on the standard.
  • the sub-network discriminating unit may discriminate the sub-network for each in-link of the specific document as a plurality of sub-networks.
  • Each of the sub-networks may include a group of documents located upstream of the corresponding inlink, a specific document, and a group of documents located downstream of the outlink of the specific document.
  • the sub-network discriminating unit may discriminate the sub-network for each combination of in-link and out-link possessed by the specific document as a plurality of sub-networks.
  • Each of the sub-networks may include a group of documents located upstream of the inlink corresponding to the combination, a specific document, and a group of documents located downstream of the outlink corresponding to the combination.
  • integration may be achieved by calculating the sum of the scores of the corresponding duplicate documents in two or more subnetworks.
  • integration may be achieved by calculating representative scores of the corresponding duplicate documents in two or more subnetworks.
  • the representative value may be an average value.
  • the individual processing includes the processing of calculating the score of each document contained in the corresponding subnetting using the Hermitian adjacency matrix based on the connection relationship between the documents in the corresponding subnetting. You may.
  • the individual processing is virtually out to the trailing document by adding a dummy document to the trailing document that does not have an outlink and is included in the corresponding subnetworks. It may include a process of changing the corresponding subnetwork so as to provide a link.
  • the individual processing may include a processing that defines an Hermitian adjacency matrix based on the connection relationship between documents in the modified subnetwork.
  • the Hermitian adjacency matrix is an N-by-N Hermitian matrix based on the connection relationships between the documents D [m] (1 ⁇ m (integer) ⁇ N) that make up the corresponding subnetworks. May be good.
  • the component h (p, q) in the p-th row and q-th column has a link from the document D [p] to the document D [q] and the document D [q] to the document D [p].
  • a value of 1 when a link to is present, and a value when neither the link from document D [p] to document D [q] nor the link from document D [q] to document D [p] is present.
  • the value + i (i is an imaginary unit). Shown, when there is no link from document D [p] to document D [q] but there is a link from document D [q] to document D [p], the diagonal component is zero, indicating the value -i. It may correspond to the Elmeat matrix of.
  • the individual processing may include a process of transforming the Hermitian adjacency matrix to define a special Hermitian adjacency matrix and using the eigenvectors of the special Hermitian adjacency matrix to calculate the score of each document contained in the corresponding subnetwork. ..
  • the Hermitian adjacency matrix may be deformed so that when each component of the eigenvector is placed on the complex plane, all the components fall within the angle range of ⁇ / 2 radians on the complex plane.
  • a plurality of documents are scored using a special Hermitian adjacency matrix corresponding to the Hermitian adjacency matrix in which the connection relationship between documents can be represented by four values of 1,0, + i, and -i. .. Therefore, it is not necessary to determine a virtual connection relationship from all documents to all documents, and scoring of each document based on the connection relationship between documents can be realized more appropriately than before.
  • a computer program may be provided.
  • the computer program may be a computer program for operating the computer as at least one of the document network discrimination unit, the document discrimination unit, the subnetwork discrimination unit, and the score calculation unit included in the above-mentioned information processing system. ..
  • an information processing method executed by a computer may be provided.
  • the information processing method may include determining a document network composed of a plurality of documents connected by at least a weak connection based on data representing a connection relationship between documents.
  • the information processing method may include discriminating a specific document having an inlink from two or more documents included in the discriminated document network.
  • the information processing method may include discriminating a plurality of sub-networks included in the document network based on the discriminated specific document.
  • each of the plurality of documents constituting the document network is executed by executing a process of calculating the score of each document included in the corresponding subnetwork as an individual process for each of the plurality of determined subnetworks. It may include calculating the score of.
  • the document network may include one or more duplicate documents that belong to two or more subnetworks among the plurality of subnetworks. Calculating the scores for each of the documents that make up the document network corresponds to each of the duplicate documents by integrating the scores of the corresponding duplicate documents in two or more subnetworks. It may include calculating one score for duplicate documents.
  • an information processing method can be provided that includes at least part of the procedure performed by the information processing system described above.
  • an information processing system may be provided that includes a processor and a memory containing instructions for causing the processor to perform a particular process.
  • the specific process may be a process corresponding to the above-mentioned information processing method.
  • FIG. 11A is an explanatory diagram regarding the arrangement of each node on the complex plane
  • FIG. 11B is a diagram illustrating a score calculation method.
  • It is a flowchart which shows the 2nd part of the score calculation process.
  • 26A and 26B are diagrams showing a subgraph of the third document network in the third embodiment. It is a figure which shows the example of the 4th document network. It is a figure which shows the example of the 5th document network. 29A and 29B are diagrams showing a subgraph of the fourth document network. It is a flowchart which shows the sub-processing which the 2nd scoring unit executes in 4th Embodiment. 6 is a flowchart showing a sub-process executed by the second scoring unit in the fifth embodiment. It is explanatory drawing about the special Hermitian adjacency matrix in 5th Embodiment. It is explanatory drawing about the special Hermitian adjacency matrix in 5th Embodiment.
  • the information processing system 1 of the present embodiment shown in FIG. 1 is configured to provide the user terminal 5 with a list of documents corresponding to the search query in response to the search query input from the user terminal 5.
  • the document is a web document, specifically a web page.
  • the information processing system 1 functions as a search engine that can be used from the user terminal 5 through the communication network.
  • the communication network is, for example, the Internet.
  • the information processing system 1 includes a calculation unit 10, a storage unit 20, and a communication unit 30.
  • the arithmetic unit 10 includes a processor 11 and a memory 15.
  • the storage unit 20 stores computer programs and data executed by the processor 11.
  • the storage unit 20 can include either a hard disk drive or a solid state drive.
  • the communication unit 30 includes a communication interface capable of communicating with the user terminal 5.
  • the calculation unit 10 realizes a search function by executing a process according to a computer program stored in the storage unit 20. Specifically, the process for realizing the search function is executed by the processor 11.
  • the information processing system 1 briefly shown in FIG. 1 may be composed of one or more cooperating server devices.
  • the calculation unit 10 functions as the crawler 110, the indexer 120, the query processing unit 130, and the query response unit 140 shown in FIG. 2, and the storage unit 20 functions as the page repository 210 and the index storage unit 220. It is realized by doing.
  • the crawler 110 is configured to collect a plurality of web pages existing in a communication network, like a well-known crawler.
  • the web pages collected by the crawler 110 are stored in the page repository 210.
  • the indexer 120 is configured to analyze and index each web page stored in the page repository 210. Indexing generates index data for each web page.
  • the index data for each web page is stored in the index storage unit 220.
  • Each index data includes a content index and a structural index.
  • the content index contains information on the corresponding web page keywords, titles, and key texts.
  • the structure index contains information that represents the hyperlink structure of the corresponding web page.
  • a group of index data represents the connection relationship between web pages.
  • the query processing unit 130 receives a search query from a user and extracts a group of related pages, which is a set of web pages corresponding to the search query, from all the web pages. All the web pages referred to here correspond to a group of web pages found in the communication network by the crawler 110 and whose index data is registered in the index storage unit 220.
  • the query processing unit 130 extracts from all the web pages a set of web pages including the vocabulary corresponding to the search query as a related page group based on the content index of the web page stored in the index storage unit 220. To do.
  • the extracted related page group information is provided to the query response unit 140.
  • the query response unit 140 transmits a search result list in which the related page groups are arranged in the order of page rank to the user terminal 5 as response data for the search query based on the information of the related page group provided.
  • Each related page is ranked higher as the web page has a higher degree of relevance and importance to the search query, and is placed at the top of the search result list.
  • the search result list is configured as a web page having links to the listed related pages, similar to the response data from a conventional search engine.
  • the link referred to here is a so-called hyperlink.
  • the query response unit 140 includes a first scoring unit 141, a second scoring unit 143, a ranking unit 145, and an output unit 147.
  • the first scoring unit 141 scores each of the related pages for the related page group corresponding to the search query based on the degree of relevance to the search query of the page content.
  • the first scoring unit 141 is configured to give a content score as the first score to each of the related pages.
  • the second scoring unit 143 operates independently of the search query, and is configured to give each of the web pages collected by the crawler 110 an important score based on the connection relationship between the web pages as the second score. Will be done.
  • the second score is calculated so that the web page that is estimated to be more important from the connection relationship between the web pages shows a larger value.
  • the second score is higher for web pages with many links, for web pages linked from web pages with high important scores, and for web pages with links from web pages with few links to other web pages. Is shown.
  • the ranking unit 145 is based on the first score calculated by the first scoring unit 141 for each of the related pages and the second score calculated by the second scoring unit 143 for each of the related pages. It is configured to calculate the page rank of each related page.
  • each page rank of the related page corresponds to the weighted sum of the first score and the second score.
  • the page rank of each related page may be understood as an overall score that includes a content score based on a search query and an important score based on the connection relationship between web pages.
  • the output unit 147 sends a search query as a search result list, which is a page list in which related pages corresponding to the search query are arranged in descending order of page rank based on the page rank of each related page calculated by the ranking unit 145. It is transmitted to the original user terminal 5.
  • the second scoring unit 143 periodically executes the process shown in FIG. 4, and based on the latest index data stored in the index storage unit 220, at least a weakly concatenated web page. For each group, the second score of each web page belonging to the corresponding web page group is calculated.
  • the second scoring unit 143 extracts one or more document networks from all the web pages (S110). By referring to the index data stored in the index storage unit 220, the second scoring unit 143 extracts at least each of the web page groups connected by a weak connection as one document network among all the web pages. can do.
  • a document network consists of at least a group of web pages that are weakly linked.
  • a network consisting of at least a group of nodes connected by a weak connection follows a link from any one node of the group of nodes belonging to the network to the remaining nodes when the connection direction of the link between the nodes is ignored. Corresponds to a reachable network.
  • the document network is composed of a group of web pages that are at least indirectly connected to the remaining web pages when any one of the web pages belonging to the document network ignores the connection direction of the link.
  • FIGS. 5 and 6 show examples of two different document networks.
  • the document network is represented as a directed graph.
  • One circle in FIGS. 5 and 6 corresponds to one node, in other words, one web page.
  • the arrow in the figure indicates that a link (hyperlink) to the web page corresponding to the end point of the arrow is formed on the web page corresponding to the start point of the arrow. That is, it means that it is possible to move from the web page corresponding to the start point of the arrow to the web page corresponding to the end point of the arrow via a link.
  • Each web page in the document network shown in FIGS. 5 and 6 is clearly at least indirectly connected to other web pages in the document network when the direction of the arrow is ignored.
  • each of a plurality of web pages in one document network is also referred to as a k-th web page by using the number k shown in a circle in the figure.
  • Each web page in the document network is also referred to as a node.
  • the k-th node means the k-th web page.
  • the second scoring unit 143 selects one or more of the extracted document networks as the document network to be processed. After that, the second scoring unit 143 executes the score calculation process shown in FIG. 7 in order to calculate the second score of each node in the document network to be processed (S130).
  • the second scoring unit 143 repeatedly executes the score calculation process until the score calculation process is executed for all the document networks (S120-S140). That is, the second scoring unit 143 sequentially selects each document network as a processing target (S120), and executes a score calculation process for the selected document network to be processed (S130).
  • the second scoring unit 143 finishes the score calculation process for all the document networks (Yes in S140), the process shown in FIG. 4 ends. In this way, the second scoring unit 143 calculates the second score of each web page constituting the corresponding document network for each document network. The calculated second score is provided to the ranking unit 145.
  • the second scoring unit 143 determines the tip node having no inlink in the document network to be processed (S210).
  • a node with an in-link means a node in which a link to this node is formed in another node.
  • a web page with an in-link means a web page in which a link (hyperlink) that can move to this web page is formed in another web page.
  • a node that does not have an inlink is also referred to as a "leading node”.
  • the nodes having an inlink are the second, third, fourth, fifth, sixth, seventh, and eighth nodes, and the node having no inlink is the first node.
  • the nodes having no inlink in FIG. 6 are the first, second, fourth, tenth, thirteenth, and fourteenth nodes.
  • a dummy node DP that does not have an inlink is added to the document network. Specifically, a dummy node DP having outlinks to all the nodes in the document network is added to the document network.
  • a node with an outlink means a node with a link to another node.
  • a web page with an outlink means a web page in which a link (hyperlink) that can be moved to another web page is formed.
  • a node that does not have an outlink is also referred to as a "rear end node”.
  • the second scoring unit 143 When a dummy node DP having no in-link is added to the document network, the second scoring unit 143 considers the document network to which the dummy node DP is added as a document network to be processed, and adds the dummy node DP. Is determined as the tip node.
  • the second scoring unit 143 determines a join node having a plurality of inlinks.
  • One join node means one node having a plurality of inlinks.
  • the document network shown in FIG. 5 does not have a join node.
  • the join nodes in the document network shown in FIG. 6 are the third, sixth, seventh, twelfth, and fifteenth nodes indicated by double circles.
  • the third node has an inlink from the first node and an inlink from the second node.
  • the second scoring unit 143 executes the processing in S250.
  • the second scoring unit 143 executes the process of S240.
  • the second scoring unit 143 calculates the score of each node in the document network to be processed by using the Hermitian adjacency matrix H corresponding to the document network.
  • the calculated score of each node is a score based on the connection relationship between the nodes.
  • the second scoring unit 143 outputs the score of each node calculated in S240 to the ranking unit 145 as the second score of each web page (S245). After that, the score calculation process shown in FIG. 7 is completed.
  • the second scoring unit 143 can calculate the score of each node by the same method as the international application PCT / JP2018 / 026560 filed on July 13, 2018 by the same applicant. Specifically, the second scoring unit 143 can calculate the score of each node by executing the sub-processing shown in FIG.
  • each of the nodes constituting the document network to be processed is expressed as node D [m].
  • the variable m takes an integer value from the values 1 to N (1 ⁇ m ⁇ N).
  • N is the number of nodes N of the document network to be processed.
  • Node D [m] corresponds to the mth node in the corresponding document network, i.e. the mth web page.
  • the second scoring unit 143 When the sub-processing shown in FIG. 8 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1010). Specifically, the second scoring unit 143 generates the Hermitian adjacency matrix H in which the connection relationships between the nodes in the document network to be processed are represented by the values 1, 0, + i, and ⁇ i. Here, i represents an imaginary unit.
  • the Hermitian adjacency matrix H is a matrix of N rows and N columns (NxN) corresponding to the number N of nodes of the document network to be processed, and each component takes any value of 1, 0, + i, or -i. It is a matrix.
  • the expression "component h (p, q)" in the following indicates the component of the p-th row and q-th column in the Hermitian adjacency matrix H.
  • the corresponding component h (p, q) has a value of 0. Is set to. Therefore, the diagonal component h (p, p) of the Hermitian adjacency matrix H has a value of zero.
  • the corresponding component h (p, q) is the value + i. Is set to.
  • the corresponding component h (p, q) is the value- Set to i.
  • the second scoring unit 143 sets the value of each component h (p, q) as described above according to the connection relationship between the nodes of the document network to be processed, and generates the Hermitian adjacency matrix H (S1010).
  • the Hermitian adjacency matrix H is a Hermitian matrix.
  • a dummy node DP is added to each rear-end node having no outlink in the document network to be processed before the Hermitian adjacency matrix H is generated.
  • the dummy node DP added to the rear end node is a node that has one inlink from the rear end node but does not have an outlink.
  • the document network shown in FIG. 9 is a document network in which a dummy node DP is added to each of the fifth and eighth nodes having no outlink in the document network shown in FIG.
  • the Hermitian adjacency matrix H is generated for the document network to be processed to which the dummy node DP is added in this way.
  • the second scoring unit 143 generates a special Hermitian adjacency matrix H1 which is a modification of the generated Hermitian adjacency matrix H.
  • the transformation is performed so that when each component of the eigenvector V of the special Hermitian adjacency matrix H1 is arranged in the complex plane, all of the components fall within the angular range of ⁇ / 2 radians.
  • the second scoring unit 143 calculates the first correction amount C1 and the second correction amount C2.
  • the value of the parameter n included in the first correction amount C1 and the second correction amount C2 is defined as a natural number equal to or more than the maximum value of the number N of nodes in one or more document networks extracted in S110.
  • a dummy node DP may be added to the document network as described above.
  • the number of nodes N is the number of nodes in the document network including the dummy node DP.
  • the purpose of keeping all of the components within the ⁇ / 2 radian angle range is to ensure that all of the components fit within one quadrant on the complex plane.
  • the parameter n is set to a small value within the range where this purpose can be achieved.
  • the above-mentioned second correction amount C2 contributes to the adjustment of the angle range, and the first correction amount C1 helps to avoid changing the absolute value of the matrix component by the second correction amount C2.
  • the second scoring unit 143 After calculating the first correction amount C1 and the second correction amount C2, the second scoring unit 143 replaces the component of the value + i in the Hermitian adjacency matrix H with the value C1 (C2 + i), and replaces the component showing the value ⁇ i with the component showing the value ⁇ i. Replace with the value C1 (C2-i).
  • the second scoring unit 143 further sets the value C1 (C2 + i) of the component in each row in the Hermitian adjacency matrix H after the substitution, the number of components showing the value C1 (C2 + i) in the same row, and the number of components showing the value 1. Change to the value ⁇ C1 (C2 + i) / W ⁇ divided by the sum of W of.
  • the second scoring unit 143 further sets the value C1 (C2-i) of the component changed to the value ⁇ C1 (C2 + i) / W ⁇ and the component located symmetrically across the diagonal component to the value ⁇ C1. Change to the complex conjugate ⁇ C1 (C2-i) / W ⁇ of (C2 + i) / W ⁇ .
  • the second scoring unit 143 generates the Hermitian matrix defined by such substitution and modification as the special Hermitian adjacency matrix H1.
  • FIG. 1 A specific example of the transformation procedure from the Hermitian adjacency matrix H to the special Hermitian adjacency matrix H1 is shown in FIG.
  • the component that takes the value + i and the component that takes the value 1 are the total W1.
  • each component indicating the value + i in the p1 row in the Hermitian adjacency matrix H is changed to the value ⁇ C1 (C2 + i) / W1 ⁇ .
  • each component indicating the value + i in row p2 in the Hermitian adjacency matrix H is changed to the value ⁇ C1 (C2 + i) / W2 ⁇ .
  • the value of the component indicating the value -i is changed to the complex conjugate of the component at a symmetrical position with the diagonal component in between.
  • the value of the component h (p1, q1) showing the value ⁇ C1 (C2 + i) / W1 ⁇ and the component h (q1, p1) symmetrically across the diagonal component is ⁇ C1 (C2-i). ) / W1 ⁇ .
  • the value of the component h (p2, q2) showing the value ⁇ C1 (C2 + i) / W2 ⁇ and the component h (q2, p2) located symmetrically across the diagonal component is ⁇ C1 (C2-C2-). i) / W2 ⁇ is changed.
  • the second scoring unit 143 calculates the eigenvalue and the eigenvector V of the special Hermitian adjacency matrix H1 generated in S1020. Since the special Hermitian adjacency matrix H1 is a matrix of NxN, the eigenvector V is an N-dimensional vector containing N components.
  • each component of the eigenvector V corresponding to the eigenvalue with the maximum absolute value is represented using V [m].
  • Each component V [m] (1 ⁇ m ⁇ N) of the eigenvector V corresponds to the node D [m] (1 ⁇ m ⁇ N) constituting the document network.
  • the second scoring unit 143 sets each component V [m] (1 ⁇ m ⁇ N) of the eigenvector V corresponding to the maximum eigenvalue of the absolute value of the special Hermitian adjacency matrix H1 to the start point node of the document network. Divide by the corresponding component E.
  • the starting node is the s node D [s]
  • the start point node corresponds to the node to which the lowest score should be given among the nodes in the document network.
  • the start point node is the tip node corresponding to the combination of the tip node and the trailing node that can move according to the direction of the link in the document network to be processed and has the largest number of nodes from the tip node to the trailing node. Can be set.
  • the component of the eigenvector V corresponding to the start point node is converted to the value 1.
  • the eigenvector V after division is expressed as the eigenvector V1.
  • the score of each node D [m] is calculated (S1050).
  • the second scoring unit 143 rotationally transforms each component V1 [m] (1 ⁇ m ⁇ N) on the complex plane. Specifically, in the second scoring unit 143, each component V1 of the eigenvector V1 is arranged so that the component located on the first quadrant side on the complex plane is located on the fourth quadrant side by an angle ⁇ 1 from the real axis. [M] (1 ⁇ m ⁇ N) is rotated on the complex plane.
  • the component V1 [s] corresponding to the start point node is on the real axis of the complex plane.
  • Each of the black and white circles in FIGS. 11A and 11B corresponds to one of the components of the eigenvector V1, in other words, one of the nodes in the document network, and the black circle corresponds to the starting page.
  • the component corresponding to the start point node is rotationally moved so as to be located on the complex plane so as to be located on the fourth quadrant side by an angle ⁇ 1 from the real axis.
  • This rotational transformation is performed to shift the starting node from the real axis to the fourth quadrant for proper scoring.
  • the angle ⁇ 1 is set to a small angle in which all the components of the eigenvector V1 are still located in the fourth quadrant even by the rotation transformation.
  • the angle ⁇ 1 may be zero as long as the scoring is not adversely affected.
  • Each component Vc [m] (1 ⁇ m ⁇ N) of the eigenvector Vc is a complex number.
  • each node is calculated based on the position of each component Vc [m] (1 ⁇ m ⁇ N) on the complex plane.
  • each component Vc [m] (1 ⁇ m ⁇ N) of the eigenvector Vc after rotation conversion is also expressed as a score reference value Vc [m] (1 ⁇ m ⁇ N) of each node.
  • the score reference value Vc [m] is the score reference value of the m-th node, and is used for scoring the m-node.
  • the score reference value Vc [m] (1 ⁇ m ⁇ N) of each node corresponds to V1 [m] (1 ⁇ m ⁇ N).
  • the function arg (x) described below in the present specification may be understood as an argument on the complex plane of the complex number x.
  • x is, for example, Vc [m].
  • the angle ⁇ [m] of the component Vc [m] shown in FIG. 11B from the real axis on the complex plane to the fourth quadrant is equal to ⁇ 2 ⁇ -arg (Vc [m]) ⁇ .
  • expressed below means the absolute value of the complex number x.
  • x Vc [m]
  • the second scoring unit 143 sets the score equivalent value Z [m] (1 ⁇ m ⁇ N) of each node to the complex plane of the score reference value Vc [m] (1 ⁇ m ⁇ N) of each node.
  • the values d1 and d2 are any real numbers greater than zero. As d1 is larger, the value of Z [m] becomes larger according to the smaller number of outlinks at each point from the start point node. The larger d2 is, the larger the value of Z [m] is according to the distance from the starting node.
  • the second scoring unit 143 calculates the score of each node D [m] (1 ⁇ m ⁇ N) in the document network based on the score equivalent value Z [m].
  • Z0 is, for example, the minimum value of Z [m] in the entire document network. In this case, the score of the node D [m] indicating the smallest Z [m] is zero.
  • Z0 may have a value of zero. That is, the Z0 term may be omitted.
  • the score X of each node calculated in this way is output to the ranking unit 145 as the second score of each web page.
  • the second scoring unit 143 may generate the special Hermitian adjacency matrix H2 shown in FIG. 12 in place of the above-mentioned special Hermitian adjacency matrix H1 in S1020.
  • the special Hermitian adjacency matrix H2 shown in FIG. 12 is an example of the special Hermitian adjacency matrix H2 corresponding to the Hermitian adjacency matrix H shown in the upper part of FIG.
  • the second scoring unit 143 replaces the component of the value + i in the Hermitian adjacency matrix H with the value C1 (C2 + i), and replaces the component indicating the value ⁇ i with the value C1 (C2-i). ) Can be replaced.
  • the second scoring unit 143 can further perform the following processes A and B.
  • the second scoring unit 143 divides the value C1 (C2 + i) in the components of each row in the Hermitian adjacency matrix H after substitution by the value C1 (C2 + i) and the number W of the components indicating the value 1 in the same row ⁇
  • the value C1 (C2-i) in the component at a symmetrical position with the component changed to C1 (C2 + i) / W ⁇ and further changed to the value ⁇ C1 (C2 + i) / W ⁇ with the diagonal component in between.
  • the second scoring unit 143 multiplies the value C1 (C2-i) in the components of each row in the replaced Hermitian adjacency matrix H by the number Z of the components showing C1 (C2-i) and the value 1 in the same row. Change to the value ⁇ C1 (C2-i) Z ⁇ , and further change to the value ⁇ C1 (C2-i) Z ⁇ . (C2 + i) can be changed to the complex conjugate ⁇ C1 (C2 + i) Z ⁇ of the value ⁇ C1 (C2-i) Z ⁇ .
  • the special Hermitian adjacency matrix H2 is generated.
  • the second scoring unit 143 may execute the process B after the execution of the process A, may execute the process A after the execution of the process B, or may execute the process A and the process B in parallel. You may do it. Regardless of which mode the process A and the process B are executed, the same special Hermitian adjacency matrix H2 is generated.
  • the value Z1 in the special Hermitian adjacency matrix H2 shown in FIG. 12 takes the value ⁇ i from the components h (p3, 1), h (p3, 2), ..., H (p3, N) in the third row.
  • the value Z2 is a component that takes the value -i and a component that takes the value 1 among the total N components h (p4, 1), h (p4, 2), ..., H (p4, N) in the p4th row.
  • the third line may be understood as the line in which the value ⁇ C1 (C2-i) Z1 / W1 ⁇ is shown in FIG.
  • the fourth line may be understood as the line in which the value ⁇ C1 (C2-i) Z2 / W2 ⁇ is shown in FIG.
  • the second scoring unit 143 can execute the processing of S1030-S1050 by using the special Hermitian adjacency matrix H2 calculated in this way in place of the special Hermitian adjacency matrix H1.
  • the second scoring unit 143 determines the number of layers J of the join nodes included in the document network to be processed.
  • the first joining node that appears when moving between the nodes according to the direction of the link from the tip node is defined as the 0th layer joining node.
  • join node that appears next to the 0th layer join node is defined as the 1st layer join node
  • the join node that appears next to the jth layer join node is defined as the (j + 1) join node (j is an integer greater than or equal to 0). Is).
  • the number of join nodes in the document network is J.
  • the 0th layer connection node is the 3rd node and the 12th node
  • the 1st layer connection node is the 6th node
  • the 2nd layer connection node is the 7th node.
  • the third layer join node is the fifteenth node.
  • the number of layers J of the join nodes in the document network shown in FIG. 6 is 4.
  • the highest layer number among the available layer numbers is assigned to the corresponding join node.
  • the seventh node is not a first layer join node but a second layer join node.
  • the subgraph determined in S270 is composed of the subgraph SG1 consisting of the first node and the third node, and the second node and the third node, as shown in FIG. Subgraph SG2, subgraph SG3 consisting of tenth node, eleventh node, twelfth node, and twentieth node, and subgraph SG4 consisting of thirteenth node and twelfth node.
  • the subgraph is discriminated for each combination of the tip node and the 0th layer join node.
  • the second scoring unit 143 executes the same processing as that shown in FIG. 8 for each of the subgraphs determined in S270 (S280). As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph.
  • the second scoring unit 143 can sequentially select the determined subgraphs as processing targets and execute the processing shown in FIG.
  • the subgraph to be processed is treated in the same manner as the “document network to be processed” in the description of FIG. 8, and the score reference value and the score of each node in the subgraph are calculated.
  • the subgraph to be processed is the subgraph SG3 consisting of the 10th node, the 11th node, the 12th node, and the 20th node
  • the 12th node and the 20th node having no outlink.
  • a dummy node DP is added to the node (see FIG. 14). Dummy node DP may be added to the addition target node in the same number as the number of outlinks that the addition target node has before subgraphing.
  • the Hermitian adjacency matrix H corresponding to the subgraph to which the dummy node DP is added is generated.
  • the score reference values and scores of the tenth node, the eleventh node, the twelfth node, and the twentieth node are calculated.
  • the above-mentioned value Z0 can be set so that the score of the tenth node, which is the tip node, becomes zero, for example.
  • the second scoring unit 143 integrates the score reference value and the score of the j-layer connecting node overlapping between the subgraphs (S285).
  • the second scoring unit 143 synthesizes the score reference values of the j-layer connected nodes overlapping between the subgraphs as follows, and for each of the j-layer connected nodes in the document network to be processed. Calculate the only score reference value and score.
  • the second scoring unit 143 has the highest angle ⁇ from the real axis on the complex plane among the score reference values for each subgraph of the j-layer connection node with respect to one j-layer connection node. Determine a large score reference value.
  • the score reference value in each subgraph of the j-layer join node is rotated on the complex plane so that the angle ⁇ overlaps with the maximum score reference value on the complex plane.
  • the value Z0 is Z [m in the subgraph having the maximum score reference value at the angle ⁇ for the j-layer connecting node. ] Can be set to the minimum value.
  • the value Z0 may be zero, as described above.
  • the second scoring unit 143 performs vector synthesis on the complex plane with respect to one j-layer connecting node without overlapping the score reference values for each subgraph of the j-layer connecting node.
  • the only score reference value may be determined for the j-layer join node.
  • the second scoring unit 143 executes the above processing for each of the j-layer connected nodes in S285, and the second scoring unit 143 executes the above processing for each of the j-layer connected nodes.
  • the score reference value Vx and the score X are calculated.
  • the second scoring unit 143 calculates the score reference value Vx and the corresponding score X by integrating the score reference values Vc between the subgraphs regarding the j-layer join node for each j-layer join node.
  • the second scoring unit 143 increments the value of the variable j by 1 (S290), and in the subsequent S300, the second scoring unit 143 has the value of the variable j less than the number of layers J. Judge whether or not.
  • the second scoring unit 143 executes the process of S410 (see FIG. 19). On the other hand, if it is determined that the value of the variable j is less than the number of layers J (Yes in S300), the second scoring unit 143 executes the process of S310 (see FIG. 15).
  • the second scoring unit 143 determines the subgraph from the tip node to the j-layer join node.
  • the subgraph determined here is a subgraph in which no other join node is included between the tip node and the j-th layer join node.
  • a subgraph including one tip node and one join node corresponding to the combination is determined.
  • the subgraph determined in S310 is a subgraph SG5 including a fourth node, a fifth node, and a sixth node, as shown in FIG.
  • the second scoring unit 143 executes the same processing as in S280 for each of the subgraphs determined in S310 (Yes in S320). S330). As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph (S330). After that, the second scoring unit 143 executes the process of S340.
  • the second scoring unit 143 does not execute the processing of S330, but executes the processing of S340.
  • the second scoring unit 143 sets the variable f to a value of zero.
  • the second scoring unit 143 determines the subgraph from the f-layer connection node to the j-layer connection node.
  • the subgraph determined here is a subgraph in which no other join node is included between the f-layer join node and the j-layer join node.
  • a subgraph including one f-layer connection node and one j-layer connection node corresponding to the combination is determined.
  • the f-layer join node corresponds to the front end node
  • the j-layer join node corresponds to the rear end node.
  • the subgraph determined in S350 is the subgraph SG6 including the third node and the sixth node shown in FIG.
  • the second scoring unit 143 executes the processing of S380 without executing the processing of S370.
  • the second scoring unit 143 executes the same processing as in S280 for each of the subgraphs determined in S350. As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph (S370).
  • the second scoring unit 143 further corrects the score reference value and score of each node in the calculated subgraph according to the already calculated score reference value and score of the f-layer connection node.
  • the second scoring unit 143 modifies the score reference values and scores of the remaining nodes in the subgraph based on the score reference values and scores of the f-layer connected node for which the score reference values and scores have already been calculated. To do.
  • the second scoring unit 143 when the score reference value of the f-layer join node calculated before the processing of S370 is Va, the second scoring unit 143 has the score reference value of each node in the subgraph. Is modified as follows.
  • the second scoring unit 143 rotates the score reference value of each node before modification calculated in S370 on the complex plane so that the score reference value of the f-layer connection node matches the above Va.
  • the score reference value of each node when rotated in this way is determined as the corrected score reference value.
  • the second scoring unit 143 can calculate the modified score X of each node in the subgraph based on the score reference value Vc of each node after modification.
  • the score X can be calculated so that the score X of the f-layer join node is the same as the score before modification.
  • the second scoring unit 143 sets the score of each node in the subgraph as follows. Modify to.
  • the second scoring unit 143 adds the score of each node before modification calculated in S370 by the difference between the score of the f-layer connection node before modification calculated in S370 and the above Xa. As a result, the second scoring unit 143 corrects the score of each node in the subgraph so that the score of the f-layer connecting node before the correction calculated in S370 matches the above Xa.
  • the second scoring unit 143 increments the value of the variable f by 1 (S380). After that, the second scoring unit 143 determines whether or not the value of the variable f is less than the value of the variable j (S390). If an affirmative decision is made here (Yes in S390), the second scoring unit 143 executes the process of S350. If affirmative determination is made (No in S390), the second scoring unit 143 executes the process of S400.
  • the second scoring unit 143 of the corresponding j-layer join node calculated by the processing of S330 and S370 for each of the j-layer join nodes overlapping between the subgraphs determined in S310 and S350.
  • the score reference value and the score for each subgraph are integrated in the same manner as in the process of S285.
  • the second scoring unit 143 calculates a score reference value Vx that integrates the score reference value Vc for each subgraph of the corresponding j-layer join node for each of the j-layer join nodes, and uses the score reference value Vx as the score reference value Vx.
  • the score X Zx-Z0 based on the corresponding score equivalent value Zx is calculated.
  • the second scoring unit 143 increments the value of the variable j by 1 (S290) and executes the processes of S300 to S400.
  • the second scoring unit 143 calculates the score reference value and the score of each node up to the (J-1) layer connection node by repeatedly executing S300 to S400 while incrementing the value of the variable j.
  • the second scoring unit 143 After calculating the score reference value and the score of each node up to the (J-1) layer connection node, the second scoring unit 143 makes a negative judgment in S300 and executes the process of S410 (see FIG. 19). ).
  • the second scoring unit 143 performs the processing related to the subgraph SG1 (see FIG. 13) and the processing related to the subgraph SG2 to perform the score reference values and scores of the first, second, and third nodes. Is calculated.
  • the second scoring unit 143 further calculates the score reference values and scores of the tenth, eleventh, twelfth, thirteenth, and twentieth nodes by executing the processing related to the subgraph SG3 and the processing related to the subgraph SG4.
  • the second scoring unit 143 executes the process related to the subgraph SG5 (see FIG. 16), further executes the process related to the subgraph SG6, and executes the processes related to the subgraph SG6, and the fourth, fifth, and sixth. Calculate the node score reference value and score.
  • the second scoring unit 143 executes the process related to the subgraph SG7 (see FIG. 17), further executes the process related to the subgraph SG8, and calculates the score reference value and the score of the seventh node. To do.
  • the second scoring unit 143 executes the processing related to the subgraph SG9 (see FIG. 18), and further executes the processing related to the subgraph SG10, and the 8th, 14th, and 15th nodes. Score The reference value and score are calculated.
  • the second scoring unit 143 discriminates a non-circular subgraph starting from the joined node and ending at the trailing end node of the unjoined node.
  • the subgraph identified in S410 is the subgraph SG11 composed of the 15th, 16th, 17th, 18th, and 19th nodes shown in FIG.
  • the second scoring unit 143 executes the processing of S440 without executing the processing of S430.
  • the second scoring unit 143 calculates the score reference value and the score of each node in the subgraph for each subgraph determined in S410. (S430).
  • the second scoring unit 143 modifies the score reference value and the score of each node in the subgraph based on the already calculated score and the score reference value of the combined node, as in the process of S370. In this way, the score and score reference value of each node in the subgraph are determined.
  • the second scoring unit 143 determines the subgraph of the circulatory system.
  • the subgraph identified in S440 is the subgraph SG12 composed of the sixth, seventh, eighth, and ninth nodes shown in FIG.
  • the second scoring unit 143 executes the process of S500 without executing the process of S460-S490.
  • the second scoring unit 143 sets the score reference value and score of each node in the subgraph for each subgraph determined in S440 in S460-S490. calculate.
  • the second scoring unit 143 executes the processing of S470-S480 for all the subgraphs determined in S440 (Yes in S490)
  • the second scoring unit 143 executes the processing of S500.
  • the second scoring unit 143 selects one of the subgraphs determined in S440. In S470, the second scoring unit 143 determines a common additional score for each node in the subgraph based on the score group of the nodes for which the score has already been calculated in the selected subgraph.
  • the second scoring unit 143 determines the maximum value of the score group in the above subgraph as the added score. According to the second example of S470, the second scoring unit 143 determines the average value of the score group as the added score.
  • the second scoring unit 143 adds the determined added score to the score of each node in the selected subgraph to correct the score of each node.
  • the score can be regarded as zero and the above-determined addition score can be added.
  • the second scoring unit 143 executes the process of S500 when the score of the node in each subgraph is corrected in this way. Before the processing of S500 is executed, the scores of all the nodes in the document network are determined.
  • the second scoring unit 143 outputs the score of each node in the determined document network to the ranking unit 145 as the second score of each web page. After that, the score calculation process is completed.
  • the information processing system 1 of the first embodiment described above can be modified as follows.
  • the second scoring unit 143 may generate the Hermitian adjacency matrix H and calculate the score reference value and the score without placing the dummy node DP for the node without the outline.
  • the second scoring unit 143 modifies the score reference value Vc [m] of each node as a value excluding the influence of the outlink destination of the corresponding node, and modifies the score reference value Vc *.
  • the score equivalent value Z [m]
  • the second scoring unit 143 modifies the score reference value Vc [m] of each node as a value excluding the influence from the inlink source of the corresponding node, and modifies the score reference value Vc.
  • the score equivalent value Z [m]
  • d1 ⁇ ⁇ 2 ⁇ -arg (Vc * [m] corresponding to each node ]) ⁇ d2 may be calculated.
  • the second modification and the third modification can be implemented by generating the Hermitian adjacency matrix H without placing a dummy node DP for a node having no outline.
  • Hermitian adjacency matrices H1 and H2 corresponding to the Hermitian adjacency matrix H in which the connection relationship between web pages is expressed by four values of 1,0, + i, and ⁇ i are used. Score multiple web pages. Therefore, it is not necessary to establish a virtual connection relationship from all web pages to all web pages, and scoring / ranking of each web page based on the connection relationship between web pages should be realized more appropriately than before. Can be done.
  • the output unit 147 can provide the user terminal 5 with an appropriate search result list based on the connection relationship between the web pages based on the second score from the second scoring unit 143.
  • the information processing system 1 of the second embodiment is configured in the same manner as the information processing system 1 of the first embodiment, except that a score calculation process having a content different from that of the first embodiment is executed in S130. Therefore, in the following, only the score calculation process executed by the second scoring unit 143 in S130 will be described. It may be understood that the configuration of the information processing system 1 of the second embodiment, which is not described below, is the same as that of the first embodiment.
  • the second scoring unit 143 executes the score calculation process shown in FIG. 22 when calculating the second score of each node included in the document network to be processed selected in S120 in S130.
  • the second scoring unit 143 determines the tip node having no inlink included in the document network to be processed (S610). For example, when the document network to be processed is the document network illustrated in FIG. 23, the second scoring unit 143 determines the first node and the eighth node as the tip node.
  • the second scoring unit 143 selects one of the advanced nodes included in the document network (S620) and determines the subgraph including the selected advanced node (S630).
  • the discriminated subgraph is a subgraph consisting of the selected tip node and all the nodes in the document network that can be moved from this tip node according to the direction of the link.
  • the second scoring unit 143 executes the same processing as the processing shown in FIG. 8 when the subgraph determined in S630 is regarded as the document network to be processed, and the score reference value of each node in the subgraph is executed. And the score is calculated (S640).
  • the second scoring unit 143 can generate the Hermitian adjacency matrix H and calculate the score reference value and the score without arranging the dummy node DP at the rear end node.
  • the second scoring unit 143 arranges the score reference value of the tip node in the subgraph at the point of the value 1 on the real axis or the specific point on the fourth quadrant close to the real axis in the complex plane.
  • the score reference value of each node in the subgraph can be calculated.
  • the specific point may be a point obtained by rotating a point having a value of 1 on the real axis toward the fourth quadrant by an angle ⁇ 1.
  • the second scoring unit 143 selects all the leading nodes and determines whether or not the processing of S640 has been executed. If a negative determination is made in S650, the second scoring unit 143 changes the selected tip node (S620) and determines the subgraph of the changed tip node (S630). Then, the score reference value and the score of each node in the subgraph are calculated based on the Hermitian adjacency matrix H of the discriminated subgraph (S640).
  • the second scoring unit 143 calculates the score reference value and the score of each node based on the corresponding Hermitian adjacency matrix H for each subgraph corresponding to each of the advanced nodes included in the document network. In other words, the second scoring unit 143 determines the subgraph for each inlink of the connecting node, and calculates the score reference value and the score of each node based on the corresponding Hermitian adjacency matrix H for each subgraph. After that, the second scoring unit 143 makes an affirmative judgment in S650 and executes the process of S660.
  • the subgraph includes nodes that overlap with other subgraphs, but in S640, a score reference value and a score are calculated for each of the overlapping nodes for each subgraph.
  • the second scoring unit 143 calculates a value obtained by integrating the scores of the corresponding nodes in each subgraph as the second score of each node in the document network. Specifically, the second scoring unit 143 calculates the second score of one node as the total value of the scores in each subgraph of the node.
  • the second scoring unit 143 may calculate the second score of one node based on the composite vector of the score reference values in each subgraph of the node.
  • the second scoring unit 143 can output the calculated score equivalent value Zx as the second score of the corresponding node.
  • the information processing system 1 can appropriately perform scoring / ranking of a plurality of web pages related to a document network having a join node by using the Hermitian adjacency matrix H.
  • the second embodiment may be modified in the same manner as the first embodiment. That is, in the first embodiment, the modified example relating to the calculation of the score described as the first, second, and third modified examples may be applied to the second embodiment.
  • the information processing system 1 of the third embodiment is configured in the same manner as the information processing system 1 of the first embodiment, except that a score calculation process having a content different from that of the first embodiment is executed in S130. Therefore, in the following, only the score calculation process executed by the second scoring unit 143 in S130 will be described. It may be understood that the configuration of the information processing system 1 of the third embodiment, which is not described below, is the same as that of the first embodiment.
  • the second scoring unit 143 executes the score calculation process shown in FIG. 25 when calculating the second score of each node included in the document network to be processed selected in S120 in S130. ..
  • the second scoring unit 143 determines the front node having no inlink and the rear node having no outlink included in the document network to be processed (S710). ).
  • the second scoring unit 143 calculates the total of the outlinks and inlinks when the document network is represented by a directed graph, and the link when the document network is represented by an undirected graph. Nodes with different numbers are formally identified as trailing nodes. If there is no corresponding node, the second scoring unit 143 adds a dummy node DP having inlinks from all the nodes in the document network to the document network, and determines the dummy node DP as the rearmost node. ..
  • the second scoring unit 143 determines the first and eighth nodes as the tip nodes, and the fifth, seventh, and ninth nodes. Determine the node as the trailing node.
  • the second scoring unit 143 determines the first node as the front node and the ninth node as the rear node. Determine.
  • the second scoring unit 143 determines the subgraph for each combination of the front end node and the rear end node included in the document network (S720).
  • a subgraph is a subgraph consisting of a group of nodes that can move from the front end node to the rear end node according to the direction of the link.
  • the subgraph for each combination of the front-end node and the rear-end node may be understood as a subgraph for each combination of in-link and out-link of the join node.
  • the second scoring unit 143 has three subgraphs SG31, SG32, SG33, and FIG. 26B in which the tip node shown in FIG. 26A is the first node.
  • the three subgraphs SG34, SG35, and SG36 whose tip node is the eighth node are discriminated.
  • the second scoring unit 143 has the first node and the rearmost node as the front node passing through the first outlink of the third node shown in FIG. 29A.
  • the subgraph which is the ninth node and the subgraph whose front end node passing through the second outlink of the third node shown in FIG. 29B is the first node and the rear end node is the ninth node are discriminated from each other.
  • the second scoring unit 143 selects one of the determined subgraphs (S730), and executes the same processing as that shown in FIG. 8 when the selected subgraph is regarded as the document network to be processed. , Calculate the first tentative score of each node in the subgraph (S740).
  • Vc [m] is a score reference value of the mth node.
  • d3 is any real number greater than 0. The larger d3 is, the larger the first provisional score Xp1 [m] is according to the distance (the number of links) from the tip node having no inlink.
  • the second scoring unit 143 selects each of the subgraphs in order (S730) until the processing of S740 is executed for all the subgraphs (No in S750), and the processing of S740 is executed. As a result, the second scoring unit 143 calculates the first provisional score of each node in the subgraph for each subgraph (S740).
  • the second scoring unit 143 sets the second tentative score of the node as the second tentative score of the node for each node in the document network.
  • the average value is calculated (S760).
  • the second provisional score of one node is the sum of the first provisional scores of the corresponding node divided by the number of subgraphs to which the corresponding node belongs.
  • the second scoring unit 143 calculates the third provisional score of each node based on the second provisional score of each node in the document network (S770).
  • the second scoring unit 143 uses the third provisional score Xp3 [m] of the m-node, the second provisional score Xp2 [m] of the m-node, and the score reference value Vc [m] to formulate the formula.
  • Xp3 [m] Xp2 [m] ⁇
  • the second scoring unit 143 calculates the second score of each node in the document network using the third provisional score of each node (S780). Specifically, the second scoring unit 143, the second score of the m nodes in the document network, the third temporary score Xp3 value obtained by dividing [m] to a value M d4 Xp3 [m] / M d4 calculate.
  • M is the product of the number of outlinks of each node that can reach the mth node from the rearmost node.
  • scoring / ranking of a plurality of web pages can be appropriately performed using the Hermitian adjacency matrix H for a document network having a join node.
  • the information processing system 1 of the fourth embodiment is different from the first embodiment in that the second scoring unit 143 executes the sub-processing shown in FIG. 30 instead of the sub-processing shown in FIG.
  • the information processing system 1 of the fourth embodiment is basically the same as that of the first embodiment in other respects. Therefore, in the following, a configuration different from that of the first embodiment of the information processing system 1 of the fourth embodiment will be selectively described, and the description of the same configuration as that of the first embodiment will be omitted.
  • the sub-processing shown in FIG. 30 is executed in S240. Further, in S280, S330, S370, and S430, the same processing as the sub-processing shown in FIG. 30 is executed when calculating the score reference value and the score.
  • the second scoring unit 143 When the sub-processing shown in FIG. 30 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1110), similarly to S1010.
  • the second scoring unit 143 After that, the second scoring unit 143 generates a special Hermitian adjacency matrix H3 which is a modification of the generated Hermitian adjacency matrix H (S1120).
  • the special Hermitian adjacency matrix H3 is generated by substituting all the diagonal components in the special Hermitian adjacency matrix H1 generated in S1020 from a value 0 to a value -1.
  • the second scoring unit 143 generates a special Hermitian adjacency matrix H1 by transforming the Hermitian adjacency matrix H generated in S1110 in the same manner as in the processing in S1020, and the diagonal components in the special Hermitian adjacency matrix H1. By substituting everything from value 0 to value -1, the special Hermitian adjacency matrix H3 can be generated.
  • the special Hermitian adjacency matrix H3 generated in S1120 is a matrix of N rows and N columns (NxN) corresponding to the number N of nodes of the document network to be processed.
  • the column vector B generated in S1130 corresponds to a matrix of N rows and 1 column, and the column vector B indicates a value according to whether or not each component is a node having an inlink. Is generated in.
  • the second scoring unit 143 sets the component corresponding to the node having no inlink to the value -1, and sets the component corresponding to the node having the inlink to the value 0. Generate vector B.
  • the second scoring unit 143 solves each component u [m] of the score reference vector U corresponding to the solution of the simultaneous equations by solving the following simultaneous equations including the special Hermitian adjacency matrix H3 and the column vector B. (1 ⁇ m ⁇ N) is obtained.
  • the score reference vector U corresponds to an N-by-1 matrix like the column vector B.
  • the m-th row component of the score reference vector U is represented by u [m]
  • the m-th row component of the column vector B is represented by b [m]
  • the special Hermitian adjacency matrix H3 is represented by H'.
  • the second scoring unit 143 determines the score reference value Vc [m] of each node based on each component u [m] of the score reference vector U calculated in S1140.
  • the correction value u [m] / E (1 ⁇ m ⁇ N) can be corrected so as to be rotationally moved from the real axis to the fourth quadrant side by an angle ⁇ 1 on the complex plane. This correction value can be determined as the score reference value Vc [m].
  • the second scoring unit 143 sets the score equivalent value Z [m] (1 ⁇ m ⁇ N) of each node based on the score reference value Vc [m] (1 ⁇ m ⁇ N) of each node.
  • the value Z [m]
  • ⁇ ⁇ 2 ⁇ -arg (Vc [m]) ⁇ is calculated.
  • the second scoring unit 143 further calculates the score X of each node D [m] (1 ⁇ m ⁇ N) in the document network based on the score equivalent value Z [m] (S1160).
  • Z0 is, for example, the minimum value of Z [m] in the entire document network.
  • Z0 may have a value of zero.
  • the score X of each node is calculated in this way.
  • the treatment of the calculated score X is the same as that of the first embodiment.
  • the score X of each node can be calculated using the Hermitian adjacency matrix H without calculating the eigenvalues and the eigenvectors as in the first embodiment.
  • the information processing system 1 of the fifth embodiment is different from the first embodiment in that the second scoring unit 143 executes the sub-processing shown in FIG. 31 instead of the sub-processing shown in FIG.
  • the information processing system 1 of the fifth embodiment is basically the same as that of the first embodiment in other respects. Therefore, in the following, a configuration different from that of the first embodiment of the information processing system 1 of the fifth embodiment will be selectively described, and the description of the same configuration as that of the first embodiment will be omitted.
  • the sub-processing shown in FIG. 31 is executed in S240. Further, in S280, S330, S370, and S430, the same processing as the sub-processing shown in FIG. 31 is executed when calculating the score reference value and the score.
  • the second scoring unit 143 When the sub-processing shown in FIG. 31 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1210), similarly to S1010.
  • the second scoring unit 143 After that, the second scoring unit 143 generates a special Hermitian adjacency matrix H4 which is a modification of the generated Hermitian adjacency matrix H (S1220).
  • the second scoring unit 143 can replace all the components of the value + i with the value 0 in the Hermitian adjacency matrix H. Further, all the components of the value ⁇ i can be replaced with the value C1 (C2-i).
  • the exemplary Hermitian adjacency matrix H shown in the upper part of FIG. 32 is replaced with the matrix H (1) shown in the lower part of FIG. 32.
  • the second scoring unit 143 further includes, in the above matrix H (1) , a column corresponding to a node having two or more outlinks, in other words, the number of components having a value ⁇ i in the Hermitian adjacency matrix H is 2.
  • the component of the value C1 (C2-i) in the above column is replaced with the value C1 (C2-i) / R.
  • the value R is the number of outlinks and corresponds to the number of components having the value ⁇ i in the corresponding column of the Hermitian adjacency matrix H.
  • the second scoring unit 143 further replaces all the diagonal components in the matrix H (2) with a value -1 as shown in the lower part of FIG. 33 to generate a special Hermitian adjacency matrix H4. In the subsequent S1230, the second scoring unit 143 generates the column vector B as in the process in S1130.
  • the second scoring unit 143 solves the simultaneous equations including the special Hermitian adjacency matrix H4 and the column vector B in the same manner as the processing in S1140, so that each component of the score reference vector U corresponding to the solution of the simultaneous equations u [m] (1 ⁇ m ⁇ N) is obtained (S1240).
  • the processing in S1240 is the same as the processing in S1140 except that the special Hermitian adjacency matrix H4 is used instead of the special Hermitian adjacency matrix H3.
  • the component corresponding to the tip node having no inlink shows the real number 1.
  • the second scoring unit 143 determines the score reference value Vc [m] of each node based on each component u [m] of the score reference vector U calculated in S1240. According to the present embodiment, the second scoring unit 143 can determine the score reference value Vc [m] of each node to be the same value as the corresponding component u [m] of the score reference vector U.
  • the second scoring unit 143 sets the score equivalent value Z [m] (1 ⁇ m ⁇ N) of each node based on the score reference value Vc [m] (1 ⁇ m ⁇ N) of each node.
  • the value Z [m] according to the following equation is calculated.
  • the value d is any real number greater than zero.
  • the value d may be the value 1.
  • the second scoring unit 143 further calculates the score X of each node D [m] (1 ⁇ m ⁇ N) in the document network based on the score equivalent value Z [m] (S1260).
  • Z0 is, for example, the minimum value of Z [m] in the entire document network.
  • Z0 may have a value of zero.
  • the score X of each node is calculated in this way.
  • the treatment of the calculated score X is the same as that of the first embodiment.
  • the score X of each node can be calculated using the Hermitian adjacency matrix H without obtaining the eigenvalues and the eigenvectors as in the first embodiment.
  • the present disclosure is not limited to the above-described embodiments.
  • the disclosure may apply to scoring documents with link / citation relationships that are not limited to web documents.
  • the technical ideas according to the fourth embodiment and the fifth embodiment may be applied to the second embodiment or the third embodiment.
  • the technique of adding a dummy node DP having no inlink and having an outlink to all the nodes in the document network to the document network having no advanced node is the first to the first embodiment. 5 Can be applied to embodiments. Furthermore, the dummy node DP having no inlink may be added to the document network having the advanced node.
  • the technique of adding a dummy node DP having inlinks from all the nodes in the document network to the document network having no rear end node and having no outlink is the first technique. It can be applied from the embodiment to the fifth embodiment. Furthermore, the dummy node DP having no outlink may be added to the document network having the trailing node.
  • the functions of one component in the above embodiment may be distributed to a plurality of components.
  • the functions of the plurality of components may be integrated into one component.
  • Some of the configurations of the above embodiments may be omitted.
  • the embodiments of the present disclosure are all aspects contained in the technical idea identified from the wording described in the claims.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

According to one aspect of the present disclosure, a document network consisting of a plurality of documents that are connected by at least a weak connection is identified. A particular document included in the document network and having an in-link from at least two documents is identified. A plurality of sub-networks are identified on the basis of the particular document. A score for each of the plurality of documents constituting the document network is calculated by executing individual processing on each of the sub-networks. In the individual processing, a score for each document included in a corresponding sub-network is calculated. For each duplicate document belonging to two or more sub-networks, the scores of the corresponding duplicate documents in the two or more sub-networks are merged.

Description

情報処理システム及び情報処理方法Information processing system and information processing method 関連出願への相互参照Cross-reference to related applications
 本国際出願は、令和1年12月20日に日本国特許庁に出願された日本国特許出願第2019-230822号に基づく優先権を主張するものであり、日本国特許出願第2019-230822号の全内容を本国際出願に参照により援用する。 This international application claims priority based on Japanese Patent Application No. 2019-230822 filed with the Japan Patent Office on December 20, 1991, and Japanese Patent Application No. 2019-230822. The entire contents of the issue are incorporated in this international application by reference.
 本開示は、情報処理システム及び情報処理方法に関する。 This disclosure relates to an information processing system and an information processing method.
 ウェブページのランク付けを行う技術が既に知られている(特許文献1参照)。この技術の単純な例では、ページランクを、多くのウェブページからリンクされるウェブページほど高く判定する。ページランクの計算には、ウェブページ間の接続関係(換言すれば接続状態)を値0,1で二値表現した隣接行列、及び、隣接行列を変形した値0,1と他の実数とを成分に含む行列が用いられる。 The technology for ranking web pages is already known (see Patent Document 1). In a simple example of this technique, the page rank is determined higher for web pages linked from more web pages. To calculate the page rank, an adjacency matrix that binary-represents the connection relationship (in other words, the connection state) between web pages with values 0 and 1, and the value 0, 1 that is a modification of the adjacency matrix and other real numbers are used. The matrix included in the components is used.
特開2017-102712号公報Japanese Unexamined Patent Publication No. 2017-102712
 上述の隣接行列に基づきページランクを判定する従来方法では、ウェブページ間の実際の接続関係に加え、全てのウェブページから全てのウェブページへの仮想的な接続関係を措定する必要がある。このため、ウェブページの良好なランク付けを行うことが難しい。 In the conventional method of determining the page rank based on the above-mentioned adjacency matrix, it is necessary to determine the virtual connection relationship from all web pages to all web pages in addition to the actual connection relationship between web pages. For this reason, it is difficult to give a good ranking of web pages.
 そこで、本開示の一側面によれば、従来よりも適切に複数文書のスコアリングを行うことが可能な技術を提供できることが望ましい。 Therefore, according to one aspect of the present disclosure, it is desirable to be able to provide a technique capable of scoring a plurality of documents more appropriately than before.
 本開示の一側面によれば、情報処理システムが提供される。情報処理システムは、文書ネットワーク判別部と、文書判別部と、サブネットワーク判別部と、スコア算出部と、を備える。 According to one aspect of the disclosure, an information processing system is provided. The information processing system includes a document network discriminating unit, a document discriminating unit, a sub-network discriminating unit, and a score calculating unit.
 文書ネットワーク判別部は、文書間の接続関係を表すデータに基づき、少なくとも弱連結で連結された複数の文書で構成される文書ネットワークを判別するように構成される。文書判別部は、判別された文書ネットワークに含まれる、二つ以上の文書からのインリンクを有する特定文書を判別するように構成される。 The document network discriminating unit is configured to discriminate a document network composed of at least a plurality of documents connected by a weak connection based on data representing a connection relationship between documents. The document discriminating unit is configured to discriminate a specific document having an inlink from two or more documents included in the discriminated document network.
 サブネットワーク判別部は、判別された特定文書を基準に、文書ネットワークに含まれる複数のサブネットワークを判別するように構成される。スコア算出部は、判別された複数のサブネットワークのそれぞれに対する個別処理を実行することにより、文書ネットワークを構成する複数の文書のそれぞれのスコアを算出するように構成される。個別処理では、対応するサブネットワークに含まれる各文書のスコアが算出される。 The sub-network discriminating unit is configured to discriminate a plurality of sub-networks included in the document network based on the discriminated specific document. The score calculation unit is configured to calculate the scores of each of the plurality of documents constituting the document network by executing individual processing for each of the plurality of determined sub-networks. In the individual processing, the score of each document included in the corresponding subnetwork is calculated.
 文書ネットワークには、重複文書が一つ以上含まれる。重複文書のそれぞれは、複数のサブネットワークのうちの二つ以上のサブネットワークに属する文書である。スコア算出部は、一つ以上の重複文書のそれぞれに関して、対応する重複文書の二つ以上のサブネットワークでのスコアを統合することにより、対応する重複文書に対する一つのスコアを算出する。 The document network contains one or more duplicate documents. Each duplicate document is a document that belongs to two or more subnetworks out of a plurality of subnetworks. The score calculation unit calculates one score for the corresponding duplicate document by integrating the scores of the corresponding duplicate document in two or more subnetworks for each of the one or more duplicate documents.
 本開示の一側面に係る情報処理システムによれば、二つ以上の文書からのインリンクを有する文書が含まれる文書ネットワークにおける複数文書のスコアリングを適切に実行できる。従って、本開示の一側面に係る情報処理システムは、複雑な接続関係を有する文書ネットワークにおける複数文書のスコアリングに大変役立つ。 According to the information processing system according to one aspect of the present disclosure, it is possible to appropriately perform scoring of a plurality of documents in a document network including a document having inlinks from two or more documents. Therefore, the information processing system according to one aspect of the present disclosure is very useful for scoring a plurality of documents in a document network having a complicated connection relationship.
 本開示の一側面によれば、サブネットワーク判別部は、特定文書を境界に有する複数のサブネットワークを判別してもよい。複数のサブネットワークは、二つ以上の上流サブネットワークと、下流サブネットワークとを少なくとも有し得る。二つ以上の上流サブネットワークは、特定文書が有する二つ以上のインリンクに対応し、上流ネットワークのそれぞれでは、特定文書が、対応する一つのインリンクを有する。下流サブネットワークは、特定文書が有するアウトリンクを通じて特定文書と接続される。 According to one aspect of the present disclosure, the sub-network discriminating unit may discriminate a plurality of sub-networks having a specific document as a boundary. The plurality of subnetworks may have at least two or more upstream subnetworks and downstream subnetworks. Two or more upstream subnetworks correspond to two or more inlinks of a particular document, and in each of the upstream networks, the particular document has one corresponding inlink. The downstream subnetwork is connected to the specific document through the outlink that the specific document has.
 この場合、特定文書は、二つ以上の上流サブネットワークに属する重複文書である。スコア算出部は、特定文書の上流サブネットワークでのスコアを統合することにより、特定文書に対して一つの統合スコアを算出し、下流サブネットワークに属する各文書のスコアを、特定文書の統合スコアを基準に算出してもよい。 In this case, the specific document is a duplicate document that belongs to two or more upstream subnetworks. The score calculation unit calculates one integrated score for a specific document by integrating the scores in the upstream subnetwork of the specific document, and sets the score of each document belonging to the downstream subnetwork as the integrated score of the specific document. It may be calculated based on the standard.
 本開示の一側面によれば、サブネットワーク判別部は、複数のサブネットワークとして、特定文書が有するインリンク毎のサブネットワークを判別してもよい。サブネットワークのそれぞれは、対応するインリンクより上流に位置する文書群と、特定文書と、特定文書が有するアウトリンクより下流に位置する文書群と、を備え得る。 According to one aspect of the present disclosure, the sub-network discriminating unit may discriminate the sub-network for each in-link of the specific document as a plurality of sub-networks. Each of the sub-networks may include a group of documents located upstream of the corresponding inlink, a specific document, and a group of documents located downstream of the outlink of the specific document.
 本開示の一側面によれば、サブネットワーク判別部は、複数のサブネットワークとして、特定文書が有するインリンク及びアウトリンクの組合せ毎のサブネットワークを判別してもよい。サブネットワークのそれぞれは、組合せに対応するインリンクより上流に位置する文書群と、特定文書と、組合せに対応するアウトリンクより下流に位置する文書群と、を備え得る。 According to one aspect of the present disclosure, the sub-network discriminating unit may discriminate the sub-network for each combination of in-link and out-link possessed by the specific document as a plurality of sub-networks. Each of the sub-networks may include a group of documents located upstream of the inlink corresponding to the combination, a specific document, and a group of documents located downstream of the outlink corresponding to the combination.
 本開示の一側面によれば、統合は、対応する重複文書の、二つ以上のサブネットワークでのスコアの合計を算出することにより実現されてもよい。本開示の一側面によれば、統合は、対応する重複文書の、二つ以上のサブネットワークでのスコアの代表値を算出することにより実現されてもよい。代表値は、平均値であってもよい。 According to one aspect of the disclosure, integration may be achieved by calculating the sum of the scores of the corresponding duplicate documents in two or more subnetworks. According to one aspect of the disclosure, integration may be achieved by calculating representative scores of the corresponding duplicate documents in two or more subnetworks. The representative value may be an average value.
 本開示の一側面によれば、個別処理は、対応するサブネットワークにおける文書間の接続関係に基づくエルミート隣接行列を用いて、対応するサブネットワークに含まれる各文書のスコアを算出する処理を含んでいてもよい。 According to one aspect of the present disclosure, the individual processing includes the processing of calculating the score of each document contained in the corresponding subnetting using the Hermitian adjacency matrix based on the connection relationship between the documents in the corresponding subnetting. You may.
 本開示の一側面によれば、個別処理は、対応するサブネットワークに含まれる、アウトリンクを有さない後端文書に対して、ダミー文書を付加することにより、後端文書に仮想的にアウトリンクを設けるように、対応するサブネットワークを変更する処理を含んでいてもよい。個別処理は、変更後のサブネットワークにおける文書間の接続関係に基づくエルミート隣接行列を定義する処理を含んでもよい。 According to one aspect of the present disclosure, the individual processing is virtually out to the trailing document by adding a dummy document to the trailing document that does not have an outlink and is included in the corresponding subnetworks. It may include a process of changing the corresponding subnetwork so as to provide a link. The individual processing may include a processing that defines an Hermitian adjacency matrix based on the connection relationship between documents in the modified subnetwork.
 本開示の一側面では、エルミート隣接行列は、対応するサブネットワークを構成する文書D[m](1≦m(整数)≦N)間の接続関係に基づくN行N列のエルミート行列であってもよい。 In one aspect of the disclosure, the Hermitian adjacency matrix is an N-by-N Hermitian matrix based on the connection relationships between the documents D [m] (1 ≤ m (integer) ≤ N) that make up the corresponding subnetworks. May be good.
 エルミート隣接行列は、第p行第q列の成分h(p,q)が、文書D[p]から文書D[q]へのリンクが存在し且つ文書D[q]から文書D[p]へのリンクが存在するとき、値1を示し、文書D[p]から文書D[q]へのリンク及び文書D[q]から文書D[p]へのリンクのいずれも存在しないとき、値0を示し、文書D[p]から文書D[q]へのリンクが存在するが文書D[q]から文書D[p]へのリンクが存在しないとき、値+i(iは虚数単位)を示し、文書D[p]から文書D[q]へのリンクが存在しないが文書D[q]から文書D[p]へのリンクが存在するとき、値-iを示す、対角成分がゼロのエルミート行列に対応してもよい。 In the Hermeet adjacency matrix, the component h (p, q) in the p-th row and q-th column has a link from the document D [p] to the document D [q] and the document D [q] to the document D [p]. A value of 1 when a link to is present, and a value when neither the link from document D [p] to document D [q] nor the link from document D [q] to document D [p] is present. When 0 is shown and there is a link from document D [p] to document D [q] but there is no link from document D [q] to document D [p], the value + i (i is an imaginary unit). Shown, when there is no link from document D [p] to document D [q] but there is a link from document D [q] to document D [p], the diagonal component is zero, indicating the value -i. It may correspond to the Elmeat matrix of.
 個別処理は、エルミート隣接行列を変形して特殊エルミート隣接行列を定義し、特殊エルミート隣接行列の固有ベクトルを用いて、対応するサブネットワークに含まれる各文書のスコアを算出する処理を含んでいてもよい。 The individual processing may include a process of transforming the Hermitian adjacency matrix to define a special Hermitian adjacency matrix and using the eigenvectors of the special Hermitian adjacency matrix to calculate the score of each document contained in the corresponding subnetwork. ..
 エルミート隣接行列は、固有ベクトルの各成分を仮に複素平面上に配置したときに、全ての成分が複素平面においてπ/2ラジアンの角度範囲内に収まるように変形されてもよい。 The Hermitian adjacency matrix may be deformed so that when each component of the eigenvector is placed on the complex plane, all the components fall within the angle range of π / 2 radians on the complex plane.
 本開示の一側面によれば、文書間の接続関係を1,0,+i,-iの4値で表現可能なエルミート隣接行列に対応する特殊エルミート隣接行列を用いて複数の文書をスコアリングする。このため、全文書から全文書への仮想的な接続関係を措定する必要がなく、文書間の接続関係に基づく各文書のスコアリングを従来よりも適切に実現することができる。 According to one aspect of the present disclosure, a plurality of documents are scored using a special Hermitian adjacency matrix corresponding to the Hermitian adjacency matrix in which the connection relationship between documents can be represented by four values of 1,0, + i, and -i. .. Therefore, it is not necessary to determine a virtual connection relationship from all documents to all documents, and scoring of each document based on the connection relationship between documents can be realized more appropriately than before.
 本開示の一側面によれば、コンピュータプログラムが提供されてもよい。コンピュータプログラムは、上述した情報処理システムが備える文書ネットワーク判別部と、文書判別部と、サブネットワーク判別部と、スコア算出部の少なくとも一つとして、コンピュータを機能させるためのコンピュータプログラムであってもよい。 According to one aspect of the present disclosure, a computer program may be provided. The computer program may be a computer program for operating the computer as at least one of the document network discrimination unit, the document discrimination unit, the subnetwork discrimination unit, and the score calculation unit included in the above-mentioned information processing system. ..
 本開示の一側面によれば、コンピュータにより実行される情報処理方法が提供されてもよい。情報処理方法は、文書間の接続関係を表すデータに基づき、少なくとも弱連結で連結された複数の文書で構成される文書ネットワークを判別することを含んでいてもよい。 According to one aspect of the present disclosure, an information processing method executed by a computer may be provided. The information processing method may include determining a document network composed of a plurality of documents connected by at least a weak connection based on data representing a connection relationship between documents.
 情報処理方法は、判別された文書ネットワークに含まれる、二つ以上の文書からのインリンクを有する特定文書を判別することを含んでいてもよい。情報処理方法は、判別された特定文書を基準に、文書ネットワークに含まれる複数のサブネットワークを判別することを含んでいてもよい。 The information processing method may include discriminating a specific document having an inlink from two or more documents included in the discriminated document network. The information processing method may include discriminating a plurality of sub-networks included in the document network based on the discriminated specific document.
 情報処理方法は、判別された複数のサブネットワークのそれぞれに対する個別処理として、対応するサブネットワークに含まれる各文書のスコアを算出する処理を実行することにより、文書ネットワークを構成する複数の文書のそれぞれのスコアを算出することを含んでいてもよい。 In the information processing method, each of the plurality of documents constituting the document network is executed by executing a process of calculating the score of each document included in the corresponding subnetwork as an individual process for each of the plurality of determined subnetworks. It may include calculating the score of.
 文書ネットワークには、複数のサブネットワークのうちの二つ以上のサブネットワークに属する文書である重複文書が一つ以上含まれていてもよい。文書ネットワークを構成する複数の文書のそれぞれのスコアを算出することは、一つ以上の重複文書のそれぞれに関して、対応する重複文書の二つ以上のサブネットワークでのスコアを統合することにより、対応する重複文書に対する一つのスコアを算出することを含んでいてもよい。 The document network may include one or more duplicate documents that belong to two or more subnetworks among the plurality of subnetworks. Calculating the scores for each of the documents that make up the document network corresponds to each of the duplicate documents by integrating the scores of the corresponding duplicate documents in two or more subnetworks. It may include calculating one score for duplicate documents.
 本開示の一側面によれば、上述した情報処理システムが実行する手順の少なくとも一部を備える情報処理方法が提供され得る。 According to one aspect of the present disclosure, an information processing method can be provided that includes at least part of the procedure performed by the information processing system described above.
 本開示の一側面によれば、プロセッサと、プロセッサに特定の処理を実行させるための命令を含むメモリと、を備える情報処理システムが提供されてもよい。特定の処理は、上述の情報処理方法に対応する処理であり得る。 According to one aspect of the present disclosure, an information processing system may be provided that includes a processor and a memory containing instructions for causing the processor to perform a particular process. The specific process may be a process corresponding to the above-mentioned information processing method.
第1実施形態の情報処理システムの構成を表すブロック図である。It is a block diagram which shows the structure of the information processing system of 1st Embodiment. 情報処理システムに関する機能ブロック図である。It is a functional block diagram about an information processing system. クエリ応答部の詳細を表す機能ブロック図である。It is a functional block diagram which shows the detail of a query response part. 第2スコアリング部が実行する処理を表すフローチャートである。It is a flowchart which shows the process which the 2nd scoring part executes. 結合ノードを含まない第1の文書ネットワークを示す図である。It is a figure which shows the 1st document network which does not include a join node. 結合ノードを含む第2の文書ネットワークを示す図である。It is a figure which shows the 2nd document network which contains a join node. 第2スコアリング部が実行するスコア算出処理の第1部分を表すフローチャートである。It is a flowchart which shows the 1st part of the score calculation process executed by the 2nd scoring part. 第2スコアリング部が実行する副処理を表すフローチャートである。It is a flowchart which shows the sub-processing which the 2nd scoring unit executes. ダミーノードが付加された第1の文書ネットワークを示す図である。It is a figure which shows the 1st document network which attached the dummy node. 特殊エルミート隣接行列の第1例を示す図である。It is a figure which shows the 1st example of the special Hermitian adjacency matrix. 図11Aは、各ノードの複素平面上の配置に関する説明図であり、図11Bは、スコアの算出方法を説明した図である。FIG. 11A is an explanatory diagram regarding the arrangement of each node on the complex plane, and FIG. 11B is a diagram illustrating a score calculation method. 特殊エルミート隣接行列の第2例を示す図である。It is a figure which shows the 2nd example of the special Hermitian adjacency matrix. 先端ノードから第0層結合ノードに導かれるサブグラフを示す図である。It is a figure which shows the subgraph which leads from the tip node to the 0th layer connection node. ダミーノードが付加されたサブグラフを示す図である。It is a figure which shows the subgraph to which the dummy node is added. スコア算出処理の第2部分を表すフローチャートである。It is a flowchart which shows the 2nd part of the score calculation process. 第1層結合ノードに導かれるサブグラフを示す図である。It is a figure which shows the subgraph which leads to the 1st layer connection node. 第2層結合ノードに導かれるサブグラフを示す図である。It is a figure which shows the subgraph which leads to the 2nd layer connection node. 第3層結合ノードに導かれるサブグラフを示す図である。It is a figure which shows the subgraph which leads to the 3rd layer connection node. スコア算出処理の第3部分を表すフローチャートである。It is a flowchart which shows the 3rd part of the score calculation process. 結合ノードから後端ノードに導かれる非循環型のサブグラフを示す図である。It is a figure which shows the non-circular type subgraph which leads from a join node to a rear end node. 循環型のサブグラフを示す図である。It is a figure which shows the circulation type subgraph. 第2実施形態のスコア算出処理を表すフローチャートである。It is a flowchart which shows the score calculation process of 2nd Embodiment. 第3の文書ネットワークの例を示す図である。It is a figure which shows the example of the 3rd document network. 図24A及び図24Bは、第2実施形態における第3の文書ネットワークのサブグラフを示す図である。24A and 24B are diagrams showing a subgraph of the third document network in the second embodiment. 第3実施形態のスコア算出処理を表すフローチャートである。It is a flowchart which shows the score calculation process of 3rd Embodiment. 図26A及び図26Bは、第3実施形態における第3の文書ネットワークのサブグラフを示す図である。26A and 26B are diagrams showing a subgraph of the third document network in the third embodiment. 第4の文書ネットワークの例を示す図である。It is a figure which shows the example of the 4th document network. 第5の文書ネットワークの例を示す図である。It is a figure which shows the example of the 5th document network. 図29A及び図29Bは、第4の文書ネットワークのサブグラフを示す図である。29A and 29B are diagrams showing a subgraph of the fourth document network. 第4実施形態において第2スコアリング部が実行する副処理を表すフローチャートである。It is a flowchart which shows the sub-processing which the 2nd scoring unit executes in 4th Embodiment. 第5実施形態において第2スコアリング部が実行する副処理を表すフローチャートである。6 is a flowchart showing a sub-process executed by the second scoring unit in the fifth embodiment. 第5実施形態における特殊エルミート隣接行列に関する説明図である。It is explanatory drawing about the special Hermitian adjacency matrix in 5th Embodiment. 第5実施形態における特殊エルミート隣接行列に関する説明図である。It is explanatory drawing about the special Hermitian adjacency matrix in 5th Embodiment.
 1…情報処理システム、5…ユーザ端末、10…演算部、11…プロセッサ、15…メモリ、20…記憶部、30…通信部、110…クローラ、120…インデクサ、130…クエリ処理部、140…クエリ応答部、141…第1スコアリング部、143…第2スコアリング部、145…ランク付け部、147…出力部、210…ページリポジトリ、220…インデックス記憶部。 1 ... Information system, 5 ... User terminal, 10 ... Computational unit, 11 ... Processor, 15 ... Memory, 20 ... Storage unit, 30 ... Communication unit, 110 ... Crawler, 120 ... Indexer, 130 ... Query processing unit, 140 ... Query response unit, 141 ... 1st scoring unit, 143 ... 2nd scoring unit, 145 ... ranking unit, 147 ... output unit, 210 ... page repository, 220 ... index storage unit.
 本開示の例示的実施形態を、以下に図面を参照しながら説明する。 An exemplary embodiment of the present disclosure will be described below with reference to the drawings.
 [第1実施形態]
 図1に示す本実施形態の情報処理システム1は、ユーザ端末5から入力される検索クエリに応答して、ユーザ端末5に検索クエリに対応する文書のリストを提供するように構成される。文書は、ウェブ文書、具体的にはウェブページである。情報処理システム1は、通信ネットワークを通じてユーザ端末5から利用可能な検索エンジンとして機能する。通信ネットワークは、例えばインターネットである。
[First Embodiment]
The information processing system 1 of the present embodiment shown in FIG. 1 is configured to provide the user terminal 5 with a list of documents corresponding to the search query in response to the search query input from the user terminal 5. The document is a web document, specifically a web page. The information processing system 1 functions as a search engine that can be used from the user terminal 5 through the communication network. The communication network is, for example, the Internet.
 情報処理システム1は、演算部10と、記憶部20と、通信部30とを備える。演算部10は、プロセッサ11及びメモリ15を備える。記憶部20は、プロセッサ11により実行されるコンピュータプログラム及びデータを記憶する。記憶部20は、ハードディスクドライブ及びソリッドステートドライブの一方を備えることができる。 The information processing system 1 includes a calculation unit 10, a storage unit 20, and a communication unit 30. The arithmetic unit 10 includes a processor 11 and a memory 15. The storage unit 20 stores computer programs and data executed by the processor 11. The storage unit 20 can include either a hard disk drive or a solid state drive.
 通信部30は、ユーザ端末5と通信可能な通信インタフェースを備える。演算部10は、記憶部20に記憶されたコンピュータプログラムに従う処理を実行することにより、検索機能を実現する。検索機能を実現するための処理は、具体的には、プロセッサ11により実行される。図1に簡略的に示される情報処理システム1は、一つ以上の協働するサーバ装置群で構成され得る。 The communication unit 30 includes a communication interface capable of communicating with the user terminal 5. The calculation unit 10 realizes a search function by executing a process according to a computer program stored in the storage unit 20. Specifically, the process for realizing the search function is executed by the processor 11. The information processing system 1 briefly shown in FIG. 1 may be composed of one or more cooperating server devices.
 検索機能は、演算部10が、図2に示すクローラ110、インデクサ120、クエリ処理部130、及び、クエリ応答部140として機能し、記憶部20が、ページリポジトリ210、及びインデックス記憶部220として機能することにより実現される。 In the search function, the calculation unit 10 functions as the crawler 110, the indexer 120, the query processing unit 130, and the query response unit 140 shown in FIG. 2, and the storage unit 20 functions as the page repository 210 and the index storage unit 220. It is realized by doing.
 クローラ110は、周知のクローラと同様に、通信ネットワークに存在する複数のウェブページを収集するように構成される。クローラ110により収集されたウェブページは、ページリポジトリ210に蓄積される。 The crawler 110 is configured to collect a plurality of web pages existing in a communication network, like a well-known crawler. The web pages collected by the crawler 110 are stored in the page repository 210.
 インデクサ120は、ページリポジトリ210に蓄積された各ウェブページを解析してインデックス化するように構成される。インデックス化により、ウェブページ毎のインデックスデータが生成される。ウェブページ毎のインデックスデータは、インデックス記憶部220に記憶される。 The indexer 120 is configured to analyze and index each web page stored in the page repository 210. Indexing generates index data for each web page. The index data for each web page is stored in the index storage unit 220.
 各インデックスデータは、内容インデックス、及び構造インデックスを含む。内容インデックスは、対応するウェブページのキーワード、タイトル、及びキーとなる文章の情報を含む。構造インデックスは、対応するウェブページのハイパーリンク構造を表す情報を含む。インデックスデータの一群は、ウェブページ間の接続関係を表す。 Each index data includes a content index and a structural index. The content index contains information on the corresponding web page keywords, titles, and key texts. The structure index contains information that represents the hyperlink structure of the corresponding web page. A group of index data represents the connection relationship between web pages.
 クエリ処理部130は、ユーザからの検索クエリを受け付け、検索クエリに対応するウェブページの集合である関連ページ群を、全ウェブページの中から抽出する。ここでいう全ウェブページは、クローラ110により通信ネットワーク内で見つけられ、インデックス記憶部220にインデックスデータが登録されたウェブページ群に対応する。 The query processing unit 130 receives a search query from a user and extracts a group of related pages, which is a set of web pages corresponding to the search query, from all the web pages. All the web pages referred to here correspond to a group of web pages found in the communication network by the crawler 110 and whose index data is registered in the index storage unit 220.
 具体的に、クエリ処理部130は、インデックス記憶部220が記憶するウェブページの内容インデックスに基づき、検索クエリに対応する語彙を含むウェブページの集合を関連ページ群として、全ウェブページの中から抽出する。抽出された関連ページ群の情報は、クエリ応答部140に提供される。 Specifically, the query processing unit 130 extracts from all the web pages a set of web pages including the vocabulary corresponding to the search query as a related page group based on the content index of the web page stored in the index storage unit 220. To do. The extracted related page group information is provided to the query response unit 140.
 クエリ応答部140は、提供される関連ページ群の情報に基づき、関連ページ群をページランク順に配列した検索結果リストを、検索クエリに対する応答データとして、ユーザ端末5に送信する。 The query response unit 140 transmits a search result list in which the related page groups are arranged in the order of page rank to the user terminal 5 as response data for the search query based on the information of the related page group provided.
 関連ページの夫々は、検索クエリとの関連度及び重要度が高いウェブページほど上位にランク付けされ、検索結果リストの上位に配置される。検索結果リストは、従来の検索エンジンからの応答データと同様に、リストアップされた関連ページへのリンクを有したウェブページとして構成される。ここで言うリンクは、所謂ハイパーリンクである。 Each related page is ranked higher as the web page has a higher degree of relevance and importance to the search query, and is placed at the top of the search result list. The search result list is configured as a web page having links to the listed related pages, similar to the response data from a conventional search engine. The link referred to here is a so-called hyperlink.
 クエリ応答部140は、図3に示すように、第1スコアリング部141と、第2スコアリング部143と、ランク付け部145と、出力部147とを備える。第1スコアリング部141は、検索クエリに対応する関連ページ群について、関連ページの夫々を、ページコンテンツの検索クエリとの関連度に基づいてスコアリングする。具体的に、第1スコアリング部141は、関連ページの夫々に、第1スコアとして、内容得点を与えるように構成される。 As shown in FIG. 3, the query response unit 140 includes a first scoring unit 141, a second scoring unit 143, a ranking unit 145, and an output unit 147. The first scoring unit 141 scores each of the related pages for the related page group corresponding to the search query based on the degree of relevance to the search query of the page content. Specifically, the first scoring unit 141 is configured to give a content score as the first score to each of the related pages.
 第2スコアリング部143は、検索クエリとは独立して動作し、クローラ110により収集されたウェブページの夫々に、第2スコアとして、ウェブページ間の接続関係に基づく重要得点を与えるように構成される。 The second scoring unit 143 operates independently of the search query, and is configured to give each of the web pages collected by the crawler 110 an important score based on the connection relationship between the web pages as the second score. Will be done.
 第2スコアは、ウェブページ間の接続関係から重要度が高いと推定されるウェブページほど大きな値を示すように算出される。第2スコアは、多くのリンクが集まるウェブページほど、高い重要得点を持つウェブページからリンクされるウェブページほど、他のウェブページへのリンクの少ないウェブページからのリンクを持つウェブページほど大きな値を示す。 The second score is calculated so that the web page that is estimated to be more important from the connection relationship between the web pages shows a larger value. The second score is higher for web pages with many links, for web pages linked from web pages with high important scores, and for web pages with links from web pages with few links to other web pages. Is shown.
 ランク付け部145は、第1スコアリング部141が関連ページの夫々に対して算出した第1スコアと、第2スコアリング部143が関連ページの夫々に対して算出した第2スコアとに基づき、関連ページの夫々のページランクを算出するように構成される。 The ranking unit 145 is based on the first score calculated by the first scoring unit 141 for each of the related pages and the second score calculated by the second scoring unit 143 for each of the related pages. It is configured to calculate the page rank of each related page.
 一例によれば、関連ページの夫々のページランクは、第1スコアと第2スコアとの重み付け和に対応する。例えば、第1スコアX1、第2スコアX2、及び、0から1の間の値を採る重み付け係数αを用いて、各関連ページのページランクYは、式Y=α・X1+(1-α)・X2に従って算出され得る。各関連ページのページランクは、検索クエリに基づく内容得点とウェブページ間の接続関係に基づく重要得点とを成分に含む全体得点として理解されてもよい。 According to one example, each page rank of the related page corresponds to the weighted sum of the first score and the second score. For example, using the first score X1, the second score X2, and the weighting coefficient α that takes a value between 0 and 1, the page rank Y of each related page is expressed by the formula Y = α · X1 + (1-α). -Can be calculated according to X2. The page rank of each related page may be understood as an overall score that includes a content score based on a search query and an important score based on the connection relationship between web pages.
 出力部147は、検索クエリに対応する関連ページ群を、ランク付け部145により算出された各関連ページのページランクに基づき、ページランクの高い順に並べたページリストを、検索結果リストとして検索クエリ送信元のユーザ端末5に送信する。 The output unit 147 sends a search query as a search result list, which is a page list in which related pages corresponding to the search query are arranged in descending order of page rank based on the page rank of each related page calculated by the ranking unit 145. It is transmitted to the original user terminal 5.
 具体的には、第2スコアリング部143は、図4に示す処理を定期的に実行することにより、インデックス記憶部220が記憶する最新のインデックスデータに基づき、少なくとも弱連結で連結されたウェブページ群毎に、対応するウェブページ群に属する各ウェブページの第2スコアを算出する。 Specifically, the second scoring unit 143 periodically executes the process shown in FIG. 4, and based on the latest index data stored in the index storage unit 220, at least a weakly concatenated web page. For each group, the second score of each web page belonging to the corresponding web page group is calculated.
 図4に示す処理を開始すると、第2スコアリング部143は、全ウェブページの中から、一つ以上の文書ネットワークを抽出する(S110)。第2スコアリング部143は、インデックス記憶部220が記憶するインデックスデータを参照することにより、全ウェブページの中で、少なくとも弱連結で連結されたウェブページ群のそれぞれを、一つの文書ネットワークとして抽出することができる。一つの文書ネットワークは、少なくとも弱連結で連結されたウェブページ群から構成される。 When the process shown in FIG. 4 is started, the second scoring unit 143 extracts one or more document networks from all the web pages (S110). By referring to the index data stored in the index storage unit 220, the second scoring unit 143 extracts at least each of the web page groups connected by a weak connection as one document network among all the web pages. can do. A document network consists of at least a group of web pages that are weakly linked.
 少なくとも弱連結で連結されたノード群から構成されるネットワークは、ノード間のリンクの接続方向を無視したときに、そのネットワークに属するノード群の任意の一つのノードから、残りのノードにリンクをたどって到達可能なネットワークに対応する。 A network consisting of at least a group of nodes connected by a weak connection follows a link from any one node of the group of nodes belonging to the network to the remaining nodes when the connection direction of the link between the nodes is ignored. Corresponds to a reachable network.
 すなわち、文書ネットワークは、その文書ネットワークに属するウェブページの任意の一つが、リンクの接続方向を無視したときに、残りのウェブページと少なくとも間接的に接続されるウェブページ群から構成される。 That is, the document network is composed of a group of web pages that are at least indirectly connected to the remaining web pages when any one of the web pages belonging to the document network ignores the connection direction of the link.
 図5及び図6は、異なる二つの文書ネットワークの例を示す。文書ネットワークは、有向グラフとして表現される。図5及び図6における一つの円は、一つのノード、換言すれば一つのウェブページに対応する。同図における矢印は、矢印の始点に対応するウェブページに、矢印の終点に対応するウェブページへのリンク(ハイパーリンク)が形成されていることを示す。即ち、矢印の始点に対応するウェブページから矢印の終点に対応するウェブページにリンクを介して移動可能であることを意味する。 5 and 6 show examples of two different document networks. The document network is represented as a directed graph. One circle in FIGS. 5 and 6 corresponds to one node, in other words, one web page. The arrow in the figure indicates that a link (hyperlink) to the web page corresponding to the end point of the arrow is formed on the web page corresponding to the start point of the arrow. That is, it means that it is possible to move from the web page corresponding to the start point of the arrow to the web page corresponding to the end point of the arrow via a link.
 図5及び図6に示される文書ネットワーク内の各ウェブページは、明らかに、矢印の方向を無視したとき、文書ネットワーク内の他のウェブページと少なくとも間接的に接続されている。以下では、一つの文書ネットワーク内の複数のウェブページのそれぞれを、図において円内に示される数字kを用いて、第kウェブページとも表現する。文書ネットワーク内の各ウェブページのことを、ノードとも表現する。第kノードは、第kウェブページを意味する。 Each web page in the document network shown in FIGS. 5 and 6 is clearly at least indirectly connected to other web pages in the document network when the direction of the arrow is ignored. In the following, each of a plurality of web pages in one document network is also referred to as a k-th web page by using the number k shown in a circle in the figure. Each web page in the document network is also referred to as a node. The k-th node means the k-th web page.
 S110に続くS120において、第2スコアリング部143は、上記抽出した一つ以上の文書ネットワークのうちの一つを、処理対象の文書ネットワークに選択する。その後、第2スコアリング部143は、処理対象の文書ネットワーク内の各ノードの第2スコアを算出するために、図7に示すスコア算出処理を実行する(S130)。 In S120 following S110, the second scoring unit 143 selects one or more of the extracted document networks as the document network to be processed. After that, the second scoring unit 143 executes the score calculation process shown in FIG. 7 in order to calculate the second score of each node in the document network to be processed (S130).
 第2スコアリング部143は、全ての文書ネットワークに対して、スコア算出処理を実行するまで、スコア算出処理を繰返し実行する(S120-S140)。すなわち、第2スコアリング部143は、各文書ネットワークを順に処理対象に選択し(S120)、選択した処理対象の文書ネットワークに対するスコア算出処理を実行する(S130)。 The second scoring unit 143 repeatedly executes the score calculation process until the score calculation process is executed for all the document networks (S120-S140). That is, the second scoring unit 143 sequentially selects each document network as a processing target (S120), and executes a score calculation process for the selected document network to be processed (S130).
 第2スコアリング部143は、全ての文書ネットワークに対するスコア算出処理を終了すると(S140でYes)、図4に示す処理を終了する。第2スコアリング部143は、このようにして、文書ネットワーク毎に、対応する文書ネットワークを構成する各ウェブページの第2スコアを算出する。算出された第2スコアは、ランク付け部145に提供される。 When the second scoring unit 143 finishes the score calculation process for all the document networks (Yes in S140), the process shown in FIG. 4 ends. In this way, the second scoring unit 143 calculates the second score of each web page constituting the corresponding document network for each document network. The calculated second score is provided to the ranking unit 145.
 図7に示すスコア算出処理(S130)を開始すると、第2スコアリング部143は、処理対象の文書ネットワークにおける、インリンクを持たない先端ノードを判別する(S210)。 When the score calculation process (S130) shown in FIG. 7 is started, the second scoring unit 143 determines the tip node having no inlink in the document network to be processed (S210).
 インリンクを持つノードは、このノードへのリンクが他のノードにおいて形成されたノードを意味する。換言すれば、インリンクを持つウェブページは、このウェブページに移動可能なリンク(ハイパーリンク)が他のウェブページにおいて形成されたウェブページを意味する。以下では、インリンクを持たないノードのことを「先端ノード」とも表現する。 A node with an in-link means a node in which a link to this node is formed in another node. In other words, a web page with an in-link means a web page in which a link (hyperlink) that can move to this web page is formed in another web page. In the following, a node that does not have an inlink is also referred to as a "leading node".
 図5においてインリンクを持つノードは、第2、第3、第4、第5、第6、第7、及び第8ノードであり、インリンクを持たないノードは、第1ノードである。図6においてインリンクを持たないノードは、第1、第2、第4、第10、第13、及び第14ノードである。 In FIG. 5, the nodes having an inlink are the second, third, fourth, fifth, sixth, seventh, and eighth nodes, and the node having no inlink is the first node. The nodes having no inlink in FIG. 6 are the first, second, fourth, tenth, thirteenth, and fourteenth nodes.
 処理対象の文書ネットワークが、先端ノードを有さない文書ネットワークである場合、文書ネットワークには、インリンクを持たないダミーノードDPが付加される。具体的には、文書ネットワーク内の全てのノードへのアウトリンクを持つダミーノードDPが文書ネットワークに付加される。 When the document network to be processed is a document network that does not have a leading node, a dummy node DP that does not have an inlink is added to the document network. Specifically, a dummy node DP having outlinks to all the nodes in the document network is added to the document network.
 アウトリンクを持つノードは、他ノードへのリンクを持つノードを意味する。換言すれば、アウトリンクを持つウェブページは、他のウェブページに移動可能なリンク(ハイパーリンク)が形成されたウェブページを意味する。以下では、アウトリンクを持たないノードのことを「後端ノード」とも表現する。 A node with an outlink means a node with a link to another node. In other words, a web page with an outlink means a web page in which a link (hyperlink) that can be moved to another web page is formed. In the following, a node that does not have an outlink is also referred to as a "rear end node".
 文書ネットワークに、インリンクを持たないダミーノードDPが付加された場合、第2スコアリング部143は、ダミーノードDPが付加された文書ネットワークを、処理対象の文書ネットワークとみなし、付加したダミーノードDPを、先端ノードと判別する。 When a dummy node DP having no in-link is added to the document network, the second scoring unit 143 considers the document network to which the dummy node DP is added as a document network to be processed, and adds the dummy node DP. Is determined as the tip node.
 S210に続くS220において、第2スコアリング部143は、複数のインリンクを持つ結合ノードを判別する。一つの結合ノードは、複数のインリンクを持つ一つのノードのことを意味する。 In S220 following S210, the second scoring unit 143 determines a join node having a plurality of inlinks. One join node means one node having a plurality of inlinks.
 図5に示す文書ネットワークには、結合ノードがない。図6に示す文書ネットワークにおける結合ノードは、二重丸で示される第3、第6、第7、第12、及び第15ノードである。例えば、第3ノードは、第1ノードからのインリンクと、第2ノードからのインリンクと、を有する。 The document network shown in FIG. 5 does not have a join node. The join nodes in the document network shown in FIG. 6 are the third, sixth, seventh, twelfth, and fifteenth nodes indicated by double circles. For example, the third node has an inlink from the first node and an inlink from the second node.
 S220での処理により、処理対象の文書ネットワークが結合ノードを有することが判明した場合(S230でYes)、第2スコアリング部143は、S250の処理を実行する。処理対象の文書ネットワークが結合ノードを有さないことが判明した場合、第2スコアリング部143は、S240の処理を実行する。 When the processing in S220 reveals that the document network to be processed has a join node (Yes in S230), the second scoring unit 143 executes the processing in S250. When it is found that the document network to be processed does not have a join node, the second scoring unit 143 executes the process of S240.
 S240において、第2スコアリング部143は、処理対象の文書ネットワーク内の各ノードのスコアを、文書ネットワークに対応するエルミート隣接行列Hを用いて算出する。算出される各ノードのスコアは、ノード間の接続関係に基づくスコアである。 In S240, the second scoring unit 143 calculates the score of each node in the document network to be processed by using the Hermitian adjacency matrix H corresponding to the document network. The calculated score of each node is a score based on the connection relationship between the nodes.
 第2スコアリング部143は、S240で算出した各ノードのスコアを、各ウェブページの第2スコアとしてランク付け部145に出力する(S245)。その後、図7に示すスコア算出処理を終了する。 The second scoring unit 143 outputs the score of each node calculated in S240 to the ranking unit 145 as the second score of each web page (S245). After that, the score calculation process shown in FIG. 7 is completed.
 S240において、第2スコアリング部143は、同一出願人によって2018年7月13日に出願された国際出願PCT/JP2018/026560と同様の手法で、各ノードのスコアを算出することができる。具体的には、第2スコアリング部143は、図8に示す副処理を実行することにより、各ノードのスコアを算出することができる。 In S240, the second scoring unit 143 can calculate the score of each node by the same method as the international application PCT / JP2018 / 026560 filed on July 13, 2018 by the same applicant. Specifically, the second scoring unit 143 can calculate the score of each node by executing the sub-processing shown in FIG.
 以下では、スコアの算出方法を説明するために、処理対象の文書ネットワークを構成するノードのそれぞれを、ノードD[m]と表現する。変数mは、値1からNまでの整数値を採る(1≦m≦N)。Nは、処理対象の文書ネットワークのノード数Nである。ノードD[m]は、対応する文書ネットワークにおける第mノード、すなわち第mウェブページに対応する。 In the following, in order to explain the score calculation method, each of the nodes constituting the document network to be processed is expressed as node D [m]. The variable m takes an integer value from the values 1 to N (1 ≦ m ≦ N). N is the number of nodes N of the document network to be processed. Node D [m] corresponds to the mth node in the corresponding document network, i.e. the mth web page.
 図8に示す副処理を開始すると、第2スコアリング部143は、処理対象の文書ネットワークに対応するエルミート隣接行列Hを生成する(S1010)。具体的には、第2スコアリング部143は、処理対象の文書ネットワーク内のノード間の接続関係を、値1,0,+i,-iで表すエルミート隣接行列Hを生成する。ここでiは、虚数単位を表す。 When the sub-processing shown in FIG. 8 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1010). Specifically, the second scoring unit 143 generates the Hermitian adjacency matrix H in which the connection relationships between the nodes in the document network to be processed are represented by the values 1, 0, + i, and −i. Here, i represents an imaginary unit.
 エルミート隣接行列Hは、処理対象の文書ネットワークのノード数Nに対応したN行N列(NxN)の行列であり、各成分が、値1,0,+i,-iのいずれかの値を採る行列である。以下における表現「成分h(p,q)」は、エルミート隣接行列Hにおける第p行第q列の成分を示す。 The Hermitian adjacency matrix H is a matrix of N rows and N columns (NxN) corresponding to the number N of nodes of the document network to be processed, and each component takes any value of 1, 0, + i, or -i. It is a matrix. The expression "component h (p, q)" in the following indicates the component of the p-th row and q-th column in the Hermitian adjacency matrix H.
 処理対象の文書ネットワークにおいて、ノードD[p]からノードD[q]へのリンクが存在し且つノードD[q]からノードD[p]へのリンクが存在するとき、対応する成分h(p,q)は、値1に設定される。 When there is a link from node D [p] to node D [q] and a link from node D [q] to node D [p] in the document network to be processed, the corresponding component h (p) , Q) is set to the value 1.
 ノードD[p]からノードD[q]へのリンク及びノードD[q]からノードD[p]へのリンクのいずれもが存在しないとき、対応する成分h(p,q)は、値0に設定される。従って、エルミート隣接行列Hの対角成分h(p,p)は、値ゼロである。 When neither the link from node D [p] to node D [q] nor the link from node D [q] to node D [p] exists, the corresponding component h (p, q) has a value of 0. Is set to. Therefore, the diagonal component h (p, p) of the Hermitian adjacency matrix H has a value of zero.
 ノードD[p]からノードD[q]へのリンクが存在するがノードD[q]からノードD[p]へのリンクが存在しないとき、対応する成分h(p,q)は、値+iに設定される。ノードD[p]からノードD[q]へのリンクが存在しないがノードD[q]からノードD[p]へのリンクが存在するとき、対応する成分h(p,q)は、値-iに設定される。 When there is a link from node D [p] to node D [q] but no link from node D [q] to node D [p], the corresponding component h (p, q) is the value + i. Is set to. When there is no link from node D [p] to node D [q] but there is a link from node D [q] to node D [p], the corresponding component h (p, q) is the value- Set to i.
 第2スコアリング部143は、処理対象の文書ネットワークのノード間の接続関係に従って、上述したように各成分h(p,q)の値を設定し、エルミート隣接行列Hを生成する(S1010)。 The second scoring unit 143 sets the value of each component h (p, q) as described above according to the connection relationship between the nodes of the document network to be processed, and generates the Hermitian adjacency matrix H (S1010).
Figure JPOXMLDOC01-appb-M000001
 上述した規則に従って各成分h(p,q)の値が設定される場合、第p行第q列の成分h(p,q)と対角成分を挟んで対称的な位置にある第q行第p列の成分h(q,p)は、成分h(p,q)の複素共役である。従って、エルミート隣接行列Hは、エルミート行列である。
Figure JPOXMLDOC01-appb-M000001
When the value of each component h (p, q) is set according to the above-mentioned rule, the qth row located symmetrically with the component h (p, q) in the p-th row and q-th column. The component h (q, p) in the first column is a complex conjugate of the component h (p, q). Therefore, the Hermitian adjacency matrix H is a Hermitian matrix.
 S1010では、エルミート隣接行列Hを生成する前に、処理対象の文書ネットワークにおけるアウトリンクを持たない各後端ノードに対し、ダミーノードDPが付加される。 In S1010, a dummy node DP is added to each rear-end node having no outlink in the document network to be processed before the Hermitian adjacency matrix H is generated.
 後端ノードに付加されるダミーノードDPは、図9に示されるように、後端ノードからのインリンクを一つ持つが、アウトリンクを持たないノードである。図9に示す文書ネットワークは、図5に示す文書ネットワークにおいてアウトリンクを持たない第5及び第8ノードのそれぞれに、ダミーノードDPが付加された文書ネットワークである。S1010では、このようにダミーノードDPが付加された処理対象の文書ネットワークに対して、エルミート隣接行列Hが生成される。 As shown in FIG. 9, the dummy node DP added to the rear end node is a node that has one inlink from the rear end node but does not have an outlink. The document network shown in FIG. 9 is a document network in which a dummy node DP is added to each of the fifth and eighth nodes having no outlink in the document network shown in FIG. In S1010, the Hermitian adjacency matrix H is generated for the document network to be processed to which the dummy node DP is added in this way.
 続くS1020において、第2スコアリング部143は、上記生成したエルミート隣接行列Hを変形した特殊エルミート隣接行列H1を生成する。変形は、特殊エルミート隣接行列H1の固有ベクトルVの各成分を複素平面に配置したときに、成分の全てがπ/2ラジアンの角度範囲に収まるように行われる。変形に際して、第2スコアリング部143は、第1補正量C1及び第2補正量C2を算出する。 In the following S1020, the second scoring unit 143 generates a special Hermitian adjacency matrix H1 which is a modification of the generated Hermitian adjacency matrix H. The transformation is performed so that when each component of the eigenvector V of the special Hermitian adjacency matrix H1 is arranged in the complex plane, all of the components fall within the angular range of π / 2 radians. At the time of deformation, the second scoring unit 143 calculates the first correction amount C1 and the second correction amount C2.
Figure JPOXMLDOC01-appb-M000002
 第1補正量C1及び第2補正量C2に含まれるパラメータnの値は、S110で抽出される一つ以上の文書ネットワークにおけるノード数Nの最大値以上の自然数に定められる。文書ネットワークには、上述したようにダミーノードDPが付加され得る。この場合、ノード数Nは、ダミーノードDPを含む文書ネットワーク内のノード数である。パラメータnの値が、このように定められることで、固有ベクトルVの成分の全ては、π/2ラジアンの角度範囲に収まる。
Figure JPOXMLDOC01-appb-M000002
The value of the parameter n included in the first correction amount C1 and the second correction amount C2 is defined as a natural number equal to or more than the maximum value of the number N of nodes in one or more document networks extracted in S110. A dummy node DP may be added to the document network as described above. In this case, the number of nodes N is the number of nodes in the document network including the dummy node DP. By defining the value of the parameter n in this way, all the components of the eigenvector V fall within the angle range of π / 2 radians.
 パラメータnの値が大きいほど、固有ベクトルVの成分は、π/2ラジアンの角度範囲より小さい角度範囲内に収まる。成分の全てをπ/2ラジアンの角度範囲に収めることの目的は、成分の全てが複素平面上の一つの象限内に収まるようにするためである。第2スコアの良好な算出のために、パラメータnは、この目的が達成可能な範囲で、小さい値に定められる。上述の第2補正量C2は、角度範囲の調整に寄与し、第1補正量C1は、第2補正量C2によって行列成分の絶対値が変化するのを回避するのに役立つ。 The larger the value of the parameter n, the more the component of the eigenvector V falls within the angle range smaller than the angle range of π / 2 radians. The purpose of keeping all of the components within the π / 2 radian angle range is to ensure that all of the components fit within one quadrant on the complex plane. For good calculation of the second score, the parameter n is set to a small value within the range where this purpose can be achieved. The above-mentioned second correction amount C2 contributes to the adjustment of the angle range, and the first correction amount C1 helps to avoid changing the absolute value of the matrix component by the second correction amount C2.
 第2スコアリング部143は、第1補正量C1及び第2補正量C2の算出後、エルミート隣接行列Hにおける値+iの成分を、値C1(C2+i)に置換し、値-iを示す成分を値C1(C2-i)に置換する。第2スコアリング部143は更に、当該置換後のエルミート隣接行列Hにおける各行の成分の値C1(C2+i)を、同じ行において値C1(C2+i)を示す成分の数及び値1を示す成分の数の和Wで除算した値{C1(C2+i)/W}に変更する。 After calculating the first correction amount C1 and the second correction amount C2, the second scoring unit 143 replaces the component of the value + i in the Hermitian adjacency matrix H with the value C1 (C2 + i), and replaces the component showing the value −i with the component showing the value −i. Replace with the value C1 (C2-i). The second scoring unit 143 further sets the value C1 (C2 + i) of the component in each row in the Hermitian adjacency matrix H after the substitution, the number of components showing the value C1 (C2 + i) in the same row, and the number of components showing the value 1. Change to the value {C1 (C2 + i) / W} divided by the sum of W of.
 第2スコアリング部143は更に、値{C1(C2+i)/W}に変更された成分と対角成分を挟んで対称的な位置にある成分の値C1(C2-i)を、値{C1(C2+i)/W}の複素共役{C1(C2-i)/W}に変更する。第2スコアリング部143は、このような置換及び変更によって定義されるエルミート行列を、特殊エルミート隣接行列H1として生成する。 The second scoring unit 143 further sets the value C1 (C2-i) of the component changed to the value {C1 (C2 + i) / W} and the component located symmetrically across the diagonal component to the value {C1. Change to the complex conjugate {C1 (C2-i) / W} of (C2 + i) / W}. The second scoring unit 143 generates the Hermitian matrix defined by such substitution and modification as the special Hermitian adjacency matrix H1.
 エルミート隣接行列Hから特殊エルミート隣接行列H1への変形手順の具体例が図10に示される。例えば、第p1行における合計N個の成分h(p1,1),h(p1,2),…,h(p1,N)の内、値+iを採る成分及び値1を採る成分が合計W1個である場合には、エルミート隣接行列Hにおける第p1行の値+iを示す各成分は、値{C1(C2+i)/W1}に変更される。 A specific example of the transformation procedure from the Hermitian adjacency matrix H to the special Hermitian adjacency matrix H1 is shown in FIG. For example, of the total N components h (p1,1), h (p1,2), ..., H (p1, N) in the first row, the component that takes the value + i and the component that takes the value 1 are the total W1. In the case of the number, each component indicating the value + i in the p1 row in the Hermitian adjacency matrix H is changed to the value {C1 (C2 + i) / W1}.
 第p2行における合計N個の成分h(p2,1),h(p2,2),…,h(p2,N)の内、値+iを採る成分及び値1を採る成分が合計W2個である場合には、エルミート隣接行列Hにおける第p2行の値+iを示す各成分は、値{C1(C2+i)/W2}に変更される。 Of the total N components h (p2,1), h (p2,2), ..., H (p2, N) in the second row, the component that takes the value + i and the component that takes the value 1 are W2 in total. In some cases, each component indicating the value + i in row p2 in the Hermitian adjacency matrix H is changed to the value {C1 (C2 + i) / W2}.
 更に、値-iを示す成分の値は、対角成分を挟んで対称的な位置にある成分の複素共役に変更される。例えば、値{C1(C2+i)/W1}を示す成分h(p1,q1)と対角成分を挟んで対称的な位置にある成分h(q1,p1)の値は、{C1(C2-i)/W1}に変更される。同様に、値{C1(C2+i)/W2}を示す成分h(p2,q2)と対角成分を挟んで対称的な位置にある成分h(q2,p2)の値は、{C1(C2-i)/W2}に変更される。 Furthermore, the value of the component indicating the value -i is changed to the complex conjugate of the component at a symmetrical position with the diagonal component in between. For example, the value of the component h (p1, q1) showing the value {C1 (C2 + i) / W1} and the component h (q1, p1) symmetrically across the diagonal component is {C1 (C2-i). ) / W1}. Similarly, the value of the component h (p2, q2) showing the value {C1 (C2 + i) / W2} and the component h (q2, p2) located symmetrically across the diagonal component is {C1 (C2-C2-). i) / W2} is changed.
 続くS1030において、第2スコアリング部143は、S1020で生成した特殊エルミート隣接行列H1の固有値及び固有ベクトルVを算出する。特殊エルミート隣接行列H1がNxNの行列であることから、固有ベクトルVは、N個の成分を含むN次元ベクトルである。 In the following S1030, the second scoring unit 143 calculates the eigenvalue and the eigenvector V of the special Hermitian adjacency matrix H1 generated in S1020. Since the special Hermitian adjacency matrix H1 is a matrix of NxN, the eigenvector V is an N-dimensional vector containing N components.
 以下では、絶対値最大の固有値に対応する固有ベクトルVの各成分をV[m]を用いて表す。変数mは値1から値Nまでの整数値を採る。即ち、固有ベクトルVは、V={V[1],V[2],…,V[N]}である。固有ベクトルVの各成分V[m](1≦m≦N)は、文書ネットワークを構成するノードD[m](1≦m≦N)に対応する。 Below, each component of the eigenvector V corresponding to the eigenvalue with the maximum absolute value is represented using V [m]. The variable m takes an integer value from the value 1 to the value N. That is, the eigenvector V is V = {V [1], V [2], ..., V [N]}. Each component V [m] (1 ≦ m ≦ N) of the eigenvector V corresponds to the node D [m] (1 ≦ m ≦ N) constituting the document network.
 続くS1040において、第2スコアリング部143は、特殊エルミート隣接行列H1の絶対値最大の固有値に対応する固有ベクトルVの各成分V[m](1≦m≦N)を、文書ネットワークの始点ノードに対応する成分Eで除算する。始点ノードが第sノードD[s]であるとき、成分Eは、固有ベクトルVの第s成分V[s]である(E=V[s])。 In the following S1040, the second scoring unit 143 sets each component V [m] (1 ≦ m ≦ N) of the eigenvector V corresponding to the maximum eigenvalue of the absolute value of the special Hermitian adjacency matrix H1 to the start point node of the document network. Divide by the corresponding component E. When the starting node is the s node D [s], the component E is the s component V [s] of the eigenvector V (E = V [s]).
 始点ノードは、文書ネットワークにおけるノードのうち、最も小さいスコアを付与すべきノードに対応する。始点ノードは、処理対象の文書ネットワーク内でリンクの向きに従って移動可能な先端ノードと後端ノードとの組合せのうち、先端ノードから後端ノードまでのノード数が最も多い組合せに対応する先端ノードに設定され得る。 The start point node corresponds to the node to which the lowest score should be given among the nodes in the document network. The start point node is the tip node corresponding to the combination of the tip node and the trailing node that can move according to the direction of the link in the document network to be processed and has the largest number of nodes from the tip node to the trailing node. Can be set.
 固有ベクトルVの各成分V[m](1≦m≦N)が成分Eで除算されると、始点ノードに対応する固有ベクトルVの成分は、値1に変換される。以下では、除算後の固有ベクトルVを、固有ベクトルV1と表現する。固有ベクトルV1は、V1={V[1]/E,V[2]/E,…,V[s]/E=1,…,V[N]/E}である。除算により、始点ノードに対応する固有ベクトルV1の成分は、複素平面において、実軸上に配置される。 When each component V [m] (1 ≦ m ≦ N) of the eigenvector V is divided by the component E, the component of the eigenvector V corresponding to the start point node is converted to the value 1. In the following, the eigenvector V after division is expressed as the eigenvector V1. The eigenvector V1 is V1 = {V [1] / E, V [2] / E, ..., V [s] / E = 1, ..., V [N] / E}. By division, the components of the eigenvector V1 corresponding to the start node are placed on the real axis in the complex plane.
 S1040での処理を終えると、第2スコアリング部143は、除算後の固有ベクトルV1の各成分V1[m]=V[m]/E(1≦m≦N)に基づいて、文書ネットワーク内の各ノードD[m]のスコアを算出する(S1050)。 After finishing the processing in S1040, the second scoring unit 143 in the document network is based on each component V1 [m] = V [m] / E (1 ≦ m ≦ N) of the eigenvector V1 after division. The score of each node D [m] is calculated (S1050).
 S1050において、第2スコアリング部143は、各成分V1[m](1≦m≦N)を、複素平面上で回転変換する。具体的に、第2スコアリング部143は、複素平面上において、最も第1象限側に位置する成分が、実軸から角度θ1だけ第4象限側に位置するように、固有ベクトルV1の各成分V1[m](1≦m≦N)を複素平面上において回転させる。 In S1050, the second scoring unit 143 rotationally transforms each component V1 [m] (1 ≦ m ≦ N) on the complex plane. Specifically, in the second scoring unit 143, each component V1 of the eigenvector V1 is arranged so that the component located on the first quadrant side on the complex plane is located on the fourth quadrant side by an angle θ1 from the real axis. [M] (1 ≦ m ≦ N) is rotated on the complex plane.
 図11Aによれば、始点ノードに対応する成分V1[s]が複素平面の実軸上にある。図11A及び図11Bにおける黒丸及び白丸の夫々は、固有ベクトルV1の成分の一つ、換言すれば、文書ネットワーク内のノードの一つに対応し、黒丸は、始点ページに対応する。 According to FIG. 11A, the component V1 [s] corresponding to the start point node is on the real axis of the complex plane. Each of the black and white circles in FIGS. 11A and 11B corresponds to one of the components of the eigenvector V1, in other words, one of the nodes in the document network, and the black circle corresponds to the starting page.
 上記回転変換によって、始点ノードに対応する成分は、図11Bに示すように、複素平面上で実軸から角度θ1だけ第4象限側に位置するように回転移動する。この回転変換は、適切なスコアリングを目的として、始点ノードを実軸から第4象限側にずらすために実行される。角度θ1は、回転変換によっても、固有ベクトルV1の全成分が依然として第4象限に位置する小さい角度に定められる。スコアリングに悪影響がなければ、角度θ1はゼロであってもよい。 By the above rotation conversion, as shown in FIG. 11B, the component corresponding to the start point node is rotationally moved so as to be located on the complex plane so as to be located on the fourth quadrant side by an angle θ1 from the real axis. This rotational transformation is performed to shift the starting node from the real axis to the fourth quadrant for proper scoring. The angle θ1 is set to a small angle in which all the components of the eigenvector V1 are still located in the fourth quadrant even by the rotation transformation. The angle θ1 may be zero as long as the scoring is not adversely affected.
 回転変換後の固有ベクトルV1のことを、以下では、固有ベクトルVc={Vc[1],Vc[2],…,Vc[s],…,Vc[N]}と表現する。固有ベクトルVcの各成分Vc[m](1≦m≦N)は、複素数である。 The eigenvector V1 after the rotation conversion is expressed below as the eigenvector Vc = {Vc [1], Vc [2], ..., Vc [s], ..., Vc [N]}. Each component Vc [m] (1 ≦ m ≦ N) of the eigenvector Vc is a complex number.
 S1050では、各成分Vc[m](1≦m≦N)の複素平面上の位置に基づいて、各ノードのスコアを算出する。以下では、回転変換後の固有ベクトルVcの各成分Vc[m](1≦m≦N)のことを、各ノードのスコア基準値Vc[m](1≦m≦N)とも表現する。スコア基準値Vc[m]は、第mノードのスコア基準値であり、第mノードのスコアリングに用いられる。角度θ1がゼロであるとき、各ノードのスコア基準値Vc[m](1≦m≦N)は、V1[m](1≦m≦N)に一致する。 In S1050, the score of each node is calculated based on the position of each component Vc [m] (1 ≦ m ≦ N) on the complex plane. In the following, each component Vc [m] (1 ≦ m ≦ N) of the eigenvector Vc after rotation conversion is also expressed as a score reference value Vc [m] (1 ≦ m ≦ N) of each node. The score reference value Vc [m] is the score reference value of the m-th node, and is used for scoring the m-node. When the angle θ1 is zero, the score reference value Vc [m] (1 ≦ m ≦ N) of each node corresponds to V1 [m] (1 ≦ m ≦ N).
 本明細書において以下に記載される関数arg(x)は、複素数xの複素平面上の偏角であると理解されてよい。xは、例えば、Vc[m]である。図11Bに示される成分Vc[m]の複素平面上の実軸から第4象限への角度θ[m]は、{2π-arg(Vc[m])}に等しい。以下で表現する|x|は、複素数xの絶対値を意味する。x=Vc[m]である場合、|x|は、図11Bに示すVc[m]の複素平面上の長さL[m]に対応する。 The function arg (x) described below in the present specification may be understood as an argument on the complex plane of the complex number x. x is, for example, Vc [m]. The angle θ [m] of the component Vc [m] shown in FIG. 11B from the real axis on the complex plane to the fourth quadrant is equal to {2π-arg (Vc [m])}. | X | expressed below means the absolute value of the complex number x. When x = Vc [m], | x | corresponds to the length L [m] of Vc [m] shown in FIG. 11B on the complex plane.
 S1050において、第2スコアリング部143は、各ノードのスコア相当値Z[m](1≦m≦N)として、各ノードのスコア基準値Vc[m](1≦m≦N)の複素平面における実軸からの距離に対応する値Z[m]=L[m]・θ[m](1≦m≦N)=|Vc[m]|・{2π-arg(Vc[m])}を算出する。 In S1050, the second scoring unit 143 sets the score equivalent value Z [m] (1 ≦ m ≦ N) of each node to the complex plane of the score reference value Vc [m] (1 ≦ m ≦ N) of each node. Value Z [m] = L [m] · θ [m] (1 ≦ m ≦ N) = | Vc [m] | · {2π-arg (Vc [m])} Is calculated.
 別例としてスコア相当値Z[m]は、式Z[m]=|Vc[m]|d1・{2π-arg(Vc[m])}d2に従って算出されてもよい。値d1,d2は、ゼロより大きい任意の実数である。d1が大きいほど、始点ノードからの各点のアウトリンク数の少なさに応じて、Z[m]の値は大きくなる。d2が大きいほど始点ノードからの距離に応じて、Z[m]の値は大きくなる。 As another example, the score equivalent value Z [m] may be calculated according to the formula Z [m] = | Vc [m] | d1 · {2π-arg (Vc [m])} d2. The values d1 and d2 are any real numbers greater than zero. As d1 is larger, the value of Z [m] becomes larger according to the smaller number of outlinks at each point from the start point node. The larger d2 is, the larger the value of Z [m] is according to the distance from the starting node.
 その後、第2スコアリング部143は、文書ネットワーク内の各ノードD[m](1≦m≦N)のスコアを、スコア相当値Z[m]に基づいて算出する。S1050において、ノードD[m]に対応するスコアXは、X=Z[m]-Z0に従って算出される。Z0は、例えば、文書ネットワーク全体におけるZ[m]の最小値である。この場合、最も小さいZ[m]を示すノードD[m]のスコアは、値ゼロである。Z0は、値ゼロであってもよい。すなわち、Z0の項はなくてもよい。 After that, the second scoring unit 143 calculates the score of each node D [m] (1 ≦ m ≦ N) in the document network based on the score equivalent value Z [m]. In S1050, the score X corresponding to the node D [m] is calculated according to X = Z [m] −Z0. Z0 is, for example, the minimum value of Z [m] in the entire document network. In this case, the score of the node D [m] indicating the smallest Z [m] is zero. Z0 may have a value of zero. That is, the Z0 term may be omitted.
 S245では、このようにして算出された各ノードのスコアXが、各ウェブページの第2スコアとしてランク付け部145に出力される。 In S245, the score X of each node calculated in this way is output to the ranking unit 145 as the second score of each web page.
 別例として、第2スコアリング部143は、S1020で上述の特殊エルミート隣接行列H1に代えて、図12に示す特殊エルミート隣接行列H2を生成してもよい。図12に示される特殊エルミート隣接行列H2は、図10上段に示すエルミート隣接行列Hに対応する特殊エルミート隣接行列H2の例である。 As another example, the second scoring unit 143 may generate the special Hermitian adjacency matrix H2 shown in FIG. 12 in place of the above-mentioned special Hermitian adjacency matrix H1 in S1020. The special Hermitian adjacency matrix H2 shown in FIG. 12 is an example of the special Hermitian adjacency matrix H2 corresponding to the Hermitian adjacency matrix H shown in the upper part of FIG.
 第2スコアリング部143は、特殊エルミート隣接行列H2の生成に際して、エルミート隣接行列Hにおける値+iの成分を、値C1(C2+i)に置換し、値-iを示す成分を値C1(C2-i)に置換することができる。第2スコアリング部143は更に、次の処理A及び処理Bを行うことができる。 When the special Hermitian adjacency matrix H2 is generated, the second scoring unit 143 replaces the component of the value + i in the Hermitian adjacency matrix H with the value C1 (C2 + i), and replaces the component indicating the value −i with the value C1 (C2-i). ) Can be replaced. The second scoring unit 143 can further perform the following processes A and B.
 (処理A)
 第2スコアリング部143は、置換後のエルミート隣接行列Hにおける各行の成分内の値C1(C2+i)を、同じ行において値C1(C2+i)及び値1を示す成分の数Wで除算した値{C1(C2+i)/W}に変更し、更に、値{C1(C2+i)/W}に変更された成分と対角成分を挟んで対称的な位置にある成分内の値C1(C2-i)を、値{C1(C2+i)/W}の複素共役{C1(C2-i)/W}に変更することができる。
(Process A)
The second scoring unit 143 divides the value C1 (C2 + i) in the components of each row in the Hermitian adjacency matrix H after substitution by the value C1 (C2 + i) and the number W of the components indicating the value 1 in the same row { The value C1 (C2-i) in the component at a symmetrical position with the component changed to C1 (C2 + i) / W} and further changed to the value {C1 (C2 + i) / W} with the diagonal component in between. Can be changed to the complex conjugate {C1 (C2-i) / W} of the value {C1 (C2 + i) / W}.
 (処理B)
 第2スコアリング部143は、置換後のエルミート隣接行列Hにおける各行の成分内の値C1(C2-i)を、同じ行においてC1(C2-i)及び値1を示す成分の数Zで乗算した値{C1(C2-i)Z}に変更し、更に、値{C1(C2-i)Z}に変更された成分と対角成分を挟んで対称的な位置にある成分内の値C1(C2+i)を、値{C1(C2-i)Z}の複素共役{C1(C2+i)Z}に変更することができる。
(Process B)
The second scoring unit 143 multiplies the value C1 (C2-i) in the components of each row in the replaced Hermitian adjacency matrix H by the number Z of the components showing C1 (C2-i) and the value 1 in the same row. Change to the value {C1 (C2-i) Z}, and further change to the value {C1 (C2-i) Z}. (C2 + i) can be changed to the complex conjugate {C1 (C2 + i) Z} of the value {C1 (C2-i) Z}.
 このような置換及び変更によって、特殊エルミート隣接行列H2は生成される。第2スコアリング部143は、処理Aの実行後、処理Bを実行してもよいし、処理Bの実行後、処理Aを実行してもよいし、処理A及び処理Bを同時並行的に実行してもよい。いずれの態様で処理A及び処理Bを実行しても、同じ特殊エルミート隣接行列H2が生成される。 By such substitution and modification, the special Hermitian adjacency matrix H2 is generated. The second scoring unit 143 may execute the process B after the execution of the process A, may execute the process A after the execution of the process B, or may execute the process A and the process B in parallel. You may do it. Regardless of which mode the process A and the process B are executed, the same special Hermitian adjacency matrix H2 is generated.
 図12に示される特殊エルミート隣接行列H2における値Z1は、第p3行における成分h(p3,1),h(p3,2),…,h(p3,N)の内、値-iを採る成分及び値1を採る成分の数に対応する。値Z2は、第p4行における合計N個の成分h(p4,1),h(p4,2),…,h(p4,N)の内、値-iを採る成分及び値1を採る成分の数に対応する。第p3行は、図12において値{C1(C2-i)Z1/W1}が示される行と理解されてよい。第p4行は、図12において値{C1(C2-i)Z2/W2}が示される行と理解されてよい。 The value Z1 in the special Hermitian adjacency matrix H2 shown in FIG. 12 takes the value −i from the components h (p3, 1), h (p3, 2), ..., H (p3, N) in the third row. Corresponds to the number of components and components that take a value of 1. The value Z2 is a component that takes the value -i and a component that takes the value 1 among the total N components h (p4, 1), h (p4, 2), ..., H (p4, N) in the p4th row. Corresponds to the number of. The third line may be understood as the line in which the value {C1 (C2-i) Z1 / W1} is shown in FIG. The fourth line may be understood as the line in which the value {C1 (C2-i) Z2 / W2} is shown in FIG.
 第2スコアリング部143は、このように算出した特殊エルミート隣接行列H2を、特殊エルミート隣接行列H1に代えて用いて、S1030-S1050の処理を実行することができる。 The second scoring unit 143 can execute the processing of S1030-S1050 by using the special Hermitian adjacency matrix H2 calculated in this way in place of the special Hermitian adjacency matrix H1.
 S250(図7参照)において、第2スコアリング部143は、処理対象の文書ネットワークに含まれる結合ノードの層数Jを判別する。本実施形態では、先端ノードからリンクの向きに従ってノード間を移動したときに、最初に現れる結合ノードが第0層結合ノードと定義される。 In S250 (see FIG. 7), the second scoring unit 143 determines the number of layers J of the join nodes included in the document network to be processed. In the present embodiment, the first joining node that appears when moving between the nodes according to the direction of the link from the tip node is defined as the 0th layer joining node.
 第0層結合ノードの次に現れる結合ノードが第1層結合ノードと定義され、第j層結合ノードの次に現れる結合ノードが第(j+1)結合ノードと定義される(jは0以上の整数である)。この定義に従えば、文書ネットワーク内に、第(J-1)層の結合ノードまでが存在するとき、文書ネットワーク内における結合ノードの層数はJである。 The join node that appears next to the 0th layer join node is defined as the 1st layer join node, and the join node that appears next to the jth layer join node is defined as the (j + 1) join node (j is an integer greater than or equal to 0). Is). According to this definition, when there are up to the join nodes of the (J-1) layer in the document network, the number of join nodes in the document network is J.
 図6に示す文書ネットワークによれば、第0層結合ノードは、第3ノード及び第12ノードであり、第1層結合ノードは、第6ノードであり、第2層結合ノードは、第7ノードであり、第3層結合ノードは、第15ノードである。図6に示す文書ネットワークにおける結合ノードの層数Jは、4である。 According to the document network shown in FIG. 6, the 0th layer connection node is the 3rd node and the 12th node, the 1st layer connection node is the 6th node, and the 2nd layer connection node is the 7th node. The third layer join node is the fifteenth node. The number of layers J of the join nodes in the document network shown in FIG. 6 is 4.
 この説明から理解できるように、先端ノードに依存して複数の層番号を採り得る結合ノードに関しては、採り得る層番号のうちの最大の層番号が、対応する結合ノードに割り当てられる。第7ノードは、第1層結合ノードではなく、第2層結合ノードである。 As can be understood from this explanation, for a join node that can take a plurality of layer numbers depending on the tip node, the highest layer number among the available layer numbers is assigned to the corresponding join node. The seventh node is not a first layer join node but a second layer join node.
 S250の処理後、第2スコアリング部143は、j=0に設定し(S260)、先端ノードから第j層(すなわち第0層)結合ノードに導かれるサブグラフを判別する(S270)。 After the processing of S250, the second scoring unit 143 sets j = 0 (S260) and determines the subgraph guided from the tip node to the jth layer (that is, the 0th layer) connecting node (S270).
 図6に示す文書ネットワークの例によれば、S270で判別されるサブグラフは、図13に示すように、第1ノードと第3ノードとからなるサブグラフSG1と、第2ノードと第3ノードとからなるサブグラフSG2と、第10ノード、第11ノード、第12ノード、及び第20ノードからなるサブグラフSG3と、第13ノード及び第12ノードからなるサブグラフSG4である。S270において、サブグラフは、先端ノードと第0層結合ノードとの組み合わせ毎に判別される。 According to the example of the document network shown in FIG. 6, the subgraph determined in S270 is composed of the subgraph SG1 consisting of the first node and the third node, and the second node and the third node, as shown in FIG. Subgraph SG2, subgraph SG3 consisting of tenth node, eleventh node, twelfth node, and twentieth node, and subgraph SG4 consisting of thirteenth node and twelfth node. In S270, the subgraph is discriminated for each combination of the tip node and the 0th layer join node.
 第2スコアリング部143は、S270で判別したサブグラフのそれぞれに関して、図8に示す処理と同様の処理を実行する(S280)。これにより、サブグラフ毎に、サブグラフ内の各ノードのスコア基準値及びスコアを算出する。 The second scoring unit 143 executes the same processing as that shown in FIG. 8 for each of the subgraphs determined in S270 (S280). As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph.
 S280において、第2スコアリング部143は、判別されたサブグラフを順に処理対象に選択して、図8に示す処理を実行することができる。ここでは、処理対象のサブグラフが、図8の説明における「処理対象の文書ネットワーク」と同様に扱われて、サブグラフ内の各ノードのスコア基準値及びスコアが算出される。 In S280, the second scoring unit 143 can sequentially select the determined subgraphs as processing targets and execute the processing shown in FIG. Here, the subgraph to be processed is treated in the same manner as the “document network to be processed” in the description of FIG. 8, and the score reference value and the score of each node in the subgraph are calculated.
 例えば、処理対象のサブグラフが、第10ノード、第11ノード、第12ノード、及び第20ノードからなるサブグラフSG3である場合には、このサブグラフにおいて、アウトリンクを有さない第12ノード及び第20ノードに対しダミーノードDPが付加される(図14参照)。付加対象のノードには、サブグラフ化前において付加対象のノードが有するアウトリンクの数と同数、ダミーノードDPが付加され得る。 For example, when the subgraph to be processed is the subgraph SG3 consisting of the 10th node, the 11th node, the 12th node, and the 20th node, in this subgraph, the 12th node and the 20th node having no outlink. A dummy node DP is added to the node (see FIG. 14). Dummy node DP may be added to the addition target node in the same number as the number of outlinks that the addition target node has before subgraphing.
 S280では、このようにダミーノードDPが付加されたサブグラフに対応するエルミート隣接行列Hが生成される。このエルミート隣接行列Hに対応する特殊エルミート隣接行列H1又は特殊エルミート隣接行列H2に基づいて、第10ノード、第11ノード、第12ノード、及び第20ノードのスコア基準値及びスコアが算出される。上述の値Z0は、例えば先端ノードである第10ノードのスコアがゼロとなるように設定され得る。 In S280, the Hermitian adjacency matrix H corresponding to the subgraph to which the dummy node DP is added is generated. Based on the special Hermitian adjacency matrix H1 or the special Hermitian adjacency matrix H2 corresponding to the Hermitian adjacency matrix H, the score reference values and scores of the tenth node, the eleventh node, the twelfth node, and the twentieth node are calculated. The above-mentioned value Z0 can be set so that the score of the tenth node, which is the tip node, becomes zero, for example.
 S280において、サブグラフ毎のスコア基準値及びスコアを算出すると、第2スコアリング部143は、サブグラフ間で重複する第j層結合ノードのスコア基準値及びスコアを統合する(S285)。 When the score reference value and the score for each subgraph are calculated in S280, the second scoring unit 143 integrates the score reference value and the score of the j-layer connecting node overlapping between the subgraphs (S285).
 S285において、第2スコアリング部143は、サブグラフ間で重複する第j層結合ノードのスコア基準値を、次のように合成して、処理対象の文書ネットワークにおける第j層結合ノードのそれぞれに対し唯一のスコア基準値及びスコアを算出する。 In S285, the second scoring unit 143 synthesizes the score reference values of the j-layer connected nodes overlapping between the subgraphs as follows, and for each of the j-layer connected nodes in the document network to be processed. Calculate the only score reference value and score.
 具体的には、第2スコアリング部143は、一つの第j層結合ノードに関して、当該第j層結合ノードのサブグラフ毎のスコア基準値のうち、複素平面上において実軸からの角度θが最も大きいスコア基準値を判別する。その角度θが最大のスコア基準値と複素平面上で重なるように、第j層結合ノードの各サブグラフにおけるスコア基準値を複素平面上で回転させる。 Specifically, the second scoring unit 143 has the highest angle θ from the real axis on the complex plane among the score reference values for each subgraph of the j-layer connection node with respect to one j-layer connection node. Determine a large score reference value. The score reference value in each subgraph of the j-layer join node is rotated on the complex plane so that the angle θ overlaps with the maximum score reference value on the complex plane.
 第2スコアリング部143は、複素平面上で重なった各スコア基準値をベクトル合成し、一つの第j層結合ノードに対して唯一のスコア基準値を、その合成ベクトルに決定する。この唯一のスコア基準値Vxに基づいて、一つの第j層結合ノードに対応するスコア相当値Zx=|Vx|d1・{2π-arg(Vx)}d2を算出する。第j層結合ノードのスコアXは、X=Zx-Z0に従って算出され得る。 The second scoring unit 143 vector-synthesizes each of the overlapping score reference values on the complex plane, and determines the only score reference value for one j-layer connecting node as the composite vector. Based on this unique score reference value Vx, the score equivalent value Zx = | Vx | d1 · {2π-arg (Vx)} d2 corresponding to one j-layer connection node is calculated. The score X of the j-layer join node can be calculated according to X = Zx−Z0.
 サブグラフ間で重複する第j層結合ノードに対して共通する一つのスコアXを与えるために、値Z0は、第j層結合ノードについて上記角度θが最大のスコア基準値を有するサブグラフにおけるZ[m]の最小値に設定され得る。あるいは、値Z0は、上述したように、値ゼロであってもよい。 In order to give a common score X to the j-layer connecting node overlapping between the subgraphs, the value Z0 is Z [m in the subgraph having the maximum score reference value at the angle θ for the j-layer connecting node. ] Can be set to the minimum value. Alternatively, the value Z0 may be zero, as described above.
 別例として、第2スコアリング部143は、一つの第j層結合ノードに関して、第j層結合ノードのサブグラフ毎のスコア基準値を重ねないまま複素平面上においてベクトル合成することで、一つの第j層結合ノードに対して唯一のスコア基準値を決定してもよい。 As another example, the second scoring unit 143 performs vector synthesis on the complex plane with respect to one j-layer connecting node without overlapping the score reference values for each subgraph of the j-layer connecting node. The only score reference value may be determined for the j-layer join node.
 第2スコアリング部143は、文書ネットワークに複数の第j層結合ノードが存在する場合、S285において、第j層結合ノードのそれぞれに対して上述の処理を実行し、各第j層結合ノードのスコア基準値Vx及びスコアXを算出する。これにより第2スコアリング部143は、第j層結合ノード毎に、第j層結合ノードに関するサブグラフ間のスコア基準値Vcを統合したスコア基準値Vx及び対応するスコアXを算出する。 When a plurality of j-layer connected nodes exist in the document network, the second scoring unit 143 executes the above processing for each of the j-layer connected nodes in S285, and the second scoring unit 143 executes the above processing for each of the j-layer connected nodes. The score reference value Vx and the score X are calculated. As a result, the second scoring unit 143 calculates the score reference value Vx and the corresponding score X by integrating the score reference values Vc between the subgraphs regarding the j-layer join node for each j-layer join node.
 S285での処理を終えると、第2スコアリング部143は、変数jの値を1インクリメントする(S290)、続くS300において、第2スコアリング部143は、変数jの値が、層数J未満であるか否かを判断する。 When the processing in S285 is completed, the second scoring unit 143 increments the value of the variable j by 1 (S290), and in the subsequent S300, the second scoring unit 143 has the value of the variable j less than the number of layers J. Judge whether or not.
 変数jの値が層数J以上であると判断すると(S300でNo)、第2スコアリング部143は、S410(図19参照)の処理を実行する。一方、変数jの値が層数J未満であると判断すると(S300でYes)、第2スコアリング部143は、S310(図15参照)の処理を実行する。 When it is determined that the value of the variable j is equal to or greater than the number of layers J (No in S300), the second scoring unit 143 executes the process of S410 (see FIG. 19). On the other hand, if it is determined that the value of the variable j is less than the number of layers J (Yes in S300), the second scoring unit 143 executes the process of S310 (see FIG. 15).
 S310において、第2スコアリング部143は、先端ノードから第j層結合ノードまでのサブグラフを判別する。ここで判別されるサブグラフは、先端ノードから第j層結合ノードまでの間に、他の結合ノードが含まれないサブグラフである。S310では、先端ノードと第j層結合ノードとの組み合わせ毎に、組み合わせに対応する一つの先端ノードと一つの結合ノードとを含むサブグラフが判別される。 In S310, the second scoring unit 143 determines the subgraph from the tip node to the j-layer join node. The subgraph determined here is a subgraph in which no other join node is included between the tip node and the j-th layer join node. In S310, for each combination of the tip node and the j-th layer join node, a subgraph including one tip node and one join node corresponding to the combination is determined.
 図6に示す文書ネットワークの例によれば、S310で判別されるサブグラフは、図16に示すように、第4ノードと、第5ノードと、第6ノードとからなるサブグラフSG5である。 According to the example of the document network shown in FIG. 6, the subgraph determined in S310 is a subgraph SG5 including a fourth node, a fifth node, and a sixth node, as shown in FIG.
 S310での処理によって、該当するサブグラフが存在することが判明した場合(S320でYes)、第2スコアリング部143は、S310で判別されたサブグラフのそれぞれに関して、S280と同様の処理を実行する(S330)。これにより、サブグラフ毎に、サブグラフ内の各ノードのスコア基準値及びスコアを算出する(S330)。その後、第2スコアリング部143は、S340の処理を実行する。 When the processing in S310 reveals that the corresponding subgraph exists (Yes in S320), the second scoring unit 143 executes the same processing as in S280 for each of the subgraphs determined in S310 (Yes in S320). S330). As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph (S330). After that, the second scoring unit 143 executes the process of S340.
 S310の処理によって、該当するサブグラフが存在しないことが判明した場合(S320でNo)、第2スコアリング部143は、S330の処理を実行せず、S340の処理を実行する。 If it is found by the processing of S310 that the corresponding subgraph does not exist (No in S320), the second scoring unit 143 does not execute the processing of S330, but executes the processing of S340.
 S340において、第2スコアリング部143は、変数fを値ゼロに設定する。続くS350において、第2スコアリング部143は、第f層結合ノードから第j層結合ノードへのサブグラフを判別する。ここで判別されるサブグラフは、第f層結合ノードから第j層結合ノードまでの間に、他の結合ノードが含まれないサブグラフである。 In S340, the second scoring unit 143 sets the variable f to a value of zero. In the following S350, the second scoring unit 143 determines the subgraph from the f-layer connection node to the j-layer connection node. The subgraph determined here is a subgraph in which no other join node is included between the f-layer join node and the j-layer join node.
 S350では、第f層結合ノードと第j層結合ノードとの組み合わせ毎に、組み合わせに対応する一つの第f層結合ノードと一つの第j層結合ノードとを含むサブグラフが判別される。サブグラフにおいて第f層結合ノードは、先端ノードに対応し、第j層結合ノードは、後端ノードに対応する。 In S350, for each combination of the f-layer connection node and the j-layer connection node, a subgraph including one f-layer connection node and one j-layer connection node corresponding to the combination is determined. In the subgraph, the f-layer join node corresponds to the front end node, and the j-layer join node corresponds to the rear end node.
 図6に示す文書ネットワークの例によれば、f=0及びg=1であるとき、S350で判別されるサブグラフは、図16に示す第3ノードと第6ノードとからなるサブグラフSG6である。 According to the example of the document network shown in FIG. 6, when f = 0 and g = 1, the subgraph determined in S350 is the subgraph SG6 including the third node and the sixth node shown in FIG.
 S350での処理によって、該当するサブグラフが存在しないことが判明した場合(S360でNo)、第2スコアリング部143は、S370の処理を実行することなく、S380の処理を実行する。 When it is found that the corresponding subgraph does not exist by the processing in S350 (No in S360), the second scoring unit 143 executes the processing of S380 without executing the processing of S370.
 一方、該当するサブグラフが存在することが判明した場合(S360でYes)、第2スコアリング部143は、S350で判別されたサブグラフのそれぞれに対しS280と同様の処理を実行する。それにより、サブグラフ毎に、サブグラフ内の各ノードのスコア基準値及びスコアを算出する(S370)。 On the other hand, when it is found that the corresponding subgraph exists (Yes in S360), the second scoring unit 143 executes the same processing as in S280 for each of the subgraphs determined in S350. As a result, the score reference value and the score of each node in the subgraph are calculated for each subgraph (S370).
 S370において、第2スコアリング部143は更に、算出したサブグラフ内の各ノードのスコア基準値及びスコアを、既に計算されている第f層結合ノードのスコア基準値及びスコアに応じて修正する。 In S370, the second scoring unit 143 further corrects the score reference value and score of each node in the calculated subgraph according to the already calculated score reference value and score of the f-layer connection node.
 サブグラフ内の第f層結合ノードのスコア基準値及びスコアは、S370の処理前に計算されている。例えば、f=0であるときの第0層結合ノードのスコア基準値及びスコアは、S285で計算される。S370において、第2スコアリング部143は、既にスコア基準値及びスコアが計算された第f層結合ノードのスコア基準値及びスコアを基準に、サブグラフ内の残りのノードのスコア基準値及びスコアを修正する。 The score reference value and score of the f-layer connection node in the subgraph are calculated before the processing of S370. For example, the score reference value and the score of the 0th layer connection node when f = 0 are calculated in S285. In S370, the second scoring unit 143 modifies the score reference values and scores of the remaining nodes in the subgraph based on the score reference values and scores of the f-layer connected node for which the score reference values and scores have already been calculated. To do.
 S370の第1例によれば、第2スコアリング部143は、S370の処理前に算出されている第f層結合ノードのスコア基準値がVaである場合、サブグラフ内の各ノードのスコア基準値を次のように修正する。 According to the first example of S370, when the score reference value of the f-layer join node calculated before the processing of S370 is Va, the second scoring unit 143 has the score reference value of each node in the subgraph. Is modified as follows.
 すなわち、第2スコアリング部143は、S370で算出した修正前の各ノードのスコア基準値を、第f層結合ノードのスコア基準値が上記Vaと一致するように、複素平面上で回転させる。このようにして回転させたときの各ノードのスコア基準値を、修正後のスコア基準値として決定する。 That is, the second scoring unit 143 rotates the score reference value of each node before modification calculated in S370 on the complex plane so that the score reference value of the f-layer connection node matches the above Va. The score reference value of each node when rotated in this way is determined as the corrected score reference value.
 第2スコアリング部143は、修正後の各ノードのスコア基準値Vcに基づいて、サブグラフ内の各ノードの修正後のスコアXを算出することができる。スコアXは、第f層結合ノードのスコアXが修正前のスコアと同じになるように算出され得る。 The second scoring unit 143 can calculate the modified score X of each node in the subgraph based on the score reference value Vc of each node after modification. The score X can be calculated so that the score X of the f-layer join node is the same as the score before modification.
 S370の第2例によれば、第2スコアリング部143は、S370の処理前に算出されている第f層結合ノードのスコアがXaである場合、サブグラフ内の各ノードのスコアを次のように修正する。 According to the second example of S370, when the score of the f-layer join node calculated before the processing of S370 is Xa, the second scoring unit 143 sets the score of each node in the subgraph as follows. Modify to.
 すなわち、第2スコアリング部143は、S370で算出した修正前の第f層結合ノードのスコアと上記Xaとの差分だけ、S370で算出した修正前の各ノードのスコアを加算する。これにより、第2スコアリング部143は、S370で算出した修正前の第f層結合ノードのスコアが上記Xaと一致するように、サブグラフ内の各ノードのスコアを修正する。 That is, the second scoring unit 143 adds the score of each node before modification calculated in S370 by the difference between the score of the f-layer connection node before modification calculated in S370 and the above Xa. As a result, the second scoring unit 143 corrects the score of each node in the subgraph so that the score of the f-layer connecting node before the correction calculated in S370 matches the above Xa.
 S370での処理を終えると、第2スコアリング部143は、変数fの値を1インクリメントする(S380)。その後、第2スコアリング部143は、変数fの値が、変数jの値未満であるか否かを判断する(S390)。ここで肯定判断すると(S390でYes)、第2スコアリング部143は、S350の処理を実行する。肯定判断すると(S390でNo)、第2スコアリング部143は、S400の処理を実行する。 When the processing in S370 is completed, the second scoring unit 143 increments the value of the variable f by 1 (S380). After that, the second scoring unit 143 determines whether or not the value of the variable f is less than the value of the variable j (S390). If an affirmative decision is made here (Yes in S390), the second scoring unit 143 executes the process of S350. If affirmative determination is made (No in S390), the second scoring unit 143 executes the process of S400.
 S400において、第2スコアリング部143は、S310,S350で判別されたサブグラフ間で重複する第j層結合ノードのそれぞれに関して、S330,S370の処理で算出された、対応する第j層結合ノードのサブグラフ毎のスコア基準値及びスコアを、S285の処理と同様に統合する。 In S400, the second scoring unit 143 of the corresponding j-layer join node calculated by the processing of S330 and S370 for each of the j-layer join nodes overlapping between the subgraphs determined in S310 and S350. The score reference value and the score for each subgraph are integrated in the same manner as in the process of S285.
 すなわち、第2スコアリング部143は、第j層結合ノードのそれぞれに関し、対応する第j層結合ノードのサブグラフ毎のスコア基準値Vcを統合したスコア基準値Vxを算出し、スコア基準値Vxに対応するスコア相当値Zxに基づくスコアX=Zx-Z0を算出する。 That is, the second scoring unit 143 calculates a score reference value Vx that integrates the score reference value Vc for each subgraph of the corresponding j-layer join node for each of the j-layer join nodes, and uses the score reference value Vx as the score reference value Vx. The score X = Zx-Z0 based on the corresponding score equivalent value Zx is calculated.
 その後、第2スコアリング部143は、変数jの値を1インクリメントして(S290)、S300~S400の処理を実行する。第2スコアリング部143は、変数jの値をインクリメントしながら、S300~S400を繰返し実行することにより、第(J-1)層結合ノードまでの各ノードのスコア基準値及びスコアを算出する。 After that, the second scoring unit 143 increments the value of the variable j by 1 (S290) and executes the processes of S300 to S400. The second scoring unit 143 calculates the score reference value and the score of each node up to the (J-1) layer connection node by repeatedly executing S300 to S400 while incrementing the value of the variable j.
 第(J-1)層結合ノードまでの各ノードのスコア基準値及びスコアを算出し終えると、第2スコアリング部143は、S300において否定判断して、S410の処理を実行する(図19参照)。 After calculating the score reference value and the score of each node up to the (J-1) layer connection node, the second scoring unit 143 makes a negative judgment in S300 and executes the process of S410 (see FIG. 19). ).
 S410より前の処理の流れを、図6に示す文書ネットワークの例に基づいて具体的に説明する。第2スコアリング部143は、g=0であるとき、サブグラフSG1(図13参照)に関する処理、及びサブグラフSG2に関する処理の実行により、第1、第2、及び第3ノードのスコア基準値及びスコアを算出する。 The flow of processing before S410 will be specifically described based on the example of the document network shown in FIG. When g = 0, the second scoring unit 143 performs the processing related to the subgraph SG1 (see FIG. 13) and the processing related to the subgraph SG2 to perform the score reference values and scores of the first, second, and third nodes. Is calculated.
 第2スコアリング部143は更に、サブグラフSG3に関する処理、及びサブグラフSG4に関する処理の実行により、第10、第11、第12、第13、及び第20ノードのスコア基準値及びスコアを算出する。 The second scoring unit 143 further calculates the score reference values and scores of the tenth, eleventh, twelfth, thirteenth, and twentieth nodes by executing the processing related to the subgraph SG3 and the processing related to the subgraph SG4.
 その後、g=1のプロセスにおいて、第2スコアリング部143は、サブグラフSG5(図16参照)に関する処理を実行し、更には、サブグラフSG6に関する処理を実行し、第4、第5、及び第6ノードのスコア基準値及びスコアを算出する。 After that, in the process of g = 1, the second scoring unit 143 executes the process related to the subgraph SG5 (see FIG. 16), further executes the process related to the subgraph SG6, and executes the processes related to the subgraph SG6, and the fourth, fifth, and sixth. Calculate the node score reference value and score.
 g=2のプロセスにおいて、第2スコアリング部143は、サブグラフSG7(図17参照)に関する処理を実行し、更には、サブグラフSG8に関する処理を実行し、第7ノードのスコア基準値及びスコアを算出する。 In the process of g = 2, the second scoring unit 143 executes the process related to the subgraph SG7 (see FIG. 17), further executes the process related to the subgraph SG8, and calculates the score reference value and the score of the seventh node. To do.
 g=3のプロセスにおいて、第2スコアリング部143は、サブグラフSG9(図18参照)に関する処理を実行し、更には、サブグラフSG10に関する処理を実行し、第8、第14、及び第15ノードのスコア基準値及びスコアを算出する。 In the process of g = 3, the second scoring unit 143 executes the processing related to the subgraph SG9 (see FIG. 18), and further executes the processing related to the subgraph SG10, and the 8th, 14th, and 15th nodes. Score The reference value and score are calculated.
 S410(図19参照)において、第2スコアリング部143は、結合ノードから始まり、非結合ノードの後端ノードで終わる非循環型のサブグラフを判別する。図6に示す文書ネットワークの例によれば、S410で判別されるサブグラフは、図20に示す、第15、第16、第17、第18、及び第19ノードからなるサブグラフSG11である。 In S410 (see FIG. 19), the second scoring unit 143 discriminates a non-circular subgraph starting from the joined node and ending at the trailing end node of the unjoined node. According to the example of the document network shown in FIG. 6, the subgraph identified in S410 is the subgraph SG11 composed of the 15th, 16th, 17th, 18th, and 19th nodes shown in FIG.
 S410での処理によって、該当するサブグラフが存在しないことが判明した場合(S420でNo)、第2スコアリング部143は、S430の処理を実行せずに、S440の処理を実行する。一方、該当するサブグラフが存在することが判明した場合(S420でYes)、第2スコアリング部143は、S410で判別されたサブグラフ毎に、サブグラフ内の各ノードのスコア基準値及びスコアを算出する(S430)。 When it is found that the corresponding subgraph does not exist by the processing in S410 (No in S420), the second scoring unit 143 executes the processing of S440 without executing the processing of S430. On the other hand, when it is found that the corresponding subgraph exists (Yes in S420), the second scoring unit 143 calculates the score reference value and the score of each node in the subgraph for each subgraph determined in S410. (S430).
 S430において、第2スコアリング部143は、S370の処理と同様、サブグラフ内の各ノードのスコア基準値及びスコアを、既に計算されている結合ノードのスコア及びスコア基準値に基づいて修正する。このようにして、サブグラフ内の各ノードのスコア及びスコア基準値を決定する。 In S430, the second scoring unit 143 modifies the score reference value and the score of each node in the subgraph based on the already calculated score and the score reference value of the combined node, as in the process of S370. In this way, the score and score reference value of each node in the subgraph are determined.
 続くS440において、第2スコアリング部143は、循環系のサブグラフを判別する。図6に示す文書ネットワークの例によれば、S440で判別されるサブグラフは、図21に示す、第6、第7、第8、及び第9ノードからなるサブグラフSG12である。 In the following S440, the second scoring unit 143 determines the subgraph of the circulatory system. According to the example of the document network shown in FIG. 6, the subgraph identified in S440 is the subgraph SG12 composed of the sixth, seventh, eighth, and ninth nodes shown in FIG.
 循環系のサブグラフが存在しない場合(S450でNo)、第2スコアリング部143は、S460-S490の処理を実行せずに、S500の処理を実行する。一方、循環系のサブグラフが存在する場合(S450でYes)、第2スコアリング部143は、S460-S490において、S440で判別されたサブグラフ毎に、サブグラフ内の各ノードのスコア基準値及びスコアを算出する。第2スコアリング部143は、S440で判別されたすべてのサブグラフに関してS470-S480の処理を実行すると(S490でYes)、S500の処理を実行する。 When the circulatory system subgraph does not exist (No in S450), the second scoring unit 143 executes the process of S500 without executing the process of S460-S490. On the other hand, when there is a circulatory system subgraph (Yes in S450), the second scoring unit 143 sets the score reference value and score of each node in the subgraph for each subgraph determined in S440 in S460-S490. calculate. When the second scoring unit 143 executes the processing of S470-S480 for all the subgraphs determined in S440 (Yes in S490), the second scoring unit 143 executes the processing of S500.
 S460において、第2スコアリング部143は、S440で判別されたサブグラフの一つを選択する。S470において、第2スコアリング部143は、選択したサブグラフ内において、既にスコアが算出されているノードのスコア群に基づき、サブグラフ内の各ノードに対する共通の加算スコアを決定する。 In S460, the second scoring unit 143 selects one of the subgraphs determined in S440. In S470, the second scoring unit 143 determines a common additional score for each node in the subgraph based on the score group of the nodes for which the score has already been calculated in the selected subgraph.
 S470の第1例によれば、第2スコアリング部143は、上記サブグラフ内のスコア群の最大値を、加算スコアに決定する。S470の第2例によれば、第2スコアリング部143は、上記スコア群の平均値を、加算スコアに決定する。 According to the first example of S470, the second scoring unit 143 determines the maximum value of the score group in the above subgraph as the added score. According to the second example of S470, the second scoring unit 143 determines the average value of the score group as the added score.
 S480において、第2スコアリング部143は、決定した加算スコアを、選択したサブグラフ内の各ノードのスコアに加算して、各ノードのスコアを修正する。加算前にスコアが算出されていないノードに対しては、スコアがゼロであるとみなして、上記決定した加算スコアを加算することができる。 In S480, the second scoring unit 143 adds the determined added score to the score of each node in the selected subgraph to correct the score of each node. For the nodes for which the score has not been calculated before the addition, the score can be regarded as zero and the above-determined addition score can be added.
 第2スコアリング部143は、このようにして各サブグラフ内のノードのスコアを修正すると、S500の処理を実行する。S500の処理が実行される前に、文書ネットワーク内のすべてのノードのスコアが決定される。 The second scoring unit 143 executes the process of S500 when the score of the node in each subgraph is corrected in this way. Before the processing of S500 is executed, the scores of all the nodes in the document network are determined.
 S500において、第2スコアリング部143は、決定された文書ネットワーク内の各ノードのスコアを、各ウェブページの第2スコアとしてランク付け部145に出力する。その後、スコア算出処理を終了する。 In S500, the second scoring unit 143 outputs the score of each node in the determined document network to the ranking unit 145 as the second score of each web page. After that, the score calculation process is completed.
 以上に説明した第1実施形態の情報処理システム1は、次のように変形され得る。第1変形例として、第2スコアリング部143は、アウトリンクのないノードに対してダミーノードDPを置かずに、エルミート隣接行列Hを生成し、スコア基準値及びスコアを算出してもよい。 The information processing system 1 of the first embodiment described above can be modified as follows. As a first modification, the second scoring unit 143 may generate the Hermitian adjacency matrix H and calculate the score reference value and the score without placing the dummy node DP for the node without the outline.
 第2変形例として、第2スコアリング部143は、各ノードのスコア基準値Vc[m]を、対応するノードのアウトリンク先の影響を除いた値として修正し、修正したスコア基準値Vc[m]を、修正前のスコア基準値Vc[m]に代えてを用いて、各ノードに対応するスコア相当値Z[m]=|Vc[m]|d1・{2π-arg(Vc[m])}d2を算出してもよい。 As a second modification, the second scoring unit 143 modifies the score reference value Vc [m] of each node as a value excluding the influence of the outlink destination of the corresponding node, and modifies the score reference value Vc *. Using [m] instead of the uncorrected score reference value Vc [m], the score equivalent value Z [m] = | Vc * [m] | d1 · {2π-arg (Vc) corresponding to each node. * [M])} d2 may be calculated.
 第3変形例として、第2スコアリング部143は、各ノードのスコア基準値Vc[m]を、対応するノードのインリンク元からの影響を除いた値として修正し、修正したスコア基準値Vc[m]を、修正前のスコア基準値に代えてを用いて、各ノードに対応するスコア相当値Z[m]=|Vc[m]|d1・{2π-arg(Vc[m])}d2を算出してもよい。第2変形例及び第3変形例は、第1変形例と同様、アウトリンクのないノードに対してダミーノードDPを置かずに、エルミート隣接行列Hを生成して実施され得る。 As a third modification, the second scoring unit 143 modifies the score reference value Vc [m] of each node as a value excluding the influence from the inlink source of the corresponding node, and modifies the score reference value Vc. * Using [m] instead of the score reference value before correction, the score equivalent value Z [m] = | Vc * [m] | d1 · {2π-arg (Vc * [m] corresponding to each node ])} d2 may be calculated. Similar to the first modification, the second modification and the third modification can be implemented by generating the Hermitian adjacency matrix H without placing a dummy node DP for a node having no outline.
 本実施形態の情報処理システム1によれば、ウェブページ間の接続関係を1,0,+i,-iの4値で表現したエルミート隣接行列Hに対応する特殊エルミート隣接行列H1,H2を用いて複数のウェブページをスコアリングする。このため、全ウェブページから全ウェブページへの仮想的な接続関係を措定する必要がなく、ウェブページ間の接続関係に基づく各ウェブページのスコアリング/ランク付けを従来よりも適切に実現することができる。 According to the information processing system 1 of the present embodiment, special Hermitian adjacency matrices H1 and H2 corresponding to the Hermitian adjacency matrix H in which the connection relationship between web pages is expressed by four values of 1,0, + i, and −i are used. Score multiple web pages. Therefore, it is not necessary to establish a virtual connection relationship from all web pages to all web pages, and scoring / ranking of each web page based on the connection relationship between web pages should be realized more appropriately than before. Can be done.
 本実施形態によれば、結合ノードを有する文書ネットワークにおいても、複数のウェブページのスコアリング/ランク付けを、エルミート隣接行列Hを用いて適切に実行できる。従って、出力部147は、第2スコアリング部143からの第2スコアに基づき、ウェブページ間の接続関係に基づいた適切な検索結果リストを、ユーザ端末5に提供することができる。 According to this embodiment, even in a document network having a join node, scoring / ranking of a plurality of web pages can be appropriately performed using the Hermitian adjacency matrix H. Therefore, the output unit 147 can provide the user terminal 5 with an appropriate search result list based on the connection relationship between the web pages based on the second score from the second scoring unit 143.
 [第2実施形態]
 続いて、第2実施形態の情報処理システム1を説明する。第2実施形態の情報処理システム1は、第1実施形態とは異なる内容のスコア算出処理がS130において実行されることを除けば、第1実施形態の情報処理システム1と同様に構成される。従って、以下では、第2スコアリング部143が、S130で実行するスコア算出処理の説明のみをする。以下において言及しない第2実施形態の情報処理システム1の構成は、第1実施形態と同一であると理解されてよい。
[Second Embodiment]
Subsequently, the information processing system 1 of the second embodiment will be described. The information processing system 1 of the second embodiment is configured in the same manner as the information processing system 1 of the first embodiment, except that a score calculation process having a content different from that of the first embodiment is executed in S130. Therefore, in the following, only the score calculation process executed by the second scoring unit 143 in S130 will be described. It may be understood that the configuration of the information processing system 1 of the second embodiment, which is not described below, is the same as that of the first embodiment.
 第2実施形態において、第2スコアリング部143は、S120で選択した処理対象の文書ネットワークに含まれる各ノードの第2スコアをS130において算出する際、図22に示すスコア算出処理を実行する。 In the second embodiment, the second scoring unit 143 executes the score calculation process shown in FIG. 22 when calculating the second score of each node included in the document network to be processed selected in S120 in S130.
 図22に示すスコア算出処理を開始すると、第2スコアリング部143は、処理対象の文書ネットワークに含まれるインリンクを持たない先端ノードを判別する(S610)。例えば、処理対象の文書ネットワークが、図23に例示される文書ネットワークである場合、第2スコアリング部143は、第1ノード及び第8ノードを先端ノードとして判別する。 When the score calculation process shown in FIG. 22 is started, the second scoring unit 143 determines the tip node having no inlink included in the document network to be processed (S610). For example, when the document network to be processed is the document network illustrated in FIG. 23, the second scoring unit 143 determines the first node and the eighth node as the tip node.
 その後、第2スコアリング部143は、文書ネットワークに含まれる先端ノードの一つを選択し(S620)、選択した先端ノードを含むサブグラフを判別する(S630)。判別されるサブグラフは、選択した先端ノードと、この先端ノードからリンクの向きに従って移動可能な文書ネットワーク内のすべてのノードとからなるサブグラフである。 After that, the second scoring unit 143 selects one of the advanced nodes included in the document network (S620) and determines the subgraph including the selected advanced node (S630). The discriminated subgraph is a subgraph consisting of the selected tip node and all the nodes in the document network that can be moved from this tip node according to the direction of the link.
 図23に示される文書ネットワークの例によれば、選択された先端ノードが第1ノードである場合、S630では、図24Aに示すように、第8ノードを除く第1ノードから第9ノードまでのノードからなるサブグラフSG21が判別される。選択された先端ノードが第8ノードである場合、S630では、図24Bに示すように、第3ノードから第9ノードまでのノードからなるサブグラフSG22が判別される。 According to the example of the document network shown in FIG. 23, when the selected leading node is the first node, in S630, as shown in FIG. 24A, from the first node to the ninth node excluding the eighth node. The subgraph SG21 composed of nodes is determined. When the selected leading node is the eighth node, in S630, as shown in FIG. 24B, the subgraph SG22 composed of the nodes from the third node to the ninth node is determined.
 その後、第2スコアリング部143は、S630で判別されたサブグラフを、処理対象の文書ネットワークとみなしたときの図8に示す処理と同様の処理を実行し、サブグラフ内の各ノードのスコア基準値及びスコアを算出する(S640)。 After that, the second scoring unit 143 executes the same processing as the processing shown in FIG. 8 when the subgraph determined in S630 is regarded as the document network to be processed, and the score reference value of each node in the subgraph is executed. And the score is calculated (S640).
 S640において、第2スコアリング部143は、後端ノードにダミーノードDPを配置せずに、エルミート隣接行列Hを生成し、スコア基準値及びスコアを算出することができる。第2スコアリング部143は、サブグラフ内における先端ノードのスコア基準値を、複素平面において、実軸上の値1の点、又は実軸に近い第4象限上の特定点に配置するように、サブグラフにおける各ノードのスコア基準値を算出することができる。特定点は、実軸上の値1の点を、角度θ1だけ第4象限側に回転させた点であり得る。 In S640, the second scoring unit 143 can generate the Hermitian adjacency matrix H and calculate the score reference value and the score without arranging the dummy node DP at the rear end node. The second scoring unit 143 arranges the score reference value of the tip node in the subgraph at the point of the value 1 on the real axis or the specific point on the fourth quadrant close to the real axis in the complex plane. The score reference value of each node in the subgraph can be calculated. The specific point may be a point obtained by rotating a point having a value of 1 on the real axis toward the fourth quadrant by an angle θ1.
 S640に続くS650において、第2スコアリング部143は、すべての先端ノードを選択して、S640の処理を実行したか否かを判断する。S650において否定判断すると、第2スコアリング部143は、選択する先端ノードを変更して(S620)、変更後の先端ノードのサブグラフを判別する(S630)。そして、判別したサブグラフのエルミート隣接行列Hに基づいて、サブグラフ内の各ノードのスコア基準値及びスコアを算出する(S640)。 In S650 following S640, the second scoring unit 143 selects all the leading nodes and determines whether or not the processing of S640 has been executed. If a negative determination is made in S650, the second scoring unit 143 changes the selected tip node (S620) and determines the subgraph of the changed tip node (S630). Then, the score reference value and the score of each node in the subgraph are calculated based on the Hermitian adjacency matrix H of the discriminated subgraph (S640).
 第2スコアリング部143は、このようにして、文書ネットワーク内に含まれる先端ノードのそれぞれに対応するサブグラフ毎に、対応するエルミート隣接行列Hに基づく各ノードのスコア基準値及びスコアを算出する。換言すれば、第2スコアリング部143は、結合ノードのインリンク毎のサブグラフを判別し、サブグラフ毎に、対応するエルミート隣接行列Hに基づく各ノードのスコア基準値及びスコアを算出する。その後、第2スコアリング部143は、S650で肯定判断して、S660の処理を実行する。 In this way, the second scoring unit 143 calculates the score reference value and the score of each node based on the corresponding Hermitian adjacency matrix H for each subgraph corresponding to each of the advanced nodes included in the document network. In other words, the second scoring unit 143 determines the subgraph for each inlink of the connecting node, and calculates the score reference value and the score of each node based on the corresponding Hermitian adjacency matrix H for each subgraph. After that, the second scoring unit 143 makes an affirmative judgment in S650 and executes the process of S660.
 サブグラフ内には、他のサブグラフと重複するノードが含まれるが、S640では、重複するノードのそれぞれに対し、サブグラフ毎に、スコア基準値及びスコアが算出される。 The subgraph includes nodes that overlap with other subgraphs, but in S640, a score reference value and a score are calculated for each of the overlapping nodes for each subgraph.
 S660において、第2スコアリング部143は、文書ネットワーク内の各ノードの第2スコアとして、対応するノードの各サブグラフでのスコアを統合した値を算出する。具体的に、第2スコアリング部143は、一つのノードの第2スコアを、そのノードの各サブグラフでのスコアを合計した値として算出する。 In S660, the second scoring unit 143 calculates a value obtained by integrating the scores of the corresponding nodes in each subgraph as the second score of each node in the document network. Specifically, the second scoring unit 143 calculates the second score of one node as the total value of the scores in each subgraph of the node.
 あるいは、第2スコアリング部143は、一つのノードの第2スコアを、そのノードの各サブグラフでのスコア基準値の合成ベクトルに基づいて算出してもよい。第2スコアリング部143は、ノード毎に、対応するノードの各サブグラフでのスコア基準値の合成ベクトルを、対応するノードの唯一のスコア基準値Vxとして用いて、式Zx=|Vx|d1・{2π-arg(Vx)}d2に従って、スコア基準値Vxに対応するスコア相当値Zxを算出することができる。第2スコアリング部143は、算出したスコア相当値Zxを、対応するノードの第2スコアとして出力することができる。 Alternatively, the second scoring unit 143 may calculate the second score of one node based on the composite vector of the score reference values in each subgraph of the node. The second scoring unit 143 uses the composite vector of the score reference values in each subgraph of the corresponding node as the only score reference value Vx of the corresponding node for each node, and uses the equation Zx = | Vx | d1. According to {2π-arg (Vx)} d2 , the score equivalent value Zx corresponding to the score reference value Vx can be calculated. The second scoring unit 143 can output the calculated score equivalent value Zx as the second score of the corresponding node.
 上述の第2実施形態によっても、情報処理システム1は、結合ノードを有する文書ネットワークに関する複数のウェブページのスコアリング/ランク付けを、エルミート隣接行列Hを用いて適切に実行することができる。 Also according to the second embodiment described above, the information processing system 1 can appropriately perform scoring / ranking of a plurality of web pages related to a document network having a join node by using the Hermitian adjacency matrix H.
 第2実施形態は、第1実施形態と同様に変形されてもよい。すなわち、第1実施形態において、第1、第2、及び第3変形例として説明したスコアの算出に係る変形例は、第2実施形態に適用されてもよい。 The second embodiment may be modified in the same manner as the first embodiment. That is, in the first embodiment, the modified example relating to the calculation of the score described as the first, second, and third modified examples may be applied to the second embodiment.
 [第3実施形態]
 続いて、第3実施形態の情報処理システム1を説明する。第3実施形態の情報処理システム1は、第1実施形態とは異なる内容のスコア算出処理がS130において実行されることを除けば、第1実施形態の情報処理システム1と同様に構成される。従って、以下では、第2スコアリング部143が、S130で実行するスコア算出処理の説明のみをする。以下において言及しない第3実施形態の情報処理システム1の構成は、第1実施形態と同一であると理解されてよい。
[Third Embodiment]
Subsequently, the information processing system 1 of the third embodiment will be described. The information processing system 1 of the third embodiment is configured in the same manner as the information processing system 1 of the first embodiment, except that a score calculation process having a content different from that of the first embodiment is executed in S130. Therefore, in the following, only the score calculation process executed by the second scoring unit 143 in S130 will be described. It may be understood that the configuration of the information processing system 1 of the third embodiment, which is not described below, is the same as that of the first embodiment.
 第3実施形態において、第2スコアリング部143は、S120で選択した処理対象の文書ネットワークに含まれる各ノードの第2スコアをS130において算出する際に、図25に示すスコア算出処理を実行する。 In the third embodiment, the second scoring unit 143 executes the score calculation process shown in FIG. 25 when calculating the second score of each node included in the document network to be processed selected in S120 in S130. ..
 図25に示すスコア算出処理を開始すると、第2スコアリング部143は、処理対象の文書ネットワーク内に含まれるインリンクを持たない先端ノード、及びアウトリンクを持たない後端ノードを判別する(S710)。 When the score calculation process shown in FIG. 25 is started, the second scoring unit 143 determines the front node having no inlink and the rear node having no outlink included in the document network to be processed (S710). ).
 アウトリンクを持たない後端ノードがない場合、第2スコアリング部143は、文書ネットワークを有向グラフで表現したときのアウトリンク及びインリンクの合計と、文書ネットワークを無向グラフで表現したときのリンク数とが異なるノードを、形式に後端ノードと判別する。該当するノードがない場合、第2スコアリング部143は、文書ネットワーク内のすべてのノードからのインリンクを有するダミーノードDPを文書ネットワークに追加して、そのダミーノードDPを後端ノードと判別する。 When there is no trailing node that does not have an outlink, the second scoring unit 143 calculates the total of the outlinks and inlinks when the document network is represented by a directed graph, and the link when the document network is represented by an undirected graph. Nodes with different numbers are formally identified as trailing nodes. If there is no corresponding node, the second scoring unit 143 adds a dummy node DP having inlinks from all the nodes in the document network to the document network, and determines the dummy node DP as the rearmost node. ..
 処理対象の文書ネットワークが、図23に例示される文書ネットワークである場合、第2スコアリング部143は、第1及び第8ノードを、先端ノードとして判別し、第5、第7、及び第9ノードを、後端ノードとして判別する。処理対象の文書ネットワークが、図27及び図28に例示される文書ネットワークである場合、第2スコアリング部143は、第1ノードを、先端ノードとして判別し、第9ノードを、後端ノードとして判別する。 When the document network to be processed is the document network exemplified in FIG. 23, the second scoring unit 143 determines the first and eighth nodes as the tip nodes, and the fifth, seventh, and ninth nodes. Determine the node as the trailing node. When the document network to be processed is the document network exemplified in FIGS. 27 and 28, the second scoring unit 143 determines the first node as the front node and the ninth node as the rear node. Determine.
 その後、第2スコアリング部143は、文書ネットワークに含まれる、先端ノードと後端ノードとの組合せ毎のサブグラフを判別する(S720)。サブグラフは、先端ノードから後端ノードまでリンクの向きに従って移動可能なノードの一群からなるサブグラフである。先端ノードと後端ノードとの組合せ毎のサブグラフは、結合ノードが有するインリンク及びアウトリンクの組合せ毎のサブグラフと理解されてもよい。 After that, the second scoring unit 143 determines the subgraph for each combination of the front end node and the rear end node included in the document network (S720). A subgraph is a subgraph consisting of a group of nodes that can move from the front end node to the rear end node according to the direction of the link. The subgraph for each combination of the front-end node and the rear-end node may be understood as a subgraph for each combination of in-link and out-link of the join node.
 図23に示される文書ネットワークの例によれば、S720において、第2スコアリング部143は、図26Aに示す先端ノードが第1ノードである三つのサブグラフSG31、SG32、SG33、及び、図26Bに示す先端ノードが第8ノードである三つのサブグラフSG34、SG35、SG36を判別する。 According to the example of the document network shown in FIG. 23, in S720, the second scoring unit 143 has three subgraphs SG31, SG32, SG33, and FIG. 26B in which the tip node shown in FIG. 26A is the first node. The three subgraphs SG34, SG35, and SG36 whose tip node is the eighth node are discriminated.
 図27に示される文書ネットワークの例によれば、S720において、第2スコアリング部143は、図29Aに示す第3ノードの第一のアウトリンクを通る先端ノードが第1ノード及び後端ノードが第9ノードであるサブグラフと、図29Bに示す第3ノードの第二のアウトリンクを通る先端ノードが第1ノード及び後端ノードが第9ノードであるサブグラフとを判別する。 According to the example of the document network shown in FIG. 27, in S720, the second scoring unit 143 has the first node and the rearmost node as the front node passing through the first outlink of the third node shown in FIG. 29A. The subgraph which is the ninth node and the subgraph whose front end node passing through the second outlink of the third node shown in FIG. 29B is the first node and the rear end node is the ninth node are discriminated from each other.
 その後、第2スコアリング部143は、判別したサブグラフの一つを選択し(S730)、選択したサブグラフを、処理対象の文書ネットワークとみなしたときの図8に示す処理と同様の処理を実行し、サブグラフ内の各ノードの第1仮スコアを算出する(S740)。 After that, the second scoring unit 143 selects one of the determined subgraphs (S730), and executes the same processing as that shown in FIG. 8 when the selected subgraph is regarded as the document network to be processed. , Calculate the first tentative score of each node in the subgraph (S740).
 S740において、第2スコアリング部143は、第mノードの第1仮スコアXp1[m]を、式Xp1[m]={(2π-arg(Vc[m]))/(π/2n)}d3に従って算出することができる。Vc[m]は、第mノードのスコア基準値である。d3は、0より大きい任意の実数である。d3が大きいほど、第1仮スコアXp1[m]は、インリンクを持たない先端ノードからの距離(リンクの数)に応じて大きくなる。 In S740, the second scoring unit 143 sets the first provisional score Xp1 [m] of the m node to the equation Xp1 [m] = {(2π-arg (Vc [m])) / (π / 2n)}. It can be calculated according to d3. Vc [m] is a score reference value of the mth node. d3 is any real number greater than 0. The larger d3 is, the larger the first provisional score Xp1 [m] is according to the distance (the number of links) from the tip node having no inlink.
 第2スコアリング部143は、すべてのサブグラフに関してS740の処理を実行するまで(S750でNo)、サブグラフのそれぞれを順に選択し(S730)、S740の処理を実行する。これにより、第2スコアリング部143は、サブグラフ毎に、当該サブグラフ内の各ノードの第1仮スコアを算出する(S740)。 The second scoring unit 143 selects each of the subgraphs in order (S730) until the processing of S740 is executed for all the subgraphs (No in S750), and the processing of S740 is executed. As a result, the second scoring unit 143 calculates the first provisional score of each node in the subgraph for each subgraph (S740).
 すべてのサブグラフに関して、S740の処理を実行すると(S750でYes)、第2スコアリング部143は、文書ネットワーク内のノード毎に、当該ノードの第2仮スコアとして、当該ノードの第1仮スコアの平均値を算出する(S760)。一つのノードの第2仮スコアは、対応するノードの第1仮スコアの合計を、対応するノードが属するサブグラフの数で除算した値である。 When the processing of S740 is executed for all the subgraphs (Yes in S750), the second scoring unit 143 sets the second tentative score of the node as the second tentative score of the node for each node in the document network. The average value is calculated (S760). The second provisional score of one node is the sum of the first provisional scores of the corresponding node divided by the number of subgraphs to which the corresponding node belongs.
 その後、第2スコアリング部143は、文書ネットワーク内の各ノードの第2仮スコアに基づいて、各ノードの第3仮スコアを算出する(S770)。S770において、第2スコアリング部143は、第mノードの第3仮スコアXp3[m]を、第mノードの第2仮スコアXp2[m]及びスコア基準値Vc[m]を用いて、式Xp3[m]=Xp2[m]・|Vc[m]|に従い算出する。 After that, the second scoring unit 143 calculates the third provisional score of each node based on the second provisional score of each node in the document network (S770). In S770, the second scoring unit 143 uses the third provisional score Xp3 [m] of the m-node, the second provisional score Xp2 [m] of the m-node, and the score reference value Vc [m] to formulate the formula. Calculate according to Xp3 [m] = Xp2 [m] · | Vc [m] |.
 その後、第2スコアリング部143は、文書ネットワーク内の各ノードの第2スコアを、各ノードの第3仮スコアを用いて算出する(S780)。具体的には、第2スコアリング部143は、文書ネットワーク内の第mノードの第2スコアを、第3仮スコアXp3[m]を値Md4で除算した値Xp3[m]/Md4に算出する。ここで、Mは、後端ノードから第mノードに到達可能な各ノードのアウトリンクの数の積である。 After that, the second scoring unit 143 calculates the second score of each node in the document network using the third provisional score of each node (S780). Specifically, the second scoring unit 143, the second score of the m nodes in the document network, the third temporary score Xp3 value obtained by dividing [m] to a value M d4 Xp3 [m] / M d4 calculate. Here, M is the product of the number of outlinks of each node that can reach the mth node from the rearmost node.
 上述の第3実施形態によっても、結合ノードを有する文書ネットワークに関し、複数のウェブページのスコアリング/ランク付けを、エルミート隣接行列Hを用いて適切に実行することができる。 Also according to the third embodiment described above, scoring / ranking of a plurality of web pages can be appropriately performed using the Hermitian adjacency matrix H for a document network having a join node.
 [第4実施形態]
 続いて、第4実施形態の情報処理システム1を説明する。第4実施形態の情報処理システム1は、第2スコアリング部143が、図8に示す副処理に代えて図30に示す副処理を実行する点で、第1実施形態とは異なる。一方、第4実施形態の情報処理システム1は、その他の点で基本的に第1実施形態と同じである。従って、以下では、第4実施形態の情報処理システム1の第1実施形態とは異なる構成を選択的に説明し、第1実施形態とは同一構成の説明を省略する。
[Fourth Embodiment]
Subsequently, the information processing system 1 of the fourth embodiment will be described. The information processing system 1 of the fourth embodiment is different from the first embodiment in that the second scoring unit 143 executes the sub-processing shown in FIG. 30 instead of the sub-processing shown in FIG. On the other hand, the information processing system 1 of the fourth embodiment is basically the same as that of the first embodiment in other respects. Therefore, in the following, a configuration different from that of the first embodiment of the information processing system 1 of the fourth embodiment will be selectively described, and the description of the same configuration as that of the first embodiment will be omitted.
 図30に示す副処理は、S240で実行される。更に、S280,S330,S370,S430では、スコア基準値及びスコアの算出に際し、図30に示す副処理と同様の処理が実行される。 The sub-processing shown in FIG. 30 is executed in S240. Further, in S280, S330, S370, and S430, the same processing as the sub-processing shown in FIG. 30 is executed when calculating the score reference value and the score.
 図30に示す副処理を開始すると、第2スコアリング部143は、S1010と同様に、処理対象の文書ネットワークに対応するエルミート隣接行列Hを生成する(S1110)。 When the sub-processing shown in FIG. 30 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1110), similarly to S1010.
 その後、第2スコアリング部143は、上記生成したエルミート隣接行列Hを変形した特殊エルミート隣接行列H3を生成する(S1120)。特殊エルミート隣接行列H3は、S1020において生成される特殊エルミート隣接行列H1における対角成分の全てを、値0から値-1に置換することによって生成される。 After that, the second scoring unit 143 generates a special Hermitian adjacency matrix H3 which is a modification of the generated Hermitian adjacency matrix H (S1120). The special Hermitian adjacency matrix H3 is generated by substituting all the diagonal components in the special Hermitian adjacency matrix H1 generated in S1020 from a value 0 to a value -1.
 すなわち、第2スコアリング部143は、S1110で生成したエルミート隣接行列HをS1020での処理と同様に変形することにより、特殊エルミート隣接行列H1を生成し、特殊エルミート隣接行列H1における対角成分の全てを、値0から値-1に置換することによって、特殊エルミート隣接行列H3を生成することができる。 That is, the second scoring unit 143 generates a special Hermitian adjacency matrix H1 by transforming the Hermitian adjacency matrix H generated in S1110 in the same manner as in the processing in S1020, and the diagonal components in the special Hermitian adjacency matrix H1. By substituting everything from value 0 to value -1, the special Hermitian adjacency matrix H3 can be generated.
 続くS1130において、第2スコアリング部143は、列ベクトルBを生成する。S1120で生成される特殊エルミート隣接行列H3は、処理対象の文書ネットワークのノード数Nに対応したN行N列(NxN)の行列である。 In the following S1130, the second scoring unit 143 generates the column vector B. The special Hermitian adjacency matrix H3 generated in S1120 is a matrix of N rows and N columns (NxN) corresponding to the number N of nodes of the document network to be processed.
 S1130で生成される列ベクトルBは、N行1列の行列に対応し、列ベクトルBは、各成分が、対応するノードがインリンクを有するノードであるか否かに応じた値を示すように生成される。 The column vector B generated in S1130 corresponds to a matrix of N rows and 1 column, and the column vector B indicates a value according to whether or not each component is a node having an inlink. Is generated in.
 具体的に、第2スコアリング部143は、インリンクを有さないノードに対応する成分を値-1に設定し、インリンクを有するノードに対応する成分を値0に設定するように、列ベクトルBを生成する。 Specifically, the second scoring unit 143 sets the component corresponding to the node having no inlink to the value -1, and sets the component corresponding to the node having the inlink to the value 0. Generate vector B.
 続くS1140において、第2スコアリング部143は、特殊エルミート隣接行列H3及び列ベクトルBを含む次の連立方程式を解くことにより、連立方程式の解に対応するスコア基準ベクトルUの各成分u[m](1≦m≦N)を求める。スコア基準ベクトルUは、列ベクトルBと同様にN行1列の行列に対応する。下式では、スコア基準ベクトルUの第m行成分をu[m]で表現し、列ベクトルBの第m行成分をb[m]で表し、特殊エルミート隣接行列H3を、H’で表す。 In the following S1140, the second scoring unit 143 solves each component u [m] of the score reference vector U corresponding to the solution of the simultaneous equations by solving the following simultaneous equations including the special Hermitian adjacency matrix H3 and the column vector B. (1 ≦ m ≦ N) is obtained. The score reference vector U corresponds to an N-by-1 matrix like the column vector B. In the following equation, the m-th row component of the score reference vector U is represented by u [m], the m-th row component of the column vector B is represented by b [m], and the special Hermitian adjacency matrix H3 is represented by H'.
Figure JPOXMLDOC01-appb-M000003
 例えば、第1ノードのみがインリンクを有さないノード数が5の例示的かつ単純な文書ネットワークを想定する。この場合、上記連立方程式は、例示的な特殊エルミート隣接行列H3を用いて、次のように表され得る。
Figure JPOXMLDOC01-appb-M000003
For example, assume an exemplary and simple document network in which only the first node has no inlinks and the number of nodes is 5. In this case, the simultaneous equations can be expressed as follows using the exemplary Hermitian adjacency matrix H3.
Figure JPOXMLDOC01-appb-M000004
 続くS1150において、第2スコアリング部143は、S1140で算出したスコア基準ベクトルUの各成分u[m]に基づいて、各ノードのスコア基準値Vc[m]を決定する。
Figure JPOXMLDOC01-appb-M000004
In the subsequent S1150, the second scoring unit 143 determines the score reference value Vc [m] of each node based on each component u [m] of the score reference vector U calculated in S1140.
 第2スコアリング部143は、第1実施形態における固有ベクトルV1と同様に、スコア基準ベクトルUの各成分u[m](1≦m≦N)を、文書ネットワークの始点ノードに対応する成分E=u[s]で除算することによって、補正することができる。その補正値u[m]/E(1≦m≦N)を、複素平面上で角度θ1だけ実軸から第4象限側に回転移動させるように補正することができる。この補正値を、スコア基準値Vc[m]に決定することができる。 Similar to the eigenvector V1 in the first embodiment, the second scoring unit 143 sets each component u [m] (1 ≦ m ≦ N) of the score reference vector U to the component E = corresponding to the start point node of the document network. It can be corrected by dividing by u [s]. The correction value u [m] / E (1 ≦ m ≦ N) can be corrected so as to be rotationally moved from the real axis to the fourth quadrant side by an angle θ1 on the complex plane. This correction value can be determined as the score reference value Vc [m].
 続くS1160において、第2スコアリング部143は、各ノードのスコア相当値Z[m](1≦m≦N)として、各ノードのスコア基準値Vc[m](1≦m≦N)に基づいた値Z[m]=|Vc[m]|・{2π-arg(Vc[m])}を算出する。第1実施形態と同様に、スコア相当値Z[m]は、式Z[m]=|Vc[m]|d1・{2π-arg(Vc[m])}d2に従って算出されてもよい。 In the following S1160, the second scoring unit 143 sets the score equivalent value Z [m] (1 ≦ m ≦ N) of each node based on the score reference value Vc [m] (1 ≦ m ≦ N) of each node. The value Z [m] = | Vc [m] | · {2π-arg (Vc [m])} is calculated. Similar to the first embodiment, the score equivalent value Z [m] may be calculated according to the formula Z [m] = | Vc [m] | d1 · {2π-arg (Vc [m])} d2.
 第2スコアリング部143は更に、文書ネットワーク内の各ノードD[m](1≦m≦N)のスコアXを、スコア相当値Z[m]に基づいて算出する(S1160)。第2スコアリング部143は、第1実施形態と同様に、ノードD[m]に対応するスコアXを、X=Z[m]-Z0に従って算出することができる。Z0は、例えば文書ネットワーク全体におけるZ[m]の最小値である。Z0は、値ゼロであってもよい。 The second scoring unit 143 further calculates the score X of each node D [m] (1 ≦ m ≦ N) in the document network based on the score equivalent value Z [m] (S1160). The second scoring unit 143 can calculate the score X corresponding to the node D [m] according to X = Z [m] −Z0, as in the first embodiment. Z0 is, for example, the minimum value of Z [m] in the entire document network. Z0 may have a value of zero.
 本実施形態では、このように各ノードのスコアXが算出される。算出されたスコアXの扱いは、第1実施形態と同様である。本実施形態によれば、第1実施形態のように固有値及び固有ベクトルを算出することなく、エルミート隣接行列Hを用いて各ノードのスコアXを算出することができる。 In this embodiment, the score X of each node is calculated in this way. The treatment of the calculated score X is the same as that of the first embodiment. According to the present embodiment, the score X of each node can be calculated using the Hermitian adjacency matrix H without calculating the eigenvalues and the eigenvectors as in the first embodiment.
 [第5実施形態]
 続いて、第5実施形態の情報処理システム1を説明する。第5実施形態の情報処理システム1は、第2スコアリング部143が、図8に示す副処理に代えて図31に示す副処理を実行する点で、第1実施形態とは異なる。一方、第5実施形態の情報処理システム1は、その他の点で基本的に第1実施形態と同じである。従って、以下では、第5実施形態の情報処理システム1の第1実施形態とは異なる構成を選択的に説明し、第1実施形態とは同一構成の説明を省略する。
[Fifth Embodiment]
Subsequently, the information processing system 1 of the fifth embodiment will be described. The information processing system 1 of the fifth embodiment is different from the first embodiment in that the second scoring unit 143 executes the sub-processing shown in FIG. 31 instead of the sub-processing shown in FIG. On the other hand, the information processing system 1 of the fifth embodiment is basically the same as that of the first embodiment in other respects. Therefore, in the following, a configuration different from that of the first embodiment of the information processing system 1 of the fifth embodiment will be selectively described, and the description of the same configuration as that of the first embodiment will be omitted.
 図31に示す副処理は、S240で実行される。更に、S280,S330,S370,S430では、スコア基準値及びスコアの算出に際し、図31に示す副処理と同様の処理が実行される。 The sub-processing shown in FIG. 31 is executed in S240. Further, in S280, S330, S370, and S430, the same processing as the sub-processing shown in FIG. 31 is executed when calculating the score reference value and the score.
 図31に示す副処理を開始すると、第2スコアリング部143は、S1010と同様に、処理対象の文書ネットワークに対応するエルミート隣接行列Hを生成する(S1210)。 When the sub-processing shown in FIG. 31 is started, the second scoring unit 143 generates the Hermitian adjacency matrix H corresponding to the document network to be processed (S1210), similarly to S1010.
 その後、第2スコアリング部143は、上記生成したエルミート隣接行列Hを変形した特殊エルミート隣接行列H4を生成する(S1220)。特殊エルミート隣接行列H4を生成するために、第2スコアリング部143は、エルミート隣接行列Hにおいて、値+iの成分を全て値0に置き換えることができる。更に、値-iの成分を、全て値C1(C2-i)に置き換えることができる。この置換により、図32上段に示す例示的なエルミート隣接行列Hは、図32下段に示す行列H(1)に置換される。 After that, the second scoring unit 143 generates a special Hermitian adjacency matrix H4 which is a modification of the generated Hermitian adjacency matrix H (S1220). In order to generate the special Hermitian adjacency matrix H4, the second scoring unit 143 can replace all the components of the value + i with the value 0 in the Hermitian adjacency matrix H. Further, all the components of the value −i can be replaced with the value C1 (C2-i). By this substitution, the exemplary Hermitian adjacency matrix H shown in the upper part of FIG. 32 is replaced with the matrix H (1) shown in the lower part of FIG. 32.
 第2スコアリング部143は更に、上記行列H(1)において、アウトリンクの数が2以上のノードに対応する列、換言すれば、エルミート隣接行列Hにおいて値-iを有する成分の数が2以上である列における値C1(C2-i)の成分を、値C1(C2-i)/Rに置換する。値Rは、アウトリンクの数であり、エルミート隣接行列Hの対応する列において値-iを有する成分の個数に対応する。この置換により、図32下段に示される行列H(1)は、図33上段に示される行列H(2)に置換される。 The second scoring unit 143 further includes, in the above matrix H (1) , a column corresponding to a node having two or more outlinks, in other words, the number of components having a value −i in the Hermitian adjacency matrix H is 2. The component of the value C1 (C2-i) in the above column is replaced with the value C1 (C2-i) / R. The value R is the number of outlinks and corresponds to the number of components having the value −i in the corresponding column of the Hermitian adjacency matrix H. By this substitution, the matrix H (1) shown in the lower part of FIG. 32 is replaced with the matrix H (2) shown in the upper part of FIG. 33.
 第2スコアリング部143は更に、図33下段に示すように、行列H(2)における対角成分を全て値-1に置換して、特殊エルミート隣接行列H4を生成する。続くS1230において、第2スコアリング部143は、S1130での処理と同様に、列ベクトルBを生成する。 The second scoring unit 143 further replaces all the diagonal components in the matrix H (2) with a value -1 as shown in the lower part of FIG. 33 to generate a special Hermitian adjacency matrix H4. In the subsequent S1230, the second scoring unit 143 generates the column vector B as in the process in S1130.
 その後、第2スコアリング部143は、S1140での処理と同様に、特殊エルミート隣接行列H4及び列ベクトルBを含む連立方程式を解くことにより、連立方程式の解に対応するスコア基準ベクトルUの各成分u[m](1≦m≦N)を求める(S1240)。 After that, the second scoring unit 143 solves the simultaneous equations including the special Hermitian adjacency matrix H4 and the column vector B in the same manner as the processing in S1140, so that each component of the score reference vector U corresponding to the solution of the simultaneous equations u [m] (1 ≦ m ≦ N) is obtained (S1240).
 S1240での処理は、特殊エルミート隣接行列H3に代えて、特殊エルミート隣接行列H4が用いられる点を除けば、S1140での処理と同じである。連立方程式の解において、インリンクを持たない先端ノードに対応する成分は、実数1を示す。 The processing in S1240 is the same as the processing in S1140 except that the special Hermitian adjacency matrix H4 is used instead of the special Hermitian adjacency matrix H3. In the solution of simultaneous equations, the component corresponding to the tip node having no inlink shows the real number 1.
 続くS1250において、第2スコアリング部143は、S1240で算出したスコア基準ベクトルUの各成分u[m]に基づいて、各ノードのスコア基準値Vc[m]を決定する。本実施形態によれば、第2スコアリング部143は、各ノードのスコア基準値Vc[m]を、スコア基準ベクトルUの、対応する成分u[m]と同じ値に決定することができる。 In the following S1250, the second scoring unit 143 determines the score reference value Vc [m] of each node based on each component u [m] of the score reference vector U calculated in S1240. According to the present embodiment, the second scoring unit 143 can determine the score reference value Vc [m] of each node to be the same value as the corresponding component u [m] of the score reference vector U.
 続くS1260において、第2スコアリング部143は、各ノードのスコア相当値Z[m](1≦m≦N)として、各ノードのスコア基準値Vc[m](1≦m≦N)に基づいた次式に従う値Z[m]を算出する。値dは、ゼロより大きい任意の実数である。値dは、値1であってもよい。 In the following S1260, the second scoring unit 143 sets the score equivalent value Z [m] (1 ≦ m ≦ N) of each node based on the score reference value Vc [m] (1 ≦ m ≦ N) of each node. The value Z [m] according to the following equation is calculated. The value d is any real number greater than zero. The value d may be the value 1.
Figure JPOXMLDOC01-appb-M000005
 第2スコアリング部143は更に、文書ネットワーク内の各ノードD[m](1≦m≦N)のスコアXを、スコア相当値Z[m]に基づいて算出する(S1260)。第2スコアリング部143は、第1実施形態と同様に、ノードD[m]に対応するスコアXを、X=Z[m]-Z0に従って算出することができる。Z0は、例えば、文書ネットワーク全体におけるZ[m]の最小値である。Z0は、値ゼロであってもよい。
Figure JPOXMLDOC01-appb-M000005
The second scoring unit 143 further calculates the score X of each node D [m] (1 ≦ m ≦ N) in the document network based on the score equivalent value Z [m] (S1260). The second scoring unit 143 can calculate the score X corresponding to the node D [m] according to X = Z [m] −Z0, as in the first embodiment. Z0 is, for example, the minimum value of Z [m] in the entire document network. Z0 may have a value of zero.
 本実施形態では、このように各ノードのスコアXが算出される。算出されたスコアXの扱いは、第1実施形態と同様である。本実施形態によれば、第1実施形態のように固有値及び固有ベクトルを求めることなく、エルミート隣接行列Hを用いて、各ノードのスコアXを算出することができる。 In this embodiment, the score X of each node is calculated in this way. The treatment of the calculated score X is the same as that of the first embodiment. According to the present embodiment, the score X of each node can be calculated using the Hermitian adjacency matrix H without obtaining the eigenvalues and the eigenvectors as in the first embodiment.
 以上に、本開示の例示的実施形態を説明したが、本開示は、上述の実施形態に限定されない。本開示は、ウェブ文書に限定されないリンク/引用関係を持つ文書のスコアリングに適用されてもよい。第4実施形態及び第5実施形態に係る技術的思想は、第2実施形態又は第3実施形態に適用されてもよい。 Although the exemplary embodiments of the present disclosure have been described above, the present disclosure is not limited to the above-described embodiments. The disclosure may apply to scoring documents with link / citation relationships that are not limited to web documents. The technical ideas according to the fourth embodiment and the fifth embodiment may be applied to the second embodiment or the third embodiment.
 先端ノードを有さない文書ネットワークに、インリンクを持たないダミーノードDPであって、文書ネットワーク内の全てのノードへのアウトリンクを持つダミーノードDPを付加する技術は、第1実施形態から第5実施形態に適用され得る。更に言えば、このインリンクを持たないダミーノードDPは、先端ノードを有する文書ネットワークに付加されてもよい。 The technique of adding a dummy node DP having no inlink and having an outlink to all the nodes in the document network to the document network having no advanced node is the first to the first embodiment. 5 Can be applied to embodiments. Furthermore, the dummy node DP having no inlink may be added to the document network having the advanced node.
 同様に、後端ノードを有さない文書ネットワークに、文書ネットワーク内のすべてのノードからのインリンクを有するダミーノードDPであって、アウトリンクを持たないダミーノードDPを付加する技術は、第1実施形態から第5実施形態に適用され得る。更に言えば、このアウトリンクを持たないダミーノードDPは、後端ノードを有する文書ネットワークに付加されてもよい。 Similarly, the technique of adding a dummy node DP having inlinks from all the nodes in the document network to the document network having no rear end node and having no outlink is the first technique. It can be applied from the embodiment to the fifth embodiment. Furthermore, the dummy node DP having no outlink may be added to the document network having the trailing node.
 上記実施形態における1つの構成要素が有する機能は、複数の構成要素に分散して設けられてもよい。複数の構成要素が有する機能は、1つの構成要素に統合されてもよい。上記実施形態の構成の一部は、省略されてもよい。特許請求の範囲に記載の文言から特定される技術思想に含まれるあらゆる態様が本開示の実施形態である。
 
The functions of one component in the above embodiment may be distributed to a plurality of components. The functions of the plurality of components may be integrated into one component. Some of the configurations of the above embodiments may be omitted. The embodiments of the present disclosure are all aspects contained in the technical idea identified from the wording described in the claims.

Claims (16)

  1.  文書間の接続関係を表すデータに基づき、少なくとも弱連結で連結された複数の文書で構成される文書ネットワークを判別するように構成される文書ネットワーク判別部と、
     判別された前記文書ネットワークに含まれる、二つ以上の文書からのインリンクを有する特定文書を判別するように構成される文書判別部と、
     判別された前記特定文書を基準に、前記文書ネットワークに含まれる複数のサブネットワークを判別するように構成されるサブネットワーク判別部と、
     判別された前記複数のサブネットワークのそれぞれに対する個別処理を実行することにより、前記文書ネットワークを構成する前記複数の文書のそれぞれのスコアを算出するように構成されるスコア算出部であって、前記個別処理では、対応するサブネットワークに含まれる各文書のスコアを算出するスコア算出部と、
     を備え、
     前記文書ネットワークには、前記複数のサブネットワークのうちの二つ以上のサブネットワークに属する文書である重複文書が一つ以上含まれ、
     前記スコア算出部は、一つ以上の前記重複文書のそれぞれに関して、対応する重複文書の前記二つ以上のサブネットワークでのスコアを統合することにより、前記対応する重複文書に対する一つのスコアを算出する情報処理システム。
    A document network discriminator configured to discriminate at least a document network composed of a plurality of documents linked by a weak connection based on data representing a connection relationship between documents.
    A document discriminating unit configured to discriminate a specific document having an inlink from two or more documents included in the discriminated document network, and a document discriminating unit.
    A sub-network discriminating unit configured to discriminate a plurality of sub-networks included in the document network based on the discriminated specific document.
    A score calculation unit configured to calculate the scores of the plurality of documents constituting the document network by executing individual processing for each of the determined plurality of subnetworks. In the process, a score calculation unit that calculates the score of each document included in the corresponding subnetwork, and
    With
    The document network includes one or more duplicate documents that belong to two or more of the plurality of subnetworks.
    The score calculation unit calculates one score for the corresponding duplicate document by integrating the scores of the corresponding duplicate document in the two or more subnetworks for each of the one or more duplicate documents. Information processing system.
  2.  請求項1記載の情報処理システムであって、
     前記サブネットワーク判別部は、前記特定文書を境界に有する複数のサブネットワークを判別し、
     前記複数のサブネットワークは、前記特定文書が有する二つ以上のインリンクに対応する二つ以上の上流ネットワークであって、上流ネットワークのそれぞれでは、前記特定文書が対応する一つのインリンクを有する、二つ以上の上流サブネットワークと、前記特定文書が有するアウトリンクを通じて前記特定文書と接続される下流サブネットワークとを少なくとも有し、
     前記特定文書は、前記二つ以上の上流サブネットワークに属する前記重複文書であり、
     前記スコア算出部は、前記特定文書の前記上流サブネットワークでのスコアを統合することにより、前記特定文書に対して一つの統合スコアを算出し、前記下流サブネットワークに属する各文書のスコアを、前記特定文書の前記統合スコアを基準に算出する情報処理システム。
    The information processing system according to claim 1.
    The sub-network discriminating unit discriminates a plurality of sub-networks having the specific document as a boundary, and discriminates between them.
    The plurality of subnetworks are two or more upstream networks corresponding to two or more inlinks possessed by the specific document, and each of the upstream networks has one inlink corresponding to the specific document. It has at least two or more upstream subnetworks and at least a downstream subnet network that is connected to the particular document through the outlinks that the particular document has.
    The specific document is the duplicate document belonging to the two or more upstream subnetworks.
    The score calculation unit calculates one integrated score for the specific document by integrating the scores of the specific document in the upstream subnetwork, and obtains the score of each document belonging to the downstream subnetwork. An information processing system that calculates based on the integrated score of a specific document.
  3.  請求項1記載の情報処理システムであって、
     前記サブネットワーク判別部は、前記複数のサブネットワークとして、前記特定文書が有するインリンク毎のサブネットワークであって、対応するインリンクより上流に位置する文書群と、前記特定文書と、前記特定文書が有するアウトリンクより下流に位置する文書群と、を備えるインリンク毎のサブネットワークを判別する情報処理システム。
    The information processing system according to claim 1.
    The sub-network discriminating unit is a sub-network for each in-link possessed by the specific document as the plurality of sub-networks, and is a document group located upstream of the corresponding in-link, the specific document, and the specific document. An information processing system that discriminates a group of documents located downstream of the outlinks of the document and a subnetwork for each inlink.
  4.  請求項1記載の情報処理システムであって、
     前記サブネットワーク判別部は、前記複数のサブネットワークとして、前記特定文書が有するインリンク及びアウトリンクの組合せ毎のサブネットワークであって、前記組合せに対応するインリンクより上流に位置する文書群と、前記特定文書と、前記組合せに対応するアウトリンクより下流に位置する文書群と、を備える組合せ毎のサブネットワークを判別する情報処理システム。
    The information processing system according to claim 1.
    The sub-network discriminating unit is a sub-network for each combination of in-link and out-link possessed by the specific document as the plurality of sub-networks, and includes a group of documents located upstream of the in-link corresponding to the combination. An information processing system that discriminates a sub-network for each combination including the specific document and a document group located downstream from the outlink corresponding to the combination.
  5.  請求項3又は請求項4記載の情報処理システムであって、
     前記統合は、前記対応する重複文書の、前記二つ以上のサブネットワークでのスコアの合計を算出することにより、又は、前記二つ以上のサブネットワークでのスコアの代表値を算出することにより実現される情報処理システム。
    The information processing system according to claim 3 or 4.
    The integration is achieved by calculating the sum of the scores of the corresponding duplicate documents in the two or more subnetworks, or by calculating the representative value of the scores in the two or more subnetworks. Information processing system to be done.
  6.  請求項1~請求項5のいずれか一項記載の情報処理システムであって、
     前記個別処理は、前記対応するサブネットワークにおける文書間の接続関係に基づくエルミート隣接行列を用いて、前記対応するサブネットワークに含まれる各文書のスコアを算出する処理を含む情報処理システム。
    The information processing system according to any one of claims 1 to 5.
    The individual process is an information processing system including a process of calculating a score of each document included in the corresponding subnetwork by using an Hermitian adjacency matrix based on the connection relationship between documents in the corresponding subnetwork.
  7.  前記個別処理は、前記対応するサブネットワークに含まれる、アウトリンクを有さない後端文書に対して、ダミー文書を付加することにより、前記後端文書に仮想的にアウトリンクを設けるように、前記対応するサブネットワークを変更し、変更後のサブネットワークにおける文書間の接続関係に基づくエルミート隣接行列を定義する処理を含む請求項6記載の情報処理システム。 In the individual processing, a dummy document is added to the trailing document that does not have an outlink included in the corresponding subnetwork so that the trailing document is virtually provided with an outlink. The information processing system according to claim 6, further comprising a process of changing the corresponding subnetwork and defining an Elmeat adjacency matrix based on the connection relationship between documents in the changed subnetwork.
  8.  前記エルミート隣接行列は、前記対応するサブネットワークを構成する文書D[m](1≦m(整数)≦N)間の接続関係に基づくN行N列のエルミート行列であって、第p行第q列の成分h(p,q)が、文書D[p]から文書D[q]へのリンクが存在し且つ文書D[q]から文書D[p]へのリンクが存在するとき、値1を示し、文書D[p]から文書D[q]へのリンク及び文書D[q]から文書D[p]へのリンクのいずれも存在しないとき、値0を示し、文書D[p]から文書D[q]へのリンクが存在するが文書D[q]から文書D[p]へのリンクが存在しないとき、値+i(iは虚数単位)を示し、文書D[p]から文書D[q]へのリンクが存在しないが文書D[q]から文書D[p]へのリンクが存在するとき、値-iを示す、対角成分がゼロのエルミート行列に対応する請求項6又は請求項7記載の情報処理システム。 The Elmeet adjacency matrix is an N-by-N-column Elmeat matrix based on the connection relationship between the documents D [m] (1 ≦ m (integer) ≦ N) constituting the corresponding subnetwork, and is the p-th row. The component h (p, q) in column q is a value when there is a link from document D [p] to document D [q] and there is a link from document D [q] to document D [p]. 1 is indicated, and when neither the link from the document D [p] to the document D [q] nor the link from the document D [q] to the document D [p] exists, the value 0 is indicated and the document D [p] is indicated. When there is a link from document D [q] but no link from document D [q] to document D [p], a value + i (i is an imaginary unit) is shown, and document D [p] to document Claim 6 corresponding to an Elmeat matrix with zero diagonal, indicating a value −i, where there is no link to D [q] but there is a link from document D [q] to document D [p]. Alternatively, the information processing system according to claim 7.
  9.  前記個別処理は、前記エルミート隣接行列を変形して特殊エルミート隣接行列を定義し、前記特殊エルミート隣接行列の固有ベクトルを用いて前記対応するサブネットワークに含まれる各文書のスコアを算出する処理を含み、前記エルミート隣接行列は、前記固有ベクトルの各成分を仮に複素平面上に配置したときに、全ての成分が前記複素平面においてπ/2ラジアンの角度範囲内に収まるように変形される請求項8記載の情報処理システム。 The individual processing includes a process of transforming the Hermitian adjacency matrix to define a special Hermitian adjacency matrix and calculating the score of each document included in the corresponding subnetwork using the eigenvectors of the special Hermitian adjacency matrix. The Hermitian adjacency matrix according to claim 8, wherein when each component of the eigenvector is tentatively arranged on the complex plane, all the components are deformed so as to be within the angle range of π / 2 radian in the complex plane. Information processing system.
  10.  プロセッサと、
     前記プロセッサに特定の処理を実行させるための命令を含むメモリと、
     を備え、前記特定の処理は、
     文書間の接続関係を表すデータに基づき、少なくとも弱連結で連結された複数の文書で構成される文書ネットワークを判別することと、
     判別された前記文書ネットワークに含まれる、二つ以上の文書からのインリンクを有する特定文書を判別することと、
     判別された前記特定文書を基準に、前記文書ネットワークに含まれる複数のサブネットワークを判別することと、
     判別された前記複数のサブネットワークのそれぞれに対する個別処理として、対応するサブネットワークに含まれる各文書のスコアを算出する処理を実行することにより、前記文書ネットワークを構成する前記複数の文書のそれぞれのスコアを算出することと、
     を含み、
     前記文書ネットワークには、前記複数のサブネットワークのうちの二つ以上のサブネットワークに属する文書である重複文書が一つ以上含まれ、
     前記文書ネットワークを構成する前記複数の文書のそれぞれのスコアを算出することは、一つ以上の前記重複文書のそれぞれに関して、対応する重複文書の前記二つ以上のサブネットワークでのスコアを統合することにより、前記対応する重複文書に対する一つのスコアを算出することを含む情報処理システム。
    With the processor
    A memory containing instructions for causing the processor to perform a specific process, and
    The specific process is
    Determining a document network consisting of at least multiple documents linked by weakly linked documents based on data representing the connection relationship between documents.
    Distinguishing a specific document having an inlink from two or more documents included in the discriminated document network, and
    To discriminate a plurality of sub-networks included in the document network based on the discriminated specific document.
    As an individual process for each of the determined plurality of sub-networks, a process of calculating the score of each document included in the corresponding sub-network is executed, so that each score of the plurality of documents constituting the document network is executed. To calculate and
    Including
    The document network includes one or more duplicate documents that belong to two or more of the plurality of subnetworks.
    To calculate the score of each of the plurality of documents constituting the document network is to integrate the scores of the corresponding duplicate documents in the two or more subnetworks for each of the one or more duplicate documents. An information processing system comprising calculating one score for the corresponding duplicate document.
  11.  コンピュータにより実行される情報処理方法であって、
     文書間の接続関係を表すデータに基づき、少なくとも弱連結で連結された複数の文書で構成される文書ネットワークを判別することと、
     判別された前記文書ネットワークに含まれる、二つ以上の文書からのインリンクを有する特定文書を判別することと、
     判別された前記特定文書を基準に、前記文書ネットワークに含まれる複数のサブネットワークを判別することと、
     判別された前記複数のサブネットワークのそれぞれに対する個別処理として、対応するサブネットワークに含まれる各文書のスコアを算出する処理を実行することにより、前記文書ネットワークを構成する前記複数の文書のそれぞれのスコアを算出することと、
     を含み、
     前記文書ネットワークには、前記複数のサブネットワークのうちの二つ以上のサブネットワークに属する文書である重複文書が一つ以上含まれ、
     前記文書ネットワークを構成する前記複数の文書のそれぞれのスコアを算出することは、一つ以上の前記重複文書のそれぞれに関して、対応する重複文書の前記二つ以上のサブネットワークでのスコアを統合することにより、前記対応する重複文書に対する一つのスコアを算出することを含む情報処理方法。
    An information processing method executed by a computer
    Determining a document network consisting of at least multiple documents linked by weakly linked documents based on data representing the connection relationship between documents.
    Distinguishing a specific document having an inlink from two or more documents included in the discriminated document network, and
    To discriminate a plurality of sub-networks included in the document network based on the discriminated specific document.
    As an individual process for each of the determined plurality of sub-networks, a process of calculating the score of each document included in the corresponding sub-network is executed, so that each score of the plurality of documents constituting the document network is executed. To calculate and
    Including
    The document network includes one or more duplicate documents that belong to two or more of the plurality of subnetworks.
    To calculate the score of each of the plurality of documents constituting the document network is to integrate the scores of the corresponding duplicate documents in the two or more subnetworks for each of the one or more duplicate documents. An information processing method comprising calculating one score for the corresponding duplicate document.
  12.  請求項11記載の情報処理方法であって、
     前記複数のサブネットワークを判別することは、前記特定文書を境界に有する複数のサブネットワークを判別することを含み、
     前記複数のサブネットワークは、前記特定文書が有する二つ以上のインリンクに対応する二つ以上の上流ネットワークであって、上流ネットワークのそれぞれでは、前記特定文書が対応する一つのインリンクを有する、二つ以上の上流サブネットワークと、前記特定文書が有するアウトリンクを通じて前記特定文書と接続される下流サブネットワークとを少なくとも有し、
     前記特定文書は、前記二つ以上の上流サブネットワークに属する前記重複文書であり、
     前記文書ネットワークを構成する前記複数の文書のそれぞれのスコアを算出することは、前記特定文書の前記上流サブネットワークでのスコアを統合することにより、前記特定文書に対して一つの統合スコアを算出し、前記下流サブネットワークに属する各文書のスコアを、前記特定文書の前記統合スコアを基準に算出することを含む情報処理方法。
    The information processing method according to claim 11.
    Distinguishing the plurality of subnetworks includes discriminating a plurality of subnetworks having the specific document as a boundary.
    The plurality of subnetworks are two or more upstream networks corresponding to two or more inlinks possessed by the specific document, and each of the upstream networks has one inlink corresponding to the specific document. It has at least two or more upstream subnetworks and at least a downstream subnet network that is connected to the particular document through the outlinks that the particular document has.
    The specific document is the duplicate document belonging to the two or more upstream subnetworks.
    To calculate the score of each of the plurality of documents constituting the document network, one integrated score is calculated for the specific document by integrating the scores of the specific document in the upstream subnetwork. , An information processing method including calculating the score of each document belonging to the downstream subnetwork based on the integrated score of the specific document.
  13.  請求項11記載の情報処理方法であって、
     前記複数のサブネットワークを判別することは、前記特定文書が有するインリンク毎のサブネットワークであって、対応するインリンクより上流に位置する文書群と、前記特定文書と、前記特定文書が有するアウトリンクより下流に位置する文書群と、を備えるインリンク毎のサブネットワークを判別することを含む情報処理方法。
    The information processing method according to claim 11.
    Distinguishing the plurality of sub-networks is a sub-network for each in-link possessed by the specific document, and is a document group located upstream of the corresponding in-link, the specific document, and an out of the specific document. An information processing method including determining a group of documents located downstream of a link and a subnetwork for each in-link including the document group.
  14.  請求項11記載の情報処理方法であって、
     前記複数のサブネットワークを判別することは、前記特定文書が有するインリンク及びアウトリンクの組合せ毎のサブネットワークであって、前記組合せに対応するインリンクより上流に位置する文書群と、前記特定文書と、前記組合せに対応するアウトリンクより下流に位置する文書群と、を備える組合せ毎のサブネットワークを判別することを含む情報処理方法。
    The information processing method according to claim 11.
    Distinguishing the plurality of sub-networks is a sub-network for each combination of in-link and out-link of the specific document, and the document group located upstream of the in-link corresponding to the combination and the specific document. An information processing method including determining a sub-network for each combination including a document group located downstream from the outlink corresponding to the combination.
  15.  請求項13又は請求項14記載の情報処理方法であって、
     前記統合は、前記対応する重複文書の、前記二つ以上のサブネットワークでのスコアの合計を算出することにより、又は、前記二つ以上のサブネットワークでのスコアの代表値を算出することにより実現される情報処理方法。
    The information processing method according to claim 13 or 14.
    The integration is achieved by calculating the sum of the scores of the corresponding duplicate documents in the two or more subnetworks, or by calculating the representative value of the scores in the two or more subnetworks. Information processing method to be performed.
  16.  請求項1~請求項9のいずれか一項記載の情報処理システムが備える前記文書ネットワーク判別部と、前記文書判別部と、前記サブネットワーク判別部と、前記スコア算出部として、コンピュータを機能させるためのコンピュータプログラム。
     
    To make a computer function as the document network discriminating unit, the document discriminating unit, the sub-network discriminating unit, and the score calculating unit included in the information processing system according to any one of claims 1 to 9. Computer program.
PCT/JP2020/045268 2019-12-20 2020-12-04 Information processing system and information processing method WO2021124933A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2021510244A JP6919961B1 (en) 2019-12-20 2020-12-04 Information processing system and information processing method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2019230822 2019-12-20
JP2019-230822 2019-12-20

Publications (1)

Publication Number Publication Date
WO2021124933A1 true WO2021124933A1 (en) 2021-06-24

Family

ID=76477331

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/045268 WO2021124933A1 (en) 2019-12-20 2020-12-04 Information processing system and information processing method

Country Status (2)

Country Link
JP (1) JP6919961B1 (en)
WO (1) WO2021124933A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6285999B1 (en) * 1997-01-10 2001-09-04 The Board Of Trustees Of The Leland Stanford Junior University Method for node ranking in a linked database
JP2007511815A (en) * 2003-10-20 2007-05-10 テレノール アーアスアー Backward and forward denormalized link weight analysis method, system, and computer program product
WO2019106878A1 (en) * 2017-11-28 2019-06-06 桂太 杉原 Information processing system, information processing method, and computer program

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6285999B1 (en) * 1997-01-10 2001-09-04 The Board Of Trustees Of The Leland Stanford Junior University Method for node ranking in a linked database
JP2007511815A (en) * 2003-10-20 2007-05-10 テレノール アーアスアー Backward and forward denormalized link weight analysis method, system, and computer program product
WO2019106878A1 (en) * 2017-11-28 2019-06-06 桂太 杉原 Information processing system, information processing method, and computer program

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SUGIHARA, KEITA: "Using Complex Numbers in Website Ranking Calculations: A Non-ad Hoc Alternative to Google's PageRank", JOURNAL OF SOFTWARE, vol. 14, no. 2, February 2019 (2019-02-01), pages 58 - 64, XP055835924 *

Also Published As

Publication number Publication date
JPWO2021124933A1 (en) 2021-12-23
JP6919961B1 (en) 2021-08-18

Similar Documents

Publication Publication Date Title
Ma et al. A tensorized transformer for language modeling
Zhang et al. Optimal margin distribution machine
US9262484B2 (en) Method for identifying network similarity by matching neighborhood topology
Rahangdale et al. Deep neural network regularization for feature selection in learning-to-rank
Cheng et al. A subregion division based multi-objective evolutionary algorithm for SVM training set selection
CN105404619B (en) A kind of Semantic Web Services cluster mask method based on similarity
CN112989842A (en) Construction method of universal embedded framework of multi-semantic heterogeneous graph
Ruiz et al. Benchmarking multivariate time series classification algorithms
Xu et al. Correlated features synthesis and alignment for zero-shot cross-modal retrieval
Ismi et al. Deep learning for protein secondary structure prediction: Pre and post-AlphaFold
Wang et al. Localized graph collaborative filtering
Xu et al. Classifier ensemble based on multiview optimization for high-dimensional imbalanced data classification
JP6919961B1 (en) Information processing system and information processing method
Song et al. Domain adaptive network embedding
Peng et al. EGRC-Net: Embedding-Induced Graph Refinement Clustering Network
Zhang et al. Two trades is not baffled: Condense graph via crafting rational gradient matching
See et al. Cryptensor: A resource-shared co-processor to accelerate convolutional neural network and polynomial convolution
WO2019106878A1 (en) Information processing system, information processing method, and computer program
CN115564017A (en) Model data processing method, electronic device and computer storage medium
Kaur et al. Comparative analysis of rank aggregation techniques for metasearch using genetic algorithm
JP6502592B1 (en) INFORMATION PROCESSING SYSTEM, INFORMATION PROCESSING METHOD, AND COMPUTER PROGRAM
Hu et al. Quantum error-correcting codes from the quantum construction X
Xu et al. GLAC-GCN: Global and Local Topology-Aware Contrastive Graph Clustering Network
Ma et al. Essential multi-view graph learning for clustering
JP2012038023A (en) Query generation device and operation method therefor

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021510244

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20902685

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20902685

Country of ref document: EP

Kind code of ref document: A1