Module:PropertyPath
The PropertyPath module is an implemetation and extension of SPARQL (Q54871) 's PropertyPath in Lua for Wikidata and Wikibase client.
There is however a few difference with SPARQL regular property path and this one are:
- this implementation adds a feature to deal with qualifiers in paths
- unlike full sparql paths, we need to provide and item for the starting point of the path
- we can use Wikidata's identifiers for property Ids, including property labels in the main language of the client wiki
- the operator "^" cannot work with lua because of the limitations of the lua mw.wikibase API
Pahs
[edit]Introduction example
[edit]A path to find the grandchildren.
"child/child"
will be our first path. Used with a template that exploits this modules we can expand this path starting from an item, say the item about the greek God Zeus :
- Anteros (Q572133)
- Deimos (Q106240)
- Eros (Q121973)
- Phobos (Q105019)
- Harmonia (Q978079)
- Himeros (Q1425549)
- Phlegyas (Q831583)
- Antiope (Q580862)
- Adrestia (Q4217252)
- Lycaon (Q3268334)
- ...(?)
- The templates shows as a first result (when writing this line) Anteros (Q572133) . It's here because the Zeus item has a statement and on Ares (Q40901) there is a statement . Actually Zeus has a lot of grandchildren and you should not see all of them with this template without adding a parameter to add to the limit.
Similarly, if we want the mothers of Zeus children we can use a path child/mother
:
- Hera (Q38012)
- Dione (Q199923)
- Leto (Q103107)
- Hera (Q38012)
- Leto (Q103107)
- Maia (Q205233)
- Demeter (Q40730)
- Styx (Q542758)
- Semele (Q776139)
- Demeter (Q40730)
- ...(?)
The next section will briefly describe the set of operators we can put in a path. "/", which we exemplified here, is one of them, but there is several others that can be combined with each other to build more complex paths.
Operators
[edit]In the following, an item path will be a sequence of items and (one or two) properties. An item path may be ended with a wikidata value. For example
Q1001) ; instance of (P31) ; human (Q5) ; subclass of (P279) ; person (Q215627)
or Zeus (Q34201) ; child (P40) ; Ares (Q40901) ; child (P40) ; Anteros (Q572133)
will be paths. It's possible to create item paths from Wikidata statements starting from an item and following the statements' main property and main values, but also as we will see later by following the qualifiers snaks of some property. In this case our item paths could look like Fyodor Dostoyevsky (Q991) ; place of birth (P19) ; Saint Petersburg (Q656) ; official name (P1448) ; start time (P580) ;
We will discover the operation of "matching" an item path with a path. First we need to define and operation beetween a property and an item path : "eating". We will say that a property can "eat" P a path if the first property of a path is P. The result will be a path shortened up to this property - takin a property and a path :
- Path =
Fyodor Dostoyevsky (Q991) ; place of birth (P19) ; Saint Petersburg (Q656) ; official name (P1448) ; start time (P580) ;
- P=place of birth (P19)
P can "eat" the path which will result to Saint Petersburg (Q656) ; official name (P1448) ; start time (P580)
. Another property could not eat this path.
The operation of "eating" is used to construct the operation of "matching" an item path to a property path - it finds all the item paths we can construct on Wikidata following statements from a starting item that matches a part.
A path matches an item path if the components of a path can entirely eat the item path - except for one item or value.
This module performs the operation of finding the item paths in Wikidata that can be totally eaten by a path pattern. Additionally, all parts of a path must eat something in the sense defined in the following section.
For example : the path Mexico ; shares border with ; USA ; shares border with ; Canada
will be a match for the pattern shares border with/shares border with
but won't be a match for shares border with
nor for shares border with/shares border with/shares border with
. The first one because there will be parts of the path that are not eaten, and the second because the third shares border with
has nothing to eat.
We will extend this "eating" operation to more complex path in the following.
A path is constructed using the following operators and properties :
- simple property
- A simple property is either a property Pid or a property label like
P31
orinstance of
. A property "eats" a path as explained above. If using the label of the property like in "instance of", only the label of a property can be used and not an alias. The language of the label also matters : it must be written in the language of the main wiki (english for wikidata, french for french wikipedia, ...) /
- The sequence operator. See the initial example. This operator is one basic operation to build a complex path from simpler path. if P1 is a path and P2 is a path then
P1/P2
is also a path.P1/P2
will eat an item path IP if P1 eats IP at first, then if P2 eats the result of P1 eating IP. |
- The alternative operator. This operator is one other basic operation to build a complex path from simpler path. if P1 is a path and P2 is a path then
P1|P2
is also a path.P1|P2
will eat an item path IP if either P1 eats IP or P2 eats IP. ?
- an operator that make the preceding path optional. It always eats something. For example the pattern
share borders with?
starting from Mexico will match both theMexico
path and theMexico ; shares border with ; USA
one. *
- an operator that, like preceding one, always matches. It allows to match a path of indefinite size if it can be eaten by repeating the "eat" operation as many times as we like on the path that is preceded by the operator. For example the code
(father|mother)*
will find all the ancestors of the starting point in a genealogical tree, including itself. It will find the parents of someone (the father and the mother) then the parents of the parents and so on. +
P+
is equivalent toP/P*
. Like the*
operator except it must be at least one math ofP
.>
- This operator is related to qualifier snaks matching. For example the path
union of>of
will eat the qualifier "of" snaks of the statements "union of" of an item. The values from which the next operator of the path will start is the value of the snak. - It also can be used either at the very beginning of the path, if the starting point of the path is a statement and not an item.
!
- This operator is a way to define property that will not be eaten. It is followed by either a single property or a list of properties separated by a
|
put into parenthesis. for example!(instance of|subclass of)
will eat any (one-propertied) path likeDonald Duck ; Father ; Picsou
, but neitherGhandi ; instance of ; human
norhuman ; subclass of ; animal
. It can also be used with the>
operator to exclude a set of qualifiers for matching. {number} and {min, max}
- This operator is used to specify that a path can match several times, but with a bound.
P{4}
is equivalent toP/P/P/P
, that is P must eat exactly four times something, andP{4,6}
must match between 4 and six times. It's equivalent toP/P/P/P/P?/P?
Results
[edit]This module manipulates and generate a datastructure called "ResultNode" that represent nodes on a matching path (a path that is totally eaten by a path pattern).
The path is designed such as the result is a linked list of Nodes. Those nodes represents a path (say <human> ; <P31> ; <Q5> ; <P279> ; <Q42>
) in backward order. Actually each node can be interpreted as either a snak or a statement (like the <P279> ; <Q42>) in the usual lua Wikidata data model (it's actually a snak or a statement which is adjoined a few functions and attributes. So a result node can be used as a snak or a statement in the functions of Module:Wikidata.
The previous snak or statement can be retrieved thanks trough the "parent" field of the node.
Interface
[edit]- Main functions
- iterate
path (string), start_item (string, entity object) -> iterator (on result node)
- function that takes a path and a start item and returns an iterable object, usable in lua for loops that iterates over the set of paths matching the pattern, each represented by a ResultNode (see above)
Code
local parser = require "Module:PropertyPath/parser"
local datastructure = require "Module:PropertyPath/Path"
local results = require "Module:PropertyPath/ResultStructure"
local props = require "Module:Properties"
local iter = require "Module:Iterators"
local path = {}
--------------
-- TODO :
-- * Update the "between" path to handle it better epsilon paths
-- * Test full path rendering
-- *
--------------
-- Definition of a PropertyPath class
local PropertyPath = {}
PropertyPath.__index = PropertyPath
--[[ Datastructure for the paths that will match a path pattern
A path matching the pattern "subclass of*" will be a chain of statements and snaks nodes.
If we got statements of the form (no qualifiers here, just subject with the main statement snak) :
* <human> <subclass of> <ape>
* <ape> <subclass of> <mammal>
* <mammal> <subclass of> <animal>
a matching path like "<human> -> <ape> -> <mammal> -> <animal>" will be reprensented by a linked list of "ResultNode" objects.
A result node object is a mw.wikibase "statement" standard object augmented with a few methods and a link that goes from the statement or snak to the previous node in the path.
{
<mammal> <subclass of> <animal>
"parent" = {
<ape> <subclass of> <mammal>
"parent" = {
<human> <subclass of> <ape>
"parent" = EpsilonRNode(<human>, "parent" = nil)
}
}
}
--]]
local ResultNode = results.ResultNode
local StatementRNode = results.StatementRNode
local QualifierRNode = results.QualifierRNode
local EpsilonRNode = results.EpsilonRNode
------------------------------------------------------------------------------------------------------
local function iterate_on_snaks(
start_rnode,
property_filter_criteria,
snak_map_iterator,
rnode_type)
assert(snak_map_iterator)
return iter.pair_map(
iter.flatten(
iter.select_vals(
iter.pair_filter(
snak_map_iterator,
property_filter_criteria
)
),
iter.on_vals
),
function(value) return rnode_type:create(value, start_rnode) end
)
end
-- creates an iterator that will iterate over all the statements
-- of a specific property of an item
local function iterate_on_statement(start_rnode, property_filter_criteria)
local item = mw.wikibase.getEntity(start_rnode:item_value())
return iterate_on_snaks(
start_rnode,
property_filter_criteria,
iter.on_pairs(item.claims),
StatementRNode
)
end
local function iterate_on_statement_qualifier(statement, qualifier_filter_criteria)
if statement.qualifiers then
return iterate_on_snaks(
statement,
qualifier_filter_criteria,
iter.on_pairs(statement.qualifiers),
QualifierRNode
)
else
-- is there a qualifier table when the statement has no qualifier snak ??
mw.log(" maybe a statement with no qualifier, or a bug")
return function() return nil end
end
end
local iterate_on_statement_from_property = function(start_rnode, pid)
local claims = mw.wikibase.getBestStatements(
start_rnode:item_value(),
props.normalize(pid)
) or {}
return iter.pair_map(
iter.pair_filter(iter.on_pairs(claims), function(key, val) return true end),
function(key, value) return StatementRNode:create(value, start_rnode) end
)
end
local function iterate_on_star(start_rnode, child_pnode, depth, iterated, max)
-- start_rnode : the result node from which we will iterate
-- child_pnode : the path within the star operator (for example P31/P31 if our node is (P31/P31)*
iterated = iterated or {} -- iterated is the store of already iterated starting points items to avoid infinite loops
-- max : the max number of iteration depth to go, nil for no limit
depth = depth or 1
--[[
In pseudo code using a « yield » operator, the algorithm would be
algo star(startnode)
for each value v which match child_pnode from startnode
yield v
for each value vchild in star(v)
yield vchild
end for
end
end
But we can’t use a yield operator if the « coroutine » module on lua is not activated.
So we must translate this into something more complicated.
Luckily the approach to write iterators in term of composition seems to pay off and
it seem possible to write code structurally similar to this algorithm thanks to the
« flatten » iterator and a recursive closure that creates iterator to handle the
recursivity implied by the « star » operator nature.
--]]
function creator()
return function(start_rnode)
local depth_overflow = not (not max or depth < max)
if start_rnode:has_an_item() and not iterated[start_rnode:item_value()] and not depth_overflow then
iterated[start_rnode:item_value()] = true
return iterate_on_star(start_rnode, child_pnode, depth + 1, iterated, max)
else
return function() end
end
end
end
return iter.chain(
iter.singleton(start_rnode),
iter.flatten(
child_pnode:iterate(start_rnode),
creator()
)
)
end
local iterate_on_plus = function(start_rnode, child_pnode, max_depth)
local first = true
iterated = iterated or {}
return iter.flatten(
child_pnode:iterate(start_rnode),
function(rnode)
return iterate_on_star(rnode, child_pnode, 1, iterated, max_depth)
end
)
end
--[[
Test :
p.test("Q5", "subclass of+") -- at the time writing, "Q215627" is the only direct superclass of human. It does not show up, but there is numerous superclass in the result
--]]
--[[ an iteraton to handle "/" operator sequences, for example « P31/P279* »
"creators" is a table of functions that needs to create iterators of child nodes.
In our example, the first cretors element will be a function that takes an item object and
will return an iterator over P31-statements of this item
the second one will create an iterator over the path « P279* » and so on.
The resulting iteratior will iterate on each elements of the second iterator starting from each iterator over the second one
for each elements in the first one.
--]]
local function iterate_on_iterator_creators(start_rnode, creators, i)
i = i or 1
if not(tonumber(i)) then i = 1 end
-- main iterator : the iterator that will iterate on the values on this node of the path
local main_iterator = creators[i]:iterate(start_rnode)
if i < #creators then
--trying to initialize the iterator for the next node with a value of the current one, if we can
local rnode = main_iterator()
while rnode and not(rnode:has_an_item()) do
rnode = main_iterator()
end
-- could not initialize the next iterator with a proper item ; returnun the empty iterator function
if not rnode then return function() return end end
-- we found a proper value to iterate on for the next node in the path
-- final iterator : the iterator that will iterate
-- on elems that will be returned by each iterations
-- on the iterator created by the main client
local final_iterator = iterate_on_iterator_creators(rnode, creators, i+1)
return function()
while final_iterator ~= nil do
-- pulling the element from the next node iterator in the sequence
local final_elem = final_iterator()
if final_elem then
return final_elem
else
-- we pulled the last elem for this value, getting a new value
-- for this node path and regenerate the next node iterator to pull new final values
local rnode_value = main_iterator()
-- return the element pulled from the next node iterator
-- if the property has item datatype is not a special value and has the right snaktype
-- as we can't continue path on other kind of values
if rnode_value then
if rnode_value:has_an_item() then
final_iterator = iterate_on_iterator_creators(rnode_value, creators, i+1)
end
else
--we're over, no next value for this node to continue the path
return
end
end
end
end
elseif i == #creators then
return main_iterator
end
end
--[[ JSBach : Q1339 ;
Testing with :
test("Q1339", "child/child")
wikidata query equivalent query :
select ?grandchild where {
wd:Q1339 wdt:P40/wdt:P40 ?grandchild
}
Adam : wd:Q70899
test("Q70899", "child/child/child")
wikidata query equivalent query :
select ?grandgrandchild where {
wd:Q70899 wdt:P40/wdt:P40/wdt:P40 ?grandgrandchild
}
--]]
local iterate_on_alternatives = function(start_rnode, pnodes)
local i=1
local current_iter = pnodes[i]:iterate(start_rnode)
return function ()
-- loop to go to next iterator if there is empty one in the list
while true do
local res = current_iter()
-- res is an iterator itself ; getting its result
if res then
return res
else
i = i + 1
if i <= #pnodes then
-- following to next iterator and resume loop
current_iter = pnodes[i]:iterate(start_rnode)
else
-- no current iterator : ending
return nil
end
end
end
end
end
--[[
Adam's father or mother : no value of course
p.test('Q70899', "P22|P25")
JS Bach's
p.test("Q1339", "P22|P25")
--]]
local function iterate_on_nodes_beetween(start_rnode, pnode, min, max)
seq = {}
i = 1
while i <= min do
table.insert(seq, pnode)
i = i + 1
end
local sequence_obj = {}
function sequence_obj:iterate(next_rnode)
return iterate_on_iterator_creators(next_rnode, seq, min)
end
if max then
local star_obj = {}
function star_obj:iterate(next_rnode)
return iterate_on_star(next_rnode, pnode, 1, iterated, max-min)
end
return iterate_on_iterator_creators(
start_rnode,
{
sequence_obj,
star_obj
}
)
else
return sequence_obj:iterate()
end
end
local function iterate_maybe(start_rnode, pnode)
iterator = pnode:iterate(start_rnode)
local self_done = false
return function()
if not self_done then
val = iterator()
if val then return val else
self_done = true
return start_rnode
end
end
end
end
function PropertyPath:new(str)
local obj = {["path"]=str}
setmetatable(obj, self)
local ast = parser.parse_path(str)
assert(ast, "parser did not ruturn a node")
obj.node = ast
return obj
end
local function entityId(entity)
if type(entity) == 'string' then
return entity
end
return entity.id
end
local function norm_start_point(start_point)
if type(start_point) == "string" then
return EpsilonRNode:create(start_point)
elseif type(start_point) == "table" then
if start_point["claims"] ~= nil then
-- assume this is an item or entity object
return EpsilonRNode:create(start_point.id)
elseif start_point["is_RNode"] then
return start_point
elseif start_point["qualifiers"] or start_point["mainsnak"] then
local itemid = string.gmatch(start_point.id, "^.*[^$]")() -- extract the item id from the starting statement
return StatementRNode:create(start_point, EpsilonRNode:create(itemid))
end
end
mw.logObject(start_point)
error("from function norm_start_point of module PropertyPath : wrong type for start_point", tostring(start_point)) -- TODO : Log a better error
end
function PropertyPath:iterate(start_point)
start_point = norm_start_point(start_point)
return self.node:iterate(start_point)
end
local PropertyNode = datastructure.PropertyNode
local AlternativeNode = datastructure.AlternativeNode
local SequenceNode = datastructure.SequenceNode
local QualifiedStatementNode = datastructure.QualifiedStatementNode
local NegatedPropertySetNode = datastructure.NegatedPropertySetNode
local PlusNode = datastructure.PlusNode
local StarNode = datastructure.StarNode
local BetweenNode = datastructure.BetweenNode
local MaybeNode = datastructure.MaybeNode
local QualifierSnakNode = datastructure.QualifierSnakNode
function PropertyNode:iterate(rnode)
return iterate_on_statement_from_property(rnode, self.property)
end
--[[
test("Q5", "subclass of")
--]]
function AlternativeNode:iterate(rnode)
return iterate_on_alternatives(rnode, self.nodes)
end
function NegatedPropertySetNode:iterate(rnode)
return iterate_on_statement(rnode,
function (property, val) return self:matches(property) end
)
end
--[[
test("Q90, ""!(P150)")
--]]
function SequenceNode:iterate(rnode)
return iterate_on_iterator_creators(rnode, self.nodes)
end
function QualifiedStatementNode:iterate(rnode)
local statement_iterator = iterate_on_statement(
rnode,
function (key, value)
return self.property:matches(key)
end
)
local qualifier_iterator_creator = function(statement)
return iterate_on_statement_qualifier(
statement,
function (key, value) return self.qualifier:matches(key) end
)
end
return iter.flatten(statement_iterator, qualifier_iterator_creator)
end
--[[ to test with :
p.test("Q79529", "union of>of")
p.test("Q105019",'P22{1,6}'
--]]
function QualifierSnakNode:iterate(statementnode)
return iterate_on_statement_qualifier(
statementnode,
function (key, value) return self:matches(key) end
)
end
--[[ to test with :
for x in p.iterate("Q79529", "union of") do p.test(x, ">of") end
--]]
function StarNode:iterate(rnode)
return iterate_on_star(rnode, self.node)
end
function PlusNode:iterate(rnode)
return iterate_on_plus(rnode, self.node)
end
function BetweenNode:iterate(rnode)
return iterate_on_nodes_beetween(rnode, self.node, self.min, self.max)
end
function MaybeNode:iterate(rnode)
return iterate_maybe(rnode, self.node)
end
-- returns an iterator on the result set of a path from a specific node
-- ppath acn either be a string representing a path or a compiled path
function path.iterate(start_node, ppath)
if start_node == nil then error("the start node is mandatory to get result on a path, it is nil") end
if type(ppath) == "table" then
return ppath:iterate(start_node)
else
return path.PropertyPath:new(ppath):iterate(start_node)
end
end
-- function that return a boolean
-- true if there is a path matching ppath from start_node that ends with the value "value"
-- (currently only works if "value" is a Qid string)
function path.matches(start_node, ppath, value)
for val in path.iterate(start_node, ppath) do
if val:item_value() == value then
return true
end
end
return false
end
----------------------------
--[[
p.test("Q5", "P279")
p.test(mw.wikibase.getEntity("Q5"), "P279")
for x in p.iterate(mw.wikibase.getEntity("Q5"), "P279") do p.test(x, "P279") end -- test if we can continue iteration of an RNode object
Complex test :
p.test("Q27929033","P1552>!()/P31") => OK
p.test("Q27929033","subclass of/P1552>!()/P31") => NOK
--]]
function path.test(start_point, ppath)
for x in path.iterate(start_point, ppath) do
mw.log("woot")
if x then
mw.log(x:item_value())
end
end
end
----------------------------
path.PropertyPath = PropertyPath
return path