Page MenuHomePhabricator

[Bug] Quantity field has bad data (\n)
Closed, ResolvedPublic

Description

Somehow on certain entities quantity field happens to have bad data, e.g. see:

https://www.wikidata.org/wiki/Special:EntityData/Q1427761.ttl?flavor=dump

The quantity field on P1083 is "+16100\n" which is obviously wrong. Same in JSON:

https://www.wikidata.org/wiki/Special:EntityData/Q1427761.json

There's quantity field:

"snaktype": "value",
"property": "P1083",
"datavalue": {
"value": {
"amount": "+16100\n",
"unit": "1",
"upperBound": "+16100\n",
"lowerBound": "+16100\n"
},
"type": "quantity"

This is obviously not correct and API should catch this and fix the data.

Event Timeline

Smalyshev raised the priority of this task from to Needs Triage.
Smalyshev updated the task description. (Show Details)
Smalyshev added a project: Wikidata.
Smalyshev subscribed.
Lydia_Pintscher renamed this task from Quantity field has bad data (\n) to [Bug] Quantity field has bad data (\n).Aug 31 2015, 9:33 AM
Lydia_Pintscher triaged this task as High priority.
Lydia_Pintscher set Security to None.
Lydia_Pintscher added a subscriber: daniel.

Also, editing P1083 on Q1427761 seems to be broken too - edit field just does not appear.

thiemowmde subscribed.

Patches for review:

For your information, the reason is a bogus edge case in PCRE, see http://www.regular-expressions.info/anchors.html#realend. This edge case does not exist in ECMA/JavaScript.

@thiemowmde thanks for digging this up. What the fuck, PCRE?! I bet this causes security holes in thousands of web apps. Maybe worth mentioning at Perl Jam 3... https://media.ccc.de/v/32c3-7130-the_perl_jam_2

Wow this is messed up. You could use D modifier though:

https://3v4l.org/33Gks
http://php.net/manual/en/reference.pcre.pattern.modifiers.php

D (PCRE_DOLLAR_ENDONLY)
If this modifier is set, a dollar metacharacter in the pattern matches only at the end of the subject string. Without this modifier, a dollar also matches immediately before the final character if it is a newline (but not before any other newlines). This modifier is ignored if m modifier is set. There is no equivalent to this modifier in Perl.

Existance of this modifier shows somebody already encountered this WTF :)