Page MenuHomePhabricator

SVG translate tool replaces all fields with "$1" (style element needs at least one trailing character)
Open, LowPublic3 Estimated Story Points

Description

See the file history of https://commons.wikimedia.org/wiki/File:Conic_Sections.svg.

I reverted the file to an old version. I cleared cache, cleared cookies on this site, and go the same bug. It displays correctly for a second and then flickers away to something wrong. There was no way to edit the file correctly.

I tried to revert to an even older version and got the same issue.

Browser: Chrome

image.png (731×1 px, 91 KB)

Event Timeline

The same problem happened to that file again.

Magog reported the original error on 8 Jan 2021; the day before SVG Translate was used to add the language ca, but $ strings appeared instead. The addition of a ca translation should be trivial because many ca translations have been added in the past.

The file history shows that both of Magog's reverts worked. Some cache timing issue may have persisted.

On 12 April 2021, SVG Translate was used to add the language id, and it also resulted in the $ strings.

The bad 12 April 2021 version has improper langtags sr_Latn and sr_Cryl. Those bad langtags were introduced in the 4 and 10 September 2020 uses of SVG Translate.

So this bug may be related to SVG Translate using non-IETF langtags. See T271000.

SVG Translate uses MediaWiki language identifiers rather than IETF langtags. See T279874 and T125073.

On 13 April 2021, Sarang corrected the improper sr language tags.

I tried running SVG Translate on Sarang's version, but SVG Translate states, "This file does not have any labels available for translation. Please pick another image."

The direct URL gives the same message:
https://svgtranslate.toolforge.org/File:Conic_Sections.svg

I ran the W3C validator; it was happy except that no character set had been declared.

Checking the current SVG file shows that it is missing the XML processing instruction. The processing instruction was apparently removed during the optimization of 16 Nov 2020.

I added the XML PI and tried again, but no luck. I may be running into SVG Translate's private cache, so I'll drop the investigation for now:

$ strings were also reported in T231143.

I've purged Conic Sections.svg waited half a day, but SVG Translate is working on the old file (with sr_EC langtags).

SVG Translate is still using the old file iwth sr_EC langtags.

Tried again. SVG Translate may be adding the sr_EC langtags, so I need another way to test this problem.

if sr-EC rewrites are being done, they are changing sequence.

The file does not have tspan elements.

None of the characteristic SVG Translate identifiers have been added to the file.

The xmlns="http://www.w3.org/2000/svg" looks right.

Uploaded a Test.svg file with only text, but cannot purge Commons cache after several failed attempts.

Purge finally succeeded.

Here's file I used for test (with sr-ec):

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 2100 2100">
<title>Trouble Test</title>
<g fill="none" stroke="#000" font-family="sans-serif" font-size="150">
<switch fill="#a00" transform="translate(960.1 456.77)">
<text systemLanguage="cy">Hyperbola</text>
<text systemLanguage="de">Hyperbel</text>
<text systemLanguage="en">hyperbola</text>
<text systemLanguage="es">hipérbola</text>
<text systemLanguage="fa">هذلولی</text>
<text systemLanguage="fi">Hyperbeli</text>
<text systemLanguage="fr">hyperbole</text>
<text systemLanguage="he">היפרבולה</text>
<text systemLanguage="hi">अतिपरवलय</text>
<text systemLanguage="ja">双曲線</text>
<text systemLanguage="mk">хипербола</text>
<text systemLanguage="no">hyperbel</text>
<text systemLanguage="pt">hipérbole</text>
<text systemLanguage="ru">гипербола</text>
<text systemLanguage="sr-el">hiperbola</text>
<text systemLanguage="sr-ec">хипербола</text>
<text systemLanguage="sv">hyperbel</text>
<text systemLanguage="tr">Hiperbol</text>
<text systemLanguage="uk">гіпербола</text>
<text>hyperbola</text></switch>
</g>
</svg>

SVG Translate found the text and offered to translate. I downloaded the SVG to get:

<?xml version="1.0" encoding="UTF-8"?>
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 2100 2100">
<title>Trouble Test</title>
<g fill="none" stroke="#000" font-family="sans-serif" font-size="150">
<switch fill="#a00" transform="translate(960.1 456.77)"><text systemLanguage="sr-el" id="trsvg35"><tspan id="trsvg15">hiperbola</tspan></text><text systemLanguage="sr-ec" id="trsvg36"><tspan id="trsvg16">хипербола</tspan></text>
<text systemLanguage="cy" id="trsvg21"><tspan id="trsvg1">Hyperbola</tspan></text>
<text systemLanguage="de" id="trsvg22"><tspan id="trsvg2">Hyperbel</tspan></text>
<text systemLanguage="en" id="trsvg23"><tspan id="trsvg3">hyperbola</tspan></text>
<text systemLanguage="es" id="trsvg24"><tspan id="trsvg4">hipérbola</tspan></text>
<text systemLanguage="fa" id="trsvg25"><tspan id="trsvg5">هذلولی</tspan></text>
<text systemLanguage="fi" id="trsvg26"><tspan id="trsvg6">Hyperbeli</tspan></text>
<text systemLanguage="fr" id="trsvg27"><tspan id="trsvg7">hyperbole</tspan></text>
<text systemLanguage="he" id="trsvg28"><tspan id="trsvg8">היפרבולה</tspan></text>
<text systemLanguage="hi" id="trsvg29"><tspan id="trsvg9">अतिपरवलय</tspan></text>
<text systemLanguage="ja" id="trsvg30"><tspan id="trsvg10">双曲線</tspan></text>
<text systemLanguage="mk" id="trsvg31"><tspan id="trsvg11">хипербола</tspan></text>
<text systemLanguage="no" id="trsvg32"><tspan id="trsvg12">hyperbel</tspan></text>
<text systemLanguage="pt" id="trsvg33"><tspan id="trsvg13">hipérbole</tspan></text>
<text systemLanguage="ru" id="trsvg34"><tspan id="trsvg14">гипербола</tspan></text>


<text systemLanguage="sv" id="trsvg37"><tspan id="trsvg17">hyperbel</tspan></text>
<text systemLanguage="tr" id="trsvg38"><tspan id="trsvg18">Hiperbol</tspan></text>
<text systemLanguage="uk" id="trsvg39"><tspan id="trsvg19">гіпербола</tspan></text>
<text id="trsvg40"><tspan id="trsvg20">hyperbola</tspan></text></switch>
</g>
</svg>

Notice resorting of sr langtags and insertion of identifiers and tspan elements.

This is not what I'm seeing on failed attempts to translate File:Conic Sections.svg. In those, I'm seeing a resorted sr_EC and sr_EL langtags.

I edited Test.svg to use locale strings, but now must await the cache.

Uploaded test file File:SVG Translate Test - bad langtags.svg

Ran SVG Translate on that file, and it worked.

https://svgtranslate.toolforge.org/File:SVG_Translate_Test_-_bad_langtags.svg

It resorted Serbian, but kept the underscores, added tspan and identifiers. Not the failing case, so something else is the trigger.

I uploaded the current Conic Sections.svg file but without its style element.

https://svgtranslate.toolforge.org/File:SVG_Translate_Test_-_Conic_Sections.svg

SVG Translate displays text. Serbian translations are not accessible but other translations are.

The original style element was:

<style>.B{stroke:#6c5d53;stroke-dasharray:25,12.5,6.25,12.5;stroke-width:6.25}.E{stroke-width:8.91;fill:#f95}.F{stroke-width:9.37;fill:#ff8080}.G{fill:#aaf}</style>

I removed the style element because everything else in Conic Sections.svg looks generic.

added style element with CRLFs; hope cache times out so it can be tested....

Tried new version, and it worked.

style element looked like

<style>
.B{stroke:#6c5d53;stroke-dasharray:25,12.5,6.25,12.5;stroke-width:6.25}
.E{stroke-width:8.91;fill:#f95}
.F{stroke-width:9.37;fill:#ff8080}
.G{fill:#aaf}
</style>

Restoring style element to one line, but CRLF before and after, but must wait for a trial.

So the style element tickles the bug.

If the SVG has a one-line style element,

<style>.B{stroke:#6c5d53;stroke-dasharray:25,12.5,6.25,12.5;stroke-width:6.25}.E{stroke-width:8.91;fill:#f95}.F{stroke-width:9.37;fill:#ff8080}.G{fill:#aaf}</style>

Then SVG Translate says, "This file does not have any labels available for translation."

if the SVG has a multi-line style element,

<style>
.B{stroke:#6c5d53;stroke-dasharray:25,12.5,6.25,12.5;stroke-width:6.25}
.E{stroke-width:8.91;fill:#f95}
.F{stroke-width:9.37;fill:#ff8080}
.G{fill:#aaf}
</style>

then SVG Translate is happy.

I suspect that SVG Translate reads the SVG file, walks the DOM tree at least to the extent that it will reorder the Serbian langtags but before it adds any of its identifiers or tspan elements, and then attempts to parse the style block (why?). The code probably throws an exception during that parse, SVG Translate catches the exception, and presumes there is no text. If the style block has some extra characters (newlines), then the parse does not throw an exception, and SVG Translate offers the translations.

The parsing failure is probably due to parsing a # literal when the optional final ; statement separator is absent. I'll test that by adding the semicolons.

A one-character fix is needed.

https://github.com/wikimedia/svgtranslate/blob/master/src/Model/Svg/SvgFile.php at line 175.

Its regex requires the final CSS block's } to be followed by at least one non-open-brace character ([^{]+$).

The failure case above has no characters following the last CSS }, so makeTranslationReady() will return false.

Change + to * and it should be good to go.

Log file should have "File {file} has CSS too complex to parse". That message should have been given to the user.

The regex should test that trailing chararacters do not have a left brace or a right brace. A right brace may appear with CSS at-sign rules.

Working case: the style element has trailing whitespace.

Failing case: the style element has no characters following the last close brace.

It no longer displays $0 strings but rather claims "This file does not have any labels available for translation. Please pick another image."

Glrx renamed this task from SVG translate tool replaces all fields with "$1" to SVG translate tool replaces all fields with "$1" (style element needs at least one trailing character).Jul 12 2022, 9:54 PM

@TheDJ @Samwilson

This issue has a 1-character fix.

The original file has been fixed to avoid the error. To check, see that this link works:

The failure mode has changed. It no longer shows $nn strings but rather complains with:

  • This file does not have any labels available for translation. Please pick another image.

This file will trigger the bug

will show the bug. Some earlier versions of the file will work.

Link to run SVG Translate:

The original SVG has the following style element:

Conic Sections.svg
<style>.B{stroke:#6c5d53;stroke-dasharray:25,12.5,6.25,12.5;stroke-width:6.25}.E{stroke-width:8.91;fill:#f95}.F{stroke-width:9.37;fill:#ff8080}.G{fill:#aaf}</style>

As explained above, the error is in https://github.com/wikimedia/svgtranslate/blob/master/src/Model/Svg/SvgFile.php at line 175 (the first preg_match below; the code starts at line 169):

SVGFile.php
$styles = $this->document->getElementsByTagName('style');
$styleLength = $styles->length;
for ($i = 0; $i < $styleLength; $i++) {
    $style = $styles->item($i);
    $CSS = $style->textContent;
    if (false !== strpos($CSS, '#')) {
        if (!preg_match('/^([^{]+\{[^}]*\})*[^{]+$/', $CSS)) {
            // Can't easily understand the CSS to check it, so exit
            $this->logFileProblem('File {file} has CSS too complex to parse');
            return false;
        }
        $selectors = preg_split('/\{[^}]+\}/', $CSS);
        foreach ($selectors as $selector) {
            if (false !== strpos($selector, '#')) {
                // IDs in CSS will break when we clone things, should be classes
                $this->logFileProblem('File {file} has IDs in CSS');
                return false;
            }
        }
    }
}

The code is looking at all the style elements and testing for an id selector (e.g., #foo). SVG Translate will give up if there is such a selector. If it finds a # in the CSS block, then it tries a more careful examination at line 175. It tries to match the basic pattern of 0 or more

  • selectors { rules }

Where the selectors pattern is everything but a { and the rules pattern is everything but a }. That simple parsing will work as long as braces are not nested. Such nesting will happen if there is a media rule. If there is nesting, then the braces will not match up

The pattern checks for 0 or more rule sets, and then it checks that the characters following the last rule set do not contain any {. There can be substantial text at the end because the CSS could have comments (/* ... */). The pattern checks the trailing characters to EOL with

  • [^{]+$

The problem is that there may not be any trailing characters. The original style element had no trailing characters. Consequently, the preg_match will fail, the CSS will be declared as too complex, and SVG Translate will refuse to translate.

Do not demand at least one character after the last rule set: The pattern should end with

  • [^{]*$

I might make the pattern also reject any trailing } to make assurance double sure.

Consider a style element that has an @media rule:

  • @media printer { text { fill: black; } }

That rule is presumably beyond the ad hoc parser, yet the original regex will succeed.

The `preg_match will match

  1. @media printer
  2. text { fill: black; (match is confused by nested braces)

and the trailing string will be } rather than just spaces. That is not a left brace, so it will be absorbed by the [^{]+$ pattern.