Page MenuHomePhabricator

Tweak the frequency of AbuseFilter validations
Closed, ResolvedPublic

Description

After the deployment of AbuseFilter validations, @ssastry informed about a spike in the requests to parsoid to serialize the fragments. This is triggered by the validations in autosave events. The number of autosaves that happen for a long translations can be so high and we may not be efficient in doing this validation every time.

We need to think about some smart ways to reduce frequency of validations.

Event Timeline

I have the following rough idea:

  1. In translation units that are sent to cxsave API for saving can have a boolean typed validate flag. If true, it will be validated, otherwise skipped
  2. The translation storage module will be responsible for setting that flag true or false
  3. Do not set that flag true for section headings or if sections are too small(move the size check logic to client side)
  4. Set the validate flag true in every 10th autosave(10 is example)
  5. Set the validate flag true if the section has a validation error

A potential quick win: do not save unchanged sections right after article is loaded.

A potential quick win: do not save unchanged sections right after article is loaded.

The idea of deleting the record from drafts table on first save to cx_corpora table depends on this whole sections save. :/

Can we add a feature flag that indicates when it is coming from drafts or corpora table?

To address the main issue - that is frequency of validations, The above approach I outlined will solve it. The validate flag in translation units becomes true in its 10th edit.

Change 277480 had a related patch set uploaded (by Santhosh):
Reduce the frequency of AbuseFilter Validations

https://gerrit.wikimedia.org/r/277480

Change 277480 merged by jenkins-bot:
Reduce the frequency of AbuseFilter Validations

https://gerrit.wikimedia.org/r/277480

html to wikitext API request rate from Parsoid performance dashboard:

pasted_file (598×1 px, 46 KB)

The fix was deployed to production on March 24 and above graph shows immediate decrease in the API request rate