Page MenuHomePhabricator

Determine storage requirements for stashing parsoid output for VE edits
Closed, ResolvedPublic

Description

The storage backend for stashing parsoid output for VE edits in the page/html endpoint needs to be configurable. The requirements in for persistance and latency are still unclear though.

Outcome

  • On the Cassandra keyspace used by RESTbase for stashing edits, we are seeing about 100 writes per second across all wikis (but only about 10 reads/s, indicating that 90% of edits are abandoned)
  • At a TTL of 24h, to amounts to about 7 million entries at any given time
  • Assuming an average of 20KB for each HTML blob, this works out to be 140GB.
  • Since this is essentially a key/value store, not much extra space is needed for indexes.
  • The sorage requirement will be multiplied by the replication factor

Backend tech choice:

  • Replication requirement: we need the stahed data to be available across DCs. Candidate tech: MemCached via mcrouter, Cassandra, MySQL (Redis as well, but it is being phased out).
  • Retention requirement: if stashed data vanishes, this directly impacts users by causing edits to fail. We don't want that. Candidate tech: Cassandra, MySQL
  • Performance requirement: high write rate. Candidate tech: MemCached via mcrouter, Cassandra
  • Space requirement: we need hundreds of GB with no unexpected eviction. Candidate tech: Cassandra, MySQL
  • Ease of deployment/maintenance: use what we have. Candidate tech: MemCached via mcrouter, MySQL.

Given the requirements above, the choice is between Cassandra and MySQL. Cassandra would require a significant effort (bundling and deploying a driver, implementing an adapter, setting up and running the Cassandra cluster). Using the ParserCache MySQL cluster only requires a small config change. So we whould try MySQL first, and picot to Cassandra if needed.

See T308511: [SPIKE] Determine necessity of edit session continuity during data center switchovers

Related Objects

StatusSubtypeAssignedTask
StalledNone
In ProgressNone
OpenNone
ResolvedNone
OpenNone
ResolvedJgiannelos
Resolveddaniel
ResolvedClement_Goubert
DeclinedNone
Resolvedhnowlan
In ProgressNone
Resolveddaniel
Resolveddaniel
Resolveddaniel
OpenMSantos
ResolvedMSantos
ResolvedMSantos
ResolvedROdonnell-WMF
ResolvedBUG REPORTMSantos
ResolvedBUG REPORTdaniel
ResolvedBUG REPORTdaniel
OpenBUG REPORTNone
InvalidNone
Resolveddaniel
ResolvedBPirkle
In Progressdaniel
DuplicateNone
Stalleddaniel
Resolveddaniel
Resolveddaniel
Resolveddaniel
In Progressdaniel
Resolveddaniel
Opendaniel
OpenNone
Resolveddaniel
Resolveddaniel
Resolveddaniel
DeclinedNone
OpenNone
OpenJgiannelos
ResolvedBPirkle
ResolvedJgiannelos
OpenNone
OpenNone

Event Timeline

Change 802584 had a related patch set uploaded (by Daniel Kinzler; author: Daniel Kinzler):

[mediawiki/core@master] ParsoidOutputStash: make storage backend configurable.

https://gerrit.wikimedia.org/r/802584

Change 802584 merged by jenkins-bot:

[mediawiki/core@master] ParsoidOutputStash: make storage backend configurable.

https://gerrit.wikimedia.org/r/802584

daniel claimed this task.
daniel updated the task description. (Show Details)

See summary in task description