Page MenuHomePhabricator

Consider improving quota workflow
Closed, ResolvedPublic

Description

The current restrictive quotas combined with the slow, manual, quota increase workflow as described in https://phabricator.wikimedia.org/project/view/4834/ is not well suited for a platform which is used by volunteers to build tools.

These volunteers will typically build these tools in their free time, and "please fill a form and wait for a week" will lead to tools not being built, or tools circumventing the quotas (e.g. by creating a second tool account).
In my case, this was the first weekend where I had time to look at migrating tools to Kubernetes in the last 2 months, and I won't have much time in the upcoming weekends. The low 'deployments.apps' quota (T306322) and server ulimits (T306307) were a significant hindrance to moving wikibugs over.

Two suggested alternative approaches:

  • Lenient quotas with monitoring -- increase all quotas by 10x and manually reach out to maintainers when they use more than the current quota.
  • Self-service quota increase with manual verification afterwards

Event Timeline

Indeed the default CPU quota seems very low (2.0 for both requests and limits). I'd be in favour of a much higher default limits (ideally double the requests). For context, webservices default to 0.15 requests / 0.5 limits and toolforge-jobs defaults to 0.25 / 0.5. We didn't for some reason raise that one when reviewing the quotas for initial toolforge-jobs use.

@valhallasw Sorry to hear you ran into limitations during your migration! We can consider revising the default. What level would have worked for you?

@Majavah It sounds like you are suggesting defaults of:

1 CPU and 1G by default per container, user expandable to up to 2 CPU and 8G of memory

Do I have this correct?

It's great to hear from you, @valhallasw ! We discussed this in our weekly meeting today. The general consensus is that users almost /never/ hit our (admittedly low) default quotas, and that the low quotas have the advantage of making us a very unappealing target for potential abuse (e.g. coin miners).

The good news is that the quota approval process has been streamlined recently, so typically it only takes a day or two to get a quota bump. We're updating the docs to reflect this, and I'd also encourage people to just yell for help on IRC if they're impatient and/or blocked for lack of headroom.

aborrero claimed this task.