I think @BTullis' idea is great - there's a few unknowns regarding how to keep what runs via airflow in sync with what runs on k8s via helm charts, but that's something I'd rather invest my time in than writing some shim to run mwscript on a physical host. That was a solution I proposed to reduce effort overall but I think it makes sense for multiple reasons, first of all the operational toil of the DE SRE team, to go with his proposal.

Wed, Oct 23, 1:38 PM · Data Products, Data-Platform-SRE, MW-on-K8s, Dumps-Generation, Release-Engineering-Team, serviceops

Joe added a comment to T377497: Functional replacement for importImages.php on Kubernetes.

In T377497#10240939, @Urbanecm_WMF wrote:

For example, video2commons (as one of the major sources of Server-side-upload-request) first attempts to upload the file itself. If it receives an error such as backend-fail-internal, it prompts the user to request a server side upload instead (see source for more details). Of course, if we can identify and fix all of those bugs, that would be amazing and far better than figuring out how to continue doing server side uploads.

Wed, Oct 23, 4:06 AM · serviceops, MW-on-K8s

Tue, Oct 22

Joe added a comment to T377497: Functional replacement for importImages.php on Kubernetes.

In T377497#10248513, @Pppery wrote:

That documentation isn't quite accurate. The goal of server-side uploads as they are used today is to work around the fact that uploads of large files are flaky for various reasons, and more likely to flake the larger the file gets.

Tue, Oct 22, 4:41 PM · serviceops, MW-on-K8s

Mon, Oct 21

Joe removed a subtask for T369480: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks.: T372095: Create new content and structure for requestctl tool documentation.

Mon, Oct 21, 9:34 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe added a subtask for T377699: [EPIC] FY 24/25 WE 4.3.7 Roll out a user-friendly web application that enables assisted editing and creation of requestctl rules: T372095: Create new content and structure for requestctl tool documentation.

Mon, Oct 21, 9:34 AM · User-Joe, Epic, conftool, Traffic

Joe edited parent tasks for T372095: Create new content and structure for requestctl tool documentation, added: T377699: [EPIC] FY 24/25 WE 4.3.7 Roll out a user-friendly web application that enables assisted editing and creation of requestctl rules; removed: T369480: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks..

Mon, Oct 21, 9:34 AM · Tech-Docs-Team, Documentation, User-CDanis, User-Joe, conftool, Traffic

Joe changed the status of T376877: Deprecate sync, add apply command to requestctl from Open to In Progress.

Mon, Oct 21, 9:34 AM · Patch-For-Review, Epic, User-CDanis, User-Joe, conftool, Traffic

Joe changed the status of T376877: Deprecate sync, add apply command to requestctl, a subtask of T374723: Coexistence of the requestctl CLI tool and of the web interface, from Open to In Progress.

Mon, Oct 21, 9:33 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe removed a subtask for T371782: Create simple web view of requestctl status: T374723: Coexistence of the requestctl CLI tool and of the web interface.

Mon, Oct 21, 9:32 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe added a subtask for T377699: [EPIC] FY 24/25 WE 4.3.7 Roll out a user-friendly web application that enables assisted editing and creation of requestctl rules: T374723: Coexistence of the requestctl CLI tool and of the web interface.

Mon, Oct 21, 9:32 AM · User-Joe, Epic, conftool, Traffic

Joe edited parent tasks for T374723: Coexistence of the requestctl CLI tool and of the web interface, added: T377699: [EPIC] FY 24/25 WE 4.3.7 Roll out a user-friendly web application that enables assisted editing and creation of requestctl rules; removed: T371782: Create simple web view of requestctl status.

Mon, Oct 21, 9:32 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe removed a subtask for T369480: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks.: T310009: Make it easier to create a new requestctl object.

Mon, Oct 21, 9:31 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe added a subtask for T377699: [EPIC] FY 24/25 WE 4.3.7 Roll out a user-friendly web application that enables assisted editing and creation of requestctl rules: T310009: Make it easier to create a new requestctl object.

Mon, Oct 21, 9:31 AM · User-Joe, Epic, conftool, Traffic

Joe edited parent tasks for T310009: Make it easier to create a new requestctl object, added: T377699: [EPIC] FY 24/25 WE 4.3.7 Roll out a user-friendly web application that enables assisted editing and creation of requestctl rules; removed: T369480: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks..

Mon, Oct 21, 9:31 AM · Traffic, SRE-Sprint-Week-Sustainability-March2023, Sustainability (Incident Followup), conftool

Joe triaged T377699: [EPIC] FY 24/25 WE 4.3.7 Roll out a user-friendly web application that enables assisted editing and creation of requestctl rules as High priority.

Mon, Oct 21, 9:29 AM · User-Joe, Epic, conftool, Traffic

Joe created T377699: [EPIC] FY 24/25 WE 4.3.7 Roll out a user-friendly web application that enables assisted editing and creation of requestctl rules.

Mon, Oct 21, 9:28 AM · User-Joe, Epic, conftool, Traffic

Joe closed T369480: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks. as Resolved.

Mon, Oct 21, 9:19 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe added a comment to T369480: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks..

Most of the user stories for this epic have been implemented. Some were out of scope for this quarter's hypothesis and will be the focus of the new hypothesis for the new quarter.

Mon, Oct 21, 9:19 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe closed T371782: Create simple web view of requestctl status as Resolved.

Mon, Oct 21, 9:15 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe closed T371782: Create simple web view of requestctl status, a subtask of T369480: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks., as Resolved.

Mon, Oct 21, 9:15 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe added a comment to T371782: Create simple web view of requestctl status.

The basic read-only interface is published, we can solve this task and open a new one for the read-write interface

Mon, Oct 21, 9:15 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Fri, Oct 18

Joe added a comment to T377497: Functional replacement for importImages.php on Kubernetes.

In T377497#10240939, @Urbanecm_WMF wrote:

In T377497#10240682, @Joe wrote:

[...]
And finally, by far my favourite option:

Given now uploads by url are async, just raise the file size limit for those to more than 5 GB and stop doing server side uploads, which is an archaic idea that generates toil work for a small and precious slice of our community members

Can you clarify this, please? I do not understand how would raising this limit help to avoid server side uploads.

Fri, Oct 18, 1:04 PM · serviceops, MW-on-K8s

Joe added a comment to T377497: Functional replacement for importImages.php on Kubernetes.

I should add, this is yet another example of how the unmaintained and substantially abandoned parts of MediaWiki, like the file uploads and manipulation stack, are a constant source of technical debt and slow down every single migration/change we want to make to MediaWiki.

Fri, Oct 18, 6:19 AM · serviceops, MW-on-K8s

Joe added a comment to T377497: Functional replacement for importImages.php on Kubernetes.

I have thought of a few options for this:

Fri, Oct 18, 6:16 AM · serviceops, MW-on-K8s

Thu, Oct 17

Joe added a comment to T377218: Review past work to simulate a DDoS using Locust.

In T377218#10230170, @Volans wrote:

Thu, Oct 17, 1:21 PM · Infrastructure-Foundations, WMF-NDA

Mon, Oct 14

Joe closed T370745: Integrate requestctl haproxy rules into our TLS terminator, a subtask of T369606: Allow integrating requestctl rules into haproxy, as Resolved.

Mon, Oct 14, 1:47 PM · User-CDanis, User-Joe, conftool, Traffic

Joe closed T370745: Integrate requestctl haproxy rules into our TLS terminator as Resolved.

Mon, Oct 14, 1:47 PM · Patch-For-Review, User-CDanis, User-Joe, conftool, Traffic

Fri, Oct 11

Joe created T376996: scap deploy --init should not require a reason for the deployment.

Fri, Oct 11, 1:17 PM · Scap, Release-Engineering-Team

Joe created T376995: scap deploy --init fails if the deployment server is not primary for mediawiki deployments.

Fri, Oct 11, 1:12 PM · Scap, Release-Engineering-Team

Joe added a comment to T368389: WE4.3.1 - IP traffic.

In T368389#10221202, @SCherukuwada wrote:

Questions:

How can MW code use this model?

Fri, Oct 11, 1:05 PM · Knowledge-Integrity, OKR-Work, Research (FY2024-25-Research-July-September)

Joe added a comment to T376976: Remove memory limits from critical cluster components (calico).

In T376976#10220538, @akosiaris wrote:

We 've already discussed this in a 1on1 and just for transparency's sake, this finds me in agreement.

Fri, Oct 11, 8:11 AM · Patch-For-Review, Prod-Kubernetes, Kubernetes, Sustainability (Incident Followup), serviceops

Thu, Oct 10

Joe created T376877: Deprecate sync, add apply command to requestctl.

Thu, Oct 10, 8:59 AM · Patch-For-Review, Epic, User-CDanis, User-Joe, conftool, Traffic

Tue, Oct 8

Joe created P69501 A patchy server.

Tue, Oct 8, 5:05 PM

Mon, Oct 7

Joe closed T371783: Create an audit log for conftool, a subtask of T369480: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks., as Resolved.

Mon, Oct 7, 3:53 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe closed T371783: Create an audit log for conftool as Resolved.

Mon, Oct 7, 3:53 PM · Epic, User-CDanis, User-Joe, conftool

Joe changed the status of T374723: Coexistence of the requestctl CLI tool and of the web interface from Open to In Progress.

Conftool2git is now running in production; the git synced repository is cloned across all puppetservers and could in theory be served (minus the requestctl actions) via config-master if anyone wanted to take a look.

Mon, Oct 7, 12:22 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe changed the status of T374723: Coexistence of the requestctl CLI tool and of the web interface, a subtask of T371782: Create simple web view of requestctl status, from Open to In Progress.

Mon, Oct 7, 12:21 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Thu, Oct 3

Joe added a comment to T352650: Migrate current-generation dumps to run from our containerized images.

In T352650#10179932, @dr0ptp4kt wrote:

@Joe checking - is Q2 FY 24-25 still looking good for Service Ops containerization work? Or is it looking more like Q3? Either way, any guesses on rough range of dates in which this effort might start?

Thu, Oct 3, 9:36 AM · Data Products, Data-Platform-SRE, MW-on-K8s, Dumps-Generation, Release-Engineering-Team, serviceops

Joe added a comment to T374830: Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org.

Also, having read all the task, I'd bet the problem is first of all a network instability, as @bd808 has suggested.

Thu, Oct 3, 8:26 AM · User-aborrero, Cloud-VPS, cloud-services-team, Continuous-Integration-Infrastructure, Release-Engineering-Team (Seen), User-brennen, ci-test-error (WMF-deployed Build Failure)

Joe added a comment to T374830: Various CI jobs failing with: Could not resolve host: gerrit.wikimedia.org.

Please next time you have a problem with name resolution, run dig with +trace as well, so that we can see what the authdns is actually saying.

Thu, Oct 3, 8:20 AM · User-aborrero, Cloud-VPS, cloud-services-team, Continuous-Integration-Infrastructure, Release-Engineering-Team (Seen), User-brennen, ci-test-error (WMF-deployed Build Failure)

Tue, Oct 1

Joe added a comment to T376140: Setting new password at wikitech does not work.

In T376140#10191908, @Ladsgroup wrote:

FWIW, lots of people could reset their passwords, that worksTM. I need to figure out what's causing this.

Tue, Oct 1, 3:04 PM · wikitech.wikimedia.org

Sep 24 2024

Joe closed T375472: Evaluate CockroachDB as a replacement to mariadb for wikidatawiki as Invalid.

Reason: besides the AI blurb, I don't think this is a reasonable task to open if someone isn't going to do the work themselves, or has the resources to do the work.

Sep 24 2024, 9:36 AM

Joe closed T375472: Evaluate CockroachDB as a replacement to mariadb for wikidatawiki, a subtask of T375470: Evaluate switch to a distributed SQL database for the Wikidatawiki cluster, as Invalid.

Sep 24 2024, 9:35 AM

Sep 23 2024

Joe added a comment to T374887: Log messages at ERROR level on http channel: Special:ExtensionDistributor unable to connect to https://graphite.wikimedia.org.

I'm not sure what is this extension doing, but http requests to our edge (so, using the public graphite url) will for sure fail, and even if we use the internal url, we'd need to open a firewall rule for it.

Sep 23 2024, 1:11 PM · Unstewarded-production-error, Observability-Metrics, Patch-For-Review, Wikimedia-production-error, Grafana, ExtensionDistributor

Joe added a comment to T374888: Log messages at ERROR level on http channel: Special:Book unable to connect to https://tools.pediapress.com.

Looks like a request to an external source from MediaWiki that doesn't make use of the url-downloader as a proxy, thus it's firewalled.

Sep 23 2024, 1:09 PM · Patch-For-Review, Wikimedia-production-error, Collection

Sep 19 2024

Joe added a comment to T368654: Determine which API we should use to fetch Lexeme data from Wikidata when specified in the function-orchestrator.

In T368654#10160566, @Lucas_Werkmeister_WMDE wrote:

In T368654#10156498, @Joe wrote:

Yeah that is slightly surprising

I guess your “slightly surprising” is our “pretty important for site performance” ;) I don’t know if we track the cache hit rate anywhere, but the total request volume to Special:EntityData is in the millions per day. If you’re not using that cache, then I agree you probably want a more restrictive limit on those calls 👍

Sep 19 2024, 3:32 PM · Patch-For-Review, Wikidata Dev Team, OKR-Work, Wikidata, Abstract Wikipedia team (25Q1 (Jul–Sep)), WikiLambda, function-orchestrator

Joe closed T375059: Requestctl sync writes unchanged objects as Resolved.

Sep 19 2024, 8:31 AM · serviceops, conftool

Sep 18 2024

Joe added a comment to T368654: Determine which API we should use to fetch Lexeme data from Wikidata when specified in the function-orchestrator.

An update after the meeting with SRE, Security and the wikifunctions people:

Sep 18 2024, 11:44 AM · Patch-For-Review, Wikidata Dev Team, OKR-Work, Wikidata, Abstract Wikipedia team (25Q1 (Jul–Sep)), WikiLambda, function-orchestrator

Joe added a comment to T368654: Determine which API we should use to fetch Lexeme data from Wikidata when specified in the function-orchestrator.

In T368654#10143463, @Lucas_Werkmeister_WMDE wrote:

In T368654#10137263, @Joe wrote:

Special pages are not cached at the edge, so there is no caching for that url, independently of indicating a revision or not:

$ curl -Is https://www.wikidata.org/wiki/Special:EntityData/Q42.json  | grep cache-control
cache-control: private, s-maxage=0, max-age=0, must-revalidate
$ curl -Is https://www.wikidata.org/wiki/Special:EntityData/Q42.json?revision=1600533266  | grep cache-control
cache-control: private, s-maxage=0, max-age=0, must-revalidate

Some caching seems to be happening:

$ curl -Is 'https://www.wikidata.org/wiki/Special:EntityData/Q42.json'  | grep x-cache
x-cache: cp3073 miss, cp3073 pass
x-cache-status: pass
$ curl -Is 'https://www.wikidata.org/wiki/Special:EntityData/Q42.json?revision=1600533266'  | grep x-cache
x-cache: cp3073 miss, cp3073 hit/1
x-cache-status: hit-front

Sep 18 2024, 10:47 AM · Patch-For-Review, Wikidata Dev Team, OKR-Work, Wikidata, Abstract Wikipedia team (25Q1 (Jul–Sep)), WikiLambda, function-orchestrator

Joe claimed T375059: Requestctl sync writes unchanged objects.

Sep 18 2024, 8:44 AM · serviceops, conftool

Joe triaged T375059: Requestctl sync writes unchanged objects as High priority.

Setting to high and not UBN because we don't have this version installed anywhere.

Sep 18 2024, 8:34 AM · serviceops, conftool

Joe updated the task description for T375059: Requestctl sync writes unchanged objects.

Sep 18 2024, 8:33 AM · serviceops, conftool

Joe created T375059: Requestctl sync writes unchanged objects.

Sep 18 2024, 8:32 AM · serviceops, conftool

Joe added a comment to T374662: PHP web requests running for multiple hours.

In T374662#10154543, @Krinkle wrote:

When the excimer timeout exception is thrown, MediaWiki replaces the response with an error page. However, the reason for having excimer at this level, is so that we can render a skinned and localised error page, which includes various bits of information from the database (e.g. sitenotice, notification count, etc). It also involves running whatever deferred updates were queued post-send.

As such, it is not impossible for an excimer timeout to happen "at the right time" but also have long execution, if there are badly behaved features queueing code to happen post-send.

Sep 18 2024, 7:23 AM · Discovery-Search, MediaWiki-Platform-Team (Radar), serviceops, CirrusSearch

Sep 13 2024

Joe added a comment to T371782: Create simple web view of requestctl status.

In T371782#10114735, @TBurmeister wrote:

Hi @Joe, do you have a screenshot or any mockups of what the web UI for this tool will look like, or a list of all the functionality the tool will implement? I saw in a recent Asana update that the web interface, while still under construction, can already read/update/delete any requestctl object, and that you're planning to implement the ability to create objects and to commit changes. In working on T372095, it would be helpful for me to know:

Which user actions or workflows documented at https://wikitech.wikimedia.org/wiki/Requestctl may become irrelevant due to being handled behind the scenes by the tool?

Which user actions or workflows would potentially still require command line examples even after the web application is deployed?

Are there any new user actions or workflows that will be made possible thanks to the new UI?

Thanks!

Sep 13 2024, 3:41 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe added a comment to T374723: Coexistence of the requestctl CLI tool and of the web interface.

I would personally much prefer the second option, and I'd rather focus on keeping a good audit log one can find on logstash rather than working to keep the current git structure. While i understand it could be useful to inspect quickly when/how a change was made, I feel like the major pain point all users of requestctl, expert or otherwise, point out is the trial and error of creating a new object in an editor. I am confident the experience using the web interface is much less stressful even in this initial version, but it would keep getting better as we add new functions in the coming quarters.

Sep 13 2024, 3:39 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe created T374723: Coexistence of the requestctl CLI tool and of the web interface.

Sep 13 2024, 3:28 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe added a comment to T374591: Installer Task abstraction.

Also, if you have a hierarchy of tasks that you want to run in a coordinated way, you should always account for the possibility one step fails for external reasons (for instance, the database master crashes suddenly), or for software bugs.

Sep 13 2024, 8:05 AM · MW-1.44-notes (1.44.0-wmf.1; 2024-10-29), Patch-For-Review, MediaWiki-Installer

Joe added a comment to T374591: Installer Task abstraction.

I think that when you have a list of tasks to execute that have side effects like creating databases and tables, you want the task to be idempotent, or - alternatively - to have a rollback mechanism in case of failure.

Sep 13 2024, 8:01 AM · MW-1.44-notes (1.44.0-wmf.1; 2024-10-29), Patch-For-Review, MediaWiki-Installer

Joe added a comment to T372604: Turn up PHP 8.1-flavored mw-debug k8s deployment.

In T372604#10142394, @Scott_French wrote:
Alas, among the ways mw-debug is special, it configures an extra NodePort service that skips envoy (i.e., direct to apache) and in theory permits serving via HTTP for testing / benchmarking purposes:
service:
  deployment: production
  expose_http: true
  port:
    nodePort: 8444
See, e.g., https://gerrit.wikimedia.org/r/703739.

As I forgot about this, I failed to override the nodePort on the "next" release to a distinct value, which of course failed to apply:
Service "mediawiki-next" is invalid: spec.ports[0].nodePort: Invalid value: 8444: provided port is already allocated
Two options come to mind:

I can simply re-spin https://gerrit.wikimedia.org/r/1071945 with a nodePort: 8453 override in values-next.yaml (port doesn't seem to be used for anything else per code-search and follows the same +4000 convention).

I'm not 100% sure this is serving any useful purpose at the moment, given that any HTTP request directed at a wiki would receive a 302 to the canonical HTTPS URI. If that's the case, I can remove this service from mw-debug.

@akosiaris - Any preference among these options? Or @Joe if you have any context to share on current applicability of the original use case, that would be greatly appreciated as well.

Sep 13 2024, 7:42 AM · Patch-For-Review, serviceops

Joe added a comment to T374662: PHP web requests running for multiple hours.

A big difference between php-fpm and HHVM is that PHP doesn't have natively a wall clock based timeout. So for instance max_execution_time only refers to time spent executing php bytecode, so not for things like db queries or http requests.

Sep 13 2024, 7:30 AM · Discovery-Search, MediaWiki-Platform-Team (Radar), serviceops, CirrusSearch

Joe closed T373449: Extract an api class for requestctl, a subtask of T371782: Create simple web view of requestctl status, as Resolved.

Sep 13 2024, 6:36 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe closed T373449: Extract an api class for requestctl as Resolved.

Sep 13 2024, 6:36 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Sep 11 2024

Joe updated subscribers of T368654: Determine which API we should use to fetch Lexeme data from Wikidata when specified in the function-orchestrator.

If you want to use change dispatching from wikidata, which is a proven mechanism, then you'd probably be better off keeping the lexeme data within the wiki structure, and pass it to the orchestrator as a parameter of the function. That would allow you to re-parse the wiki page using the normal mechanism that's already established for wikis, and solve a lot of problems for you (including I think how to fetch the items, but I'd let @LucasWerkmeister comment on that).

Sep 11 2024, 1:25 PM · Patch-For-Review, Wikidata Dev Team, OKR-Work, Wikidata, Abstract Wikipedia team (25Q1 (Jul–Sep)), WikiLambda, function-orchestrator

Joe added a comment to T368654: Determine which API we should use to fetch Lexeme data from Wikidata when specified in the function-orchestrator.

In T368654#9942854, @DMartin-WMF wrote:

In Wikidata:Data_access, it says:

The following URL formats are used by the user interface and by the query service updater, respectively, so if you use one of the same URL formats there’s a good chance you’ll get faster (cached) responses:

https://www.wikidata.org/wiki/Special:EntityData/Q42.json?revision=1600533266 (JSON)

Does that mean we would have to specify a specific revision to get the benefit of caching?

Special pages are not cached at the edge, so there is no caching for that url, independently of indicating a revision or not:

$ curl -Is https://www.wikidata.org/wiki/Special:EntityData/Q42.json  | grep cache-control
cache-control: private, s-maxage=0, max-age=0, must-revalidate
$ curl -Is https://www.wikidata.org/wiki/Special:EntityData/Q42.json?revision=1600533266  | grep cache-control
cache-control: private, s-maxage=0, max-age=0, must-revalidate

Sep 11 2024, 1:18 PM · Patch-For-Review, Wikidata Dev Team, OKR-Work, Wikidata, Abstract Wikipedia team (25Q1 (Jul–Sep)), WikiLambda, function-orchestrator

Sep 10 2024

Joe added a comment to T374394: TypeError: tests.values().toArray is not a function (Wikicurious_beat_en).

In T374394#10133179, @Ladsgroup wrote:

In T374394#10132257, @jeremyb wrote:

That's not possible, the whole point of the API query is to decide whether or not to show the banner. (on enwiki the categories I'm filtering on for article (ns0) views are on the corresponding Talk. (ns1) so they're also not available in wgCategories.) I was worried more about performance for the API requests. I tested pretty thoroughly on a few devices so I was less worried about JS errors but of course WP has an enormous variety of clients.

CN banners work on a different level, one API request might be fast but if you add it to let's say enwiki CN, you will trigger a flood of several thousand requests a second every second. Reaching billions more requests to production in a day. That can easily bring down everything specially since API requests don't get CDN cached.

If something is not possible natively in CN, it's better not to circumvent it by API calls.

Sep 10 2024, 10:41 AM · Wikimedia-production-error, Wikimedia-CentralNotice-Administration

Sep 9 2024

Joe reopened T374318: Wikifunctions is down as "In Progress".

For the record, the cause was a relatively aggressive crawler filling up all resources. While we've rate-limited this bot, I think we should use robots.txt to ban crawling from most pages.

Sep 9 2024, 1:10 PM · Traffic, Abstract Wikipedia team

Sep 5 2024

Joe added a comment to T338761: Bouncehandler is broken.

I'm finally seeing bounces get processed in logstash https://logstash.wikimedia.org/goto/3d34190bb82088f19669b0c66331d7c9

Sep 5 2024, 9:45 AM · MW-1.43-notes (1.43.0-wmf.23; 2024-09-17), SRE, MediaWiki-Engineering, Wikimedia-Hackathon-2024, Observability-Metrics, Grafana, MediaWiki-extensions-BounceHandler

Joe added a comment to T338761: Bouncehandler is broken.

I still see the errors in the logs, and it's baffling. In fact, I've tried the command now listed in exim's configuration:

Sep 5 2024, 7:52 AM · MW-1.43-notes (1.43.0-wmf.23; 2024-09-17), SRE, MediaWiki-Engineering, Wikimedia-Hackathon-2024, Observability-Metrics, Grafana, MediaWiki-extensions-BounceHandler

Joe added a comment to T338761: Bouncehandler is broken.

The check is done using Webrequest::getIP() which uses REMOTE_ADDR as a source for the address, and then overrides that using X-Forwarded-For only for trusted proxies.

Sep 5 2024, 6:16 AM · MW-1.43-notes (1.43.0-wmf.23; 2024-09-17), SRE, MediaWiki-Engineering, Wikimedia-Hackathon-2024, Observability-Metrics, Grafana, MediaWiki-extensions-BounceHandler

Aug 30 2024

Joe triaged T371782: Create simple web view of requestctl status as Medium priority.

Aug 30 2024, 1:35 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe added a comment to T371633: Burst of GuzzleHttp Exception for http://localhost:6025/call/constraint-regex-checker.

Let me add some prospect, as I've heard people are complaining about this.

Aug 30 2024, 1:21 PM · User-ItamarWMDE, Wikidata Dev Team, Wikidata, wmde-wikidata-tech, Wikimedia-production-error, Shellbox, serviceops, Wikibase-Quality-Constraints, Deployments

Restricted Application added a project to T371633: Burst of GuzzleHttp Exception for http://localhost:6025/call/constraint-regex-checker: User-ItamarWMDE.

In T371633#10102614, @ItamarWMDE wrote:

Is this somehow related to T362084: [SW] [WBQC] shellbox-constraints returning 500 on preg_match error?

Aug 30 2024, 1:16 PM · User-ItamarWMDE, Wikidata Dev Team, Wikidata, wmde-wikidata-tech, Wikimedia-production-error, Shellbox, serviceops, Wikibase-Quality-Constraints, Deployments

Aug 28 2024

Joe awarded T373526: Migrate the ownership of Docker images in production-images repo to mailing lists a Like token.

Aug 28 2024, 2:24 PM · User-Elukey, Data-Platform-SRE, Machine-Learning-Team, serviceops, Infrastructure-Foundations

Aug 27 2024

Joe changed the status of T373449: Extract an api class for requestctl from Open to In Progress.

Aug 27 2024, 3:06 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe changed the status of T373449: Extract an api class for requestctl, a subtask of T371782: Create simple web view of requestctl status, from Open to In Progress.

Aug 27 2024, 3:06 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe updated the task description for T373449: Extract an api class for requestctl.

Aug 27 2024, 3:02 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe created T373449: Extract an api class for requestctl.

Aug 27 2024, 3:01 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe changed the status of T371782: Create simple web view of requestctl status from Open to In Progress.

Aug 27 2024, 2:59 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe changed the status of T371782: Create simple web view of requestctl status, a subtask of T369480: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks., from Open to In Progress.

Aug 27 2024, 2:58 PM · Epic, User-CDanis, User-Joe, conftool, Traffic

Aug 26 2024

Joe added a comment to T292322: Support large files in Shellbox.

For the record, the reason we wanted to support large file uploads was not to worsen the performance of upload-by-url, which has since been fixed by making the process asynchronous. Better performance handling large files would be welcome, though.

Aug 26 2024, 5:29 AM · MW-1.43-notes, MW-1.44-notes (1.44.0-wmf.2; 2024-11-05), Patch-For-Review, MW-1.38-notes (1.38.0-wmf.21; 2022-02-07), SRE-swift-storage, Shellbox, serviceops, MW-on-K8s

Aug 8 2024

Joe closed T322523: Check confd last index in a mw-on-k8s world as Declined.

I don't think we really need this, because I can't remember one episode in which this check has acutally fired and it wasn't expected/a false positive.

Aug 8 2024, 8:48 AM · Observability-Metrics, MW-on-K8s, User-fgiunchedi

Joe closed T322523: Check confd last index in a mw-on-k8s world, a subtask of T314118: Reduce IRC flood/spam during incidents, as Declined.

Aug 8 2024, 8:48 AM · Observability-Alerting, serviceops-radar, User-fgiunchedi, SRE

Joe added a comment to T371885: Gaps in Grafana graphs using Thanos.

Now prometheus only reports scraping the correct ports https://prometheus-eqiad.wikimedia.org/k8s/targets?scrapePool=k8s-pods-metrics&search=statsd-exporter

Aug 8 2024, 6:57 AM · SRE Observability (FY2024/2025-Q1), serviceops, MW-on-K8s, Grafana, Observability-Metrics

Joe added a comment to T371885: Gaps in Grafana graphs using Thanos.

In T371885#10049340, @Scott_French wrote:

Edit: Whoops, I completely missed T371885#10048618 onward before posting this. In any case, question still stands re: the annotation :)

For the endpoints marked down: it looks as if prometheus is scraping both container ports - i.e., 9102 (correct) and 9125 (statsd listen port, incorrect).

Not sure if that could somehow cause problems like those described in the task description, but it would at least explain T371885#10048571.

I wonder if we need to add an explicit prometheus.io/port annotation to ensure only 9102 is scraped?

Aug 8 2024, 6:13 AM · SRE Observability (FY2024/2025-Q1), serviceops, MW-on-K8s, Grafana, Observability-Metrics

Aug 6 2024

Joe closed T369606: Allow integrating requestctl rules into haproxy as Resolved.

Aug 6 2024, 9:41 AM · User-CDanis, User-Joe, conftool, Traffic

Joe closed T369606: Allow integrating requestctl rules into haproxy, a subtask of T369480: [EPIC] FY 24/25 WE 4.3.4 Improve our existing tooling to allow quicker reaction times to ongoing attacks., as Resolved.

Aug 6 2024, 9:41 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe added a comment to T370934: Build and publish multiple MediaWiki production images for a given set of PHP versions.

I think we need to be able to pass to the release script a set of base images for mediawiki and web, and build the final images for each of those pairs.

Aug 6 2024, 5:39 AM · Release-Engineering-Team (Doing 😎), Patch-For-Review, Kubernetes, Deployments

Aug 5 2024

Joe added a comment to T368064: Some mw-api-int traffic is going cross-DC.

I should add, wgLocalHTTPProxy and all those mechanisms are hacks, and if we want to do such things the right way, we should have a proper configuration table for read-only and read-write paths for different domains, and then instruct the application writers to use one or the other. I'd avoid replicating the hack we have at the traffic layer, because say one request gets erroneously sent to the wrong datacenter - now instead of paying 30 ms in total of penalty, we'll pay 30 ms per query to the mysql master.

Aug 5 2024, 1:16 PM · MediaWiki-Platform-Team (Radar), serviceops

Joe added a comment to T368064: Some mw-api-int traffic is going cross-DC.

There's quite a bit of incorrect information in the wall of text above, but to actually keep it short:
We went with just pointing to the read-write api because there's no system, within MediaWiki, to split requests between write and read and we didn't want to add ad-hoc logic to the service mesh just for that.

Aug 5 2024, 1:16 PM · MediaWiki-Platform-Team (Radar), serviceops

Joe claimed T371783: Create an audit log for conftool.

Aug 5 2024, 5:51 AM · Epic, User-CDanis, User-Joe, conftool

Joe created T371783: Create an audit log for conftool.

Aug 5 2024, 5:50 AM · Epic, User-CDanis, User-Joe, conftool

Joe created T371782: Create simple web view of requestctl status.

Aug 5 2024, 5:47 AM · Epic, User-CDanis, User-Joe, conftool, Traffic

Joe triaged T370745: Integrate requestctl haproxy rules into our TLS terminator as High priority.

Aug 5 2024, 5:42 AM · Patch-For-Review, User-CDanis, User-Joe, conftool, Traffic

Jul 30 2024

Joe added a comment to T371144: support the haproxy silent-drop hysteresis gadget from requestctl.

Thanks for the thorough explanation! I know the traffic folks were a bit worried about controlling stick tables from requestctl but I think this format is ok.

Jul 30 2024, 8:21 AM · Patch-For-Review, User-CDanis, User-Joe, conftool, Traffic

Joe (Giuseppe Lavagetto)
Spy

Projects (22)
View All

Calendar

Today

Tomorrow

Friday

User Details

Recent Activity
View All

Yesterday

Wed, Oct 23

Tue, Oct 22

Mon, Oct 21

Fri, Oct 18

Thu, Oct 17

Mon, Oct 14

Fri, Oct 11

Thu, Oct 10

Tue, Oct 8

Mon, Oct 7

Thu, Oct 3

Tue, Oct 1

Sep 24 2024

Sep 23 2024

Sep 19 2024

Sep 18 2024

Sep 13 2024

Sep 11 2024

Sep 10 2024

Sep 9 2024

Sep 5 2024

Aug 30 2024

Aug 28 2024

Aug 27 2024

Aug 26 2024

Aug 8 2024

Aug 6 2024

Aug 5 2024

Jul 30 2024

Joe (Giuseppe Lavagetto)Spy

Projects (22)View All

Calendar

Today

Tomorrow

Friday

User Details

Recent ActivityView All

Yesterday

Wed, Oct 23

Tue, Oct 22

Mon, Oct 21

Fri, Oct 18

Thu, Oct 17

Mon, Oct 14

Fri, Oct 11

Thu, Oct 10

Tue, Oct 8

Mon, Oct 7

Thu, Oct 3

Tue, Oct 1

Sep 24 2024

Sep 23 2024

Sep 19 2024

Sep 18 2024

Sep 13 2024

Sep 11 2024

Sep 10 2024

Sep 9 2024

Sep 5 2024

Aug 30 2024

Aug 28 2024

Aug 27 2024

Aug 26 2024

Aug 8 2024

Aug 6 2024

Aug 5 2024

Jul 30 2024

Joe (Giuseppe Lavagetto)
Spy

Projects (22)
View All

Recent Activity
View All