User Details
- User Since
- Oct 3 2014, 5:57 AM (525 w, 5 d)
- Availability
- Available
- LDAP User
- Giuseppe Lavagetto
- MediaWiki User
- GLavagetto (WMF) [ Global Accounts ]
Yesterday
Wed, Oct 23
I think @BTullis' idea is great - there's a few unknowns regarding how to keep what runs via airflow in sync with what runs on k8s via helm charts, but that's something I'd rather invest my time in than writing some shim to run mwscript on a physical host. That was a solution I proposed to reduce effort overall but I think it makes sense for multiple reasons, first of all the operational toil of the DE SRE team, to go with his proposal.
Tue, Oct 22
Mon, Oct 21
Most of the user stories for this epic have been implemented. Some were out of scope for this quarter's hypothesis and will be the focus of the new hypothesis for the new quarter.
The basic read-only interface is published, we can solve this task and open a new one for the read-write interface
Fri, Oct 18
I should add, this is yet another example of how the unmaintained and substantially abandoned parts of MediaWiki, like the file uploads and manipulation stack, are a constant source of technical debt and slow down every single migration/change we want to make to MediaWiki.
I have thought of a few options for this:
Thu, Oct 17
Mon, Oct 14
Fri, Oct 11
Thu, Oct 10
Tue, Oct 8
Mon, Oct 7
Conftool2git is now running in production; the git synced repository is cloned across all puppetservers and could in theory be served (minus the requestctl actions) via config-master if anyone wanted to take a look.
Thu, Oct 3
Also, having read all the task, I'd bet the problem is first of all a network instability, as @bd808 has suggested.
Please next time you have a problem with name resolution, run dig with +trace as well, so that we can see what the authdns is actually saying.
Tue, Oct 1
Sep 24 2024
Reason: besides the AI blurb, I don't think this is a reasonable task to open if someone isn't going to do the work themselves, or has the resources to do the work.
Sep 23 2024
I'm not sure what is this extension doing, but http requests to our edge (so, using the public graphite url) will for sure fail, and even if we use the internal url, we'd need to open a firewall rule for it.
Looks like a request to an external source from MediaWiki that doesn't make use of the url-downloader as a proxy, thus it's firewalled.
Sep 19 2024
Sep 18 2024
An update after the meeting with SRE, Security and the wikifunctions people:
Setting to high and not UBN because we don't have this version installed anywhere.
Sep 13 2024
I would personally much prefer the second option, and I'd rather focus on keeping a good audit log one can find on logstash rather than working to keep the current git structure. While i understand it could be useful to inspect quickly when/how a change was made, I feel like the major pain point all users of requestctl, expert or otherwise, point out is the trial and error of creating a new object in an editor. I am confident the experience using the web interface is much less stressful even in this initial version, but it would keep getting better as we add new functions in the coming quarters.
Also, if you have a hierarchy of tasks that you want to run in a coordinated way, you should always account for the possibility one step fails for external reasons (for instance, the database master crashes suddenly), or for software bugs.
I think that when you have a list of tasks to execute that have side effects like creating databases and tables, you want the task to be idempotent, or - alternatively - to have a rollback mechanism in case of failure.
A big difference between php-fpm and HHVM is that PHP doesn't have natively a wall clock based timeout. So for instance max_execution_time only refers to time spent executing php bytecode, so not for things like db queries or http requests.
Sep 11 2024
If you want to use change dispatching from wikidata, which is a proven mechanism, then you'd probably be better off keeping the lexeme data within the wiki structure, and pass it to the orchestrator as a parameter of the function. That would allow you to re-parse the wiki page using the normal mechanism that's already established for wikis, and solve a lot of problems for you (including I think how to fetch the items, but I'd let @LucasWerkmeister comment on that).
Special pages are not cached at the edge, so there is no caching for that url, independently of indicating a revision or not:
$ curl -Is https://www.wikidata.org/wiki/Special:EntityData/Q42.json | grep cache-control cache-control: private, s-maxage=0, max-age=0, must-revalidate $ curl -Is https://www.wikidata.org/wiki/Special:EntityData/Q42.json?revision=1600533266 | grep cache-control cache-control: private, s-maxage=0, max-age=0, must-revalidate
Sep 10 2024
Sep 9 2024
For the record, the cause was a relatively aggressive crawler filling up all resources. While we've rate-limited this bot, I think we should use robots.txt to ban crawling from most pages.
Sep 5 2024
I'm finally seeing bounces get processed in logstash https://logstash.wikimedia.org/goto/3d34190bb82088f19669b0c66331d7c9
I still see the errors in the logs, and it's baffling. In fact, I've tried the command now listed in exim's configuration:
The check is done using Webrequest::getIP() which uses REMOTE_ADDR as a source for the address, and then overrides that using X-Forwarded-For only for trusted proxies.
Aug 30 2024
Let me add some prospect, as I've heard people are complaining about this.
Aug 28 2024
Aug 27 2024
Aug 26 2024
For the record, the reason we wanted to support large file uploads was not to worsen the performance of upload-by-url, which has since been fixed by making the process asynchronous. Better performance handling large files would be welcome, though.
Aug 8 2024
I don't think we really need this, because I can't remember one episode in which this check has acutally fired and it wasn't expected/a false positive.
Now prometheus only reports scraping the correct ports https://prometheus-eqiad.wikimedia.org/k8s/targets?scrapePool=k8s-pods-metrics&search=statsd-exporter
Aug 6 2024
I think we need to be able to pass to the release script a set of base images for mediawiki and web, and build the final images for each of those pairs.
Aug 5 2024
I should add, wgLocalHTTPProxy and all those mechanisms are hacks, and if we want to do such things the right way, we should have a proper configuration table for read-only and read-write paths for different domains, and then instruct the application writers to use one or the other. I'd avoid replicating the hack we have at the traffic layer, because say one request gets erroneously sent to the wrong datacenter - now instead of paying 30 ms in total of penalty, we'll pay 30 ms per query to the mysql master.
There's quite a bit of incorrect information in the wall of text above, but to actually keep it short:
We went with just pointing to the read-write api because there's no system, within MediaWiki, to split requests between write and read and we didn't want to add ad-hoc logic to the service mesh just for that.
Jul 30 2024
Thanks for the thorough explanation! I know the traffic folks were a bit worried about controlling stick tables from requestctl but I think this format is ok.