Page MenuHomePhabricator

Southparkfan (Ferran Tufan)
InfoSec/infrastructure/networking guru

Projects (16)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 12 2014, 7:12 AM (525 w, 4 d)
Availability
Available
IRC Nick
Southparkfan
LDAP User
Southparkfan
MediaWiki User
Southparkfan [ Global Accounts ]

The guy watching South Park all day. Used to be a student in Computer Science (specialised in InfoSec, open source infrastructures, computer networking, and ethical security & network research). Formerly employed in InfoSec as well.

Macro southparkfan-approves: Approved by Wikimedia

Recent Activity

Sep 7 2024

Southparkfan added a comment to T374272: asw2-d2-eqid <-> asw2-d4-eqiad vcp link flapping.

As noted on IRC partially as well: the flapping has been going on for a while, there didn't seem to be any critical hosts in D4 (assuming the line card numbering matches the physical racks properly, in all VCs) and hence it was not Klaxon-worthy to me. Nevertheless, they're still production hosts running on a switch, with interface issues for sometimes for up to two hours. And unless the eqiad VC cabling is different from a perfect spine-leaf topology, this means the D4 asw only had one remaining uplink, which is an issue.

Sep 7 2024, 3:15 AM · ops-eqiad, SRE, DC-Ops, Infrastructure-Foundations, netops

Sep 5 2024

Southparkfan updated the task description for T374173: Grafana LDAP sync script broken, seems to cause login issues for users recently added to LDAP groups.
Sep 5 2024, 11:14 PM · Grafana, observability
Southparkfan created T374173: Grafana LDAP sync script broken, seems to cause login issues for users recently added to LDAP groups.
Sep 5 2024, 11:11 PM · Grafana, observability
Southparkfan added a member for netops: Southparkfan.
Sep 5 2024, 3:57 PM

Sep 4 2024

Southparkfan added a comment to T373518: Grant Access to ldap/nda for Southparkfan.

I can confirm the ldap group has been added to my account.

Sep 4 2024, 12:34 PM · SRE, LDAP-Access-Requests

Sep 3 2024

Southparkfan added a comment to T373518: Grant Access to ldap/nda for Southparkfan.

For the record: there seem to be a few IdP-related issues in Netbox (T373702), but despite that, this LDAP access request is still valid.

Sep 3 2024, 4:17 PM · SRE, LDAP-Access-Requests
Southparkfan added a comment to T373702: Unable to log in to Netbox.

do we understand properly what changed since the earlier task was closed?

Maybe this?

https://gerrit.wikimedia.org/r/c/operations/puppet/+/932231

Sep 3 2024, 3:51 PM · Patch-For-Review, Infrastructure-Foundations, CAS-SSO, netbox

Aug 28 2024

Southparkfan added a comment to T373518: Grant Access to ldap/nda for Southparkfan.

Hello @Southparkfan, please send your full name, mailing address, and email address to [email protected] and I will send the NDA agreement to you. Thanks!

Email has been sent!

Aug 28 2024, 6:16 PM · SRE, LDAP-Access-Requests
Southparkfan updated Southparkfan.
Aug 28 2024, 3:11 PM
Southparkfan updated the task description for T373518: Grant Access to ldap/nda for Southparkfan.
Aug 28 2024, 1:02 PM · SRE, LDAP-Access-Requests
Southparkfan created T373518: Grant Access to ldap/nda for Southparkfan.
Aug 28 2024, 1:00 PM · SRE, LDAP-Access-Requests

Aug 27 2024

Southparkfan added a comment to T372161: Publish, and maintain ASPA records for valid AS14907 upstreams.

[...]

However, the ASPA record is yet another duplicate of the transit_provider list in Homer, and the export policies defined in our aut-num objects. Our export policies in the IRRs already do not match up what's in Homer (for instance, we removed a few Transit providers during the knams migration, but they're still present in our export policies), and adding an ASPA record will make it even harder to stay in sync. Perhaps we can script something that checks whether the IRRs and ASPA record(s) still match with the source of truth, being Homer?

I think step 1 is to update the doc https://wikitech.wikimedia.org/wiki/Add_transit_provider to add the steps you mentioned. As well as add steps to do when removing a transit.

Ta da: https://wikitech.wikimedia.org/w/index.php?title=Adding_and_removing_transit_providers&diff=2218856&oldid=2042295. Can you verify this is correct? There are lots of references to private Phabricator tasks, and of course I have never dealed with WMF's transit providers before.

Before working on automating the verification (that we didn't forget any step) or the actual implementation, we should look at using Netbox as source of truth.

As far as I understand, the actual circuits are already managed in Netbox, but the Homer template for the Transit group (which is what actually manages the BGP sessions on the CRs) is expanded based on yaml configuration in config/devices.yaml. On top of that, we have site-wide policies, and BFD + FlowSpec configuration for transit providers in config/common.yaml. Using Netbox to manage the AS-specific import/export policies does not really make sense (the policies are free-form text), and I'm not sure what Netbox data model is suitable for modelling BGP sessions. Something I can think of is some kind of CI/test that checks whether the Homer transit and transit_provider keys contain ASes that are not "transit" Providers in Netbox, to at least have some kind of Netbox-Homer verification, and said data may also be useful for cross-checking the IRR databases and ASPA objects. If Wikimedia has some custom data model for any of the use cases listed, let me know!

Aug 27 2024, 10:57 PM · netops, Infrastructure-Foundations
Southparkfan added a comment to T372158: Apply egress Source Address Validation on the Wikimedia core routers.

However, in reality, it should be possible to reject all IP packets where the source IP is not part of the IP prefixes that the Foundation has been assigned (i.e. prefix lists production{4,6}, which are a superset of the publicly routable LVS service IPs).

We would need to at least permit traffic from the transit interface IPs, as they do BGP to their peers, v6 link local for neighbor discovery, some land GRE tunnels, etc. Not sure what is the cleanest way for that, maybe using an apply-path like for bgp-sessions ? Ideally we wouldn't have to list them all :)

Yep, that would be ideal. Unfortunately, one`prefix-list` can only have one apply-path, so cannot take the union of BGP peer IPs, and interface/tunnel IPs. Furthermore, these apply-paths do not seem to support regexes(?), so we cannot craft a regex that only matches interfaces part of the external-links group (set policy-options prefix-list egress-ranges4 apply-groups xxxx" is valid, but the apply-groups is meant for "inheriting configuration data", not so much retrieving IP addresses from interfaces assigned to this...?).

Aug 27 2024, 5:12 PM · Infrastructure-Foundations, netops

Aug 12 2024

Southparkfan added a comment to T372161: Publish, and maintain ASPA records for valid AS14907 upstreams.

Follow-up from IRC: Wikimedia uses the Hosted RPKI, but we assume the ARIN portal just doesn't support anything else than ROAs. There is an ASPA record for AS11358, whose ASN is controlled by ARIN, and we think they either use Hybrid RPKI, where ARIN still hosts the RPKI objects through their Repository Publication Service. Technically, Krill could both act as the CA and create ASPA records, and it is possible that AS11358 and others are doing this.

Aug 12 2024, 3:44 PM · netops, Infrastructure-Foundations

Aug 9 2024

Southparkfan created T372161: Publish, and maintain ASPA records for valid AS14907 upstreams.
Aug 9 2024, 4:02 PM · netops, Infrastructure-Foundations
Southparkfan updated subscribers of T372158: Apply egress Source Address Validation on the Wikimedia core routers.
Aug 9 2024, 3:41 PM · Infrastructure-Foundations, netops
Southparkfan created T372158: Apply egress Source Address Validation on the Wikimedia core routers.
Aug 9 2024, 3:40 PM · Infrastructure-Foundations, netops

Jul 24 2024

Southparkfan updated the task description for T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm.
Jul 24 2024, 4:25 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T370461: Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation), a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, as Resolved.
Jul 24 2024, 4:24 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T370461: Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) as Resolved.

sessionstorage04 is no longer.

Jul 24 2024, 4:24 PM · Patch-For-Review, cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure

Jul 23 2024

Southparkfan added a comment to T370461: Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

Couldn't upgrade Buster to 4.x, because there are no packages in buster-wikimedia. Installing Cassandra was a rather interesting process.

Jul 23 2024, 4:55 PM · Patch-For-Review, cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan added a comment to T370461: Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

I didn't get a response in -sre, but Andrew has provided me with extra information.

Jul 23 2024, 12:13 PM · Patch-For-Review, cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure

Jul 22 2024

Southparkfan updated the task description for T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm.
Jul 22 2024, 4:05 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan added a comment to T370461: Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

Puppet fails to install the Cassandra instance:

Error: 'install -o cassandra -g cassandra -m 750 -d /var/lib/cassandra/data' returned 1 instead of one of [0]
Error: /Stage[main]/Cassandra/Cassandra::Instance[default]/Exec[install-/var/lib/cassandra/data]/returns: change from 'notrun' to ['0'] failed: 'install -o cassandra -g cassandra -m 750 -d /var/lib/cassandra/data' returned 1 instead of one of [0] (corrective)
Error: 'install -o cassandra -g cassandra -m 750 -d /var/lib/cassandra/data' returned 1 instead of one of [0]
Error: /Stage[main]/Cassandra/Cassandra::Instance[default]/Exec[install-/var/lib/cassandra/data]/returns: change from 'notrun' to ['0'] failed: 'install -o cassandra -g cassandra -m 750 -d /var/lib/cassandra/data' returned 1 instead of one of [0] (corrective)
Jul 22 2024, 4:04 PM · Patch-For-Review, cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan added a comment to T370461: Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

Had to delete sessionstorage05 (bookworm) due to T357791, will replace with a bullseye instance for Cassandra

Jul 22 2024, 3:43 PM · Patch-For-Review, cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T370459: Remove or replace deployment-push-notifications01.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) as Resolved.
Jul 22 2024, 3:30 PM · Patch-For-Review, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan closed T370459: Remove or replace deployment-push-notifications01.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation), a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, as Resolved.
Jul 22 2024, 3:30 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan updated the task description for T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm.
Jul 22 2024, 3:20 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T370582: Remove or replace deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud as Resolved.
Jul 22 2024, 3:14 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T370582: Remove or replace deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud, a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, as Resolved.
Jul 22 2024, 3:14 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T361386: Remove or replace deployment-parsoid12.deployment-prep.eqiad1.wikimedia.cloud as Resolved.
Jul 22 2024, 3:14 PM · cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T361386: Remove or replace deployment-parsoid12.deployment-prep.eqiad1.wikimedia.cloud, a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, as Resolved.
Jul 22 2024, 3:14 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T370466: Remove or replace deployment-urldownloader03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation), a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, as Resolved.
Jul 22 2024, 3:01 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T370466: Remove or replace deployment-urldownloader03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) as Resolved.
Jul 22 2024, 3:01 PM · Patch-For-Review, Cloud-VPS (Debian Buster Deprecation), cloud-services-team, Beta-Cluster-Infrastructure
Southparkfan closed T361387: Replace or delete deployment-mediawiki[11-12].deployement-prep.eqiad1.wikimedia.cloud, a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, as Resolved.
Jul 22 2024, 2:57 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T361387: Replace or delete deployment-mediawiki[11-12].deployement-prep.eqiad1.wikimedia.cloud as Resolved.
Jul 22 2024, 2:57 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team

Jul 20 2024

Southparkfan added a comment to T370461: Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

Great, can't ssh into my new instance:

$ ssh deployment-sessionstore05.deployment-prep.eqiad1.wikimedia.cloud
Connection closed by UNKNOWN port 65535
Jul 20 2024, 3:51 PM · Patch-For-Review, cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan claimed T370461: Remove or replace deployment-sessionstore04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).
Jul 20 2024, 3:36 PM · Patch-For-Review, cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan updated subscribers of T370460: Remove or replace deployment-restbase04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

@Jgiannelos hey! Is deployment-restbase-bullseye (created by you last year) ready to take over the work from restbase04? Other than changing the references to restbase04 in Horizon hiera and LabsServices.php, and in the changeprop Chart (deployment--charts), it should be possible to switch, although the restbase service is not listening to port 7231 on -bullseye - any idea what's wrong?

Jul 20 2024, 3:36 PM · Patch-For-Review, Cloud-VPS (Debian Buster Deprecation), cloud-services-team, Beta-Cluster-Infrastructure
Southparkfan added a comment to T361381: Replace deployment-maps-master01 with a Bullseye or Bookworm instance.

@hnowlan I see you have created deployment-maps-master02. Other than possibly replacing the old master in https://github.com/wikimedia/maps-kartotherian-deploy/blob/master/scap/environments/beta/targets, is there anything needed before deleting master01?

Jul 20 2024, 3:23 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan added a comment to T370466: Remove or replace deployment-urldownloader03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

As soon as the above changes have been merged, urldownloader03 can be deleted.

Jul 20 2024, 3:17 PM · Patch-For-Review, Cloud-VPS (Debian Buster Deprecation), cloud-services-team, Beta-Cluster-Infrastructure
Southparkfan claimed T370466: Remove or replace deployment-urldownloader03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).
Jul 20 2024, 3:15 PM · Patch-For-Review, Cloud-VPS (Debian Buster Deprecation), cloud-services-team, Beta-Cluster-Infrastructure
Southparkfan updated subscribers of T370465: Remove or replace deployment-snapshot03.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

@BTullis I see you have created deployment-snapshot05 (Bullseye), although this new host was not part of https://gerrit.wikimedia.org/r/c/operations/dumps/scap/+/1008451, and neither is it part of the mediawiki-installation dsh group. Do we have to add snapshot05 to your 'scap' repository, as well or is it fine to just add it to the dsh group?

Jul 20 2024, 3:03 PM · Patch-For-Review, Data-Platform-SRE (2024.07.29 - 2024.08.16), Cloud-VPS (Debian Buster Deprecation), cloud-services-team, Beta-Cluster-Infrastructure
Southparkfan added a comment to T370582: Remove or replace deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud.

Given that Puppet does not have a flag to stop the periodic MediaWiki jobs, I had to disable Puppet on mwmaint03 and kill the jobs myself (just like the DC switchover cookbooks do). Can be re-enabled as soon as mwmaint02 is gone (deleted + removed from dsh groups).

Jul 20 2024, 2:55 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T370462: Remove or replace deployment-shellbox.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation), a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, as Resolved.
Jul 20 2024, 2:46 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T370462: Remove or replace deployment-shellbox.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) as Resolved.
Jul 20 2024, 2:46 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan added a comment to T370462: Remove or replace deployment-shellbox.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

It looks like the shellbox container is broken on the new host:

root@deployment-shellbox01:~# /usr/bin/docker run --rm=true  --env-file /etc/shellbox/env -p 8081:8081 -v shellbox:/etc/shellbox  -v /run/shared:/run/shared  -v /srv/shellbox/config/:/srv/app/config  -v /srv/shellbox/src:/srv/app/src --name spftest docker-registry.wikimedia.org/wikimedia/mediawiki-libs-shellbox:2024-06-13-133425-video --nodaemonize
[20-Jul-2024 14:43:13] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
[20-Jul-2024 14:43:13] ERROR: unable to bind listening socket for address '/run/shared/fpm-www.sock': Permission denied (13)
[20-Jul-2024 14:43:13] ERROR: FPM initialization failed
[20-Jul-2024 14:43:13] ERROR: FPM initialization failed
Jul 20 2024, 2:44 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan claimed T370462: Remove or replace deployment-shellbox.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).
Jul 20 2024, 2:26 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan updated the task description for T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm.
Jul 20 2024, 2:25 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan claimed T370582: Remove or replace deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud.
Jul 20 2024, 2:09 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan created T370582: Remove or replace deployment-mwmaint02.deployment-prep.eqiad1.wikimedia.cloud.
Jul 20 2024, 2:09 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan claimed T361386: Remove or replace deployment-parsoid12.deployment-prep.eqiad1.wikimedia.cloud.

deployment-parsoid14 has been installed with a Bullseye image.

Jul 20 2024, 2:04 PM · cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan updated subscribers of T370458: Remove or replace poolcounter06.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

Upgrade to bullseye/bookworm blocked due to T332015. @MoritzMuehlenhoff, can I help you to get poolcounter-prometheus-exporter imported to bullseye and/or bookworm (preferably both)?

Jul 20 2024, 1:43 PM · cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan claimed T370458: Remove or replace poolcounter06.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).
Jul 20 2024, 1:38 PM · cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan added a comment to T370459: Remove or replace deployment-push-notifications01.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

^ after merging this change, deployment-push-notifications01 can be replaced with a Bookworm instance.

Jul 20 2024, 1:34 PM · Patch-For-Review, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan claimed T370459: Remove or replace deployment-push-notifications01.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).
Jul 20 2024, 1:25 PM · Patch-For-Review, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan added a comment to T361387: Replace or delete deployment-mediawiki[11-12].deployement-prep.eqiad1.wikimedia.cloud.

mediawiki11 and mediawiki12 are no longer in use, but still receive scap deployments. As soon as the two changes above have been merged, we can delete these instances.

Jul 20 2024, 1:19 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan updated subscribers of T369915: Rebuild or delete deployment-docker-mobileapps01.

@Jgiannelos I see you have created mobileapps02 with a Bullseye image. Is mobileapps01 ready for removal?

Jul 20 2024, 12:32 PM · cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure

Jul 19 2024

Southparkfan closed T370487: Remove or replace deployment-jobrunner04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation), a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, as Resolved.
Jul 19 2024, 2:50 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T370487: Remove or replace deployment-jobrunner04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation) as Resolved.
Jul 19 2024, 2:49 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan added a comment to T370487: Remove or replace deployment-jobrunner04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

Done (volume has been deleted as well)

Jul 19 2024, 2:48 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T369919: Replace deployment-ircd02 with a Bullseye or Bookworm host as Resolved.

Done :)

Jul 19 2024, 2:39 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan closed T369919: Replace deployment-ircd02 with a Bullseye or Bookworm host, a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, as Resolved.
Jul 19 2024, 2:39 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan added a comment to T370487: Remove or replace deployment-jobrunner04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).

deployment-jobrunner04 has been shut down (had to reboot due to errors in Jenkins). As soon as https://gerrit.wikimedia.org/r/1055394 and https://gerrit.wikimedia.org/r/1055412 are merged, we can delete that instance.

Jul 19 2024, 11:09 AM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan created T370510: Increase RAM and volume quotas for deployment-prep.
Jul 19 2024, 10:43 AM · Beta-Cluster-Infrastructure, Cloud-VPS (Quota-requests)
Southparkfan claimed T361387: Replace or delete deployment-mediawiki[11-12].deployement-prep.eqiad1.wikimedia.cloud.
Jul 19 2024, 10:31 AM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan updated subscribers of T369914: Rebuild or delete deployment-docker-cpjobqueue01.

Instance is offline, seems to be superseded by deployment-changeprop-1.deployment-prep.eqiad1.wikimedia.cloud per T357476#9540192. @Urbanecm_WMF, do you agree we can delete deployment-docker-cpjobqueue01?

Jul 19 2024, 9:22 AM · Cloud-VPS (Debian Buster Deprecation), cloud-services-team, Beta-Cluster-Infrastructure
Southparkfan closed T369913: Rebuild or delete deployment-docker-changeprop01 as Invalid.

Instance does not exist anymore?

Jul 19 2024, 9:22 AM · cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan closed T369913: Rebuild or delete deployment-docker-changeprop01, a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, as Invalid.
Jul 19 2024, 9:20 AM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan changed the status of T369919: Replace deployment-ircd02 with a Bullseye or Bookworm host from Open to In Progress.

After merging https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/1055306, the irc.beta.wmflabs.org RRset can be removed, and the floating IP can be removed from deployment-ircd03 as well; if you want to test the IRC server, you can run irssi on a Cloud VPS instance.

Jul 19 2024, 9:16 AM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan changed the status of T369919: Replace deployment-ircd02 with a Bullseye or Bookworm host, a subtask of T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm, from Open to In Progress.
Jul 19 2024, 9:16 AM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure

Jul 18 2024

Southparkfan updated the task description for T327742: Migrate deployment-prep away from Debian Buster to Bullseye/Bookworm.
Jul 18 2024, 11:28 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan claimed T370487: Remove or replace deployment-jobrunner04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).
Jul 18 2024, 11:27 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan created T370487: Remove or replace deployment-jobrunner04.deployment-prep.eqiad1.wikimedia.cloud (Buster deprecation).
Jul 18 2024, 11:27 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan added a comment to T369913: Rebuild or delete deployment-docker-changeprop01.
Jul 18 2024, 11:22 PM · cloud-services-team, Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure
Southparkfan added a comment to T369919: Replace deployment-ircd02 with a Bullseye or Bookworm host.

Loop fixed by setting profile::base::remove_python2_on_bullseye: false on prefix level, also done in production.

Jul 18 2024, 10:25 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan added a comment to T369919: Replace deployment-ircd02 with a Bullseye or Bookworm host.

role::mw_rc_irc seems to work fine on a Bullseye box, except for a loop.

Jul 18 2024, 10:20 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team
Southparkfan claimed T369919: Replace deployment-ircd02 with a Bullseye or Bookworm host.
Jul 18 2024, 9:43 PM · Cloud-VPS (Debian Buster Deprecation), Beta-Cluster-Infrastructure, cloud-services-team

Jun 14 2024

Southparkfan added a comment to T367522: Cloud VPS "auditlogging" project Buster deprecation.

Relevant: T127717#9671526 (i.e. can be deleted without issues, but preferably only before other instances are deleted)

Jun 14 2024, 8:00 PM · Cloud-VPS (Debian Buster Deprecation)

May 10 2024

Southparkfan added a comment to T351418: Upgrade from ISC-DHCP Server to KEA-DHCP Server.

Thanks for your help, Riccardo! Given current time constraints, I'm afraid most of this work will take multiple months, but nevertheless, to see whether kea_python still works with the Kea packages provided by Debian, I felt it was time to bite the bullet and build the bindings manually.

May 10 2024, 5:08 PM · Infrastructure-Foundations

Apr 17 2024

Southparkfan added a comment to T351418: Upgrade from ISC-DHCP Server to KEA-DHCP Server.

Thank you for your reply! My comments:

[...]

  1. The current setup that by default doesn't offer an IP to DHCP requests is by design, in response to data loss on servers that rebooted into PXE by error (the force PXE bit didn't get cleared by the BIOS or similar), see T251416 for context. This is easily fixable having a Netbox field that is empty by default and the reimage/dhcp cookbooks will set it to the PXE image to be installed next and will reset it after the DHCP step is done. The automation should surely not give PXE fields if that field is not set, probably not even an IP for safety reasons (to be discussed).

As I understand it, no server in production VLANs (that is: starting with {analytics,private,public} - excluding frack infrastructure?) should rely on DHCP for any purpose other than reimaging, because the IPv4 address will be set statically in d-i. For that reason, I can see why we would like to refuse DHCP requests if no syslinux path is provided by NetBox. I wouldn't classify it as a security measure against malevolent administrators, but rather as a failsafe to mitigate the impact of operator error.

Apr 17 2024, 11:47 AM · Infrastructure-Foundations

Apr 9 2024

Southparkfan updated subscribers of T351418: Upgrade from ISC-DHCP Server to KEA-DHCP Server.

@ayounsi and I have discussed my first findings, and we thought it made sense to share them here.

Apr 9 2024, 7:41 PM · Infrastructure-Foundations

Mar 28 2024

Southparkfan added a comment to T127717: Move Cloud VPS auth.logs to central logging.

@Southparkfan We're trying to reduce use of Buster in cloud-vps, and two servers in 'auditlogging' are running Buster: syslog-server-04 and syslog-client04. My recollection is that they're redundant now that server-05 and client05 exist (and are running bookworm) -- is that right? Can the 04 VMs be removed?

The purpose on having syslog servers on multiple operating systems is to verify compatibility. As you might have seen, sometimes, rsyslog requires OS-specific changes to work properly.

If you don't mind potentially breaking Buster compatibility in the future, or if we should remove support right away, then these servers are OK to go.

That's a good point. We'll save these for a bit later in the Buster deprecation cycle. Thanks!

Mar 28 2024, 11:43 PM · Cloud-VPS, cloud-services-team, User-dcaro, Sustainability (Incident Followup)
Southparkfan added a comment to T127717: Move Cloud VPS auth.logs to central logging.

@Southparkfan We're trying to reduce use of Buster in cloud-vps, and two servers in 'auditlogging' are running Buster: syslog-server-04 and syslog-client04. My recollection is that they're redundant now that server-05 and client05 exist (and are running bookworm) -- is that right? Can the 04 VMs be removed?

Mar 28 2024, 8:53 PM · Cloud-VPS, cloud-services-team, User-dcaro, Sustainability (Incident Followup)

Mar 27 2024

Southparkfan added a comment to T351418: Upgrade from ISC-DHCP Server to KEA-DHCP Server.

Haven't made a lot of progress on this, unfortunately. Scheduled for April.

Mar 27 2024, 12:46 PM · Infrastructure-Foundations

Nov 20 2023

Southparkfan claimed T351418: Upgrade from ISC-DHCP Server to KEA-DHCP Server.

I'll work on this.

Nov 20 2023, 12:49 PM · Infrastructure-Foundations

Nov 15 2023

Southparkfan added a comment to T351181: syslog tls clients failing to connect to centrallog2002 post puppet7 migration.

Production migration from the gnutls driver to the openssl driver can be tracked in T324623.

Nov 15 2023, 3:05 PM · Patch-For-Review, SRE-tools, Puppet-Core, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE
Southparkfan added a parent task for T324623: Switch rsyslog from gtls to ossl: T351181: syslog tls clients failing to connect to centrallog2002 post puppet7 migration.
Nov 15 2023, 3:01 PM · User-MoritzMuehlenhoff, Cloud-VPS, cloud-services-team, Patch-For-Review, SRE, observability, User-dcaro
Southparkfan added a subtask for T351181: syslog tls clients failing to connect to centrallog2002 post puppet7 migration: T324623: Switch rsyslog from gtls to ossl.
Nov 15 2023, 3:01 PM · Patch-For-Review, SRE-tools, Puppet-Core, Puppet (Puppet 7.0), Infrastructure-Foundations, SRE

Nov 3 2023

Southparkfan updated Southparkfan.
Nov 3 2023, 8:03 PM

Oct 13 2023

Southparkfan added a comment to T348837: Investigate IPVS IPIP encapsulation support.

Alternative to consider: injecting REDIRECTs for traffic meant for a VIP. See the second section at http://www.linuxvirtualserver.org/docs/arp.html. I haven't tested it and it requires some sort of Netfilter implementation on the realservers, but it avoids MTU-related issues (when tunneling traffic). Nevermind, ARP problem is solved at Wikimedia by not annoucing ARP. MTU is a challenge when using any type of encapsulation (in this case IPIP), but that's a different issue :)

Oct 13 2023, 9:34 AM · Patch-For-Review, SRE, Traffic

Oct 3 2023

Southparkfan added a subtask for T348075: Ingest Cloud VPS audit logs into production logging pipeline: T127717: Move Cloud VPS auth.logs to central logging.
Oct 3 2023, 10:29 PM · Security, Cloud-VPS, observability
Southparkfan added a parent task for T127717: Move Cloud VPS auth.logs to central logging: T348075: Ingest Cloud VPS audit logs into production logging pipeline.
Oct 3 2023, 10:29 PM · Cloud-VPS, cloud-services-team, User-dcaro, Sustainability (Incident Followup)
Southparkfan created T348075: Ingest Cloud VPS audit logs into production logging pipeline.
Oct 3 2023, 10:28 PM · Security, Cloud-VPS, observability

Aug 5 2023

Southparkfan created T343628: Cannot set up standalone puppetmaster due to stray ruby process at port 8140.
Aug 5 2023, 1:50 PM · Cloud-VPS

May 12 2023

Southparkfan added a comment to T336428: Reimaging lvs2012 fails as the host is unreachable from cumin2002.

Ok I think I see what the issue is. Looking at the kernel docs they state that "the max value from conf/{all,interface}/rp_filter is used when doing source validation on the {interface}."

This effectively means that this setting:

net.ipv4.conf.all.rp_filter = 1

Nullifies the per-interface setting on eno12399np0:

net.ipv4.conf.eno12399np0.rp_filter = 0

(...)

May 12 2023, 6:58 PM · SRE, Infrastructure-Foundations, Traffic

Feb 1 2023

Southparkfan added a comment to T127717: Move Cloud VPS auth.logs to central logging.

I have expanded https://wikitech.wikimedia.org/wiki/Portal:Cloud_VPS/Admin/Auth_logging. The 'known limitations' section shows there is enough work to do, but to avoid a never ending task, I am fine with resolving this task when T127717#8505600 has been applied to Cloud VPS. I find the lack of monitoring to be a blocker too, though.

Feb 1 2023, 7:30 PM · Cloud-VPS, cloud-services-team, User-dcaro, Sustainability (Incident Followup)

Dec 14 2022

Southparkfan added a comment to T325128: git: detected dubious ownership in repository at '/srv/mediawiki-staging'.

Standalone puppetmasters are also affected by this Git update:

$ git push -f project_puppetmaster HEAD:production
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
remote: fatal: detected dubious ownership in repository at '/var/lib/git/operations/puppet'
remote: To add an exception for this directory, call:
remote: 
remote: 	git config --global --add safe.directory /var/lib/git/operations/puppet
Dec 14 2022, 5:45 PM · Release-Engineering-Team (Radar), SRE, Beta-Cluster-Infrastructure

Dec 7 2022

Southparkfan added a comment to T324623: Switch rsyslog from gtls to ossl.

I have tested https://gerrit.wikimedia.org/r/c/operations/puppet/+/865731 by using rsyslog-openssl on one syslog client and one syslog server running buster + one syslog client and one syslog server running bullseye. All works as expected.

Dec 7 2022, 8:05 PM · User-MoritzMuehlenhoff, Cloud-VPS, cloud-services-team, Patch-For-Review, SRE, observability, User-dcaro
Southparkfan added a comment to T127717: Move Cloud VPS auth.logs to central logging.

Status: we chose #3 (Let's Encrypt via acme-chief). We've gotten stuck on a bug in the gnutls driver for rsyslog: T324623

Dec 7 2022, 12:16 AM · Cloud-VPS, cloud-services-team, User-dcaro, Sustainability (Incident Followup)