Page MenuHomePhabricator

bking (Brian King)
Senior Site Reliability Engineer, Search Platform Team

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Dec 15 2021, 9:19 PM (158 w, 6 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
BKing (WMF) [ Global Accounts ]

Recent Activity

Fri, Dec 20

bking added a comment to T374967: wdqs-categories migration: decide where to migrate.

Note that if we do want to run in Kubernetes, we need to build a wdqs image, similar to what WMDE does here

Fri, Dec 20, 6:01 PM · Wikidata, Wikidata-Query-Service, Data-Platform-SRE
bking added a comment to T382601: Object storage quota increase request for search project.

Thanks for the update @Andrew . Per IRC conversation, I've changed this request from wikidata-query project to search, as that project doesn't a have a hyphen. Let us know if you are able to fulfill this request.

Fri, Dec 20, 5:42 PM · Wikidata, Cloud-VPS (Quota-requests), Wikidata-Query-Service, Data-Platform-SRE
bking updated the task description for T382601: Object storage quota increase request for search project.
Fri, Dec 20, 5:42 PM · Wikidata, Cloud-VPS (Quota-requests), Wikidata-Query-Service, Data-Platform-SRE
bking renamed T382601: Object storage quota increase request for search project from Object storage quota increase request for wikidata-query project to Object storage quota increase request for search project.
Fri, Dec 20, 5:40 PM · Wikidata, Cloud-VPS (Quota-requests), Wikidata-Query-Service, Data-Platform-SRE
bking edited projects for T382601: Object storage quota increase request for search project, added: Cloud-VPS (Quota-requests); removed Cloud-Services.
Fri, Dec 20, 4:13 PM · Wikidata, Cloud-VPS (Quota-requests), Wikidata-Query-Service, Data-Platform-SRE
bking updated the task description for T382601: Object storage quota increase request for search project.
Fri, Dec 20, 4:13 PM · Wikidata, Cloud-VPS (Quota-requests), Wikidata-Query-Service, Data-Platform-SRE
bking created T382601: Object storage quota increase request for search project.

The Cloud-Services project tag is not intended to have any tasks. Please check the list on https://phabricator.wikimedia.org/project/profile/832/ and replace it with a more specific project tag to this task. Thanks!

Fri, Dec 20, 4:11 PM · Wikidata, Cloud-VPS (Quota-requests), Wikidata-Query-Service, Data-Platform-SRE

Thu, Dec 19

bking created T382542: Improve pybal config validation.
Thu, Dec 19, 9:12 PM · Data-Platform-SRE, PyBal
bking closed T381241: ProbeDown - Search Platform still getting these tickets, we need this to stop as Resolved.

I checked the alert files on the Prometheus hosts using cumin:

Thu, Dec 19, 3:45 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20)

Wed, Dec 18

bking added a comment to T374916: Port Categories lag / ping checks to Prometheus/Alertmanager.

We're now shipping the metrics correctly (thanks volans and dcausse ).

Wed, Dec 18, 3:38 PM · Patch-For-Review, Data-Platform-SRE (2024.11.30 - 2024.12.20), SRE Observability (FY2024/2025-Q2), MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), Wikidata, Discovery-Search, Wikidata-Query-Service, Observability-Alerting

Tue, Dec 17

bking added a comment to T374916: Port Categories lag / ping checks to Prometheus/Alertmanager.

I ran a pcap on wdqs1011 and I can confirm that the prometheus hosts are scraping the correct port.

Tue, Dec 17, 10:38 PM · Patch-For-Review, Data-Platform-SRE (2024.11.30 - 2024.12.20), SRE Observability (FY2024/2025-Q2), MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), Wikidata, Discovery-Search, Wikidata-Query-Service, Observability-Alerting
bking added a comment to T374916: Port Categories lag / ping checks to Prometheus/Alertmanager.

Per today's DPE SRE standup, I've grabbed this ticket and will try to move it forward.

Tue, Dec 17, 9:20 PM · Patch-For-Review, Data-Platform-SRE (2024.11.30 - 2024.12.20), SRE Observability (FY2024/2025-Q2), MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), Wikidata, Discovery-Search, Wikidata-Query-Service, Observability-Alerting
bking claimed T374916: Port Categories lag / ping checks to Prometheus/Alertmanager.
Tue, Dec 17, 8:59 PM · Patch-For-Review, Data-Platform-SRE (2024.11.30 - 2024.12.20), SRE Observability (FY2024/2025-Q2), MW-1.43-notes (1.43.0-wmf.25; 2024-10-01), Wikidata, Discovery-Search, Wikidata-Query-Service, Observability-Alerting
bking closed T364233: add https://imagehash-sparql.wmcloud.org/sparql endpoint to wikidata federated query whitelists, a subtask of T197530: [tracking] federation query issues on Wikidata Query Server, as Resolved.
Tue, Dec 17, 7:08 PM · Discovery-ARCHIVED, Wikidata-Query-Service, Wikidata, Tracking-Neverending
bking closed T364233: add https://imagehash-sparql.wmcloud.org/sparql endpoint to wikidata federated query whitelists as Resolved.

@Zache I ran your example query against commons-query and it appears to be giving a response. As such, I'm going to resolve this ticket. That being said: if it does not work, respond here and the ticket will re-open.

Tue, Dec 17, 7:08 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), WMFI-imagehash, Wikidata, Wikimedia-Hackathon-2024, Wikidata-Query-Service
bking claimed T364233: add https://imagehash-sparql.wmcloud.org/sparql endpoint to wikidata federated query whitelists.
Tue, Dec 17, 6:56 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), WMFI-imagehash, Wikidata, Wikimedia-Hackathon-2024, Wikidata-Query-Service
bking moved T378757: Jupyter and Analytics Client Enhancements Phase 1: Ensure acceptable storage performance with CephFS from Backlog - project to In Progress on the Data-Platform-SRE (2024.11.30 - 2024.12.20) board.
Tue, Dec 17, 3:37 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20)

Mon, Dec 16

bking added a comment to T381919: Supermicro: unable to set boot order after using Redfish to boot once.

Per IRC conversation with @elukey , I just wanted to let y'all know that I successfully reimaged cloudelastic1012 just now. No Puppet 5, no CSR or any other errors.

Mon, Dec 16, 9:49 PM · Infrastructure-Foundations
bking claimed T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly.
Mon, Dec 16, 8:35 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Wikidata
bking moved T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly from Backlog - project to In Progress on the Data-Platform-SRE (2024.11.30 - 2024.12.20) board.
Mon, Dec 16, 8:35 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Wikidata
bking added a comment to T378757: Jupyter and Analytics Client Enhancements Phase 1: Ensure acceptable storage performance with CephFS.

Since benchmarking is not my forte, I've created a benchmarking plan that sticks as closely as possible to Brendan Gregg's recommendations . Feel free to review/update the document as necessary.

Mon, Dec 16, 7:11 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20)
bking changed the status of T378757: Jupyter and Analytics Client Enhancements Phase 1: Ensure acceptable storage performance with CephFS, a subtask of T260386: Improve the JupyterHub services and use CAS/SSO, from Open to In Progress.
Mon, Dec 16, 7:05 PM · Data-Platform-SRE, Epic, Data-Engineering, Data-Engineering-Jupyter, User-MoritzMuehlenhoff, CAS-SSO
bking changed the status of T378757: Jupyter and Analytics Client Enhancements Phase 1: Ensure acceptable storage performance with CephFS, a subtask of T378735: Jupyter and Analytics Client Enhancements Phase 1: enable shared home directories on the stat servers umbrella task, from Open to In Progress.
Mon, Dec 16, 7:05 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Ceph
bking changed the status of T378757: Jupyter and Analytics Client Enhancements Phase 1: Ensure acceptable storage performance with CephFS from Open to In Progress.
Mon, Dec 16, 7:05 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20)

Thu, Dec 12

bking moved T371994: Deploy the HDFS synchronizer (blunderbuss) service to the dse-k8s cluster from In Progress to Blocked/Waiting on the Data-Platform-SRE (2024.11.30 - 2024.12.20) board.

@Jelto see above, Puppet laid down the ferm rules when the patch you reviewed was merged, but ferm didn't actually load them, even when I reloaded ferm.service.

Thu, Dec 12, 9:47 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Patch-For-Review, Data-Engineering (Q2 2024 October 1st - December 31th)
bking edited P71709 blunderbuss curls.
Thu, Dec 12, 8:53 PM · Data-Platform-SRE
bking updated the title for P71709 blunderbuss curls from blunderbuss curl from cumin2002 to blunderbuss curls.
Thu, Dec 12, 8:53 PM · Data-Platform-SRE
bking created P71709 blunderbuss curls.
Thu, Dec 12, 8:23 PM · Data-Platform-SRE

Tue, Dec 10

bking updated the task description for T381918: Monitor commons-query.wikidata.org.
Tue, Dec 10, 7:55 PM · Data-Platform-SRE
bking created T381918: Monitor commons-query.wikidata.org.
Tue, Dec 10, 7:53 PM · Data-Platform-SRE
bking reopened T350793: move commons-query.wikimedia.org and query.wikidata.org to kubernetes as "In Progress".
Tue, Dec 10, 7:40 PM · Patch-For-Review, Data-Platform-SRE (2024.11.30 - 2024.12.20), User-ItamarWMDE, Wikidata, wmde-wikidata-tech, Wikidata Query UI, GitLab (Pipeline Services Migration🐤), collaboration-services
bking reopened T350793: move commons-query.wikimedia.org and query.wikidata.org to kubernetes, a subtask of T300171: Move micro sites from Ganeti to Kubernetes and from Gerrit to GitLab, as In Progress.
Tue, Dec 10, 7:40 PM · Patch-For-Review, GitLab (Pipeline Services Migration🐤), collaboration-services
bking closed T350793: move commons-query.wikimedia.org and query.wikidata.org to kubernetes as Invalid.
Tue, Dec 10, 7:39 PM · Patch-For-Review, Data-Platform-SRE (2024.11.30 - 2024.12.20), User-ItamarWMDE, Wikidata, wmde-wikidata-tech, Wikidata Query UI, GitLab (Pipeline Services Migration🐤), collaboration-services
bking closed T350793: move commons-query.wikimedia.org and query.wikidata.org to kubernetes, a subtask of T300171: Move micro sites from Ganeti to Kubernetes and from Gerrit to GitLab, as Invalid.
Tue, Dec 10, 7:39 PM · Patch-For-Review, GitLab (Pipeline Services Migration🐤), collaboration-services
bking lowered the priority of T350793: move commons-query.wikimedia.org and query.wikidata.org to kubernetes from Medium to Low.
Tue, Dec 10, 7:39 PM · Patch-For-Review, Data-Platform-SRE (2024.11.30 - 2024.12.20), User-ItamarWMDE, Wikidata, wmde-wikidata-tech, Wikidata Query UI, GitLab (Pipeline Services Migration🐤), collaboration-services
bking moved T350793: move commons-query.wikimedia.org and query.wikidata.org to kubernetes from Blocked/Waiting to In Progress on the Data-Platform-SRE (2024.11.30 - 2024.12.20) board.
Tue, Dec 10, 5:56 PM · Patch-For-Review, Data-Platform-SRE (2024.11.30 - 2024.12.20), User-ItamarWMDE, Wikidata, wmde-wikidata-tech, Wikidata Query UI, GitLab (Pipeline Services Migration🐤), collaboration-services
bking updated Other Assignee for T350793: move commons-query.wikimedia.org and query.wikidata.org to kubernetes, added: bking.
Tue, Dec 10, 5:56 PM · Patch-For-Review, Data-Platform-SRE (2024.11.30 - 2024.12.20), User-ItamarWMDE, Wikidata, wmde-wikidata-tech, Wikidata Query UI, GitLab (Pipeline Services Migration🐤), collaboration-services
bking added a comment to T350793: move commons-query.wikimedia.org and query.wikidata.org to kubernetes.

Hello. This is Brian and I'm an SRE for the Search Platform team.

Tue, Dec 10, 5:53 PM · Patch-For-Review, Data-Platform-SRE (2024.11.30 - 2024.12.20), User-ItamarWMDE, Wikidata, wmde-wikidata-tech, Wikidata Query UI, GitLab (Pipeline Services Migration🐤), collaboration-services
bking closed T379330: Create pybal pools for wdqs-internal-main and wdqs-internal-scholarly, a subtask of T376150: Prepare hosts to serve wdqs-internal-main & wdqs-internal-scholarly, as Resolved.
Tue, Dec 10, 4:49 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Wikidata
bking closed T379330: Create pybal pools for wdqs-internal-main and wdqs-internal-scholarly as Resolved.

@RKemper I believe this one is finished, but not 100% sure. Moving to "Needs Review" so you can take a look. Feel free to close if everything is done.

Tue, Dec 10, 4:49 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Wikidata
bking updated Other Assignee for T379330: Create pybal pools for wdqs-internal-main and wdqs-internal-scholarly, added: bking.
Tue, Dec 10, 4:47 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Wikidata
bking assigned T379023: Create WDQS split endpoints for internal traffic and reconfigure clients to use the new endpoints to RKemper.
Tue, Dec 10, 4:47 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Wikidata

Mon, Dec 9

bking added a comment to T378368: Q2:rack/setup/install cloudelastic101[12].

@elukey I'm fine with focusing our efforts on UEFI, it seems like the best use of our time. Ping me in IRC if I can do anything to help test.

Mon, Dec 9, 7:22 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), SRE, ops-eqiad, Discovery-Search, DC-Ops
bking moved T378030: Q2:rack/setup/install wdqs102[567] from Backlog - operations to Done on the Data-Platform-SRE (2024.11.30 - 2024.12.20) board.

It took a few tries, but wdqs1025 is now running off UEFI. I left some notes here about my experience. Closing...

Mon, Dec 9, 6:29 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), wmde-wikidata-tech, Wikidata, Wikidata-Query-Service, SRE, Discovery-Search, ops-eqiad, DC-Ops

Wed, Dec 4

bking added a comment to P71557 conftool diff between known good gerrit change Ifdd752d043283d80bee805be9fab2aef433477c8.
diff -U0 _srv_config-master_pybal_codfw_wdqs-internal-scholarly.tmpl _srv_config-master_pybal_codfw_wdqs-scholarly.tmpl
-{{range $node := ls "/pools/codfw/wdqs-internal-scholarly/wdqs/"}}{{ $key := printf "/pools/codfw/wdqs-internal-scholarly/wdqs/%s" $node }}{{ $data := json (getv $key) }}
+{{range $node := ls "/pools/codfw/wdqs-scholarly/wdqs-scholarly/"}}{{ $key := printf "/pools/codfw/wdqs-scholarly/wdqs-scholarly/%s" $node }}{{ $data := json (getv $key) }}
Wed, Dec 4, 7:28 PM · Data-Platform-SRE
bking created P71557 conftool diff between known good gerrit change Ifdd752d043283d80bee805be9fab2aef433477c8.
Wed, Dec 4, 7:24 PM · Data-Platform-SRE
bking moved T371061: Update CirrusSearch dashboards to use new metrics/refresh dashboards from Done to Needs Review on the Data-Platform-SRE (2024.11.30 - 2024.12.20) board.
Wed, Dec 4, 3:38 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Discovery-Search, CirrusSearch
bking closed T371061: Update CirrusSearch dashboards to use new metrics/refresh dashboards, a subtask of T359033: EPIC: Convert CirrusSearch metrics to statslib, as Resolved.
Wed, Dec 4, 3:38 PM · Data-Platform-SRE, MW-1.43-notes (1.43.0-wmf.13; 2024-07-09), Observability-Metrics, Epic, Discovery-Search (Current work), CirrusSearch
bking closed T371061: Update CirrusSearch dashboards to use new metrics/refresh dashboards, a subtask of T350597: Audit and prioritize metrics for conversion to statslib that are used for graphite-based alerting, as Resolved.
Wed, Dec 4, 3:38 PM · SRE Observability (FY2024/2025-Q1), User-fgiunchedi, Observability-Metrics
bking closed T371061: Update CirrusSearch dashboards to use new metrics/refresh dashboards as Resolved.

All dashboards have been migrated to Prometheus metrics. As such, I'm closing this ticket.

Wed, Dec 4, 3:38 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Discovery-Search, CirrusSearch
bking closed T371061: Update CirrusSearch dashboards to use new metrics/refresh dashboards, a subtask of T369148: Replace usage of StatsdDataFactory with StatsFactory , as Resolved.
Wed, Dec 4, 3:38 PM · Observability-Metrics, Discovery-Search (Current work), CirrusSearch
bking updated the task description for T371061: Update CirrusSearch dashboards to use new metrics/refresh dashboards.
Wed, Dec 4, 3:37 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Discovery-Search, CirrusSearch

Tue, Dec 3

bking added a comment to T378738: Jupyter and Analytics Client Enhancements Phase 1: Estimate storage needs/provision CephFS storage for user directories.

No problem...as you said, we can come back to this when the time is right.

Tue, Dec 3, 11:35 PM · Data-Platform-SRE
bking added a comment to T381241: ProbeDown - Search Platform still getting these tickets, we need this to stop.

Search platform shouldn't be getting these alerts anyone-Data Platform SRE should be the sole responder. I thought I fixed this in T379182 , but it appears I need to take another look.

Tue, Dec 3, 4:46 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20)
bking changed the status of T381241: ProbeDown - Search Platform still getting these tickets, we need this to stop from Open to In Progress.
Tue, Dec 3, 4:44 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20)
bking renamed T381241: ProbeDown - Search Platform still getting these tickets, we need this to stop from ProbeDown to ProbeDown - Search Platform still getting these tickets, we need this to stop.
Tue, Dec 3, 4:43 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20)

Mon, Dec 2

bking moved T381283: wdqs1025 fails to PXE boot, NIC shows "no link" in DRAC web UI from Backlog - project to Blocked/Waiting on the Data-Platform-SRE (2024.11.30 - 2024.12.20) board.
Mon, Dec 2, 6:34 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), wmde-wikidata-tech, Wikidata, Wikidata-Query-Service, SRE, Discovery-Search, ops-eqiad, DC-Ops
bking renamed T381283: wdqs1025 fails to PXE boot, NIC shows "no link" in DRAC web UI from wdqs1025 fails to PXE boot to wdqs1025 fails to PXE boot, NIC shows "no link" in DRAC web UI.
Mon, Dec 2, 4:47 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), wmde-wikidata-tech, Wikidata, Wikidata-Query-Service, SRE, Discovery-Search, ops-eqiad, DC-Ops
bking created T381283: wdqs1025 fails to PXE boot, NIC shows "no link" in DRAC web UI.
Mon, Dec 2, 4:08 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), wmde-wikidata-tech, Wikidata, Wikidata-Query-Service, SRE, Discovery-Search, ops-eqiad, DC-Ops

Nov 27 2024

bking closed T379182: ProbeDown - wdqs1015/migrate LDF alerts to Data Platform SRE as Resolved.

Per the above merge, Data Platform SRE is the new recipient of these alerts. Search Platform will no longer receive them. Closing...

Nov 27 2024, 4:55 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)
bking closed T219507: Create cookbook to reindex into elasticsearch / cirrus, a subtask of T203943: Spicerack cookbooks TODO list, as Resolved.
Nov 27 2024, 4:51 PM · SRE-tools, User-jijiki, User-Joe
bking closed T219507: Create cookbook to reindex into elasticsearch / cirrus, a subtask of T251149: [epic] Ryan's onboarding to the Search Platform team, as Resolved.
Nov 27 2024, 4:51 PM · Discovery-Search (Current work), Epic
bking closed T219507: Create cookbook to reindex into elasticsearch / cirrus as Resolved.
Nov 27 2024, 4:51 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE-tools, SRE

Nov 26 2024

bking moved T378034: Q2:rack/setup/install elastic211[0-5] from Blocked/Waiting to Done on the Data-Platform-SRE (2024.11.09 - 2024.11.29) board.
Nov 26 2024, 10:49 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE, Discovery-Search, ops-codfw, DC-Ops
bking created T380939: decommission elastic20[55-60].
Nov 26 2024, 10:48 PM · decommission-hardware
bking created T380937: decommission cloudelastic100[5-6] : Don't decommission until we have cloudelastic101[12]!.
Nov 26 2024, 10:26 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), decommission-hardware
bking created T380934: decommission wdqs200[7-8].
Nov 26 2024, 10:15 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), decommission-hardware
bking added a comment to T380835: Exclude zram devices from disk health checks.

This is an interesting one...zram0 is a compressed RAMdisk, so it should not be in scope for any SMART (hard drive health) checks. I believe we are the first at WMF to use zRAM, so we'll probably need to find the SMART monitoring config and exclude zram devices.

Nov 26 2024, 7:47 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), sre-alert-triage

Nov 25 2024

bking moved T219507: Create cookbook to reindex into elasticsearch / cirrus from Ready for Dev -- SRE/Ops to Needs Reporting on the Discovery-Search (Current work) board.
Nov 25 2024, 7:50 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE-tools, SRE
bking edited projects for T219507: Create cookbook to reindex into elasticsearch / cirrus, added: Discovery-Search (Current work); removed Discovery-Search.
Nov 25 2024, 7:49 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE-tools, SRE
bking moved T219507: Create cookbook to reindex into elasticsearch / cirrus from Backlog - project to Done on the Data-Platform-SRE (2024.11.09 - 2024.11.29) board.
Nov 25 2024, 7:49 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE-tools, SRE
bking updated subscribers of T219507: Create cookbook to reindex into elasticsearch / cirrus.

Per IRC conversation with @dcausse , we now have an alternate way of reindexing that does not involve cookbooks . As such, we can close out this ticket.

Nov 25 2024, 7:49 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE-tools, SRE
bking edited projects for T219507: Create cookbook to reindex into elasticsearch / cirrus, added: Data-Platform-SRE (2024.11.09 - 2024.11.29); removed Data-Platform-SRE.
Nov 25 2024, 7:49 PM · Discovery-Search (Current work), Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE-tools, SRE
bking updated the task description for T371061: Update CirrusSearch dashboards to use new metrics/refresh dashboards.
Nov 25 2024, 7:38 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Discovery-Search, CirrusSearch
bking renamed T379182: ProbeDown - wdqs1015/migrate LDF alerts to Data Platform SRE from ProbeDown - wdqs1015 to ProbeDown - wdqs1015/migrate LDF alerts to Data Platform SRE.
Nov 25 2024, 5:51 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)
bking closed T380346: ProbeDown as Invalid.

Closing as a duplicate of T379182

Nov 25 2024, 5:49 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)
bking added a comment to T379182: ProbeDown - wdqs1015/migrate LDF alerts to Data Platform SRE.

This alert has cleared, although the WDQS hosts appear frequently in the "slow but successful probes" on the linked dashboards. If this keeps up, we might try restarting the services, but for now I don't think we need to take any actions on that front.

Nov 25 2024, 3:37 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)
bking updated the task description for T378735: Jupyter and Analytics Client Enhancements Phase 1: enable shared home directories on the stat servers umbrella task.
Nov 25 2024, 3:22 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Ceph
bking added a comment to T378735: Jupyter and Analytics Client Enhancements Phase 1: enable shared home directories on the stat servers umbrella task.

I've created the cephs.home.meta and cephs.home.data pools as required by step 1 (ref https://wikimedia.slack.com/archives/C055QGPTC69/p1732203662967269 )

Nov 25 2024, 3:20 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Ceph
bking created T380752: Migrate Relforge to Opensearch.
Nov 25 2024, 2:44 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Patch-For-Review, Research, Discovery-Search (Current work)

Nov 22 2024

bking created P71118 Puppet-merge output T380555.
Nov 22 2024, 9:59 PM
bking updated Other Assignee for T379329: Create puppet config for wdqs-internal-main and wdqs-internal-scholarly roles, added: bking.
Nov 22 2024, 9:07 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Patch-For-Review, Wikidata
bking created T380608: Address categories migration for internal graph split endpoints.
Nov 22 2024, 3:32 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Wikidata, Data-Platform
bking created T380594: Inform current wdqs-internal consumers about new internal graph split endpoints.
Nov 22 2024, 2:35 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20), Wikidata, Wikidata-Query-Service

Nov 21 2024

bking created T380529: Bring elastic211[0-5] into production clusters.
Nov 21 2024, 6:48 PM · Data-Platform-SRE (2024.11.30 - 2024.12.20)

Nov 20 2024

bking created P71106 Kerberos actions T371994.
Nov 20 2024, 6:58 PM
bking added a comment to T303011: Automate Search Platform opensearch plugin Debian package build process.

I failed to re-open this in December, re-opening now for the same reasons.

Nov 20 2024, 6:20 PM · Data-Platform-SRE
bking renamed T303011: Automate Search Platform opensearch plugin Debian package build process from Automate elastic plugin pkg build process to Automate Search Platform opensearch plugin Debian package build process.
Nov 20 2024, 6:19 PM · Data-Platform-SRE
bking closed T380395: Build/release opensearch-plugins Debian package, a subtask of T372769: Adjust cirrus-integration-test-runner to boot opensearch, as Resolved.
Nov 20 2024, 6:04 PM · MW-1.44-notes (1.44.0-wmf.8; 2024-12-17), Patch-For-Review, Discovery-Search (Current work)
bking closed T380395: Build/release opensearch-plugins Debian package as Resolved.

As the tasks above have been completed, I'll go ahead and close out this ticket.

Nov 20 2024, 6:04 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)
bking updated the task description for T380395: Build/release opensearch-plugins Debian package.
Nov 20 2024, 6:03 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)
bking claimed T380395: Build/release opensearch-plugins Debian package.
Nov 20 2024, 5:51 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)
bking created T380395: Build/release opensearch-plugins Debian package.
Nov 20 2024, 5:48 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29)

Nov 19 2024

bking closed T378835: Test 1G NIC compatibility, default to TFTP in sre.hosts.reimage cookbook as Resolved.

As @elukey and @Volans are addressing the issue in T363576 , I'm going to go ahead and close this one out. Thanks to you both for your help!

Nov 19 2024, 9:39 PM · DC-Ops, Infrastructure-Foundations
bking moved T380278: High priority: Disk space expansion on an-launcher1002 from Backlog - project to Done on the Data-Platform-SRE (2024.11.09 - 2024.11.29) board.
Nov 19 2024, 9:36 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE, ops-eqiad, DC-Ops
bking updated Other Assignee for T380278: High priority: Disk space expansion on an-launcher1002, added: bking.
Nov 19 2024, 9:35 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE, ops-eqiad, DC-Ops
bking added a comment to T380278: High priority: Disk space expansion on an-launcher1002.

Thank you very much @Jclark-ctr !

Nov 19 2024, 9:35 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE, ops-eqiad, DC-Ops
bking edited projects for T380278: High priority: Disk space expansion on an-launcher1002, added: Data-Platform-SRE (2024.11.09 - 2024.11.29); removed Data-Platform-SRE.
Nov 19 2024, 9:27 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE, ops-eqiad, DC-Ops
xcollazo awarded T380278: High priority: Disk space expansion on an-launcher1002 a Pterodactyl token.
Nov 19 2024, 8:31 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE, ops-eqiad, DC-Ops
bking created T380278: High priority: Disk space expansion on an-launcher1002.
Nov 19 2024, 3:05 PM · Data-Platform-SRE (2024.11.09 - 2024.11.29), SRE, ops-eqiad, DC-Ops