Nova Resource:Deployment-prep/SAL
Appearance
2024-08-13
- 23:59 bd808: Added BetaDevOpsBot as a service account with admin rights for OpenTofu automation tasks
2024-08-12
- 21:15 bd808: Added BryanDavis (self) as a project admin
2024-07-24
- 16:02 Southparkfan: moved sessionstorage/kask from sessionstorage04 to sessionstorage06 T370461
2024-07-23
- 16:55 Southparkfan: cancel kask maintenance, not going to perform switchover yet, see https://phabricator.wikimedia.org/T370461
- 16:11 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 16:10 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 16:09 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 16:07 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 16:05 andrew@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=99)
- 16:05 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:57 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:55 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:54 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:52 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:52 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:50 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:50 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:48 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:48 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:46 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:45 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:43 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:43 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:41 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:41 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:39 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:38 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:36 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:35 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:34 Southparkfan: starting kask maintenance - T370461
- 15:33 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:33 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:31 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:31 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:29 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:28 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:26 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:26 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:24 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:24 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:22 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:12 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:11 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:09 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
2024-07-22
- 17:07 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 17:05 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 17:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 17:00 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:30 Southparkfan: remove deployment-push-notifications01 - T370459
- 15:11 Southparkfan: remove deployment-parsoid12 - T361386
- 15:10 Southparkfan: remove deployment-mwmaint02 T370582
- 15:02 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 15:00 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 15:00 Southparkfan: delete deployment-urldownloader03 T370466
- 14:57 Southparkfan: delete deployment-mediawiki11 and deployment-mediawiki12 (incl PuppetDB data + volumes) T361387
- 14:43 Southparkfan: fix /srv/git/operations/puppet yet again (T364492) via chown -R gitpuppet:gitpuppet on .git/, then use 'pgit' (gitpuppet wrapper) to reset to oot branch
- 14:10 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 14:08 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
- 14:06 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.openstack.rebuild_dbinstance (exit_code=0)
- 14:04 andrew@cloudcumin1001: START - Cookbook wmcs.openstack.rebuild_dbinstance
2024-07-20
- 15:52 Southparkfan: add deployment-sessionstore05 (bookworm) - T370461
- 15:15 Southparkfan: add deployment-urldownloader04 (bookworm) - T370466
- 14:46 Southparkfan: deleted deployment-shellbox (buster) T370462
- 14:34 Southparkfan: add deployment-shellbox01 T370462
- 14:19 Southparkfan: err, adding deployment-mwmaint03, I meant - T370582
- 14:19 Southparkfan: add deployment-maint03, replacing the Buster instance T370582
2024-07-19
- 16:56 Southparkfan: switching trafficserver backends from mediawiki11 and 12 to 13 and 14 - T361387
- 14:47 Southparkfan: delete deployment-jobrunner04 (buster), replaced by 05 (bullseye) T370487
- 14:38 Southparkfan: remove floating IP for deployment-ircd03 T369919
- 11:03 Southparkfan: switched over cpjobqueue (running on deployment-changeprop-1) to deployment-jobrunner05 T370487
- 09:28 Southparkfan: create deployment-jobrunner05 with Bullseye image, T370487
2024-07-18
- 22:49 Southparkfan: deleted deployment-irc02 (buster), released its floating IP, deactivated & cleaned on puppetserver-1, removed irc-next.beta.wmcloud.org A RR - T369919
- 22:05 Southparkfan: add deployment-ircd03 (bullseye) with floating IP and irc-next.beta.wmcloud.org - T369919
2024-07-06
- 15:15 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance deployment-memc10
- 15:14 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance deployment-memc10
- 15:14 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance deployment-memc09
- 15:14 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance deployment-memc09
- 15:13 andrew@cloudcumin1001: END (PASS) - Cookbook wmcs.vps.remove_instance (exit_code=0) for instance deployment-memc08
- 15:13 andrew@cloudcumin1001: START - Cookbook wmcs.vps.remove_instance for instance deployment-memc08
2024-06-25
- 13:27 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1)
- 13:14 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_project_to_ovs
2024-06-24
- 23:02 bd808: Removed matanya's "reader" right per T368330
2024-06-18
- 10:17 taavi@cloudcumin1001: END (FAIL) - Cookbook wmcs.openstack.migrate_project_to_ovs (exit_code=1)
- 09:38 taavi: set deployment-db11 as writable after reboot
- 09:04 taavi@cloudcumin1001: START - Cookbook wmcs.openstack.migrate_project_to_ovs
2024-06-07
- 11:14 pmiazga: proceeding with soft restart deployment-puppetserver-1
- 10:31 pmiazga: deployment-puppetserver-1 - in /srv/git/operations/puppet cherry-picked I477c4b to test wikivoyage.beta.wmcloud.org domain handling - T355281
2024-06-06
- 18:31 pmiazga: T355281 testing mediawiki-config patch Idcd9cd, executed `scap sync-world`
- 17:56 pmiazga: Executed "mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki pl wikivoyage plwikivoyage pl.wikivoyage.beta.wmcloud.org" to add a new wiki - polish wikivoyage on beta.wmcloud.org domain
- 17:49 pmiazga: T355281 updated DNS zones and hiera configs - added *.m.wikipedia.beta.wmcloud.org, *.wikivoyage.beta.wmcloud.org and *.m.wikivoyage.beta.wmcloud.org domains
2024-06-04
- 11:03 pmiazga: added beta.wmcloud.org and *.wikipedia.beta.wmcloud.org definitions to SNI section in deployment-acme-chief and lets-encrypt section in deployment-cache in hiera config on horizon.
- 10:34 pmiazga: Live debugging of puppet. Pulled Ifd37f0 to puppetserver-1. Additionally fixed ownership of /srv/git/operations/puppet to `gitpuppet:gitpuppet` to solve problems with git pull.
2024-06-02
- away: T366415 removed upload.beta.wmflabs.org from hiera
2024-03-28
- 22:27 tgr: added toyofuku to deployment-prep
2024-03-14
- 21:31 andrewbogott: shutting down deployment-puppetdb03, deployment-puppetdb04, deployment-puppetmaster04. These have been replaced with new puppet infra and can be deleted in a couple of weeks if all is well.
2024-03-01
- 01:03 bd808: Added RLazarus as a project member
2023-11-30
- 17:09 tgr: added Dreamy_Jazz to members
2023-11-16
- 23:44 TheresNoTime: Add `DJackson` access, T351433
2023-10-27
- 21:26 tgr: set up deployment-rdb01 for redis (T340908)
2023-10-23
- 09:49 godog: turn off deployment-prometheus05 - T344974
2023-09-29
- 13:46 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
- 13:38 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console
- 13:33 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
- 13:30 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console
- 12:33 wm-bot2: dcaro@urcuchillay END (PASS) - Cookbook wmcs.openstack.cloudvirt.vm_console (exit_code=0)
- 12:31 wm-bot2: dcaro@urcuchillay START - Cookbook wmcs.openstack.cloudvirt.vm_console
2023-08-21
- 15:30 godog: shut prometheus05 - T344582
2023-07-28
- 18:05 andrewbogott: removing ::lvs::realserver::realserver_ips: from hiera for deployment-restbase-bullseye.deployment-prep.eqiad1.wikimedia.cloud because it's preventing puppet from compiling
- 17:57 andrewbogott: deleting deployment-thcipriani.deployment-prep.eqiad1.wikimedia.cloud, no longer used according to Tyler
- 17:38 andrewbogott: deleting deployment-mdb02.deployment-prep.eqiad1.wikimedia.cloud, seems abandoned by a departed staff member
- 17:18 andrewbogott: but not fixing deployment-mdb02.deployment-prep.eqiad1.wikimedia.cloud because it seems like mariadb was never set up there in the first place
- 17:16 andrewbogott: fixed miscellaneous puppet issues on four or five hosts
2023-07-05
- 10:59 fabfur: shutting down and removing unused deployment-cache-text07 and deployment-cache-upload07
2023-06-28
- 09:54 fabfur: removed (text|upload) instance references from wgCdnServersNoPurge (T327742)
2023-06-27
- 11:36 fabfur: removing old (text|upload) instance references from hieradata (horizon)
2023-06-26
- 12:41 fabfur: switch floating IP from deployment-cache-text07 to deployment-cache-text08 (bullseye upgrade: T327742) (fix sec group)
- 12:25 fabfur: reverted floating IP switch
- 12:14 fabfur: switch floating IP from deployment-cache-text07 to deployment-cache-text08 (bullseye upgrade: T327742)
2023-06-22
- 11:02 fabfur: switch floating IP from deployment-cache-upload07 to deployment-cache-upload08 (bullseye upgrade: T327742)
2023-06-16
- 13:50 vgutierrez: replaced buster acme-chief[03,04] with bullseye acme-chief[05,06]
2023-04-23
- 16:55 Krinkle: Fix profile::tlsproxy::envoy::global_cert_name in Horizon for webperf host to use '%{facts.fqdn}' instead of performance.discovery.wmnet as the latter doesn't resolve / would be an invalid cert for https://deployment-webperf21, ref T291015
2023-02-24
- 15:27 andrewbogott: deleting long-shutoff stretch instance deployment-imagescaler03.deployment-prep.eqiad1.wikimedia.cloud -- T289883
2023-01-19
- 19:02 andrewbogott: deleting long-shutdown stretch instances: deployment-echostore01, deployment-ms-fe03, deployment-prometheus02
2023-01-18
- 15:31 andrewbogott: shutting down deployment-imagescaler03 as it is long-overdue for replacement. See T294148 for details.
- 10:10 arturo: bump trove quotas (T326674)
2023-01-09
- 16:30 wm-bot2: Increased quotas by 6 cores (T326568) - cookbook ran by arturo@nostromo
2022-11-23
- 22:42 urandom: accidentally deleted deployment-sessionstore04
2022-11-09
- 14:56 andrewbogott: fixed puppet breakage on several instances
2022-10-31
- 15:56 andrewbogott: shutting down deployment-echostore01, deployment-ms-be0[56], deployment-mdb01, deployment-prometheus02, deployment-wikifeeds01 as per https://phabricator.wikimedia.org/T306068
2022-10-17
- 09:41 wm-bot2: Increased quotas by 4 cores (T320932) - cookbook ran by arturo@nostromo
2022-09-21
- 09:46 andrewbogott: removed some stray whitespace in /var/lib/git/operations/puppet that was preventing rebase on deployment-puppetmaster04.deployment-prep.eqiad.wmflabs
2022-08-29
- 21:25 inflatador: ES6->7 upgrade in beta-cluster T315604
2022-06-24
- 20:52 taavi: added `denisse` as a member
2022-06-20
- 16:30 urbanecm: add sgimeno as a project member (Growth engineer with need for access)
2022-05-25
- 18:20 TheresNoTime: samtar@deployment-mwmaint02:~$ mwscript resetUserEmail.php --wiki=wikidatawiki Mahir256 [snip] T309230
2022-05-23
- 19:21 inflatador: Deleted deployment-elastic0[5-7] in favor of newer bullseye hosts T299797
2022-05-16
- 19:31 inflatador: bking@deployment-elastic07 halted deployment-elastic07 in beta ES cluster; will decom on Friday T299797
- 19:03 inflatador: bking@deployment-elastic06 halted deployment-elastic06 in beta ES cluster; will decom on Friday T299797
2022-05-14
- 20:25 urbanecm: add TheresNoTime (samtar) as a project member per request
2022-05-13
- 18:58 inflatador: bking@deployment-elastic05 halted deployment-elastic05 in beta ES cluster; will decom in 1 wk T299797
2022-05-12
- 22:09 inflatador: bking@deployment-elastic05 banned deployment-elastic05 from beta ES cluster in preparation for decom T299797
2021-11-08
- 09:32 majavah: Remove rvogel from project members per IRC request
2021-10-05
- 12:03 majavah: root@deployment-cache-text06:~# systemctl restart sssd # T286502
2021-07-28
- 17:53 andrewbogott: rebooting deployment-logstash03 as it's in an inconsistent config state
2021-05-10
- 14:04 CFisch_WMDE: Improve comment around ReferencePreviews beta cluster default (T271206)
- 14:04 CFisch_WMDE: Forward renamed config name for improved template search features (T277028)
2021-05-05
- 14:17 CFisch_WMDE: Disable ReferencePreviews beta mode on beta labs (T271206)
2021-05-03
- 13:55 CFisch_WMDE: enable new search features for the template dialog (T271802)
2021-04-20
- 07:19 CFisch_WMDE: enable changes to the descriptions in the VE transclusion dialog (T273425)
- 07:17 CFisch_WMDE: enable suggested values paramter in TemplateData and VisualEditor (T271825)
2021-04-13
- 17:00 halfak: failed deploy to ORES (connection to host failed)
- 16:57 halfak: deploying ores f08a3cb
- 07:46 awight: enable syntax highlighting line numbering on all namespaces (T267911)
2021-03-22
- 11:36 dcaro: Created subzone svc.deployment-prep.eqiad1.wikimedia.cloud. (T276624)
- 11:34 dcaro: Created subzone beta.wmcloud.org (T276624)
2021-03-10
- 10:16 arturo: briefly stopping deployment-puppetdb03 to disable VMX CPU flag
2021-03-09
- 13:32 arturo: hard-reboot deployment-db05 because issues related to T276922
- 12:34 arturo: briefly rebooting VM deployment-db05, we need to reboot its hypervisor cloudvirt1038 and failed to migrate to other
2021-03-01
- 14:41 andrewbogott: changed profile::redis::multidc::discovery from 'false' to "" to comply with strict typing in the deployment-memc puppet prefix.
2020-12-23
- 19:03 balloons: resized deployment-puppetdb03 to g2.cores2.ram4.disk40 (T270420)
2020-12-16
- 22:00 mutante: adjusted 'puppet prefix' deployment-jobrunner to use "role::beta::mediawiki::jobrunner" instead of "role::mediawiki::jobrunner" - goes together with gerrit:649707 - no instance currently exists called 'deployment-jobrunner'
2020-11-11
- 09:50 awight: metawiki: Promoting User:Jan Dittrich (WMDE) into centralnoticeadmin...
2020-11-09
- 22:18 awight: metawiki: Promoting User:Jan Dittrich into centralnoticeadmin...
2020-10-29
- 17:24 andrewbogott: signing pending puppet certs for deployment-mediawiki-07.deployment-prep.eqiad1.wikimedia.cloud and deployment-mediawiki-09.deployment-prep.eqiad1.wikimedia.cloud
- 17:23 andrewbogott: signing pending puppet certs for deployment-kafka* nodes
- 16:17 andrewbogott: removing jkroll as a project member; the registered email is invalid so probably this user is no longer involved
2020-10-07
- 10:43 godog: move swift settings out of horizon and into puppet's hieradata
2020-08-04
- 22:00 mdholloway: deleted deployment-chromium01
2020-07-20
- 19:35 halfak: deploying ores f3c44be
2020-07-16
- 17:15 andrewbogott: added "profile::java::egd_source: /dev/random" to project-wide hiera since lack of that setting seems to be breaking puppet in a lot of places
2020-07-14
- 15:28 bd808: Silenced prometheus alerts for 7d
2020-06-24
- 18:56 halfak: deploying ORES 1b87365
2020-06-23
- 17:15 bstorm_: to fix puppet on several hosts, setting profile::java::hardened_tls: false in project puppet on horizon
- 17:09 bstorm_: restarted postgresql on deployment-puppetdb03
- 17:05 bstorm_: restarted puppetdb.service on deployment-puppetdb03
2020-06-12
- 08:15 awight: Granted dewiki-beta sysop to Adamw via commandline
- 07:57 awight: Update QuickSurveys config
2020-06-08
- 18:38 hauskatze: Rebooting deployment-logstash0[2,3]
2020-05-07
- 12:34 mutante: removing role::labs::lvm::srv from deployment servers since this is now included in role:deployment_server and should neve have been a role in the first place
- 12:07 mutante: - puppet still broken on deployment_servers due to unrelated pre-existing issues, also no alerts about it in shinken
- 12:04 mutante: - puppet broken on deployment_servers - fix deployed in https://gerrit.wikimedia.org/r/c/operations/puppet/+/594932
2020-04-20
- 19:48 halfak: Deploying ORES 514f94a
- 19:16 halfak: Deploying ORES ac2eb2f
- 18:52 halfak: Deploying ORES a5a5bce
- 15:41 halfak: Deploying ORES 5d977f4
2020-03-30
- 19:18 andrewbogott: restarting puppetdb on deployment-puppetdb03 and restarted apache2 on deployment-puppetmaster04 but puppet runs still fail everywhere
2020-02-14
- 21:53 andrewbogott: moving deployment-puppetdb02 and deployment-puppetmaster03 off of cloudvirt1014 (which will be drained next week anyway)
2020-02-07
- 16:57 halfak: deploying ores a6f4f14
2020-01-24
- 21:15 halfak: deploying ores 283f627
2020-01-23
- 21:50 halfak: deploying ores 039251f (reverting to last good state)
- 15:31 halfak: deploying ores 283f627a
2020-01-08
- 16:26 halfak: deploiying ores 039251f
2020-01-02
- 10:40 dcausse: created missing elastic indices: T241487
2019-12-27
- 12:24 hauskatze: Rebooting deployment-cpjobqueue
- 11:13 andrewbogott: migrating deployment-aqs03 to cloudvirt1009 in response to T241313
2019-12-18
- 19:13 halfak: deploying ores 80b1e62
2019-12-03
- 17:33 kevinbazira: deploying ores 6dd1fef
2019-11-11
- 18:28 MaxSem: Nuked HHVM and php7 tags on all beta wikis T75181
2019-11-08
- 17:46 MaxSem: Upgraded php7.2 on deployment-mwmaint01, was too old for MW
2019-10-31
- 20:29 tgr: importing a bunch of pages from production cswiki via importDump.php for T236823 (for reals now)
- 01:14 tgr: importing a bunch of pages from production cswiki via importDump.php for T236823
2019-09-25
- 23:01 andrewbogott: moving deployment-mwmaint01 and deployment-ircd to cloudvirt1021
- 15:15 andrewbogott: moving deployment-snapshot01 to cloudvirt1021
- 15:02 andrewbogott: moving deployment-dumps-puppetmaster02 to cloudvirt1021
2019-09-12
- 16:36 halfak: deploying ores 7d45b80
2019-08-25
2019-08-23
- 07:12 rxy: Applied SQL queries per phab:T231058#5433197
2019-08-06
- 17:09 accraze: deploying ores d08fa62
2019-08-05
- 22:24 accraze: deploying ores 4270244
2019-07-31
- 14:21 andrewbogott: moving deployment-sca02 to cloudvirt1030
- 12:59 andrewbogott: moving deployment-elastic05, deployment-kafka-main-2, deployment-mx02, deployment-webperf11 to new cloudvirts
2019-07-24
- 10:55 hauskatze: Dry-running extensions/AbuseFilter/maintenance/fixOldLogEntries.php refs. T228655
2019-07-11
- 20:30 mutante: add project member cdanis
2019-07-03
- 21:01 accraze: deploying ores 676f7ba
2019-06-25
- 15:51 awight: restart php7.2-fpm for wikidiff2 upgrade (T223391)
2019-06-05
- 19:34 andrewbogott: moving deployment-imagescaler03 to cloudvirt1029
2019-05-01
- 17:53 halfak: deploying ores:52e9759
2019-04-29
- 14:04 godog: add dsharpe user
2019-04-17
- 17:17 andrewbogott: cherry-picking https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/504580/ to move off of soon-to-be-shutdown dns recursors
2019-03-25
- 14:46 mateusbs17: sunset deployment-maps03
2019-03-14
- 19:14 ebernhardson: restart logstash on deployment-logstash2 to re-read and re-create apifeatureusage template
2019-03-13
- 18:09 ebernhardson: restart elasticsearch on deployment-elastic* to deploy apifeature usage fix (T183156)
2019-03-06
- 15:58 andrewbogott: deleting deployment-prometheus01 on Filippo's advice
2019-02-26
- 22:52 ebernhardson: delete logstash logs in /var/log/logstash generated prior to 2019
- 22:51 ebernhardson: restart logstash on deployment-logstash2 while hacking around to see why apifeatureusage doesn't work
2019-02-18
- 11:45 arturo: manually start deployment-db03 per Krenair request
2019-02-11
- 10:47 godog: shut deployment-prometheus01, unused now
2019-02-06
- 22:34 shdubsh: Deploy node-exporter 0.17 T213708
- 14:12 godog: shut off deployment-prometheus01 - T215272
- 14:00 godog: switch beta-prometheus to deployment-prometheus02 - T215272
2019-02-05
- 20:07 ebernhardson: jobrunner port 9006 is firewalled, revert to 9005 and created T215339 to fix job queue in beta cluste
- 19:36 ebernhardson: Update profile::cpjobqueue::{jobrunner,videoscaler}_host in horizon hiera from port 9005 to 9006 to match new restrictions in gerrit.wikimedia.org/r/481866
2019-02-04
- 21:48 ebernhardson: restart logstash on deployment-logstash2
2019-01-31
- 12:05 arturo: VM instances deployment-deploy01,deployment-deploy02,deployment-fluorine02,deployment-kafka-jumbo-2,deployment-kafka-main-1,deployment-maps04,deployment-mcs01,deployment-mediawiki-09,deployment-memc04,deployment-ms-be03,deployment-ms-fe02,deployment-parsoid09,deployment-sca04,deployment-webperf12, were stopped briefly due to issue in hypervisor (T215012)
2019-01-08
- 19:53 mutante: adjusting puppet config on deployment-mwmaint01. remove "mediawiki_maintenance" role from "other classes" section and apply "mediawiki::maintenance" instead after role rename in gerrit:479131 for consistency with other mediawiki:: roles
2019-01-07
- 19:32 awight: T212530: ORES revscoring 2.3.0
2018-11-23
- 13:55 dcausse: restarted elasticsearch on all deployement-elastic0X nodes (search broken on the beta cluster)
2018-11-20
- 23:45 mutante: deployment-deploy01 edited /srv/deployment/iegreview/iegreview/.git/DEPLOY_HEAD - - replaced deployment-tin with deployment-deploy1 to fix scap cloning / puppet
- 23:01 mutante: deployment-deploy01 edited /srv/deployment/scholarships/scholarships/.git - replaced deployment-tin with deployment-deploy1 to fix scap / cloning of scholarships app
- 13:55 andrewbogott: deleting deployment-redis05 and deployment-redis06 as per Giuseppe, "we're not using the old jobqueue, we should remove those vms"
2018-11-14
- 18:48 andrewbogott: moving deployment-mediawiki-07 to labvirt1008
- 18:31 andrewbogott: moving deployment-chromium01 to labvirt1009
- 18:06 andrewbogott: moving deployment-mx02 to labvirt1003
- 18:05 andrewbogott: migrating deployment-snapshot01 to labvirt1001
2018-11-13
- 22:19 andrewbogott: moving deployment-urldownloader02 to labvirt1012
- 21:59 andrewbogott: moving deployment-deploy02 to another labvirt
- 21:55 andrewbogott: moving deployment-webperf12 to a new labvirt
- 21:50 andrewbogott: moving deployment-dumps-puppetmaster02 to a new labvirt
- 21:43 andrewbogott: moving deployment-elastic05 to a new labvirt to clear out labvirt1016
- 13:01 arturo: a puppet refactor for the aptly module may have caused some puppet issues. Should be solved now
2018-11-02
- 14:04 Krenair: made onimisionipe a projectadmin per request in -cloud
2018-11-01
- 14:48 andrewbogott: moving deployment-redis05 to labvirt1012
- 14:47 Krenair: shut off deployment-redis05 for migration to new physical host
2018-10-31
- 21:16 Krenair: remove horizon hiera config for deployment-redis0[56] to unbreak puppet and remove old redis0[12] instance IPs T208040
- 19:55 andrewbogott: moving deployment-elastic06 to labvirt1012
- 19:40 andrewbogott: moving deployment-cpjobqueue to labvirt1012 to help clear out labvirt1017
- 19:11 andrewbogott: moving deployment-kafka-jumbo-1 to labvirt1012 to help clear out labvirt1017
- 18:54 andrewbogott: moving deployment-kafka-main-2 to labvirt1012 to help clear out labvirt1017
- 13:23 godog: enable statsd reporting for swift
2018-10-22
- 00:27 Krenair: Added gtirloni as a member per T207474 - I imagine he'll want to get in to look at shinken-related things
2018-10-02
- 08:56 godog: bounce logstash
2018-09-23
- 01:05 andrewbogott: rebooted deployment-maps03; OOM and also T205195
2018-08-09
- 01:42 awight: T201518: ORES, fawiki wp10, misc updates
2018-07-16
- 21:10 awight: ran namespaceDupes.php on beta enwiki
2018-06-30
- 20:40 Krenair: ran git gc on deployment-tin:/srv/mediawiki to free up space
2018-06-12
- 17:40 halfak: deploying ores 36037b6
2018-06-11
- 21:47 halfak: deploying ores 6ee8775
2018-06-04
- 15:44 awight: ORES: Fix T194322
2018-05-19
- 10:56 Krenair: amended uncommitted changes into HEAD commit (notified author) so I can unbreak puppet updates, also removed my old POC secure redirect puppet patch
2018-05-09
- 22:53 awight: ORES: wheels fixups
- 21:31 awight: Bump ORES wheels
- 21:04 awight: ORES: drafttopic in beta
2018-04-25
- 18:55 awight: ORES: Revscoring 2.2.2
2018-04-20
- 00:46 awight: roll back ORES beta to master
- 00:08 awight: Push ORES git-lfs to look at stuff
2018-04-16
- 18:58 awight: Update ORES editquality; T185903
2018-04-13
- 00:53 awight: ORES: Test large file in LFS
2018-04-12
- 23:01 awight: Try gerrit-based submodules for ORES, T180627
2018-04-11
- 21:04 awight: ORES experiment with git-lfs, T180627
2018-04-09
- 19:50 awight: Redundant virtualenv for ORES
- 18:06 awight: Restore to ORES master branch
- 17:17 awight: Test git-lfs in ORES
2018-04-04
- 21:24 awight: Roll back beta ORES
- 20:12 awight: Try dsh scap config for ORES
2018-03-21
- 23:25 RoanKattouw: Created maps security group for port 6533; removed port 6533 from sca security group
- 23:22 bd808: Raised security group quota from 20 to 40
2018-03-19
- 16:57 awight: ORES beta service is restored.
- 16:45 awight: Put ORES-beta back onto master branch
- 16:43 awight: ORES-beta has been down since January.
2018-03-15
- 15:47 awight: ORES with git-lfs, scap config
- 00:21 awight: ORES with git-lfs
2018-03-14
- 23:30 awight: new ORES submodule, pre-git-lfs
2018-03-13
- 18:08 awight: Enable Extension:JADE, T176333
2018-03-04
- 02:13 Krenair: Regenerated captcha images for T164047
2018-02-09
- 00:59 bd808: Removed Yuvipanda at user request (T186289)
2018-01-29
- 23:24 awight: Experiment with versioned ORES venv, T181071
2018-01-24
- 23:14 Krenair: armed keyholder on deployment-cumin using deployment-puppetmaster02:/var/lib/git/labs/private/files/ssh/tin/cumin_rsa.passphrase - this seems to have fixed cumin
2018-01-11
- 17:50 tgr: added Groovier1 to project members for T158909
2018-01-01
- 23:46 Krenair: ran `mwscript extensions/ORES/maintenance/CheckModelVersions.php --wiki=sqwiki` on deployment-tin for T183862
2017-12-21
- 16:54 awight: Update ORES to eb0f776bb
2017-12-20
- 18:00 RoanKattouw: Importing dump from deployment-db03 on deployment-db04
- 15:31 RoanKattouw: Restarting dump again, failed due to lack of disk space
- 15:07 RoanKattouw: Dropped invalid view labswiki.updates, restarting dump
- 14:59 RoanKattouw: Dumping all databases on deployment-db03 so I can restore replication on deployment-db04. This may cause MediaWiki writes to fail while the dump runs
2017-12-19
- 20:10 RoanKattouw: (Earlier today) Depooled deployment-db04, it needs fixing after replication broke badly. It's out of sync with deployment-db03, where I manually fixed inconsistencies
- 18:11 awight: Update beta ORES service to f109792
- 17:00 awight: Disable ORES UI for beta wikidatawiki, T183266
2017-12-13
- 17:24 awight: Install aspell-is for ORES
- 17:06 awight: Deploy ORES service b67bba7
2017-12-11
- 19:04 andrewbogott: upgraded deployment-puppetmaster02 to puppet v4
2017-12-06
- 21:43 awight: Update ORES to 42cf532
2017-12-04
2017-11-30
- 19:00 bd808: Testing stashbot fix for double phab logging (T181731)
- 17:49 anomie: Finished running cleanupUsersWithNoId.php on Beta Cluster for T181731
- 16:58 anomie: Running cleanupUsersWithNoId.php on Beta Cluster, see T181731
2017-11-29
- 21:27 awight: Update ores submodule, for RevIdScorer statistics
- 14:32 chasemp: git pull on /var/lib/git/labs/private and resolve one merge conflict. (the root key file is too old here)
2017-11-28
- 17:42 awight: Remove stale ORES customizations for the beta cluster.
2017-10-24
- 17:59 madhuvishy: Ran `sudo cumin -b 5 --backend openstack "project:deployment-prep" "apt-get install git --yes"`
2017-10-04
- 13:19 andrewbogott: migrating 'deployment-kafka-jumbo-1' to labvirt1017
2017-09-14
- 19:37 tgr: updated PrivateSettings.php for T175868
2017-09-05
- 19:34 gilles: deployed PrivateSettings.php change to add Thumbor username to Swift configuration
2017-06-13
- 18:47 andrewbogott: root@deployment-salt02:~# salt "*" cmd.run "apt-get -y install facter"
2017-05-19
- 19:05 mutante: fixing role class config on deployment-phab* (remove role::phabricator::main, add role::phabricator_server in context prefix "deployment-phab. remove again from instance level for phab-01
- 18:40 mutante: deployment-phab01 still has puppet error "Could not find class role::phabricator::main" and that should simply be removed from it, but i can NOT find it in Horizon, i checked instance config, project config, the "Other" section, the "All classes" tab. Because it's gone. But how do i fix the instance config then?
- 18:39 mutante: applying role::phabricator_server on instance deployment-phab01 (it had error, could not find role::phabricator::main and the name changed in role/profile conversion)
2017-03-29
- 18:41 ebernhardson: upgrading elasticsearch and kibana to 5.1.2 on deployment-logstash2 to test puppet+integration prior to prod deployment
2017-03-27
- 17:02 ebernhardson: cherry pick https://gerrit.wikimedia.org/r/344964 to puppetmaster to test upgrade to logstash 5.x
2017-03-20
- 20:51 andrewbogott: migrating deployment-urldownloader to labvirt1013
- 20:45 andrewbogott: migrating deployment-pdf01 to labvirt1011
- 20:14 andrewbogott: migrating deployment-puppetmaster02 to a different labvirt
2017-03-15
- 09:10 addshore: addshore@deployment-tin mwscript extensions/Cognate/maintenance/populateCognatePages.php --wiki=hewiktionary
- 09:10 addshore: addshore@deployment-tin mwscript extensions/Cognate/maintenance/populateCognatePages.php --wiki=dewiktionary
- 09:08 addshore: addshore@deployment-tin mwscript extensions/Cognate/maintenance/populateCognatePages.php --wiki=enwiktionary
- 08:56 addshore: addshore@deployment-tin mwscript extensions/Cognate/maintenance/populateCognatePages.php --wiki=enwiktionary // (ParameterTypeException, T160503)
- 08:50 addshore: addshore@deployment-tin mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=enwiktionary --site-group=wiktionary // (3 sites added)
- 08:49 addshore: addshore@deployment-tin mwscript extensions/Wikidata/extensions/Wikibase/lib/maintenance/populateSitesTable.php --wiki=enwiktionary --force-protocol=https --load-from=https://deployment.wikimedia.beta.wmflabs.org/w/api.php
- 08:49 addshore: addshore@deployment-tin mwscript sql.php --wiki=enwiktionary "TRUNCATE sites; TRUNCATE site_identifiers;"
- 08:44 addshore: addshore@deployment-tin mwscript extensions/Wikidata/extensions/Wikibase/lib/maintenance/populateSitesTable.php --wiki=enwiktionary --force-protocol=https
- 08:43 addshore: addshore@deployment-tin mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=dewiktionary --site-group=wiktionary // (0 sites added)
- 08:43 addshore: addshore@deployment-tin mwscript extensions/Cognate/maintenance/populateCognateSites.php --wiki=enwiktionary --site-group=wiktionary // (1 site added)
2017-03-06
- 19:04 addshore: mwscript sql.php --wiki=aawiki "CREATE DATABASE cognate_wiktionary"
2017-03-01
- 19:09 addshore: "mwscript extensions/WikimediaMaintenance/addWiki.php --wiki=aawiki he wiktionary hewiktionary he.wiktionary.beta.wmflabs.org" T158628
2017-02-02
- 00:52 tgr: added mhurd as member
2017-01-23
- 07:15 _joe_: cherry-picking the move of base to profile::base
2017-01-19
- 22:11 Krenair: added bunch of others to the same group per request. we should figure out how to make this process sane somehow
- 22:06 Krenair: added nuria to deploy-service group on deployment-tin
2017-01-17
- 17:51 urandom: re-enabling puppet on deployment-restbase02
- 17:47 urandom: re-enabling puppet on deployment-restbase01
2017-01-11
- 18:07 urandom: restarting restbase cassandra nodes
- 18:01 urandom: disabling puppet on restbase cassandra nodes to experiment with prometheus exporter
2017-01-08
- 05:20 Krenair: deployment-stream: live hacked /usr/lib/python2.7/dist-packages/socketio/handler.py a bit (added apostrophes) to try to make rcstream work
2017-01-04
- 21:30 mutante: deployment-cache-text-04 - running acme-setup command to debug .. Creating CSR /etc/acme/csr/beta_wmflabs_org.pem
- 21:26 Krenair: trying to troubleshoot puppet by stopping nginx then letting puppet start it
- 21:05 mutante: deployment-cache-text04 stopping nginx service, running puppet to debug dependency issue
2016-12-19
- 21:21 andrewbogott: and also python-functools32_3.2.3.2-3~bpo8+1_all.deb
- 21:20 andrewbogott: upgrading to python-jsonschema_2.5.1-5~bpo8+1_all.deb on deployment-eventlogging03
- 20:51 andrewbogott: upgrading to python-requests_2.12.3-1_all.deb ./python-urllib3_1.19.1-1_all.deb on deployment-mediawiki04 and deployment-tin
2016-12-04
- 15:26 Krenair: Found a git-sync-upstream cron on deployment-mx for some reason... commented for now, but wtf was this doing on a MX server?
2016-11-23
- 15:04 Krenair: fixed puppet on deployment-cache-text04 by manually enabling experimental apt repo, see T150660
2016-11-16
- 20:02 Krenair: mysql master back up, root identity is now unix socket based rather than password
- 19:57 Krenair: taking mysql master down to fix perms
- 07:52 Krenair: the new mysql root password for -db04 is at /tmp/newmysqlpass as well as in a new file in the puppetmaster's labs/private.git
2016-11-09
- 20:27 Krenair: removed default SSH access from production host 208.80.154.135, the old gallium IP
2016-11-03
- 05:04 Krenair: beginning to move the rest of beta to the new puppetmaster
2016-11-02
- 18:51 Krenair: armed keyholder on -tin and -mira
- 18:50 Krenair: started mysql on -db boxes to bring beta back online
2016-11-01
- 22:22 Krenair: started mysql on -db03 to hopefully pull us out of read-only mode
- 22:21 Krenair: started mysql on -db04
- 22:19 Krenair: stopped and started udp2log-mw on -fluorine02
- 22:00 Krenair: started moving nodes back to the new puppetmaster
- 02:55 Krenair: Managed to mess up the deployment-puppetmaster02 cert, had to move those nodes back
2016-10-31
- 20:57 Krenair: moving some nodes to deployment-puppetmaster02
- 16:57 bd808: Added Niharika29 as project member
2016-10-27
- 18:46 bd808: Testing dual page wiki logging by stashbot. (check #3)
- 18:36 bd808: Testing dual page wiki logging by stashbot. (second attempt)
- 18:14 bd808: Testing dual page wiki logging by stashbot.
2016-10-24
- 14:51 Krenair: T142288: Shut off -pdf02 and -conftool
2016-10-10
- 21:41 Krenair: restarted keyholder-proxy on -tin to make check_keyholder happy with the extra key that was active but unconfigured
- 21:11 Krenair: fixed puppet on -restbase01/-restbase02 by setting up deployment of cassandra/twcs on deployment-tin
- 20:56 Krenair: fixed puppet on -tin/-mira by restarting puppetmaster for base_path scap change
- 15:45 dcausse: deployment-elastic0[5-8]: reduce the number of replicas to 1 max for all indices
2016-10-03
- 15:40 Krenair: upgraded cache-upload04 to varnish4. hieradata is set on the prefix deployment-cache-upload
2016-09-28
- 22:33 Krenair: Rebooting deployment-ms-be01 - T146947, T141673
2016-09-26
- 23:13 Krenair: Rebooting deployment-aqs01 for T141673
2016-09-20
- 20:16 Krenair: enabled trusty-backports on deployment-puppetmaster
2016-09-14
- 15:21 godog: cherry-pick https://gerrit.wikimedia.org/r/#/c/310557/ on puppet master
2016-09-13
- 20:47 Krenair: Created SRV record _etcd._tcp.beta.wmflabs.org for etcd/confd
2016-09-11
- 20:35 Krenair: started cron service on deployment-salt02 again, seems it got killed Tue 2016-08-30 13:42:39 UTC - hopefully this will fix the puppet staleness alert
2016-09-08
- 02:25 Krenair: deployed the latest version of mediawiki/services/parsoid/deploy.git to get https://gerrit.wikimedia.org/r/#/c/309001/ see T144884
2016-08-30
- 23:20 Krenair: removed 'project_id' key from deployment-restbase02's metadata to fix compatibility with the new labsprojectfrommetadata code
- 18:09 yuvipanda: reboot deployment-kafka03 seems to be stuck
2016-08-19
- 00:39 Krenair: deployment-fluorine is now deployment-fluorine02 running jessie with the old precise packages shoehorned in
2016-08-12
- 19:20 Krenair: that fixed it, upload.beta is back up
- 19:14 Krenair: rebooting deployment-cache-upload04, it's stuck in https://phabricator.wikimedia.org/T141673 and varnish is no longer working there afaict, so trying to bring upload.beta.wmflabs.org back up
2016-08-02
- 14:02 gehel: rebooting deployment-elastic06 (unresponsive to SSH and Salt)
- 02:51 Krenair: https://deployment.wikimedia.beta.wmflabs, https://meta.wikimedia.beta.wmflabs, and their mobile variants now also have valid certs and TLS redirects.
- 01:12 Krenair: Proper SSL certificate up at https://upload.beta.wmflabs.org - HTTP has been changed to force TLS redirect.
2016-08-01
- 20:58 Krenair: deleted 2014/2015 files from deployment-stream:/var/log/diamond to get space on /var and stop it warning
2016-07-27
- 06:07 Tim: fixed broken puppet git checkout on deployment-puppetmaster, updated
2016-07-13
- 20:45 Krenair: RIP NFS
2016-07-11
- 23:24 Krenair: Unmounted /data/project (NFS) on all active hosts (mediawiki0[1-3], jobrunner01, tmh01), leaving just deployment-upload (shutoff, to schedule for deletion soon) - T64835
2016-07-09
- 00:46 Krenair: T64835: `mwscript extensions/WikimediaMaintenance/filebackend/setZoneAccess.php zerowiki --backend=local-multiwrite --private`
- 00:46 Krenair: T64835: `foreachwikiindblist "% all-labs.dblist - private.dblist" extensions/WikimediaMaintenance/filebackend/setZoneAccess.php --backend=local-multiwrite`
- 00:46 Krenair: T64835: Live-hacked some temporary swift config in
2016-06-27
- 22:32 eberhardson: deployed gerrit.wikimedia.org/r/296279 to puppetmaster to test kibana4 role
2016-06-25
- 03:24 Krenair: Changed eventbus key in secrets (from being a symlink to eventlogging to being a new random key) so check_keyholder works again
2016-06-22
- 22:23 Krenair: Installed netpbm on all deployment-mediawiki* hosts to fix ProofreadPage thumbnailing. I wonder if we should include the puppet mediawiki::packages::multimedia class on these hosts really
2016-06-13
- 16:06 Krenair: Rebooted deployment-ircd, it was stuck somehow
- 13:53 yuvipanda: kicked deployment-salt via nova for Krenair
- 13:35 Krenair: Fixed puppet on -tin by symlinking eventbus key to eventlogging in -puppetmaster:/var/lib/git/labs/private/modules/secret/secrets/keyholder
2016-06-01
- 02:14 Krenair: Started redis-server on deployment-rcstream to stop MW hhvm.log spam
2016-05-09
- 15:39 andrewbogott: migrating deployment-flourine to labvirt1009
2016-05-03
- 01:42 Krenair: ran package updates on deployment-parsoid06 so that exim4 would start so puppet will run
2016-05-02
- 09:54 gehel: restart elasticsearch cluster to ensure multicast configuration is disabled (T110236)
2016-04-13
- 20:37 Krenair: doing the same with -redis02
- 20:26 Krenair: corrected deployment-cxserver03:/etc/puppet/puppet.conf puppetmaster to use .deployment-prep as part of dns name
2016-04-10
- 06:04 Krenair: deleted some large files under deployment-mediawiki01:/var/log/nutcracker to free up space on /
2016-04-09
- 16:08 Krenair: (same for -conf03, -sentry01, -redis01, -upload - some of these are now fully fixed and some are better than they were before)
- 15:59 Krenair: mostly fixed puppet on deployment-sca02 by changing /etc/puppet/puppet.conf to use project name as part of puppetmaster's hostname
- 15:56 Krenair: fixed broken /etc/puppet/puppet.conf on deployment-cache-text04 (it started with a copy of the file for the labs central puppetmaster and then had the correct version pointing to the project's puppetmaster)
- 15:47 Krenair: reenabled puppet on eventlogging04 as no reason was provided for disabling, first run successful
2016-03-30
- 13:35 Reedy: upgrade hhvm on deployment-mediawiki03 and reboot
- 12:16 gehel: restarting varnish on deployment-cache-text04
2016-03-29
- 13:40 Amir1: Added ores-related classes and roles
2016-03-25
- 20:23 Krenair: started redis-server on deployment-redis01
- 20:23 Krenair: repaired centralauth.spoofuser table on deployment-db1
- 20:23 Krenair: fiddled around with puppet on deployment-cache-text04 earlier to fix certs etc.
- 07:38 tgr: restarting memcached
2016-03-18
- 18:13 gehel: activating automatic deployment of portals (https://gerrit.wikimedia.org/r/#/c/276397/)
2016-03-08
- 02:26 ori: Updating HHVM on deployment-mediawiki02
2016-03-01
- 16:54 gehel: fixed a stalled rebase on deployment-puppetmaster:/var/lib/operations/puppet
2016-02-18
- 13:24 gehel: upgrading elasticsearch to 1.7.5 on cirrus-browser-bot
2016-02-17
- 23:57 mobrovac: added Ppchelko to the list of members
2016-02-15
- 09:16 gehel: re-enabling puppet on elastic05 (https://phabricator.wikimedia.org/T126891)
2016-02-12
- 16:33 gehel: starting to ship logs from elasticsearch to logstash (https://gerrit.wikimedia.org/r/#/c/269100/)
2016-02-11
- 15:16 gehel: fixed deployment-puppetmaster rebase conflict by removing commit 814f12bc - author is informed
2016-02-08
- 06:10 tgr: set $wgAuthenticationTokenVersion on beta cluster (test run for T124440)
2016-02-05
- 15:57 gehel: restarting deployment-elastic07
- 01:42 Tim: cherry-picked https://gerrit.wikimedia.org/r/#/c/268022/ to local puppet master as suggested by hashar. Seems to work.
2016-01-30
- 02:57 Krenair: Restarted varnish on cache-text04 for T125282
2015-12-02
- 00:31 tgr: updated rsvg on appserver to 2.40.11 - https://phabricator.wikimedia.org/T112421
2015-11-04
- 00:06 Krenair: Synchronized portals: https://gerrit.wikimedia.org/r/#/c/250851/
2015-10-09
- 21:51 ori: Accidentally clobbered /etc/init.d/mysql on deployment-db1, causing deployment-prep failures. Restored now.
2015-09-16
- 20:39 cscott: updated OCG to version 4032a596ce6eb442b02cc6ee9b79263b1eb23275
2015-09-14
- 19:18 cscott: updated OCG to version 5811056e28f2bc6408b6da96095352ab381bb11f
- 12:04 dcausse: restarting elasticsearch (deployment-elastic0[5-8]) to deploy new plugins
2015-08-25
- 14:42 andrewbogott: moving deployment-cache-mobile04 to labvirt1004
2015-08-12
- 20:45 urandom: restarted restbase on deployment-restbase01 (dead)
2015-08-05
- 14:33 godog: update deployment-restbase02 to openjdk8 T104887
- 14:18 godog: update deployment-restbase01 to openjdk8 T104887
June 29
- 13:17 dcausse: restarting Elasticsearch to pick up new plugin versions
June 23
- 13:31 cscott: fixed salt on deployment-pdf02, restarted OCG there.
- 05:44 cscott: stopped OCG service on deployment-pdf02, see https://phabricator.wikimedia.org/T103473
- 05:20 cscott: updated OCG to version d7c698d5bf730d34057945e912ac75dc542dd788 ; restarted service.
- 03:58 cscott: stopped OCG on beta; redis 2.8.x is causing the service to crash on startup.
June 22
- 21:58 andrewbogott: re-enabling puppet on deployment-videoscaler01 because no reason was given for disabling
- 20:42 cscott: updated OCG to version b482144f5bd8b427bcc64a3dd287247195aa1951
June 4
- 20:29 ori: upgrading hhvm-fss from 1.1.4 to 1.1.5, has fix for T101395
May 29
- 14:07 moritzm: upgrade java on deployment-restbase0[12] to the 7u79 security update
May 28
- 08:46 godog: test es-tool restart-fast on deployment-elastic05
May 27
- 21:15 AaronSchulz: populated jobqueue:aggregator:s-wikis:v2 with 1000 fake wiki keys for load testing
- 21:07 AaronSchulz: Deployed https://gerrit.wikimedia.org/r/#/c/208852/
- 21:07 AaronSchulz: Deleted 4G of logs on jobrunner01
May 24
- 18:39 YuviKTM: purged old logs kept on NFS
May 20
- 20:58 cscott: updated OCG to version ca4f64852de5b1de782b292b50038fbd2dd84266
May 18
- 15:17 andrewbogott: rebooting deployment-logstash1
May 15
- 20:50 andrewbogott: rebooted deployment-bastion due to inconsistent run state after suspend/resume
May 13
- 21:08 cscott: updated OCG to version c7c75e5b03ad9096571dc6dbfcb7022c924ccb4f
May 2
- 00:51 yuvipanda: created deployment-boomboom to test
April 29
- 21:03 andrewbogott: suspending and shrinking disks of many instances
April 28
- 20:57 YuviPanda: KILL KILL KILL DEPLOYMENT-LUCID-SALT WITH FIRE AND BRIMSTONE AND BAD THINGS
April 27
- 08:01 _joe_: installed hhvm 3.6 on deployment-mediawiki02
April 24
- 14:25 _joe_: installing hhvm 3.6.1 on mediawiki-deployment01
April 23
- 17:19 andrewbogott: rebooting deployment-parsoidcache02 because it seems troubled
April 22
- 12:48 andrewbogott: migrating to new labvirt nodes
April 21
- 08:33 _joe_: rollback installation of hhvm 3.6
- 08:09 _joe_: installing HHVM 3.6 and the corresponding extensions on deployment-mediawiki01
April 9
- 20:11 mutante: fixed apt sources lists on deployment-bastion (T95541)
March 30
- 22:33 Josve05a: manually start mysql on db1 and db2
- 21:57 YuviPanda: reboot all instances from virt1000
March 23
- 20:41 cscott: updated OCG to version 11f096b6e45ef183826721f5c6b0f933a387b1bb
March 18
- 13:45 mobrovac: added restbase security group
- 13:35 YuviPanda: made mobrovac projectadmin
- 13:34 YuviPanda: added mobrovac to project
March 16
- 18:46 manybubbles: upgraded Elasticsearch on deployment-logstash1
March 11
- 18:47 YuviPanda: created deployment-mediawiki03
February 27
- 11:12 YuviPanda: start mysql on deployment-db1
February 26
- 11:53 YuviPanda: created deployment-parsoid01-test to test patch to use role::parsoid on labs
February 18
- 13:04 _joe_: installed new version of the hhvm extensions packages
February 17
- 23:18 Krenair: Started mysql on deployment-db1; beta now appears much less broken than before
February 6
- 20:07 ^d: scratch that, I rebuilt it as precise. why did I do that?
- 20:03 ^d: rebuilt deployment-elastic05 with new partition scheme
February 5
- 12:48 YuviPanda: cherry-picking https://gerrit.wikimedia.org/r/188798 on scap on deployment-prep
- 12:28 YuviPanda: killed chown on deployment-bastion, running direclty on NFS server
- 12:13 YuviPanda: running time sudo chown -R www-data:www-data upload7/ on /data/project
- 12:10 YuviPanda: stopped jobrunner on jobrunner01
- 11:53 YuviPanda: running git-sync-upstream on deployment-salt to pick up latest ops/puppet changes
- 11:52 _joe_: converting the web user to www-data
- 11:44 YuviPanda: deleted mediawiki03 instance, holdover from security testing from long, long ago
- 11:41 YuviPanda: disabled puppet on mediawiki01, 02, jobrunner01, bastion and salt
February 4
- 13:56 YuviPanda: created deployment-jobrunner01, trusty instance
- 13:51 YuviPanda: deleted deployment-jobrunner01, trusty version coming up
- 11:35 YuviPanda: created instance deployment-mediawiki02
- 11:26 YuviPanda: deleted instance deployment-mediawiki02
- 06:37 YuviPanda: created deployment-mediawiki01 host
- 06:34 YuviPanda: killed deployment-mediawiki01 host. FOREEVERRR
February 2
- 13:37 yuvipanda: added mx record to beta.wmflabs.org, for https://phabricator.wikimedia.org/T88215 via LDAP
January 27
- 18:15 andrewbogott: upgrading libc6 on all instances from deployment-salt
January 20
- 02:30 YuviPanda: created deployment-mediawiki04 to test roles
January 7
- 16:25 YuviPanda: added milimetric to NDA sudo’ers groups
December 29
- 22:24 MaxSem: Created a DNS entry for m.wikidata.beta.wmflabs.org
December 22
- 12:40 _joe_: upgrading HHVM to the latest version
December 16
- 16:52 manybubbles: elasticsearch restart finished
- 16:48 mutante: deployment-db2 is down
- 16:48 manybubbles: restarting beta's elasticsearch servers to pick up a new version of a plugin. won't interfere with current downtime.
December 13
- 17:10 bd808: Many strange puppet and scap failures in beta that look to be related to DNS failures
- 16:03 bd808: Starting work on phab:T78076 to renumber apache users in beta
December 11
- 22:47 cscott: updated OCG to version bfc3812ef346c9f767135b339cedd123a1bcac98
December 6
- 05:05 ori: upgrade hhvm-tidy to 0.1-2
December 3
- 21:33 cscott: updated OCG to version 08e94b19c3f17e699d7e53d9605f65c58e17ea0e
December 2
- 17:09 _joe_: upgrading HHVM to its latest version
- 17:08 andrewbogott: this is a test message
December 1
- 21:50 cscott-split: updated OCG to version a06e7c186796a6ee5d5af81e93688520abdf2596
November 26
- 20:47 cscott: updated OCG to version 7d8f2b8bd496464041e3ef9c092732457cc8f7ef
November 24
- 15:16 YuviPanda: modified local hack to account for 47dcefb74dd4faf8afb6880ec554c7e087aa947b
- 14:58 YuviPanda: cherry-picked 3e45c538978710113e6e28e9d533bf8d18c159a6 and 9d4614a8a352c78505212fd6e9d2a7be6d2e4927 to deployment-salt puppetmaster, restoring local hacks
November 19
- 21:19 anomie: Cherry-picked https://gerrit.wikimedia.org/r/#/c/173336/3 to Beta
November 17
- 20:37 YuviPanda: cleaned out logs on deployment-bastion
- 16:48 YuviPanda: delete deployment-analytics01, a tortoise from an ancient time.
- 05:17 YuviPanda: force apt-get install -f to unstuck puppet
- 04:49 YuviPanda: clean up coredump on deployment-prep
November 16
- 00:38 YuviPanda: uncherrypick https://gerrit.wikimedia.org/r/#/c/173634/ because OMG CODE
- 00:14 YuviPanda: cherry-pick https://gerrit.wikimedia.org/r/#/c/173634/ on deployment-salt
- 00:01 YuviPanda: cherry-pick https://gerrit.wikimedia.org/r/#/c/173510/ on deployment-prep to make memc03 run puppet
November 14
- 20:02 anomie: Cherry-picking https://gerrit.wikimedia.org/r/#/c/173336/ for testing in logstash
November 13
- 10:11 YuviPanda: cherry pick https://gerrit.wikimedia.org/r/#/c/172967/1 to test https://bugzilla.wikimedia.org/show_bug.cgi?id=73263
November 12
- 18:16 YuviPanda: cherry picking https://gerrit.wikimedia.org/r/#/c/172776/ on labs puppetmaster to see if it fixes issues in the cache machines
November 11
- 17:13 cscott: removed old ocg cronjobs on deployment-pdf0x; see https://bugzilla.wikimedia.org/show_bug.cgi?id=73166
November 10
- 22:37 cscott: rsync'ed .git from pdf01 to pdf02 to resolve git-deploy issues on pdf02 (git fsck on pdf02 reported lots of errors)
- 21:41 cscott: updated OCG to version d9855961b18f550f62c0b20da70f95847a215805 (skipping deployment-pdf02)
- 21:39 cscott: deployment-pdf02 is not responding to git-deploy for OCG
November 5
- 06:14 ori: restarted hhvm on beta app servers
November 3
- 22:07 cscott: updated OCG to version 5834af97ae80382f3368dc61b9d119cef0fe129b
October 29
- 18:55 ori: upgraded hhvm on beta labs to 3.3.0+dfsg1-1+wm1
October 28
- 23:47 RoanKattouw: ...which was a no-op
- 23:46 RoanKattouw: Updating puppet repo on deployment-salt puppet master
- 21:36 RoanKattouw: Creating deployment-parsoid05 as a replacement for the totally broken deployment-parsoid04 (also as a trusty instance rather than precise)
- 21:06 RoanKattouw: Rebooting deployment-parsoid04, wasn't responding to ssh
October 27
- 20:23 cscott: updated OCG to version 60b15d9985f881aadaa5fdf7c945298c3d7ebeac
October 22
- 21:10 arlolra: updated OCG to version e977e2c8ecacea2b4dee837933cc2ffdc6b214cb
October 8
- 22:04 subbu: updated OCG to version def24eca
October 7
- 22:50 cscott: updated OCG to version c778ea8b898f8ad8c2b7ad9de78a75469e7ed061
October 6
- 23:13 YuviPanda: killed extra log files in deployment-bastion
- 21:44 cscott: updated OCG to version bbdf4c6400cfbbc6030114ad16e1a6f7025eab2c
- 15:36 cscott: updated OCG to version aee3712b352f51f96569de0bcccf3facf654e688
October 3
- 19:51 manybubbles: performing rolling restart of elasticsearch nodes to pick up preview of accelerated regex plugin for testing at larger-than-mylaptop-scale
- 14:02 manybubbles: rebuilding beta's simplewiki cirrus *index*
- 14:02 manybubbles: rebuilding beta's simplewiki cirrus inde
October 1
- 20:13 cscott: updated OCG to version 48c495e3656f528abe636ce0cd7562270505534f
- 16:40 bd808: Added Gilles to under_NDA sudoers group
September 30
- 22:00 bd808: Cleaned deleted instances out of salt and trebuchet redis
- 20:26 bd808: Converted deployment-rsync02 to use local puppet & salt masters
- 15:36 bd808: enabling puppet and forcing run on deployment-mediawiki03
- 15:34 bd808: enabling puppet and forcing run on deployment-mediawiki02
- 15:28 bd808: enabling puppet and forcing run on deployment-mediawiki01
September 29
- 22:45 Reedy: re-enabled beta-scap-eqiad
- 21:34 Reedy: disabled "beta-scap-eqiad" until things are fixed
- 21:24 Reedy: deleted l10n cache on deployment-rsync01 to attempt to run sync-common manually
- 21:22 Reedy: deployment-rsync01 hard drive is far too small
- 17:57 cscott: updated OCG to version 89d8f29a24295b05d0643abe976fea83b56575c9
- 06:58 ori: Configured Beta cluster to use redis for session storage
- 06:57 ori: Created deployment-redis02 and converted it to use local puppet & salt masters
- 05:23 ori: Created deployment-redis01 and converted it to use local puppet & salt masters
September 28
- 14:38 andrewbogott: cherry-picked https://gerrit.wikimedia.org/r/#/c/163464/ onto deployment-salt to fix a puppet compile failure.
- 14:38 andrewbogott: edited and re-cherry-picked roan's citoid patch into beta because the previous version was breaking puppet
September 26
- 06:34 cscott: updated OCG to version f3a6c1cbba118d4a5e1aa019937dc50159fc823d
September 25
- 22:48 RoanKattouw: Fixed permissions of deployment-bastion:/srv/deployment/mathoid/mathoid/.git/deploy (needed g+w)
- 11:36 _joe_: updated hhvm to fix most bugs, also cherry-picked https://gerrit.wikimedia.org/r/#/c/162839/
September 24
- 23:00 bd808: Updated bash with salt
- 20:52 cscott: updated OCG to version 48acb8a2031863e35fad9960e48af60a3618def9
September 23
- 20:14 cscott: updated OCG to version 1cf9281ec3e01d6cbb27053de9f2423582fcc156
- 17:37 AaronSchulz: Initialized bloom cache on betalabs, enabled it, and populated it for enwiki
September 22
- 16:08 ori: updating HHVM to 3.3.0-20140918+wmf1
September 20
- 14:43 andrewbogott: movingdeployment-pdf02 to virt1009
- 00:36 mutante: raised instance quota to 43
September 19
- 00:26 cscott: updated OCG to version ce16f7adb60d7c77409e2e11ba0e5d6cce6955d5
September 16
- 15:44 godog: testing scap change from https://gerrit.wikimedia.org/r/#/c/160668/
- 02:46 cscott: updated OCG to version 188a3c221d927bd0601ef5e1b0c0f4a9d1cdbd31
September 15
- 21:44 andrewbogott: migrating deployment-videoscaler01 to virt1002
- 21:41 andrewbogott: migrating deployment-sentry2 to virt1002
- 21:40 cscott: *skipped* deploy of OCG, due to deployment-salt issues
- 21:19 bd808: Added Matanya to under_NDA sudoers group (bug 70864)
September 12
- 12:24 _joe_: set up hiera, noop as expected
September 11
- 16:31 YuviPanda: Delete deployment-graphite instance
- 02:29 mutante: raised instance quota by 1 to 42
September 10
- 08:14 Krinkle: bits.beta.wmflabs.org is down with 503 Service Unavailable (http://bits.beta.wmflabs.org/en.wikipedia.beta.wmflabs.org/load.php)
September 9
- 20:08 cscott: updated OCG to version c9a2b4cf2502479eeabed07ab2de728695d96e46
September 7
- 23:48 bd808: Added John F. Lewis to under_NDA sudo policy (bug 70539)
- 23:29 bd808: Promoted John F. Lewis to project admin (bug 70539)
- 23:26 bd808: Added Jalexander as project member (bug 70539)
September 5
- 17:54 bd808: Purged varnish cache on deployment-cache-bits01 -- sudo varnishadm ban req.url '~' /
- 16:00 YuviPanda: unfuck puppet on deployment-salt, puppet is stupid and does not properly report failed events on last_run_summary.yaml if there's a syntax error or a resource conflict. So I've to read last_run_report and do things with *that* instead now
- 15:49 YuviPanda: deliberately fucking up puppet to see if icinga complains
- 09:52 _joe_: cherry-picked I6ec53da483bebfa375eba2383cbf60123ff1ce26, it work
September 4
- 16:06 bd808: Manually cleaned bogus LocalRenameUserJob jobs from redis
- 13:54 _joe_: stopped puppet on the appservers but mw03, testing an apache change
- 05:28 legoktm: stopping jobrunner on deployment-jobrunner01
- 05:22 legoktm: restarted jobrunner on deployment-jobrunner01
- 05:14 bd808: Bad jobs in job queue filled up /var on jobrunner01 and killed jobrunner script. Leaving down for now until I find out how to delete the bad jobs.
- 01:41 bd808: Killed old jobs-loop.sh processes on deployment-jobrunner01
- 01:24 bd808: Many jobrunner errors like "wikiversions-labs.cdb has no version entry for `amwiki`" with various wiki names
- 01:23 bd808|AWAY: Started jobrunner service manually on jobrunner01.
- 00:44 bd808: Puppet run on deployment-jobrunner01 failing with what seem to be dns issues (getaddrinfo: Name or service not known when Trebuchet is running)
- 00:35 bd808: Puppet run on deployment-jobrunner01 failing with what seem to be dns issues (getaddrinfo: Name or service not known)
September 3
- 15:02 bd808: _joe_ rolled out a new hhvm package ~5 hours ago
- 15:01 bd808: morebots is back thanks to petan
- 14:50 bd808: logmsgbot down apparently
September 2
- 15:34 bd808: False alarm. SSL is borked in beta and we know that
- 15:29 bd808: `curl -vL -H 'Host: en.wikipedia.beta.wmflabs.org' localhost` works from deployment-cache-text02
- 15:27 bd808: https://en.wikipedia.beta.wmflabs.org/ returning ERR_CONNECTION_REFUSED (is varnish down?)
August 29
- 22:56 bd808: Got puppet to run cleanly on deployment-mediawiki03. Should be ready for serving traffic.
- 22:39 bd808: Fixed a merge conflict in operations/puppet on deployment-salt
- 21:46 bd808: Forced install of "right version of libvips-tools on mediawiki03 `sudo apt-get install libvips-tools=7.38.5-2`
- 08:40 hashar: rebooting deployment-cache-mobile03 (kernel up)
August 28
- 21:32 bd808: Added "Greg Grossmeier" to UnderNDA sudoers group
- 17:12 bd808: Changed centralauth db to rename labswiki -> deploymentwiki
- 16:49 bd808: CentralAuth looks broken on http://deployment.wikimedia.beta.wmflabs.org/
- 16:49 bd808: Apache vhosts look good again
- 16:34 bd808: Restarted varnishes on deployment-cache-text02
- 16:13 andrewbogott: merging a patch that renames 'labswiki' to 'deploymentwiki'
- 09:21 hashar: resetting git repository in /data/project/apache/conf to point to the betaclusterbranch of operations/mediawiki-config.git discarded all local hacks in the process
August 27
- 23:03 hashar: Blacklisting the security audit IP again on deployment-cache bits01 mobile03 and text02
- 22:53 hashar: removed the blackhole ip route from deployment-cache-text02 and deployment-cache-mobile03
- 22:48 hashar: the IP is a known security audit. See Chris Steipp.
- 22:46 hashar: blackholed an IP address on deployment-cache-text02 and deployment-cache-mobile03 , it was causing hundred of requests per seconds and overloaded the beta cluster. Use route -n to find the IP
- 22:37 hashar: restarting udp2log-mw on deployment-bastion. It keeps crashing since fiarly recently
- 22:26 bd808: when restarting varnish on deployment-cache-text02, don't forget that there are 2 varnish services (varnish and varnish-frontend)
- 22:19 bd808: restarted varnish (again) on deployment-cache-text02
- 22:10 bd808: restarted varnish on deployment-cache-text02
- 16:22 bd808: killing `apt-get update` process running on deployment-bastion since Jun13
- 14:59 bd808: Resolved puppet git merge conflict on deployment-salt
- 14:49 bd808: Moved hhvm core dumps to /data/project/hhvm-cores
- 14:42 bd808: Root dirve full on deployment-mediawiki02; hhvm core files are the culprit
August 25
- 23:47 ori: stopping hhvm/apache on deployment-mediawiki02 to replace debug build of hhvm with release build
- 21:44 bd808: Deployed scap 116027f (Make sync-common update l10n cdb files by default)
- 18:30 ori: deployment-mediawiki02: cleared /tmp; running puppet
- 15:05 hashar: mediawiki02 rm /tmp/hhvm*.core . Filled as bug 69979
- 15:01 hashar: mediawiki02 rm /tmp/mw-cache-master/conf*
- 15:01 hashar: mediawiki02 has mw conf caches under /tmp/mw-cache-master/ and since that partition is filled up, that ends up with conf caches being null file
- 15:00 hashar: mediawiki02 rm /var/log/upstart/hhvm*
- 14:53 hashar: mediawiki02 : removed /var/lib/puppet/state/agent_catalog_run.lock
- 14:46 hashar: restarting udp2log-mw service on -bastion. It is stalled for some reason
- 14:42 hashar: on mediawiki02 , clearing out some /var/log/upstart/hhvm.* log files see bug 69976
- 14:34 hashar: mediawiki02 / partition is 100% full
August 22
- 20:21 hashar: udp2log are back in /data/project/logs . The udp2log-mw service went stall for some reason.
- 20:08 ori: ran 'git pull' on deployment-salt:/srv/var-lib/git/operations/puppet
- 19:59 hashar: restarting udp2log-mw service on deployment-bastion
- 19:59 hashar: bits yielding 503
- 00:41 bd808: cherry-picked scap change https://gerrit.wikimedia.org/r/#/c/155677/ for testing
August 21
- 21:49 bd808: Trebuchet happier after all the salt-minion restarts; still have deleted hosts showing in the expected minion list for scap deploys
- 21:01 twentyafterfour: Started salt-minion on deployment-redis01
- 21:01 bd808: Started salt-minon on deployment-upload
- 21:00 bd808: Started salt-minon on deployment-fluoride
- 21:00 bd808: Started salt-minon on deployment-db1
- 20:59 bd808: Started salt-minon on deployment-elastic01
- 20:59 twentyafterfour: Started salt-minion on deployment-eventlogging02
- 20:58 bd808: Started salt-minon on deployment-elastic02
- 20:58 bd808: Started salt-minon on deployment-elastic03
- 20:57 bd808: Started salt-minon on deployment-elastic04
- 20:57 bd808: Started salt-minon on deployment-analytics01
- 20:55 bd808: Started salt-minon on deployment-cache-upload02
- 20:54 bd808: Started salt-minon on deployment-memc04
- 20:54 bd808: Started salt-minon on deployment-parsoid04
- 20:49 bd808: Started salt-minon on deployment-memc05
- 20:48 bd808: Started salt-minon on deployment-db2
- 20:48 twentyafterfour: Started salt-minion on deployment-cache-text02
- 20:47 twentyafterfour: Started salt-minion on deployment-memc03
- 20:46 bd808: Started salt-minon on deployment-cxserver01
- 20:12 bd808: List of broken salt minions can be obtained with `sudo salt-run manage.down` on deployment-salt
- 19:55 bd808: Fixed salt on deployment-memc02
- 19:52 bd808: Salt minions are broken all over beta. Hung grain-ensure calls, hung test.ping calls, downed minions
- 19:50 bd808: Killed dozens of grain-ensure calls and started salt-minion on deployment-cache-mobile03
- 19:47 bd808: Killed hung salt-call and started salt-minion on deployment-cache-bits01
- 19:28 bd808: Deployed cherry-pick of Iea7217a for scap
- 19:27 bd808: Restarted salt-minion on deployment-jobrunner01 & deployment-videoscaler01
- 19:27 bd808: Killed rogue salt-master process on deployment-bastion
- 19:26 bd808: Deleted salt keys for retired apache0[12] minions
- 00:13 bd808: Upgraded elasticsearch to 1.3.2 on deployment-logstash1
August 19
- 16:11 hashar: deleted /usr/local/apache/common-local symlink, made it a directory and retriggered https://integration.wikimedia.org/ci/job/beta-scap-eqiad/17887/console
- 16:03 bd808: Removed local changes to /usr/local/apache/conf/wmflabs-logging.conf on deployment-mediawiki02; logs back to nfs share
- 15:52 bd808: Changed apache logging level from debug to notice on deployment-mediawiki02 in /usr/local/apache/conf/wmflabs-logging.conf
- 15:47 bd808: Changed apache logging level from debug to warn on deployment-mediawiki02
- 15:44 bd808: /var full on deployment-mediawiki02; deleting 572M /var/log/apache2/debug.log.1
- 15:03 hashar: Killed some stalled scap / rsync process on deployment-bastion that were preventing https://integration.wikimedia.org/ci/job/beta-scap-eqiad/ from acquiring the lock.
- 14:17 hashar: huge rsync in progress on bastion
- 14:00 hashar: On bastion reverted the symlink on bastion and manually created directory /usr/local/apache/common-local
- 13:55 hashar_: On bastion, deleting /usr/local/apache/common-local and symlink it to /srv/common-local
August 18
- 22:22 ^d: dropped apache01/02 instances, unused and need the resources
- 18:23 manybubbles: finished upgrading elasticsearch in beta - everything seems ok so far
- 18:15 bd808: Restarted salt-minion on deployment-mediawiki01 & deployment-rsync01
- 18:15 bd808: Ran `sudo pkill python` on deployment-rsync01 to kill hundreds of grain-ensure processes
- 18:12 bd808: Ran `sudo pkill python` on deployment-mediawiki01 to kill hundreds of grain-ensure processes
- 18:10 manybubbles: finally restarting beta's elasticsearch servers now that they have new jars
- 17:56 bd808: Manually ran trebuchet fetches on deployment-elastic0*
- 17:49 bd808: Forcing puppet run on deployment-elastic01
- 17:47 godog: upgraded hhvm on mediawiki02 to 3.3-dev+20140728+wmf5
- 17:44 bd808: Trying to restart minions again with `salt '*' -b 1 service.restart salt-minion`
- 17:39 bd808: Restarting minions via `salt '*' service.restart salt-minion`
- 17:38 bd808: Restarted salt-master service on deployment-salt
- 17:19 bd808: 16:37 Restarted Apache and HHVM on deployment-mediawiki02 to pick up removal of /etc/php5/conf.d/mail.ini (logged in prod SAL by mistake)
- 16:59 manybubbles|lunc: upgrading Elasticsearch in beta to 1.3.2
- 16:11 bd808: Manually applied https://gerrit.wikimedia.org/r/#/c/141287/12/templates/mail/exim4.minimal.erb on deployment-mediawiki02 and restarted exim4 service
- 15:28 bd808: Puppet failing for deployment-mathoid due to duplicate definition error in trebuchet config
- 15:15 bd808: Reinstated puppet patch to depool deployment-mediawiki01 and forced puppet run on all deployment-cache-* hosts
- 15:04 bd808: Puppet run failing on deployment-mediawiki01 (apache won't start); Puppet disabled on deployment-mediawiki02 ('reason not specified') Probably needs to wait until Giuseppe is back from vacation for fixing.
- 15:00 bd808: Rebooting deployment-eventlogging02 via wikitech; console filling with OOM killer messages and puppet runs failing with "Cannot allocate memory - fork(2)"
- 14:29 bd808: Forced puppet run on deployment-cache-upload02
- 14:27 bd808: Forced puppet run on deployment-cache-text02
- 14:24 bd808: Forced puppet run on deployment-cache-mobile03
- 14:20 bd808: Forced puppet run on deployment-cache-bits01
August 17
- 22:58 bd808: Attempting to reboot deployment-cache-bits01.eqiad.wmflabs via wikitech
- 22:56 bd808: deployment-cache-bits01.eqiad.wmflabs not allowing ssh access and wikitech console full of OOM killer messages
August 15
- 21:57 legoktm: set $wgVERPsecret in PrivateSettings.php
- 21:42 hashSpeleology: Beta cluster database updates are broken due to CentralNotice. Fix up is 154231
- 20:57 hashSpeleology: deployment-rsync01 : deleting /usr/local/apache/common-local content. Then ln -s /srv/common-local /usr/local/apache/common-local as set by beta::common which is not applied on that host for some reason. bug 69590
- 20:55 hashSpeleology: puppet administratively disabled on mediawiki02 . Assuming some work in progress on that host. Leaving it untouched
- 20:54 hashSpeleology: puppet is proceeding on mediawiki01
- 20:52 hashSpeleology: attempting to unbreak mediawiki code update bug 69590 by cherry picking 154329
- 20:39 hashSpeleology: in case it is not in SAL. MediaWiki is no more synced to app server bug 69590
- 20:20 hashSpeleology: rebooting mediawiki01 , /var refuses to clear out and stick at 100% usage
- 20:16 hashSpeleology: cleaning up /var/log on deployment-mediawiki02
- 20:14 hashSpeleology: on deployment-mediawiki01 deleting /var/log/apache2/access.log.1
- 20:13 hashSpeleology: on deployment-mediawiki01 deleting /var/log/apache2/debug.log.1
- 20:13 hashSpeleology: bunch of instances have a full /var/log :-/
- 11:37 ori: deployment-cache-bits01 unresponsive; console shows OOMs: https://dpaste.de/LDRi/raw . rebooting
- 03:20 jeremyb: 02:46:37 UTC <ebernhardson> !log beta /dev/vda1 full. moved /srv-old to /mnt/srv-old and freed up 2.1G
August 14
- 12:23 hashar: manually rebased operations/puppet.git on puppetmaster
August 13
- 08:02 hashar: beta-code-update-eqiad is running again
- 07:57 hashar: fixing ownerships under /srv/scap-stage-dir/php-master/skins some files belong to root
- 07:55 hashar: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ is broken :-/
August 8
- 16:05 bd808: Fixed merge conflict that was preventing updates on puppet master
August 6
- 13:13 hashar: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ is running again
- 13:13 hashar: removed a bunch of local hack on deployment-bastion:/srv/scap-stage-dir/php-master . That causes the git repo to be dirty and prevents scap from achieving git pull there
- 12:08 hashar: Manually pruning whole text cache on deployment-cache-text02
- 12:07 hashar: Apache virtual hosts were not properly loaded on mediawiki02. I have hacked /etc/apache2/apache2.conf to make it Include Include /usr/local/apache/conf/all.conf (instead of main.conf which does not include everything)
- 08:43 hashar: prunning cache on deployment-cache-text02 / restarting varnish
August 2
- 08:53 swtaarrs: rebuilt and restarted hhvm on deployment-mediawiki02 with potential fix
- 05:17 swtaarrs: restarted hhvm on deployment-mediawiki0{1,2} to unwedge them
August 1
- 15:03 bd808: Updated cherry-pick of Iceb8f43
- 15:02 bd808: Cleaned up puppet repo on deployment-salt; merge conflicts with local Ia463120 hack; reapplied depool of deployment-mediawiki01
- 14:50 bd808: Restarted stuck hhvm on deployment-mediawiki02; apache had 89 children waiting for a response
- 13:27 godog: changed inplace bt-hhvm on deployment-mediawiki01/02 to also copy the binary
- 05:32 ori: depooled deployment-mediawiki02 to investigate HHVM lock-up by cherry-picking I7df8c5310 on beta.
- 00:40 ori: disabled puppet on deployment-mediawiki{01,02} and enabled verbose apache logging
July 31
- 22:41 bd808: Restarted hhvm on -mediawiki{01,02}. Brett looked at 01 before I did and said "it's the same as before"
- 20:09 cscott: updated OCG to version d2919c59eb09e09fc87777696411a070620aef45
- 19:59 hashar: Granted sudo right to cscott (under NDA). Will let him reboot OCG service
- 18:58 ori: re-enabled puppet on deployment-mediawiki{01,02}
- 10:41 hashar: Taking gdb traces of hhvm on mediawiki01 and mediawiki02. Restarting hhvm
- 05:08 bd808: HHVM hung on both boxes. Grabbed core and backtrace before restarting
July 30
- 19:59 bd808: Created local commit 7d56b79 in puppet to work around bugs in Ia463120718dceab087ad3f8e3f35917fa879f387
- 19:46 bd808: Restored prior /etc/hhvm/php.ini from puppet filebucket archive on deployment-mediawiki0[12]
- 19:32 bd808: Disabled puppet on deployment-mediawiki02 for the same reason
- 19:31 bd808: Disabled puppet on deployment-mediawiki01; Ori will look into hhvm config changes that were being applied
- 16:52 bd808: Fixed beta-scap-eqiad Jenkins job by correcting ssh problems in beta project
- 16:43 bd808: Fixed ssh to jobrunner01 and videoscaler01 by correcting unrelated puppet manifest problem and forcing run via salt.
- 16:00 bd808: Puppet runs on videoscaler01 and jobrunner01 failing for "Could not find dependency Ferm::Rule[bastion-ssh] for Ferm::Rule[deployment-bastion-scap-ssh]"
- 16:00 bd808: Puppet seems manually disabled on apache0[12].
- 15:59 bd808: Can't ssh to apache0[12], videoscaler01 and jobrunner01. Puppet not running on any of them. libnss-ldapd unattended update has broken /etc/nslcd.conf
- 15:23 bd808: Removed cherry-pick for Iac547efa83cf059a1276b6e279c3ebd4c7224b2c and updated cherry-pick for I5afba2c6b0fbf90ff8495cc4a82f5c7851893b52 to latest patch set.
- 15:05 bd808: Two cherry-picks in puppet conflicting with merged production changes: I5afba2c6b0fbf90ff8495cc4a82f5c7851893b52 and Iac547efa83cf059a1276b6e279c3ebd4c7224b2c (ori, twentyafterfour)
- 14:49 bd808: Started apache2 service on deployment-mediawiki01
- 14:16 hashar: rebooting hhvm
- 09:42 hashar: bastion had broken puppet because deployment_server and zuul both declare the same python packages 150501
- 09:40 hashar: restoring on puppetmaster modules/mediawiki/templates/apache/apache2.conf.erb which got deleted somehow
- 09:29 hashar: Rebooting apache01/02 to see whether it fix the ssh connection issue
- 09:27 hashar: manually started hhvm on mediawiki01
- 09:25 hashar: rebooting deployment-mediawiki01 hhvm process went zombie
- 09:23 hashar: restarting hhvm on mediawiki 01/02
- 09:05 hashar_: Beta scap script broken since 6:30am UTC https://integration.wikimedia.org/ci/job/beta-scap-eqiad/
July 29
- 22:56 cscott: updated OCG to version aeb8623d6ebe41ae7c7e36c57844bd9ea8e6d595
- 21:02 bd808: Converted deployment-sentry2.eqiad.wmflabs to use beta salt/puppet master
- 19:14 hashar: Removed all jobs from queue, restarted slave agent. Update Jobs coming back
- 19:09 hashar: deployment-bastion jenkins slave is stuck. Beta cluster is no more updating code :-//
- 15:58 godog: restarted hhvm on deploymnet-mediawiki01
- 15:52 godog: restarted hhvm on deployment-mediawiki02
- 15:50 godog: installed libevent-dbg on deployment-mediawiki02 to capture an hhvm backtrace
- 15:17 bd808: _joe_ restarting hhvm on deployment-mediawiki01
- 15:00 bd808: Apache stuck with 65 children on both deployment-mediawiki servers
- 10:37 hashar: Restarted hhvm on mediawiki{01,02}
July 28
- 17:41 bd808: Updated hhvm to latest 3.3-dev+20140728 build on deployment-mediawiki0[12]
- 15:37 manybubbles: rebuilding elasticsearch indexes to build a weighted all field we'll try to use to improve performance
- 15:32 bd808: Restarted hhvm on deployment-mediawiki0[12]. All apache children were stuck waiting for hhvm to respond.
- 15:20 bd808: Restarted apache on deployment-mediawiki02. 65 children and non-responsive to requests. (same as mediawiki01)
- 15:18 bd808: Restarted apache on deployment-mediawiki01. 65 children and non-responsive to requests.
- 14:23 manybubbles: or not - looks like I can't!
- 14:22 manybubbles: reubilding cirrus search indexes to pick up a speed up all field
- 08:30 hashar: restarted varnish on deployment-cache-bits01 . Hoping to clear bits cache
July 25
- 18:29 bd808: Added twentyafterfour and several other WMF staff to under_NDA sudo group
- 17:15 bd808: Morebots is back!
- 16:38 bd808: pstree showed "hhvm─┬─271*[sh]" on deployment-mediawiki02
- 16:38 bd808: Killed apache2+hhvm and restarted on deployment-mediawiki0[12]
- 16:06 bd808: `tcpdump -n udp dst port 8324` shows packets leaving deployment-bastion for deployment-logstash1
- 16:00 bd808: Stopped udp2log and started udp2log-wm with no apparent effect
- 16:00 bd808: udp2log events not being sent from deployment-bastion to deployment-logstash1
- 15:49 bd808: Restarted logstash on deployment-logstash1
- 09:45 mwalker: rebasing puppet repo to get a ocg patch
July 24
- 16:09 bd808: Reverted MW config to re-enable luasandbox mode; back to luastandalone for now
- 15:44 bd808: Updated MW config to re-enable luasandbox mode
- 15:43 bd808: Updated hhvm-luasandbox to 2.0-3 and restarted hhvm instances
- 14:21 hashar: killed hhvm process on deployment-mediawiki01 and 02. init script does not work.
- 02:59 ori: promoted legoktm to project-admin
July 23
- 23:30 bd808: Running `find . -type d -exec chmod 777 {} +` in /data/project/upload7 to finx shared image dir permisisons
- 20:49 bd808: Changed config to run lua via external executable to avoid hhvm crashing bug
- 16:20 bd808: hhvm upgraded to 3.1+20140723-1+wmf1 on deployment-mediawiki0[12]
- 15:34 bd808: Reverted hhvm to 3.1+20140630-1+wm1 on deployment-mediawiki02
- 15:21 bd808: Upgraded hhvm to 3.1+20140630; seeing problems with luasandbox extension
July 22
- 14:26 hashar: upgrading varnish on deployment-cache-mobile03
- 14:22 hashar: upgrading varnish on deployment-cache-text02
- 14:02 hashar: rebooting deployment-cache-upload02 varnish not happy with memory mapping
- 13:51 hashar: rebooting bits varnish cache
- 13:43 hashar: rebased puppetmaster repo. Rebase got broken after 0317463 - beta: New script to restart apaches got merged in.
- 13:35 hashar: apt-get upgrade on deployment-cache-bits01 + varnish upgrade
- 09:28 hashar: Removing role::beta::natfix that is now handled by labs DNS and the class is removed with 146091
July 21
- 23:37 ori: Switched over beta cluster app servers to HHVM
- 21:27 bd808: Killed update.php jobs; Antoine will give jobs a longer timeout
- 21:23 bd808: Running update.php for simplewiki in screen
- 21:22 bd808: Running update.php for hewiki in screen
- 21:21 bd808: Running update.php for eswiki in screen
- 21:21 bd808: Running update.php for cawiki in screen
- 21:21 bd808: Running update.php for commonswiki in screen
- 21:18 hashar: Restarting upd2log-mw on deployment-bastion. There is a bunch of [python] <defunct> processes
- 17:32 bd808: Updated scap to 4871208 (+ cherry pick of I6a56b5e)
- 17:12 bd808: Hotfix for scap ssh host key checking to fix jenkins scap job
- 17:03 bd808: Testing scap change I40a891b via cherry-pick
- 10:25 hashar: on bastion, fixed some puppet dependency to have nutcracker to start with the proper configuration 148043
- 10:20 hashar: upgrading packages on deployment-bastion
- 10:19 hashar: deleted /var/lib/apt/lists/lock on bastion. Was prevent apt-get update from running
- 10:18 hashar: setting up nutcracker on deployment-bastion. It was installed but the puppet class to configure it was not being applied. Related Gerrit patches: 148041 and 148042
- 09:25 hashar: rebooting deployment-apache02
- 09:22 hashar: rebooting deployment-apache01.
- 00:27 ori: deployment-mediawiki01 & deployment-mediawiki02: configured for project-local puppet & salt masters
July 18
- 00:30 bd808: removed local l10nupdate user from deployment-jobrunner01 and deployment-videoscaler01
- 00:22 bd808: Killed stuck beta-update-databases-eqiad job ( stuck for over 60m waiting for executor; deadlock?)
- 00:21 ori: beta broke due to I433826423. app servers load prod apache confs from /etc/apache2/wikimedia. temp fix: locally hack apache2.conf to load /usr/local/apache2/conf/all.conf; disable puppet.
July 17
- 23:18 bd808: Puppet broken for deployment-bastion by labs specific logic in misc::deployment::vars.
- 19:01 mwalker: possibly breaking labs by cherry picking an apparmor patch that affects mysql https://gerrit.wikimedia.org/r/#/c/147027/
July 16
- 19:15 mwalker: updated puppet about 20 minutes ago for new ocg variables (now officially in production puppet instead of just cherry picked)
July 15
- 18:26 bd808: Removed local mwdeploy user from /etc/passwd on deployment-videoscaler01 and deployment-jobrunner01
- 16:59 bd808: scap failing to deploymnet-videoscaler01 and deploymnet-jobrunner01 due to other random failures now. Lots of strange permissions errors during rsync
- 16:37 bd808: scap failing to deploymnet-videoscaler01 and deploymnet-jobrunner01 due to ssh auth failures; likely a puppet config problem
July 10
- 22:37 bd808: Added Gergő Tisza and Yuvipanda as project admins
July 8
- 23:37 bd808: Updated Kibana to 0afda49 (latest upstream head)
- 17:03 greg-g: Added John F. Lewis to the project after his NDA was signed by Mark (RT 7722)
July 7
- 20:55 bd808: Killed stuck `apt-get update` job on deployment-jobrunner01 started on Jun17
- 20:20 bd808: Fixed puppet on deployment-analytics01 with manual apt-get commands.
- 20:08 bd808: Ran `apt-get dist-upgrade` on deployment-analytics01 to upgrade hadoop, hive, pig, etc which were failing to update via puppet.
July 4
- 02:28 RoanKattouw: Unbroke replication on deployment-db2, it's catching up now
July 3
- 18:59 legoktm: manually created centralauth.renameuser_status table
- 16:04 bd808: Updated scap to ff04431
- 09:24 hashar: Reindexed ElasticSearch index for cawiki/eswiki with: mwscript extensions/CirrusSearch/maintenance/forceSearchIndex.php --wiki {cawiki,eswiki} --batch-size=50
- 09:22 hashar: Blow up ElasticSearch indices for cawiki and eswiki with: mwscript extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php --wiki cawiki --startOver --indexType content && mwscript extensions/CirrusSearch/maintenance/updateOneSearchIndexConfig.php --wiki cawiki --startOver --indexType general
- 09:10 hashar: used addwiki.php to create the wiki. manually triggered the Jenkins job that update the databases https://integration.wikimedia.org/ci/job/beta-update-databases-eqiad/2319/
- 09:06 hashar: Adding cawiki and eswiki for cxserver testing Ibbcbd4
July 2
- 07:49 hashar: cxserver being configured! 140723 by Kartik and Niklas \O/
July 1
- 15:46 bd808: Fixed git rebase conflict in operations/puppet on deployment-salt
- 13:29 manybubbles: rebuilding Cirrus search index in beta to pick up new configuration and cache warmers
- 11:20 hashar: Added Filippo Giunchedi to the project as an admin (WMF ops)
June 30
- 20:47 bd808: The state of puppet for beta is badly broken. I have hacked things to get puppet to apply on deployment-apache0[12] but puppet won't apply on deployment-bastion in part due to the same hacks.
- 18:48 bd808: Created symlink /apache -> /usr/local/apache on deployment-apache0[12] to fix docroot symlinks
- 18:09 bd808: Beta apaches are broken with latest puppet config applied. Working to correct.
- 18:08 bd808: Manually added symlink for /etc/apache/wmf on deployment-apache0[12]
June 26
- 12:48 YuviPanda: cherry picked https://gerrit.wikimedia.org/r/#/c/142228/ to puppetmaster, sending events to charcoal.wmflabs.org now with projectname \o/
- 09:46 YuviPanda: cherry-picked https://gerrit.wikimedia.org/r/#/c/142210/ on to puppetmaster
- 09:38 hashar: Granting sudo to YuviPanda
June 25
- 20:58 bd808: Fixed rebase conflict in operations/puppet.git on deployment-salt caused by cherry-picked vcl patch left over from varnish submodule usage
June 24
- 19:29 bd808: Manually updated operations/puppet checkout on deployment-salt to deal with varnish submodule change
June 19
- 22:47 bd808: Updated scap to 792a572
- 22:46 bd808: Trebuchet runs on deployment-videoscaler01 are succeeding but not showing up in the `git deploy report` output
- 22:40 bd808: Deleted /var/log/diamond/diamond.log on deployment-jobrunner01 because /var was full
June 18
- 16:55 bd808: Setup hourly cron as user bd808 on deployment-salt to test automatic update of puppet repo using ~bd808/git-sync-upstream script
June 17
- 20:36 bd808: Upgraded elasticsearch to version 1.2.1 on deployment-logstash1
June 16
- 21:16 bd808: Jenkins beta-scap-eqiad job broken because of missing puppet config on deployment-jobrunner01; needs role::beta::scap_target
- 20:36 bd808: Enabled puppet on deployment-jobrunner01 and forced a run
- 20:34 bd808: Puppet disabled on deployment-jobrunner01 since 2014-06-03; No SAL logs explaining why
- 20:19 bd808: Updated scap to 5adce72; trebuchet reported i-00000237 (deployment-videoscaler01) as not updating, but manual check shows it did sync properly
- 20:00 bd808: Deleted /var/lib/puppet/state/agent_catalog_run.lock on deployment-bastion after verifying that no puppet processes were running
- 19:55 bd808: Truncated /var/log/diamond/diamond.log and restarted diamond on deployment-bastion
- 19:36 bd808: /var/log/diamond is 787M of 1.2G total logs
- 19:29 bd808: /var 0% free on deployment-bastion; looking for things to clean-up
June 9
- 15:19 andrewbogott: doing a 'rebase origin' on deployment-salt, because it needs it.
- 15:10 andrewbogott: updating all instances to puppet 3 via a cherry-pick�� of https://gerrit.wikimedia.org/r/#/c/137898/ on deployment-salt
June 7
- 02:44 bd808: Restarted logstash on deployment-logstash1; last even logged at 2014-06-06T22:11:04
June 6
- 19:26 bblack: - synced labs/private on deployment-salt again
- 16:30 bd808: Rebooted deployment-salt
- 16:27 bd808: Made /var/log a symlink to /srv/var-log on deployment-salt
- 16:26 bblack: Updated labs/private.git on puppetmaster. brings in updated zero+netmapper password for beta
- 16:18 bd808: Changed from role::labs::lvm::biglogs to role::labs::lvm::srv on deployment-salt and made /var/lib a symlink to /srv/var-lib
- 15:45 bd808: /var on deployment-salt still at 97% full after moving logs; /var/lib is our problem
- 15:43 bd808: Archived deployment-salt:/var/log to /data/project/deployment-salt
- 15:40 bd808: Disabled puppet on deployment-salt to work on disk space issues
- 12:44 hashar: Updated labs/private.git on puppetmaster. Brings Brandon Black change "add labs copy of zerofetcher auth file" 137918
- 02:48 mwalker: added role::labs::lvm::biglogs to deployment-salt because it is out of room on /var and I don't know what I can delete
- 01:25 bd808: Live hacked /etc/apache2/wmf/hhvm.conf on apaches to allow them to start
- 00:30 bd808: `git stash`ed dirty dblist files found in /a/common on deployment-bastion
June 5
- 14:16 manybubbles: rebuild beta's jawiki's search index without kuromoji - it didn't help much anyway
- 14:14 manybubbles: recovered from busted elasticsearch - two problems: 1. I had an index that used the kuromoji plugin but I'd uninstalled it and 2. I had plugins for 1.2.1 but was trying to start 1.1.0. Solution: 1. delete the index and recreate it without kuromoji. 2. upgrade to 1.2.1 like I had planned on doing any way.
- 14:01 manybubbles: elasticsearch cluster got really angry in beta when I restarted some node - its like they aren't talking to eachother properly - trying to recover. once that is done I'll upgrade to 1.2.1 and that might fix it
- 13:59 hashar: deployment-elastic01 puppet was broken due to bug 63322 i.e. having some HTML garbage as ec2id which would be used as puppet certname
- 13:47 manybubbles: rolling restart of elasticsearch nodes in beta to pick up new kernel
June 4
- 20:46 bd808: Fixed file ownership on /data/project/apache/uncommon for beta-recompile-math-texvc-eqiad job
- 19:27 manybubbles: sorry, can't do that yet,
- 19:27 manybubbles: plugins deployed to beta - time to restart Elasticsearch in beta - should cause not interruption of service
- 19:01 manybubbles: deploying Elasticsearch 1.2.1 and some updated plugins to beta
- 17:11 bd808: Unwedged the jenkins jobs to updating beta by stopping the stuck db update job
- 16:27 bd808: Changed uid/git for files owned by l10nupdate user
- 09:50 mwalker: Reset salt caches by running `salt '*' state.clear_cache` from deployment-salt -- deployment-pdf01 now no longer reports errors when returning status for deployment
June 3
- 22:30 bd808: Deleted unused /data/project/apache/common-local on NFS share.
June 2
- 19:42 bd808: Updated scap to a7da355
- 05:14 bd808: Restarted logstash on deployment-logstash1; Last event logged at 2014-06-01T0722:56
May 30
- 21:45 bd808: Restarted uwsgi on deployment-graphite
- 18:43 bd808: Updated scap to c4204dd
May 29
- 21:07 bd808: mwalker cleaned up log spam from upstart on deployment-pdf01
- 20:59 bd808: /var full on deployment-pdf01
- 20:55 bd808: Restarted salt minion on deployment-pdf01 with `sudo salt 'i-00000396.eqiad.wmflabs' service.restart salt-minion`
May 28
- 17:53 bd808: Restarted logstash on deployment-logstash1; last event logged at 2014-05-28T12:11:37
- 16:56 bd808: Updated scap to fd7e538
May 27
- 19:08 bd808: Updated scap to 48c7e28
- 14:56 bd808: Updated scap to 9609e8d
May 23
- 16:32 bd808: Upgraded elasticsearch to 1.1.0 on deployment-logstash1
- 13:36 manybubbles: restarting elasticsearch on deployment-elastic01 to pick up some gc setting recommended by elasticsearch team
May 22
- 23:00 bd808: Added 20after4 as a project admin
- 22:59 bd808: Added matanya as a project memeber
- 21:38 bd808|LUNCH: Deployed scap 096cb3f
May 21
- 17:33 mwalker: converted deployment-pdf01 (i-00000396.eqiad.wmflabs) to use local puppet & salt master
- 14:50 bd808: restarted logstash on deployment-logstash1; getting really tired of these soft crashes
- 00:33 bd808: Puppet failing on deployment-videoscaler01 with duplicate definition of Class[Mediawiki::Jobrunner]
- 00:07 bd808: Fixed puppet for deployment-jobrunner01 using https://gerrit.wikimedia.org/r/#/c/134519/2
May 20
- 23:49 bd808: Fixed puppet for deployment-apache[12] using https://gerrit.wikimedia.org/r/#/c/134519/2
- 23:11 bd808: deployment-apache01 needs more work: "Could not set shell on user[mwdeploy]"
- 23:06 bd808: Fixing puppet config for upstream rename of role::applicationserver -> role::mediawiki
- 21:14 ori: Converted deployment-stream to use local puppet & salt masters
- 21:08 RoanKattouw: chown'ed /data/project/parsoid/parsoid.log from mwalker (?!?) to parsoid so Parsoid runs again
- 15:53 bd808: Deployed scap 7b6fc47 via trebuchet
May 19
- 14:34 bd808: Restarted logstash service on deployment-logstash1; it stopped logging new events at 10:37:13Z
May 16
- 21:20 manybubbles: restarting elasticsearch in beta to update some plugins
- 00:34 bd808: Updated EventLogging to I89819bd
May 15
- 22:14 bd808: Restarted logstash on deployment-logstash1 yet again; memory leak from invalid encoding bug
- 00:14 bd808: Disabled puppet on deployment-logstash1 to test a local logstash config change
May 14
- 23:33 bd808: Added irc input to logstash via I409fec9
May 13
- 09:28 bd808: Restarted logstash service on deployment-logstash1
- 09:28 bd808: Logstash events stop at 2014-05-11T18:36:35Z; Log file shows many "Failed parsing date from field" errors which probably triggered the known upstream memory leak bug
May 10
- 18:02 bd808: Restarted logstash on deployment-logstash1
May 9
May 6
- 17:54 bd808: Restarted logstash on deployment-logstash1
- 17:53 bd808: Logstash in beta hasn't recorded any events since 2014-05-04T04:32:36.
- 15:33 manybubbles: rolling restart of Elasticsearch servers in beta to pick up new highlighter plugin to fix bugs found when we fixed hebrew analysis. and to implement phrase highlighting.
May 5
- 21:29 mwalker: ran puppetstoredconfigclean and revoked puppet and salt keys for i-00000339.eqiad.wmflabs (was pdf01)
- 21:24 mwalker: removing pdf01 instance -- labs just uses production mwlib which works just fine. I'll recreate this when I make the OCG test instance
- 20:57 manybubbles: deploying new plugin to Elasticsearch (swift)
May 3
- 18:10 mwalker: Updated kernel on deployment-pdf01 (manually set console=ttyS0 to match older installed kernels)
- 17:58 mwalker: Converted i-00000339.eqiad.wmflabs (deployment-pdf01) to use local puppet & salt masters
- 17:54 mwalker: signed salt key for i-00000339.eqiad.wmflabs (deployment-pdf01)
- 17:43 bd808: Added mwalker to under_NDA sudoers group
May 2
- 17:01 bd808: Switched scap to use scripts delivered by trebuchet
May 1
- 15:46 manybubbles: upgrading Elasticsearch highlighter via a rolling restart
- 00:56 bd808: Fixed empty PrivateSettings.php configuration file (which I also broke earlier)
April 28
- 16:12 manybubbles: upgrading highlighter plugin in Elasticsearch
- 15:43 bd808: Created empty /srv/scap-stage-dir/wmf-config/mwblocker.log file to stop missing file warnings in beta.
April 25
- 11:31 hashar: commonswiki-75388f96: 0.6183 19.5M SQL ERROR (ignored): Table 'commonswiki.revtag_type' doesn't exist (10.68.16.193)
- 11:30 hashar: Authentication is broken on the beta cluster. Well at least from commons.wikimedia.beta.wmflabs.org
April 23
- 19:34 ^demon|lunch: created zhwiki, ukwiki, ruwiki, kowiki, hiwiki, jawiki for testing
- 10:19 hashar: stopping udp2log and starting udp2log-mw instead (known old bug that prevents logging)
April 22
- 18:42 bd808: Rebooting deployment-bastion in a wild attempt to get the jenkins slave there working again
- 18:42 bd808: Rebooting deployment-bastion in a wild attempt to get the jenkins slave there working again
April 18
- 19:24 manybubbles: rebuilding Cirrus indexes to pick up auxiliary fields and smarter accent matching
April 16
- 18:56 hashar: Migrating memc04 and memc05 to self master/salt [[bugzilla:64010|bug 64010]]
- 13:13 manybubbles: done
- 13:10 manybubbles: rolling restart of Elasticsearch nodes in beta to make super sure it picked up new plugins
- 09:33 hashar: rebased puppetmaster
April 15
- 20:02 manybubbles: restarting elasticsearch in beta to pick up a plugin update - no downtime should occur
- 14:24 hashar: rebased puppetmaster
April 11
- 17:41 bd808: Tried to enable role::protoproxy::ssl::beta on deployment-cache-text02 but it failed to apply because /etc/ssl/certs/star.wmflabs.org.pem and /etc/ssl/private/star.wmflabs.org.key don't match.
- 03:59 bd808: sudo apt-get install mysql-client on deployment-bastion
- 03:54 bd808: Added legoktm as a project member
- 00:02 bd808: Enabled https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/
April 10
- 21:35 bd808: Running scap on deployment-bastion for the first time in eqiad
- 21:13 bd808: Disabled https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ to work on scap setup
- 14:52 hashar: Adding Tobias Gritschacher to the project so he can look at udp2log / apache logs whenever needed :-]
April 9
- 23:04 bd808: Re-enabled puppet on deployment-apache02 and forced a puppet run
- 21:39 bd808: Cherry-picked I8f77e0c into puppet and forced puppet run on deployment-bastion
April 8
- 17:53 manybubbles: rebuilding simplewiki's search index optimized for the new highlighter to check the size difference
- 05:34 Ryan_Lane: upgraded libssl on all nodes, restarted affected ssl servers
- 05:03 Ryan_Lane: upgraded libssl on all salt accessible nodes
April 5
- 11:19 hashar: Attempting to reenable SSL support with 124057
April 4
- 21:39 bd808: Restarted logstash; it stopped processing events again at 2014-04-04T19:56:46Z
- 17:31 bd808: Forced puppet run on deployment-cache-text02
- 17:29 bd808: Manually fixed puppet config on deployment-cache-text02 (the cert html error problem)
- 17:22 bd808: Rebooting deployment-cache-bits01
- 17:21 bd808: Forced puppet run on deployment-cache-bits01
- 16:15 manybubbles: Performing a rolling restart of Elasticsearch nodes to pick up a new plugin
April 3
- 17:32 bd808: Fixed certname in /etc/puppet/puppet.conf manually on deployment-bastion so puppet would run again.
- 15:33 bd808: Restarted logstash on deploymnet-logstash1; Stuck in a bad state due to jvm oom logged at 2014-04-03T12:03:43Z
April 2
- 17:54 manybubbles: done installing plugins on Elasticsearch in beta
- 14:10 hashar: Fixed database updating job https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/ . It was not running on the proper node.
- 12:50 hashar: restarted parsoid daemon on deployment-parsoid04.eqiad.wmflabs. It also now log to /data/project/parsoid/parsoid.log
- 12:36 hashar: Manually deleting parsoid user/group on deployment-parsoid04. Will use the LDAP uid/gid instead.
April 1
- 21:38 hashar: Removed the Zuul triggers that updated beta cluster in PMTPA 123100.
- 19:49 bd808: Converted deployment-graphite.eqiad.wmflabs to use local puppet & salt masters
- 19:20 bd808: Deleting and re-creating deployment-graphite because I forgot to add the web security group
- 15:57 andrewbogott: shutting down all pmtpa instances
- 14:32 manybubbles: completed upgrade to Elasticsearch 1.1.0 and fixed deployment-elastic04.
- 13:32 hashar: Thumbs access more or less fixed
- 13:31 hashar: deployment-upload is rejecting connection on port 80. Applying role::beta::uploadservice from 122786
- 13:30 manybubbles: upgrading labs Elasticsearch to 1.1.0
- 13:06 hashar: Applying role::beta::natfix on deployment-upload.eqiad.wmflabs . Might let it access images from commons.wikimedia.beta.wmflabs.org ( ex: http://upload.beta.wmflabs.org/wikipedia/commons/thumb/4/43/Feed-icon.svg/16px-Feed-icon.svg.png yields: Error retrieving thumbnail from scaling server: couldn't connect to host commons.wikimedia.beta.wmflabs.org )
- 08:31 hashar: MediaWiki config paths tweaks for Math [[bugzilla:63331|bug 63331]] and Captchas [[bugzilla:63342|bug 63342]]
- 00:32 bd808: Converted deployment-graphite to use local puppet & salt masters
March 31
- 21:02 hashar: Making Parsoid daemon to write its logs to /data/project/parsoid/parsoid.log 122561
- 20:47 hashar: Puppet master is fixed. The certificates got badly messed up, had to regenerate them following the documentation "Regenerate Certificates for Puppet Master"
- 20:17 hashar: restarted parsoid daemon
- 20:00 hashar: stopped parsoid . It is killing the application servers
- 19:53 hashar: restarting both apaches
- 19:21 hashar: restarting job service on jobrunner01 to apply 122436
- 19:20 hashar: Unbreak puppetmaster on deployment-salt.eqiad.wmflabs
- 19:01 hashar: puppet master is broken :(
- 17:39 hashar: lowering # of jobs spawned by the jobrunner 122436
- 16:00 bd808: Restarted logstash service on deployment-logstash1; no new log events seen since 2014-03-28T10:57
- 15:58 bd808: Updated kibana on deployment-logstash1 to e317bc6
- 15:56 hashar_: Cluster slow because some CirrusSearch job is spamming simplewiki . Gotta find a way to throttle the number of jobs being run on jobrunner01 or add more apache boxes . It is transient anyway, might look at limiting the runs tonight
- 15:10 hashar_: Rebased puppet repository. Only one hack left: https://gerrit.wikimedia.org/r/#/c/119534/
- 14:20 hashar: deleting deployment-parsoidcache01 cache the hardway: stopping varnish, deleting files in /srv/vdb/ , starting varnish
- 14:05 hashar: shutdowning database and apache boxes for now.
- 14:03 hashar: shutdowning varnishes instances in pmtpa
- 13:56 hashar: Deleted deployment-cache-upload01 , replaced by deployment-cache-upload02
- 13:52 hashar: upload varnish cache working :-]
- 13:47 hashar: applying role::cache::upload to role-cache-upload02
- 13:37 hashar: migrating deployment-cache-upload02.eqiad.Wmflabs to self puppet/salt master
- 13:22 hashar: Creating deployment-cache-upload02 to replace deployment-cache-upload01 which was missing the security group "web"
- 11:30 hashar: Update DNS entries to point to EQIAD instances (aka switching beta cluster to eqiad)
March 28
- 16:18 hashar: rebased puppet on deployment-salt
- 15:39 hashar: Last log made to wrong project
- 15:39 hashar: deleting instance ntegration-selenium-driver no more needed. browsertests jobs should now be runnable on integration-slave1001 and integration-slave1002 (in eqiad)
- 10:54 hashar: deleting instance integration-debian-builder . That is breaking all debian-glue jobs. Will revisit later next week to get pbuilder/cowbuilder set up on the other eqiad slaves
- 08:48 hashar: deleting integration-slave-pbuilder. Unneeded (i need a coffee)
- 08:43 hashar: Created integration-slave-pbuilder on eqiad to replace pmtpa instance integration-debian-builder
- 00:23 bd808: `sudo chmod -R a+rwx /data/project/upload7`; We need to get this file permissions thing figured out
March 27
- 15:23 hashar: role::beta::natfix cant run on deployment-bastion.eqiad because the ferm rules conflicts with the Augeas rules coming from udp2log :-(
- 15:21 hashar: applying role::beta::natfix on deployment-bastion.eqiad
- 14:58 hashar: fixed up role::beta::natfix . Ferm is now being applied again on various application server instances 121378
- 13:58 hashar: rebased puppetmaster git repository, reapplied ottomata live hacks.
- 12:55 hashar: mediawiki l10n cache being rebuild!!!
- 12:54 hashar: Fixed permissions on eqiad bastion for /srv/scap . Others (such as mwdeploy) could not read / execute scap scripts
- 11:29 hashar: MediaWiki code and configuration are now self updating on EQIAD cluster via Jenkins jobs. First run: https://integration.wikimedia.org/ci/job/beta-code-update-eqiad/4/console
- 11:11 hashar: deleting job beta-code-update , replaced by datacenter variants beta-code-update-pmtpa and beta-code-update-eqiad
- 10:54 hashar: Deleting job beta-update-databases , replaced by datacenter variants beta-update-databases-pmtpa and beta-update-databases-eqiad
March 26
- 19:05 bd808: Added ottomata as a project member and admin
- 15:46 springle: deployment-db1 data loaded
- 14:45 bd808: created proxy https://logstash-beta.wmflabs.org for logstash instance
- 14:17 hashar: fixed up redis configuration in eqiad. Jobrunner is happy now: aawiki-504cd7d2: 0.9649 21.5M Creating a new RedisConnectionPool instance with id 627014d. 121060
- 14:05 hashar: udp2log functional on eqiad beta cluster \O/
- 13:55 hashar: stopping udp2log on eqiad bastion, starting udp2log-mw (really should fix that issue one day)
- 13:52 hashar: dropped some live hack on eqiad in /data/project/apache/common-local and ran git pull
- 13:14 hashar: Dropping enwikivoyage and dewikivoyage databases from sql02. Related changes are updating the Jenkins config: https://gerrit.wikimedia.org/r/#/c/121045/ and cleaning up the mw-config : https://gerrit.wikimedia.org/r/#/c/121047/
- 07:53 springle: installed mariadb via puppet on deployment-db1. no data yet
March 25
- 19:43 hashar: created jenkins slave deployment-bastion.eqiad
- 17:17 hashar: Created and validated job that updates Parsoid on the EQIAD beta cluster \O/
March 24
- 23:16 marktraceur: Touching all the MMV scripts because they're not getting invalidated or something
- 23:10 hashar: l10n cache got broken due to a PHP fatal error I introduced. It is back up now. Found out via https://integration.wikimedia.org/dashboard/
- 23:09 hashar: upgraded all pmtpa varnishes, ran puppet on all of them. all set!
- 22:57 hashar: restarting deployment-cache-upload04 , apparently stalled
- 22:48 hashar: upgrading varnish on all pmtpa caches.
- 22:47 hashar: apt-get upgrade varnish on deployment-cache-bits03
- 22:45 marktraceur: attempted restart of varnish on betalabs; seems to have failed, trying again
- 22:42 hashar: made marktraceur a project admin and granted sudo rights
- 22:39 marktraceur: Restarting betalabs varnish to workaround https://bugzilla.wikimedia.org/show_bug.cgi?id=63034
- 17:25 bd808: Converted deployment-db1.eqiad.wmflabs to use local puppet & salt masters
- 17:06 bd808: Changed rules in sql security group to use CIDR 10.0.0.0/8.
- 17:05 bd808: Changed rules in search security group to use CIDR 10.0.0.0/8.
- 17:05 bd808: Built deployment-elastic04.eqiad.wmflabs with local salt/puppet master, secondary disk on /var/lib/elasticsearch and role::elasticsearch::server
- 16:19 bd808: Built deployment-elastic03.eqiad.wmflabs with local salt/puppet master, secondary disk on /var/lib/elasticsearch and role::elasticsearch::server
- 16:08 bd808: Built deployment-elastic02.eqiad.wmflabs with local salt/puppet master, secondary disk on /var/lib/elasticsearch and role::elasticsearch::server
- 15:54 bd808: Built deployment-elastic01.eqiad.wmflabs with local salt/puppet master, secondary disk on /var/lib/elasticsearch and role::elasticsearch::server
- 10:31 hashar: migrated deployment-solr to self puppet/salt masters
March 21
- 09:29 hashar: l10ncache is now rebuild properly : https://integration.wikimedia.org/ci/job/beta-code-update/53508/console
- 09:23 hashar: fixing l10ncache on deplkoyment-bastion : chown -R l10nupdate:l10nupdate /data/project/apache/common-local/php-master/cache/l10n The l10nupdate UID/GID has been changed and are now in LDAP
March 20
- 23:46 bd808: Mounted secondary disk as /var/lib/elasticsearch on deployment-logstash1
- 23:46 bd808: Converted deployment-tin to use local puppet & salt masters
- 22:09 hashar: Migrated videoscaler01 to use self salt/puppet masters.
- 21:30 hashar: manually installing timidity-daemon on jobrunner01.eqiad so puppet can stop it and stop whining
- 21:00 hashar: migrate jobrunner01.eqiad.wmflabs to self puppet/salt masters
- 20:55 hashar: deleting deployment-jobrunner02 , lets start with a single instance for nwo
- 20:51 hashar: Creating deployment-jobrunner01 and 02 in eqiad.
- 15:47 hashar: fixed salt-minion service on deployment-cache-upload01 and deployment-cache-mobile03 by deleting /etc/salt/pki/minion/minion_master.pub
- 15:30 hashar: migrated deployment-cache-upload01.eqiad.wmflabs and deployment-cache-mobile03.eqiad.wmflabs to use the salt/puppetmaster deployment-salt.eqiad.wmflabs.
- 15:30 hashar: deployment-cache-upload01.eqiad.wmflabs and deployment-cache-mobile03.eqiad.wmflabs recovered!! /dev/vdb does not exist on eqiad which caused the instance to be stalled.
- 10:48 hashar: Stopped the simplewiki script. Would need to recreate the db from scratch instead
- 10:37 hashar: Cleaning up simplewiki by deleting most pages in the main namespace. Would free up some disk space. deleteBatch.php is running in a screen on deployment-bastion.pmtpa.wmflabs
- 10:08 hashar: applying role::labs::lvm::mnt on deployment-db1 to provide additional disk space on /mnt
- 09:39 hashar: convert all remaining hosts but db1 to use the local puppet and salt masters
- 04:40 springle: created deployment-db1 for mariadb master in eqiad
March 19
- 21:23 bd808: Converted deployment-cache-text02 to use local puppet & salt masters
- 20:21 hashar: migrating eqiad varnish caches to use xfs
- 17:58 bd808: Converted deployment-parsoid04 to use local puppet & salt masters
- 17:51 bd808: Converted deployment-eventlogging02 to use local puppet & salt masters
- 17:22 bd808: Converted deployment-cache-bits01 to use local puppet & salt masters; puppet:///volatile/GeoIP not found on deployment-salt puppetmaster
- 17:00 bd808: Converted deployment-apache02 to use local puppet & salt masters
- 16:49 bd808: Converted deployment-apache01 to use local puppet & salt masters
- 16:30 hashar: Varnish caches in eqiad are failing puppet because there is no /dev/vdb. Will figure it out tomorrow :-]
- 16:15 hashar: Applying role::logging::mediawiki::errors on deployment-fluoride.eqiad.wmflabs . It is not receiving anything yet though.
- 15:50 hashar: fixed upd2log-mw daemon not starting on eqiad bastion ( /var/log/udp2log belonged to wrong UID/GID)
- 15:49 hashar: deleted local user l10nupdate on deployment-bastion. It is in ldap now.
March 18
- 03:31 bd808: deployment-bastion now using deployment-salt as puppet master
March 17
- 15:02 hashar: Starting copying /data/project from ptmpa to eqiad
- 14:46 hashar: manually purging all commonswiki archived files (on beta of course)
March 14
- 14:47 hashar: changing uid/gid of mwdeploy which is now provisioned via LDAP (aka deleting local user and group on all instance + file permissions tweaks)
March 11
- 10:46 hashar: dropping some unused databases from deployment-sql instance.
March 10
- 11:09 hashar: Deleting http://simple.wikipedia.beta.wmflabs.org/wiki/MediaWiki:Robots.txt
- 09:54 hashar: Reducing memcached instances to 3GB ( 115617 ). Seems to fix writing to the EQIAD memcaches which only have 3GB
- 09:08 hashar: Restarted bits cache (CPU / mem overload)
March 6
- 09:07 hashar: restarted varnish and varnish-frontend on deployment-cache-text1
March 5
- 17:26 hashar: hacked in mwversioninuse to return "master=aawiki". Relaunched l10n job using mwdeploy user and then running mw-update-l10n
- 17:07 hashar: mwversioninuse gives a wmf branch instead of master. That breaks l10n messages update and the job https://integration.wikimedia.org/ci/job/beta-code-update/ . Root cause is the python based scap.
March 3
- 17:28 manybubbles: doing an Elasticsearch reindex on beta before I try another one in production
February 28
- 10:17 hashar: Puppet running on varnish upload cache after several months. Might break random things in the process :(
February 27
- 14:11 manybubbles: upgrading beta to Elasticsearch 1.0
February 26
- 20:44 hashar: Cleaning up commonswiki archived files with mwscript deleteArchivedFiles.php --wiki=commonswiki --delete
- 20:44 hashar: deleted all files from http://commons.wikimedia.beta.wmflabs.org/wiki/Category:GWToolset_Batch_Upload (gwtoolset import test). Deleted File:Title_0* (Selenium tests).
- 15:06 hashar: deleted all thumbs from shared directory: /data/project/upload7/*/*/thumb/*
- 14:54 hashar: cleaning out 2013 archived logs.
February 25
- 08:42 hashar: Upgrading all varnishes.
February 24
- 23:36 MaxSem: Rolled back
- 23:25 hoo: recursively chowned extensions/MobileFrontend to mwdeploy:mwdeploy
- 23:21 hoo: chowned /data/project/apache/common-local/php-master/extensions/.git/modules/MobileFrontend/* to mwdeploy:mwdeploy
- 17:47 MaxSem: Investigating a mobile bug, might cause intermittent problems
- 17:36 MaxSem: Rebooted deployment-cache-mobile01 - was impossible to log into it though Varnish still worked
February 21
- 19:42 MaxSem: Adjusted read privs on /home/wikipedia/syslog/apache.log to allow fatalmonitor to work
February 19
- 16:24 hashar: -bastion : /etc/init.d/udp2log stop && /etc/init.d/udp2log-mw start (known bug)
- 16:23 hashar: rebooting -bastion
- 16:22 hashar: rebooting apache32 and apache33 breaking beta :-]
February 17
- 15:26 hashar: rebooting bits cache
February 11
- 21:55 manybubbles: update elasticsearch schema after recent changes. will run a links update as well
February 6
- 22:20 Krinkle: Manually ran changePassword.php to help someone (password reminder emails don't get sent)
- 14:43 hashar: restarting udp2log-mw on deployment-bastion. Logstash.wmflabs.org no more receiving fatals logs since Jan 31st
February 4
- 17:22 hashar: fixed up beta-parsoid-update job so Parsoid should be up to date again. The issue is that the multigit job pointed to a wrong host (ZUUL_URL should be zuul.eqiad.wmnet)
- 13:33 hashar: removing role::memcached from both apache servers
- 09:58 hashar: rebooting all varnish caches
- 09:57 hashar: Upgrading all varnish
February 3
- 16:59 hashar: upgrading varnish on deployment-parsoidcache3
January 30
- 19:35 hashar: deployment-cache-bits03 restarted gmond, leaked memory. Upgrading varnish
- 19:32 hashar: Canceled varnish package upgrade on deployment-cache-mobile01 , it runs a specific version ( 3.0.5plus~wmftest-wm1 ) instead of 3.0.3plus~rc1-wm29
- 19:30 hashar: upgrading varnish on deployment-cache-mobile01
- 19:29 hashar: upgrading varnish on deployment-cache-bits03
- 19:29 hashar: upgrading varnish on deployment-staging-cache-mobile02
- 19:28 hashar: upgrading varnish on deployment-cache-upload04
- 19:27 hashar: reenabling puppet on deployment-cache-mobile01
- 17:10 manybubbles: done reindexing beta. everything looks good
- 16:54 manybubbles: reindexing beta like we're going to do in production when the release train departs later today
January 28
- 17:10 hashar: added addshore and jhall to project so they can grep logs
January 27
- 15:17 hashar: applying role::beta::fatalmonitor puppet class on deployment-bastion bug 60046
January 23
- 19:38 hashar: VisualEditor was not being updated properly because some files belonged to root instead of mwdeploy. Ran chown -R mwdeploy:mwdeploy /data/project/apache/common-local/php-master/extensions/VisualEditor
January 16
- 20:54 manybubbles: turning elasticsearch's disk space aware allocator
January 15
- 21:14 manybubbles: finished updating to elasticsearch 0.90.10
- 08:48 andrewbogott: rebooted deployment-cache-text1
January 2
- 15:32 hashar: Migrated parsoid on deployment-parsoid2 to use mediawiki/services/parsoid out of a checkouts made in /srv/deployment/parsoid/{parsoid,deploy}. No job self updating it yet
- 15:00 manybubbles: finished upgrading Elasticsearch in beta. We're on 0.90.9 now.
- 14:07 hashar: running mw-update-l10n , it was broken because of https://gerrit.wikimedia.org/r/#/c/104741/ fixed up by https://gerrit.wikimedia.org/r/#/c/104953/
- 13:54 manybubbles: upgrading Elasticsearch servers in beta
December 26
- 18:54 manybubbles: performing in place index rebuild for wikis in beta after recent cirrus update
December 23
- 20:40 anomie: Restarting mw-job-runner service on deployment-jobrunner08, since jobs don't seem to be being run
- 20:03 anomie: Restarting apache on deployment-apache33 to see if that clears the odd errors going on
December 18
- 10:56 hashar: reenabling puppet on parsoid2 and deploying the new Parsoid upstart configuration 99656
Archives
- Archive 1 (2012-2013)