Toolserver:Admin:ToDo

ToDoList 22.6.2013

 * who can copy the DBs off WMF server if not DaB?

Data Center Tasks

 * SAN: connections inbetween all working?
 * network switches cannot really fallback in case of a core router going offline - Mark knows and a maintenance window is due
 * get 6x 8-10m OM3 fibres to make all SAN connections to oe10 redundant

Databases

 * rosemary: defect CPU, NFS move to hemlock
 * z-dat-s2-b: swap too small - maintenance
 * linux virtual MySQL instances suffer a bug regarding the slave process - if the connection to the master fails for too long it wont restart and also a stop/start slave will only cause the MySQL process to be inoperable so killing the daemon is the only option but recovery may take hours
 * reinstall wikidata instances: s2-user-wd, s4-user-wd - maintenance

OSM

 * setup strabon to be the database successor of ptolemy


 * PHP stinks here - things dont work as expected - switch to Linux + Apache would be nice to have

Load Balancer Resetup
not done:
 * resetup of turnera:

LDAP NFS - final sync takes 3 hours where NFS needs to be readonly, mount the homes from the SAN volume and resync to the local array after new fs DNS recursor TS mysql Squid

setup damiana like turnera and make failover tests automatic restart of services on turnera

firewalling for the load balancers puppet for the load balancers

hemlock

 * NFS user-store & backup VM back
 * puppet broken after update
 * ssh keys update
 * fingerprints.toolserver.org update


 * Solaris updates

Nagios Alerts
SMF: possible to only repair via shutdown/restart of ILOM which requires maintenance window MySQL user does not work /root password unknown too throw away the local array
 * amaranth:
 * ha-sql.esi/damiana
 * hemlock
 * SGE plugin does not parse since the last SGE update
 * mayapple still not reachable after hard reset - still uses old ip range?
 * thyme& turnera: nrpe well configured? - please dont forget to update puppet