Toolserver:Admin:ToDo

From mediawiki.org

This page was moved from the Toolserver wiki.
Toolserver has been replaced by Toolforge. As such, the instructions here may no longer work, but may still be of historical interest.
Please help by updating examples, links, template links, etc. If a page is still relevant, move it to a normal title and leave a redirect.

ToDoList 22.6.2013[edit]

  • who can copy the DBs off WMF server if not DaB?

Data Center Tasks[edit]

  • SAN: connections inbetween all working?
  • network switches cannot really fallback in case of a core router going offline - Mark knows and a maintenance window is due
  • get 6x 8-10m OM3 fibres to make all SAN connections to oe10 redundant

Databases[edit]

  • linux virtual MySQL instances suffer a bug regarding the slave process - if the connection to the master fails for too long it wont restart and also a stop/start slave will only cause the MySQL process to be inoperable so killing the daemon is the only option but recovery may take hours

OSM[edit]

  • setup strabon to be the database successor of ptolemy

  • PHP stinks here - things dont work as expected - switch to Linux + Apache would be nice to have

Load Balancer Resetup[edit]

  • resetup of turnera:

not done:

LDAP NFS - final sync takes 3 hours where NFS needs to be readonly, mount the homes from the SAN volume and resync to the local array after new fs DNS recursor TS mysql Squid

setup damiana like turnera and make failover tests automatic restart of services on turnera

firewalling for the load balancers puppet for the load balancers


hemlock[edit]

  • needs SAN card switched to another PCI-X slot
  • puppet broken after update
  • ssh keys update
  • fingerprints.toolserver.org update

  • Solaris updates

Nagios Alerts[edit]

  • amaranth:

SMF: possible to only repair via shutdown/restart of ILOM which requires maintenance window

  • ha-sql.esi/damiana

MySQL user does not work /root password unknown too

  • hemlock

throw away the local array

  • SGE plugin does not parse since the last SGE update
  • mayapple still not reachable after hard reset - still uses old ip range?
  • thyme& turnera: nrpe well configured? - please dont forget to update puppet