Wikimedia Cloud Services team/Onboarding Chico/Sessions

https://www.mediawiki.org/wiki/Wikimedia_Cloud_Services_team/Onboarding_Chico/Sessions

= Chico Questions =

How is monitoring configured?

 * We have Icinga being phased out for prometheus in productions servers
 * Shinken in lab instances
 * My goal is to add alerts for tools-bations, seems it should be done in Icinga/prometheus task T186552 https://phabricator.wikimedia.org/T186552
 * We already collect CPU and IO data for tools-bations (https://tools.wmflabs.org/nagf/?project=tools#h_tools-bastion-03_cpu )
 * I see we can use a check_graphite_series_threshold to get the loadavg like we are doing with iowait (from https://graphite-labs.wikimedia.org/ )
 * There is no total_cpu metric, we need number of cores to know what to set for warning and critical in loadavg

https://etherpad.wikimedia.org/p/chicoandchase https://graphite-labs.wikimedia.org/render/?width=674&height=377&_salt=1518534146.333&target=tools.tools-bastion-03.cpu.total.idle&from=-30d tc ifb tc an only manipulate send queues iotop iotop -ao https://graphite-labs.wikimedia.org/render/?width=674&height=377&_salt=1518535047.584&target=tools.tools-bastion-03.nfsiostat.labstore.ops&target=tools.tools-bastion-03.nfsiostat.labstore.ops_per_sec&target=tools.tools-bastion-03.nfsiostat.labstore1003.ops&target=tools.tools-bastion-03.nfsiostat.labstore1003.ops_per_sec

https://graphite-labs.wikimedia.org/render/?width=674&height=377&_salt=1518535115.171&target=tools.tools-bastion-03.nfsiostat.mounts.data_project.write.kilobytes&target=tools.tools-bastion-03.nfsiostat.mounts.data_project.read.kilobytes&from=-30d

https://graphite-labs.wikimedia.org/render/?width=674&height=377&_salt=1518535177.867&target=tools.tools-bastion-03.nfsiostat.mounts.mnt_nfs_labstore-secondary-tools-project.read.kilobytes&target=tools.tools-bastion-03.nfsiostat.mounts.mnt_nfs_labstore-secondary-tools-project.write.kilobytes&from=-30d

WMCS Phabricator etiquete

 * Do we have documentation about how to triage tasks and move them arround projects and workboards?
 * TBD

Cloud VPS / Horizon stuff

 * I am still unfamiliar with the interfaces and common questions, maybe I should create a temp project and go through docs.
 * make a task for a chicotestproject
 * Where are things configured?
 * Wikitech
 * operations-puppet repo
 * Horizon

https://wikitech.wikimedia.org/wiki/Hiera:tools

~/git/wmf/puppet cpettet@cair>ls hieradata/labs/tools

toolsadmin.wikimedia.org

Wikitech docs

 * Portal namespace for user facing docs
 * /admin subpage for WMCS team

Other tasks

 * Is there something else I should be looking into?
 * Let's start slow and I'll try to integrate you into my sort of normal workflow
 * Flapping alerts in shinken
 * host* as way to get % of hosts in failure