Wikimedia Cloud Services team/Onboarding Arturo/Sessions

= Arturo and Chase Onboarding Sessions =

Nov 28, 2017

 * Travel!
 * going through pending tickets and patches assigned

Package upgrade workflow (https://phabricator.wikimedia.org/T181647): Unattended things: * All cloud instances get all unattended upgrades from WMF and distro by default - Security updates <-- add a patch (Arturo) - distro package upgrades https://gerrit.wikimedia.org/r/#/c/390431/2/modules/apt/manifests/unattendedupgrades.pp      - wmf package upgrades https://gerrit.wikimedia.org/r/#/c/389480/ :* Add a patch to put this behind a hiera setting (Chase) :* kernel updates still sleeping in toolforge (task?) https://phabricator.wikimedia.org/T180809 :* packages handling configuration files correctly (which means preserving settings) https://gerrit.wikimedia.org/r/#/c/392421/ - backports is an open question

Choosing to handle updates manually: * A project can choose to set a hiera key that will stop these upgrades from happening ( one key per type of upgrade candidate) * A script exists to run on an instance to generate a report for available package upgrades. (https://phabricator.wikimedia.org/P6365) :* Broken down by wmf vs distro? * The script that is used to generate the report or another script can be used to do the upgrades. This is a replacment for unattended and is ...attended upgrade solution.

Nov 21, 2017

 * Unattended-upgrades done w/ Chase: https://phabricator.wikimedia.org/T177920 notes: https://etherpad.wikimedia.org/p/389480
 * pending https://phabricator.wikimedia.org/T180811
 * Unattended upgrades pending: https://phabricator.wikimedia.org/T180254
 * wiki replicas automation. Step 1: documentation https://phabricator.wikimedia.org/T180513
 * role::puppetmaster::standalone has no firewall rule for port 8140 https://phabricator.wikimedia.org/T154150

root@tools-bastion-03:~# host enwiki.web.db.svc.eqiad.wmflabs enwiki.web.db.svc.eqiad.wmflabs is an alias for s1.web.db.svc.eqiad.wmflabs. s1.web.db.svc.eqiad.wmflabs has address 10.64.37.15

user_properties: source: user_properties view: select up_user, up_property, up_value where: > up_property in ( 'disablemail', 'fancysig', 'gender', 'nickname' )

user_properties_anon: limit: 2 source: ["user_properties", "user", "meta_p.properties_anon_whitelist" ] view: select cast(extract(year_month from user_touched)*100+1 as date) upa_touched, up_property, up_value where: user_id=up_user and up_property like pw_property

Nov 2, 2017
Recurrent problem.
 * tools-bastion-03


 * arturo's onboarding page


 * Make a network diagram


 * Openstack: everything is liberty execpt horizon which is mitaka.


 * Wiki replicas <-- look at them.


 * Next week: shadow clinic duty person. Madhu?

Oct 31, 2017

 * 2017-11-01 is a public holiday for Arturo
 * We should get some/all of these for the next few months on the team calendar
 * Arturo trying to understand which servers are physical, which are virtual, and how they link together
 * Wants a map of how things fit together
 * Nick poked Arturo about setting up his User page on metawiki

https://wikitech.wikimedia.org/wiki/Ganeti <--- KVM + DRBD (NOTE: 2017-10-31: already read the docs)
 * Chase to find the newly formed ongoing topographical docs
 * Everything is physical *except* Cloud VPS tenents and a few things on Ganeti in "production"


 * https://tools.wmflabs.org/openstack-browser/project/
 * https://tools.wmflabs.org/openstack-browser/project/tools <-- all of the VMs in Toolforge
 * names of the vms give a hint to what they do:
 * tools-k8s-* -- kubernetes core services
 * tools-docker-* -- kubernetes related Docker hosts (Docker registry, Docker image builder host)
 * tools-worker-* -- kubernetes exec nodes
 * tools-paws-* -- a second kubernetes cluster that powers PAWS , run by Yuvi
 * tools-exec-* -- Grid Engine execution nodes for "normal" tasks
 * tools-webgrid-* -- Grid Engine execution nodes for "web" tasks


 * https://quarry.wmflabs.org/
 * http://paws.wmflabs.org/

Oct 26, 2017
https://phabricator.wikimedia.org/T179024
 * topics?
 * I've been working on this task today: nfsiostat diamond collector

To test a patch, depool a node and test in a node: https://phabricator.wikimedia.org/P6194

Puppet (how does it work)

 * LDAP is the "same sign-on" solution for all things that are not MediaWiki
 * Unix user accounts outside of Cloud VPS are not connected directly to LDAP
 * Data is managed by Puppet based on modules/admin/data/data.yaml


 * puppetmaster1001.eqiad.wmnet
 * puppet-merge

y/n?

new installs
https://wikitech.wikimedia.org/wiki/Server_Lifecycle#Installation


 * New installs
 * Getting the MAC address for a new server
 * https://wikitech.wikimedia.org/wiki/Platform-specific_documentation
 * https://wikitech.wikimedia.org/wiki/Server_Lifecycle

New server: foo.eqiad.wmnet management network: foo.mgmt.eqiad.wmnet management network: .eqiad.wnet == mgmt

https://wikitech.wikimedia.org/wiki/Platform-specific_documentation/HP_DL3N0 show system1/network1/Integrated_NICs

files/dhcpd/linux-host-entries.ttyS1-115200:host labcontrol1001 { # onboard management Host *.mgmt.*.wmnet StrictHostKeyChecking ask UserKnownHostsFile /Users/cpettet/.ssh/wmf_mgmt_hosts https://gerrit.wikimedia.org/r/#/admin/projects/operations/dns

https://phabricator.wikimedia.org/diffusion/

baham.eqiad.wment authdns-update

From pupetmaster1001: new-install

Bastions

 * Bastions (protected bastion)
 * https://wikitech.wikimedia.org/wiki/Production_shell_access

restricted.bastion.wmflabs.org

toolforge <-- own bastion

---

Cloud VPS project request instructions -- https://phabricator.wikimedia.org/project/view/2875/

Openstack vs Horizon vs Toolsadmin

 * OpenStackManager -- https://www.mediawiki.org/wiki/Extension:OpenStackManager
 * Horizon -- https://docs.openstack.org/horizon/latest/
 * Toolsadmin (codename: Striker) -- https://wikitech.wikimedia.org/wiki/Toolsadmin.wikimedia.org