QA and testing/Labs plan

Roles

 * Roles (who will be responsible for what)
 * Chris - lead customer / product manager
 * Antoine - lead dev
 * Sam - backup dev
 * Faidon - primary ops
 * Ryan - escalation on ops stuff
 * RobLa - peanut gallery

Short term goals

 * Identify Chris requirements for QA (aka what he expects us to do)
 * Short term focus:
 * Commons (for TMH)
 * Wikisource (for ProofRead Page extension)
 * Enwiki
 * Short term Commons pain points examples:
 * Timed Media Handler
 * Thumbnailing doesn't work
 * TMH trying to write to a dir that does not exist
 * Slowness


 * How to guarantee we have a specific MW version AND a specific extension
 * Work out speedy review process for Puppet changes
 * Scaling reviewers (basically just Ryan right now)
 * Per labs project branches to let users push/merge directly without waiting for ops

Short term actions

 * Find out what are diff in core required for TMH. j@thing.net ? => Antoine
 * Make upload of image and video work on labs => Ryan, Faidon & Antoine
 * Make image scaler virtual cluster in Beta Labs (Faidon and Ryan)
 * Fix up syslog ? (Ryan & Antoine)
 * Get private config files out into public Git (Sam)

Points to talk about with ops
 * /home/wikipedia
 * how to dsh in labs without typing password :-D ssh passwordless key? sudo ?
 * Open bugs / prioritize them

Things to work on starting May 8:

 * Frequent, automated updates of code from Git
 * puppetizing / add to git existing production configuration:
 * /home/wikipedia/syslog as a trivial example. Not in puppet!
 * Apache configuration in git ( https://bugzilla.wikimedia.org/show_bug.cgi?id=36413 )
 * CommonSettings.php and friends in git
 * replicate HTTPS architecture (nginx proxy?).
 * Have a running install of MediaWiki for each commit?
 * Improve stability/performance / identify main performance bottlenecks in deployment-prep today
 * Run CI tests against labs?
 * Work on mini "MediaWiki in a box" instance set up for testing/dev before a feature/extension is moved to deployment prep
 * Socialize :-]

10am meeting
Rob expose labs and the overal technical view. AGREED updating cluster to master is to be made automatically. Stuff to look include: - DB migrations - scaptraps!!!!! - config changes How to handle testing extensions? Will not have anytime to setup a second cluster dedicated to them. So we need to create wiki clones. Ryan talk about having a puppet class to install all prerequisites (db/memcache, squid etc) and let people eventually tweak it / configure it. Beta extras going to be wiki with specifics settings / extensions. Will be migrated to a new virtual cluster later. Roadmap: TODO: what is BetaLabs meant to be ? Talk with RobLa. Should be as close to production as possible. Think about it as the very last step before hitting production. Kind of a pre production cluster. Not for testing! People willing to tests should staging will be done on test / test2. Some tech doc at http://etherpad.wikimedia.org/BetaLabsDoc Browser testing should be to: 1) have WMF run tests continuously 2) random people to run tests locally 3) suite made public for others to improve it
 * get Beta labs working
 * implements "extras" system
 * dev instance, to share a work between different people. Will contain everything.
 * Make Beta labs preproduction only
 * test == staging
 * test2 == release candidate

Monday May 7th
People meeting, getting to know each other: Faidon, Sam, Ryan, Antoine. Chris arrival is tomorrow. Talk about overall architecture of the current labs. Lenghty talk about LVS. Real hardware does not have enough memory, so we are forced to have low * low priority * replicating production setup is not feasible for the short-term * maybe using LVS-Tun or nginx as a load-balancer * keep doing load-balancing on squid * check if memcache is evicting content (i.e. if the current allocation, 4GB, is not enough) * give more instances for memcache (blocked by labs hardware upgrade) Network communication between Labs and Prod. Something we want to prevent one day. CommonSettings.php and friends Apache DocumentRoot / htdoc => https://bugzilla.wikimedia.org/36646 (get ride of NFS share) Apache Configurations: When deploying puppet class Apache::service, restarting httpd complained about conf files missing: Workaround was to create an empty placeholder.conf file and sym link above names to it. AGREED Apache configuration do not change that much (68 revs in fenari local repo, last one 2 months ago or so). Maybe move to git? * find out if it's a priority * nginx proxy deployed using puppet => Not for now. OPENED https://bugzilla.wikimedia.org/36648 (replicate HTTPS architecture) Performance: * php-APC (opcode cache) seems to have solved a lot of the slowness * is squid really caching stuff? Seems so. AGREED to investigate that later on if that is really an issue. Seems good right now.
 * LVS
 * memcache
 * ACTION: list files containing private data
 * ACTION: create a git repo to hold php files
 * AGREED we can skip history of files since there is Domas autocommit that screw history
 * Easier to keep the private files in a repository local to fenari
 * ACTION write out the process of sending conf changes / mergin / deploying
 * Deploy will be just about running `git pull`, ran manually on labs just like on production.
 * Keep same path as production ( /usr/local/apache/common ????)
 * On labs configuration shared on Apache through NFS, for now.
 * Production files not needed on labs cluster. It is mostly about specific wiki we are not going to replicate.
 * For later. We will want to be able to easily setup new wikis running a specific extension on top of the existing ones. We might need to have to setup a specific HTDOC for each of the "feature" wikis.   Could just be a symlink.
 * en2.conf
 * foundation.conf
 * postrewrites.conf
 * redirects.conf
 * wikimedia.conf
 * www.wikipedia.conf
 * HTTPS