Beta Cluster/status

Last update on: 2012-11-30

2012-05-10
Chris, Sam, Antoine, Faidon, and Ryan met in San Francisco the week of May 7 to bootstrap work on this project. Current focus is getting media handling working smoothly.

2012-05-15
As of May 15th: 
 * Apaches instances have been build 100% using puppet classes, the old one will be removed. All queries (thumbs/regular text/bits) hits the applications apaches, upload.beta.wmflabs.org pointing to the IP address shared by all wikis.
 * MediaWiki logging is fine.
 * Blocker: /home/wikipedia needs a decent place with lot of disk space to host MediaWiki checkouts, MediaWiki logs and syslogs.
 * Blocker: no syslog-server yet, since it conflicts with a base class which is always installed.
 * MediaWiki configuration files in progress of being merged from prod to labs.

2012-05-20
Project is now a bit more on par with production status. 
 * A job runner has been setup, currently catching up with all the pending jobs. Apparently, that includes some video resizing for TimeMediaHandler.
 * All code has been updated to a recent version and all databases have been upgraded.
 * Uploading file should work again (as of May 17th)

2012-05-monthly
Chris McMahon, Sam Reed, Antoine Musso, Faidon Liambotis, and Ryan Lane met in San Francisco the week of May 7 to bootstrap work on this project, kickstarting a process of aligning the configuration with our production cluster. Apache web server instances are now completely configured automatically using Puppet classes. A few key Wikimedia configuration files that were previously managed via private Subversion repository are now managed in a public Git repository. Much work remains to make this a stable testing environment, which will continue in June. 

2012-06-25
TimedMediaHandler has been setup though transcoding is not operational yet, since that would require a fully functional job queue. We discovered that the version of Ubuntu currently used in production (Lucid) won’t work with TimedMedia Handler. As a result, Antoine and Faidon updated the Puppet configurations for the Apache web servers to run on the next generation Ubuntu (Precise).

Administrative tools have been setup closely following the way it is done in production. As an example beta, use the exact same workflow to update the l10n cache. We will work on fetching l10n updates from translatewiki.

2012-06-monthly
The primary focus of Beta cluster work in June was in service to TimedMediaHandler (TMH). TMH has been setup though transcoding is not operational yet, since that would require a fully functional job queue. The team discovered that the version of Ubuntu currently used in production (Lucid) won’t work with TimedMedia Handler. As a result, Antoine and Faidon updated the Puppet configurations for the Apache web servers to run on the next generation Ubuntu (Precise).

Administrative tools have been setup closely following the way it is done in production. For example, the Beta Cluster now uses the exact same workflow to update the l10n cache as we do in production. The team plans to further improve this by fetching l10n updates from translatewiki.

2012-07-16
Beginning of July, the labs instances have been migrated to some new powerful hardware enhancing the performances by an order of magnitude. Some instances have been unfortunately corrupted in the process but thanks to our extensive use of Puppet, replacement have been pretty fast.

Antoine written an overview of the beta cluster, still need to be amended with sections about how to update code and debugging issues.

2012-07-23
The MediaWiki code and extensions are now being updated on a regular basis. Petr Benan is starting implementing the IRC feed system for bots consumption. We received spammer attention, several counter measures have been applied such as the Captcha system enabled by Platonides and automatic blocking of known open proxies. The job queue system is being improved by Jan Gerber so it could fit in beta, that is a requisite for the Time Media Handler extension which would let us test video transcoding. Thumbnails are still not working correctly, a workaround is still being worked on.

2012-07-30
<section begin="2012-07-30"/>All beta instances are now running out of the shared /data/project directory provided by the labs infrastructure instead of an NFS instance. Platonides has setup Captcha for user creation to help prevent spam, some well know IP have been banned. Jan Gabber is successfully using the infrastructure to work on Timed Media Handler, especially the job system that will process the video transcoding. Finally Ryan Kaldari is using the beta to setup E2 extensions.<section end="2012-07-monthly"/>

2012-07-monthly
<section begin="2012-07-monthly"/>The beta cluster infrastructure is now mostly in our configuration change engine (puppet) and start being used by third parties. The Features team and Jan Gerber are now taking advantage of the beta cluster to stage change for production. We have set up Captcha and IP blocking to reduce the amount of spam being generated on the beta wikis. An overview document has been started to help introduce new people to the beta cluster.<section end="2012-07-monthly"/>

2012-08-03
<section begin="2012-08-03"/>This past week has been focusing on cleaning out the cluster and working with ops to finish up the housework. All instances are now working on new hardware thanks to Andrew Boggot and all make use of the project storage path (/data/project) which was upgraded by Ryan Lane to use the latest GlusterFS release.

Most obsoletes and experimental instances have been removed.

The |overall documentation has been expanded.

<section end="2012-08-03"/>

2012-08-31
<section begin="2012-08-31"/>The MediaWiki core and extensions are now automatically updating. The beta cluster is from now always using the very latest version published under the master branch of each repositories.<section end="2012-08-31"/>

2012-08-monthly
<section begin="2012-08-monthly"/>The MediaWiki core and its extensions are now automatically updating, and the beta cluster is now always using the very latest version published under the master branch of each Git repository.<section end="2012-08-monthly"/>

2012-09-24
<section begin="2012-09-24"/>bits.beta.wmflabs.org is now fully managed by puppet. It serves MediaWiki and its extensions assets as well as geographical lookup of IP addresses http://bits.beta.wmflabs.org/geoiplookup<section end="2012-09-24"/>

2012-09-monthly
<section begin="2012-09-monthly"/>In September, QA Lead Chris McMahon announced that the Beta cluster is a fit test environment: code is routinely deployed there ahead of production, the test environment emulates the production environment closely, and we can easily and reliably manipulate aspects of the test environment (configuration, permissions, etc.) for testing purposes. Also, bits.beta.wmflabs.org is now fully managed by puppet. It serves MediaWiki and its extensions assets, as well as geographical lookup of IP addresses. Some work remains to be done (performance tuning, configuration) but the infrastructure is in place for software testing and browser test automation.<section end="2012-09-monthly"/>

2012-10-monthly
<section begin="2012-10-monthly"/>The MediaWiki configuration on the beta cluster has still a few remaining live hacks that prevent it from being upgraded smoothly. The final bits have been tracked down and will need a final sprint.<section end="2012-10-monthly"/>

2012-11-06
<section begin="2012-11-06"/>We are working on getting Zuul in place so Jenkins can talk to Gerrit - that's Antoine's goal. The goal for the NL hackathon is to work on CI in general. We hope that the NL hackathon will aid in speeding Beta cluster work. Filipin is working on getting CloudBees into a slave for WMF's Jenkins installation -- that's one of the NL hackathon goals.<section end="2012-11-06"/>

2012-11-13
<section begin="2012-11-13"/>Deployed AFTv5 to beta cluster and New Page Patrol is being maintained there as well. Still working on issues of ongoing maintenance. <section end="2012-11-13"/>

2012-11-30
<section begin="2012-11-30"/>Beta played a role in handling a recent issue with a defect that escaped to production. Beta remains the primary host for AFTv5 testing, including browser test automation<section end="2012-11-30"/>