Beta Cluster/status

Last update on: 2012-08-31

2012-05-10
Chris, Sam, Antoine, Faidon, and Ryan met in San Francisco the week of May 7 to bootstrap work on this project. Current focus is getting media handling working smoothly.

2012-05-15
As of May 15th: 
 * Apaches instances have been build 100% using puppet classes, the old one will be removed. All queries (thumbs/regular text/bits) hits the applications apaches, upload.beta.wmflabs.org pointing to the IP address shared by all wikis.
 * MediaWiki logging is fine.
 * Blocker: /home/wikipedia needs a decent place with lot of disk space to host MediaWiki checkouts, MediaWiki logs and syslogs.
 * Blocker: no syslog-server yet, since it conflicts with a base class which is always installed.
 * MediaWiki configuration files in progress of being merged from prod to labs.

2012-05-20
Project is now a bit more on par with production status. 
 * A job runner has been setup, currently catching up with all the pending jobs. Apparently, that includes some video resizing for TimeMediaHandler.
 * All code has been updated to a recent version and all databases have been upgraded.
 * Uploading file should work again (as of May 17th)

2012-05-monthly
Chris McMahon, Sam Reed, Antoine Musso, Faidon Liambotis, and Ryan Lane met in San Francisco the week of May 7 to bootstrap work on this project, kickstarting a process of aligning the configuration with our production cluster. Apache web server instances are now completely configured automatically using Puppet classes. A few key Wikimedia configuration files that were previously managed via private Subversion repository are now managed in a public Git repository. Much work remains to make this a stable testing environment, which will continue in June. 

2012-06-25
TimedMediaHandler has been setup though transcoding is not operational yet, since that would require a fully functional job queue. We discovered that the version of Ubuntu currently used in production (Lucid) won’t work with TimedMedia Handler. As a result, Antoine and Faidon updated the Puppet configurations for the Apache web servers to run on the next generation Ubuntu (Precise).

Administrative tools have been setup closely following the way it is done in production. As an example beta, use the exact same workflow to update the l10n cache. We will work on fetching l10n updates from translatewiki.

2012-06-monthly
The primary focus of Beta cluster work in June was in service to TimedMediaHandler (TMH). TMH has been setup though transcoding is not operational yet, since that would require a fully functional job queue. The team discovered that the version of Ubuntu currently used in production (Lucid) won’t work with TimedMedia Handler. As a result, Antoine and Faidon updated the Puppet configurations for the Apache web servers to run on the next generation Ubuntu (Precise).

Administrative tools have been setup closely following the way it is done in production. For example, the Beta Cluster now uses the exact same workflow to update the l10n cache as we do in production. The team plans to further improve this by fetching l10n updates from translatewiki.

2012-07-16
Beginning of July, the labs instances have been migrated to some new powerful hardware enhancing the performances by an order of magnitude. Some instances have been unfortunately corrupted in the process but thanks to our extensive use of Puppet, replacement have been pretty fast.

Antoine written an overview of the beta cluster, still need to be amended with sections about how to update code and debugging issues.

2012-07-23
The MediaWiki code and extensions are now being updated on a regular basis. Petr Benan is starting implementing the IRC feed system for bots consumption. We received spammer attention, several counter measures have been applied such as the Captcha system enabled by Platonides and automatic blocking of known open proxies. The job queue system is being improved by Jan Gerber so it could fit in beta, that is a requisite for the Time Media Handler extension which would let us test video transcoding. Thumbnails are still not working correctly, a workaround is still being worked on.

2012-07-30
<section begin="2012-07-30"/>All beta instances are now running out of the shared /data/project directory provided by the labs infrastructure instead of an NFS instance. Platonides has setup Captcha for user creation to help prevent spam, some well know IP have been banned. Jan Gabber is successfully using the infrastructure to work on Timed Media Handler, especially the job system that will process the video transcoding. Finally Ryan Kaldari is using the beta to setup E2 extensions.<section end="2012-07-monthly"/>

2012-07-monthly
<section begin="2012-07-monthly"/>The beta cluster infrastructure is now mostly in our configuration change engine (puppet) and start being used by third parties. The Features team and Jan Gerber are now taking advantage of the beta cluster to stage change for production. We have set up Captcha and IP blocking to reduce the amount of spam being generated on the beta wikis. An overview document has been started to help introduce new people to the beta cluster.<section end="2012-07-monthly"/>

2012-08-03
<section begin="2012-08-03"/>This past week has been focusing on cleaning out the cluster and working with ops to finish up the housework. All instances are now working on new hardware thanks to Andrew Boggot and all make use of the project storage path (/data/project) which was upgraded by Ryan Lane to use the latest GlusterFS release.

Most obsoletes and experimental instances have been removed.

The |overall documentation has been expanded.

<section end="2012-08-03"/>

2012-08-31
<section begin="2012-08-31"/>The MediaWiki core and extensions are now automatically updating. The beta cluster is from now always using the very latest version published under the master branch of each repositories.<section end="2012-08-31"/>