Wikimedia Labs

Wikimedia Labs is a two-part project aimed at helping volunteers get involved in Wikimedia operations and software development. The first part of this project is Test/Dev Labs, and the second part is Tool Labs. For the roadmap, please see Wikimedia Labs's goals and planned milestones.

It's in a closed beta -- if you want an account to do MediaWiki development, tools, or analytics, request developer access (the labs infrastructure uses the same LDAP login) or ask us in IRC. We'll get you an account within 1 business day.

Wikimedia Labs is a sandbox server environment, based on a virtualization cluster, for supporting development and operations engineering by both staff and volunteers. This will allow access to necessary system resources, software and configuration, without compromising the security of the production server clusters, and in a time- and cost-effective way.



Feature justification
There is a constant demand on new server resources for temporary usage in testing, development and operations engineering projects. Right now this often requires the purchase and deployment of new servers and custom setup, configuration and access control to maintain security of the network. The procurement of new equipment costs money, time and staff resources, which is not always justified by the duration or importance of the projects. The availability of a virtualization cluster that allows quick and automatic setup of new virtual servers after a simple moderation process would solve these problems.

The Wikimedia Foundation also wants to enable and encourage volunteers to contribute to Web Site Operations more actively, in order to reflect the core values of the broader movement and to make maximal use of the technical talents within the community. The continued growth of the movement and its ambitions is a constant challenge for the organization's capacity, including the Operations team. Active involvement of volunteers can support the Operations staff and allows identification of technical talent and skills for hiring purposes.

Wikimedia Operations is already unusually open, enabling the general public to both learn and brainstorm about it. For example, nearly all documentation, statistics and monitoring overview information concerning Wikimedia Operations is publicly available. However, giving the public and/or specific interested volunteers access to necessary software, tools and configurations without disclosing strictly private information (such as passwords) remains a challenge. We think that the availability of a sandbox environment consisting of virtual servers will be an important step in the right direction.

Goals
There are three major goals of Test/Dev Labs:


 * 1) To improve collaboration between staff and volunteer developers.
 * 2) Have a process for providing higher levels of access for people who are not on the paid operations team staff. This includes staff developers, and all volunteers. We'd like to have an environment where anyone can eventually become root, even on our production cluster.
 * 3) Have an environment where we can test major changes before we deploy them to the live site.

The last point requires some background: we had no testing environment. Most architecture changes happen live. To bring higher uptime to the sites, it's important to have a controlled non-production environment to test changes. We've started creating this: the Beta cluster. But even beyond the Beta cluster, anyone can get root on a Labs VM or set of VMs to install or test arbitrary software/services, and get them ready for production deployment.

Achieving these goals
We can achieve the goals by providing liberal access to an environment that is a clone of our production environment. In this environment it should be possible to add new architecture without affecting production, or the production clone. Users will be able to make root level changes without having root, and they will be able to have these changes implemented on the production cluster after favorable review. Here's how we'll go about it:


 * 1) Create a default OpenStack Compute (nova) project (testlabs)
 * 2) Build a clone of the production cluster in this project
 * 3) Move our puppet configuration into a public git repo
 * 4) * Have two branches: production, and test; the test branch will control the clone, and production will control the production cluster
 * 5) Give liberal access to this clone. Everyone that has commit access will have shell access on the clone. Anyone wanting to volunteer as an operations engineer will also have shell access. Higher level permissions will be assigned via groups.
 * 6) Create new OpenStack projects for community or foundation initiatives that change the architecture
 * 7) Project owners will be able to create instances inside of the project
 * 8) Project owners will have full root access inside of the instances, and will be able to add others into the projects, with varying levels of access.
 * 9) Users will be able to build out infrastructure inside of the project, that can then be formalized, via puppet manifests, templates, and files
 * 10) Users will be able to check out the puppet configuration from the git repo, make modifications to add their new infrastructure, and make a merge request to push them back into the test/dev infrastructure
 * 11) Once the changes have been merged into test/dev, they'll be tested
 * 12) After the changes have been tested, a merge request can be done to the production branch.
 * 13) The changes will be reviewed for production, and if approved, can then be pushed directly to production systems

This is treating operations work like a software developer project. This lets us open access to ops like we currently do with MediaWiki.

Furthermore, we'll also be providing easy access to create new wikis, using either deployment or trunk branches, with preseeded content for testing. Access to these wikis will be controlled via groups, which are also OpenStack projects.

Tool Labs
The design for Tool Labs is still in the very early phases; it may change drastically as the project ramp up begins in January 2013. The first target of Tool Labs is to be a replacement for Toolserver; see Wikimedia Labs/Toolserver features wanted in Tool Labs.

Goals

 * Provide an environment for rapid prototyping
 * Close to production, but simplified
 * Easy deployment of MediaWiki
 * Provide a ramp for new developers
 * Tools and extensions can easily move to Test/Dev Labs
 * Provide a location for bot authors to maintain and run their bots (in progress at bots.wmflabs.org)
 * Provide a location for analytics work

Achieving these goals

 * 1) Create a production-like environment with the difficulties removed
 * 2) * For instance, MediaWiki instances will not have Squid/Varnish in front. It won't be broken up into multiple clusters. It will have popular development tools available upon creation. It should handle web server configuration, and MediaWiki instance creation
 * 3) * Like Toolserver, this environment will have replicated databases, and will have database servers available for users to create their own databases
 * 4) * Anonymized log data should also be provided in this environment to assist with analytics and stats work
 * 5) Accounts for both Labs environments will be the same. To move to Test/Dev Labs, one will simply need to be added to another OpenStack project

A key goal of Tool Labs is to provide an easy development environment meant to be used as a ramp to the Test/Dev Labs environment. The goal is to facilitate the rapid development of software that we can then identify as something target-able for production support. Software can be moved from Tool Labs to Test/Dev Labs, formalized, and then deployed to production.

Implementation
The architecture is described on Wikitech. The software for controlling this environment is implemented as a MediaWiki extension, and is described on mediawiki.org.

TODO
For the roadmap, please see Wikimedia Labs's goals and planned milestones (schedule).

We'd love help with all of the below!


 * Enable database replication - Ryan hopes to get this done by the end of November or December 2012
 * Tungsten and MariaDB?
 * Configure clone of production cluster (mostly done in deployment-prep - needs to be puppetized!)
 * Configure LVS
 * Temporarily, at least, configure LVS to use a single IP for all NAT'd IP addresses
 * Configure Squid
 * Configure Varnish
 * Configure Apache
 * Configure MySQL
 * Configure HTTPS
 * Configure Search
 * Configure PDF
 * Enable network-node per compute-node (important for performance!)
 * Write wiki creation scripts for use with multiple projects and different MediaWiki versions
 * Likely just MySQL with filters
 * Create an all-in-one MediaWiki instance based on deployment branch
 * Integrate RT with Labs LDAP
 * Make a public RT queue for Labs
 * Switch web authentication method to SAML, using OpenAM or SimpleSAMLphp
 * an API in Labsconsole that can do OpenStreetMap stuff
 * Add MediaWiki support to Nova done
 * Puppetize deployment-prep done
 * Create a second zone in eqiad (to be finished by December 2013)
 * User databases
 * if we're going to hit user databases by end of March 2013, we likely need to put some dev work into salt; salt-api was pushed to GitHub mid-September, so some of the work has begun on their side, but we need to add keystone auth to that API
 * Push OpenStackManager changes to show SSH fingerprints for instances (not a high-priority item, will probably be December 2012 or later)
 * Add DNS support to Nova (in essex, but we must migrate to using it)
 * Add Gluster support to Nova (Gluster plugin is mostly written but we don't really have plugin support until Folsom, so will happen in Q3 when we are ready to upgrade to Folsom)

Proposals

 * Development Process
 * Shared home directories per project (done)
 * Puppet learning mode
 * Toolserver features wanted (contains even some features which are not on toolserver)
 * Deployment privilege separation what is this?
 * Install Pentaho
 * Find/test OTRS replacement, or upgrade/puppetize OTRS (with security patches)
 * Create a bot running infrastructure (partially done - test / "production" not done)
 * Puppetize PDF server
 * Package mwlib dependencies
 * Create an init script (preferably upstart) for mwlib daemon
 * Mentions of init script here (wikitech) but last I heard not in use. Peachey88 05:38, 21 December 2011 (UTC)
 * Create a log bot for the #wikimedia-labs channel (done)
 * Package adminbot, with an init script (done)
 * Puppet repository branch per project or instance (done)
 * Nagios management without exported puppet resources (done)
 * Kerberos auth using OpenDJ as the principal store
 * Reverse proxy for web services
 * Package gerrit (done)
 * Add puppet syntax highlighting to vim
 * Create shared sql service for all projects
 * Package and puppetize lilurl for use as a url shortening service
 * Write Documentation for console
 * Fix puppet repo so that it runs a complete first run of the puppet catalog on instance creation for all services
 * Java App stack
 * Package JDK 1.6 (confused about this, can we not use openjdk or sun's packaged jdk?)
 * Apache Ant
 * Maven
 * Tomcat and Jetty App Servers
 * Apache Solr
 * Hadoop
 * Per-project saltstack remote execution
 * Integrate a global queuing system (to replace tools' use of SGE when they migrate from the toolserver)

Documents

 * Status updates


 * /Terms of use/
 * /Agreement to disclosure of personally identifiable information/
 * /Account creation text/
 * /Things to fix in beta/

Communications

 * IRC channel on Freenode
 * labs-l mailing list
 * inventory of Labs's total capacity as of late September 2012 (compute, database and storage nodes)