Wikimedia Labs
| Group: | Operations |
| Start: | |
| End: | |
| Management: | Ryan Lane |
| Team: |
Wikimedia Labs is a two-part project aimed at improving the volunteer involvement in operations and software development. The first part of this project is Test/Dev Labs, and the second part is Tool Labs.
As of 2 November 2011, it's in a closed beta -- if you want an account, leave a note at this page or ask us in IRC (#wikimedia-labsconnect). If you want to use it to do MediaWiki development, tools, or analytics, we'll probably say yes and set you up.
Wikimedia is building a sandbox server environment based on a virtualization cluster for supporting development and operations engineering by both staff and selected volunteers. This will allow access to necessary system resources, software and configuration without compromising the security of the production server clusters, in a time and cost effective way.
Contents |
[edit] Status
-
[edit status] • [add new]2012-03-09: Added gluster storage cluster for project storage. Roughly 70TB of storage is available. Projects are quota'd at 300GB by default. Quota can be increased by request.
[edit] Feature justification
There is a constant demand on new server resources for temporary usage in testing, development and operations engineering projects. Right now this often requires the purchase and deployment of new servers and custom setup, configuration and access control to maintain security of the network. The procurement of new equipment costs money, time and staff resources, which is not always justified by the duration or importance of the projects. The availability of a virtualization cluster that allows quick and automatic setup of new virtual servers after a simple moderation process would solve these problems.
The Wikimedia Foundation also wants to enable and encourage volunteers to contribute to Web Site Operations more actively, in order to reflect the core values of the broader movement and to make maximal use of the technical talents within the community. The continued growth of the movement and its ambitions is a constant challenge for the organization's capacity, including the Operations team. Active involvement of volunteers can support the Operations staff and allows identification of technical talent and skills for hiring purposes.
Wikimedia Operations is already unusually open, enabling the general public to both learn and brainstorm about it. For example, nearly all documentation, statistics and monitoring overview information concerning Wikimedia Operations is publicly available. However, giving the public and/or specific interested volunteers access to necessary software, tools and configurations without disclosing strictly private information (such as passwords) remains a challenge. We think that the availability of a sandbox environment consisting of virtual servers will be an important step in the right direction.
[edit] Test/Dev Labs
[edit] Goals
There are three major goals of Test/Dev Labs:
- To improve collaboration between staff and volunteer developers.
- Have a process for providing higher levels of access for people who are not on the paid operations team staff. This includes staff developers, and all volunteers. We'd like to have an environment where anyone can eventually become root, even on our production cluster.
- Have an environment where we can test major changes before we deploy them to the live site.
The last point requires some background: we currently have no testing environment. Most architecture changes happen live. To bring higher uptime to the sites, it's important to have a controlled non-production environment to test changes.
[edit] Achieving these goals
We can achieve the goals by providing liberal access to an environment that is a clone of our production environment. In this environment it should be possible to add new architecture without affecting production, or the production clone. Users will be able to make root level changes without having root, and they will be able to have these changes implemented on the production cluster after favorable review. Here's how we'll go about it:
- Create a default OpenStack Compute (nova) project (testlabs)
- Build a clone of the production cluster in this project
- Move our puppet configuration into a public git repo
- Have two branches: production, and test; the test branch will control the clone, and production will control the production cluster
- Give liberal access to this clone. Everyone that has commit access will have shell access on the clone. Anyone wanting to volunteer as an operations engineer will also have shell access. Higher level permissions will be assigned via groups.
- Create new OpenStack projects for community or foundation initiatives that change the architecture
- Project owners will be able to create instances inside of the project
- Project owners will have full root access inside of the instances, and will be able to add others into the projects, with varying levels of access.
- Users will be able to build out infrastructure inside of the project, that can then be formalized, via puppet manifests, templates, and files
- Users will be able to check out the puppet configuration from the git repo, make modifications to add their new infrastructure, and make a merge request to push them back into the test/dev infrastructure
- Once the changes have been merged into test/dev, they'll be tested
- After the changes have been tested, a merge request can be done to the production branch.
- The changes will be reviewed for production, and if approved, can then be pushed directly to production systems
This is treating operations work like a software developer project. This lets us open access to ops like we currently do with MediaWiki.
Furthermore, we'll also be providing easy access to create new wikis, using either deployment or trunk branches, with preseeded content for testing. Access to these wikis will be controlled via groups, which are also OpenStack projects.
[edit] Tool Labs
The design for Tool Labs is still in the very early phases; it may change drastically as the project ramp up begins. The first target of Tool Labs is to be a replacement for Toolserver.
[edit] Goals
- Provide an environment for rapid prototyping
- Close to production, but simplified
- Easy deployment of MediaWiki
- Provide a ramp for new developers
- Tools and extensions can easily move to Test/Dev Labs
- Provide a location for bot authors to maintain and run their bots
- Provide a location for analytics work
[edit] Achieving these goals
- Create a production-like environment with the difficulties removed
- For instance, MediaWiki instances will not have Squid/Varnish in front. It won't be broken up into multiple clusters. It will have popular development tools available upon creation. It should handle web server configuration, and MediaWiki instance creation
- Like Toolserver, this environment will have replicated databases, and will have database servers available for users to create their own databases
- Anonymized log data should also be provided in this environment to assist with analytics and stats work
- Accounts for both Labs environments will be the same. To move to Test/Dev Labs, one will simply need to be added to another OpenStack project
A key goal of Tool Labs is to provide an easy development environment meant to be used as a ramp to the Test/Dev Labs environment. The goal is to facilitate the rapid development of software that we can then identify as something target-able for production support. Software can be moved from Tool Labs to Test/Dev Labs, formalized, and then deployed to production.
[edit] Implementation
The architecture is described on Wikitech. The software for controlling this environment is implemented as a MediaWiki extension, and is described on mediawiki.org.
[edit] Proposals
- Development Process
- Shared home directories per project (done)
- Puppet learning mode
- Toolserver features wanted (contains even some features which are not on toolserver)
- Deployment privilege separation what is this?
- Install wikitech:Pentaho
- Find/test OTRS replacement, or upgrade/puppetize OTRS (with security patches)
- Create a bot running infrastructure (partially done - test / "production" not done)
- Puppetize PDF server
- Create a log bot for the #wikimedia-labs channel (done)
- Package adminbot, with an init script
- Puppet repository branch per project or instance
- Nagios management without exported puppet resources (done)
- Kerberos auth using OpenDJ as the principal store
- Reverse proxy for web services
- Package gerrit
- Add puppet syntax highlighting to vim
- Create shared sql service for all projects
- Package and puppetize lilurl for use as a url shortening service
- Write Documentation for console
- Fix puppet repo so that it runs a complete first run of the puppet catalog on instance creation for all services
- Java App stack
- Package JDK 1.6
- Apache Ant
- Maven
- Tomcat and Jetty App Servers
- Apache Solr
- Hadoop
[edit] Roadmap
- Move instance storage to Gluster on the compute nodes (done)
- Provide a volume storage solution
- Blocker for any project requiring more than 20-40GB storage
- It's possible to allocate more storage temporarily, but it's dangerous. s1 instance types pull storage from instance storage, which is limited.
- Gluster? iSCSI via NetApp? iSCSI to an instance that shares via Gluster in instances?
- Gluster, ideally
- Blocker for any project requiring more than 20-40GB storage
- Configure quotas for NFS home directories
- Fix sudo policy for non-testlabs projects (done)
- Provide separate puppet branches per project, where that project's instances are managed by that puppet branch
- Migrate bastion to its own project, or an ops only project (done)
- Migrate home directories to its own project, so an ops only project
- Move virt1 services to a misc server, and move virt1 into compute cluster as a compute node, then increase gluster instance storage size (done)
- Configure clone of production cluster (mostly done in deployment-prep - needs to be puppetized!)
- Configure LVS
- Temporarily, at least, configure LVS to use a single IP for all NAT'd IP addresses
- Configure Squid
- Configure Varnish
- Configure Apache
- Configure MySQL
- Configure Memcached (done)
- Configure HTTPS
- Configure Search
- Configure PDF
- Configure LVS
- Enable network-node per compute-node (important for performance!)
- Write wiki creation scripts for use with multiple projects and different mediawiki versions
- Enable database replication
- Tungsten and MariaDB with LDAP authentication?
- Create an all-in-one mediawiki instance based on deployment branch
- Integrate RT with Labs LDAP
- Make a public RT queue for Labs
- Switch web authentication method to SAML, using OpenAM or SimpleSAMLphp
[edit] Documents
- Terms of use
- Agreement to disclosure of personally identifiable information
- Account creation text
- Things to fix in beta
[edit] Communications
- #wikimedia-labs IRC channel on Freenode
- labs-l mailing list