Wikimedia Labs/Toolserver features wanted in Tool Labs

Below is a list of features of toolserver which would be cool to have on labs too:


 * Public webserver for logs (done)
 * Access to production db (read-only / replicated) (high priority)
 * Encrypted home folders
 * Some packages to install
 * Support for mono (c#), python, svn, git (done)
 * Text editors: vim (probably some others like nano etc.) (done)
 * Shells - bash, ksh, csh, tclsh (done)
 * Libraries - libtcl (done, needs puppetize)
 * Php, java (done, needs puppetize)
 * Home directories
 * What is meant by this? Labs is meant to be a collaborative, community maintained environment. I specifically want to avoid the Toolserver way of individualizing everything. If a user leaves, their bot, or tool, should very easily be transferable to another user. We tend to opt for using project storage (/data/project) rather than home directories. It's also preferable to run things as service users rather than individualized users, though that isn't a requirement.--Ryan lane (talk) 22:17, 25 September 2012 (UTC)
 * I should also mention that we already have per-project home directories, where the directories are accessible to every instance in a project. It's possible to do things the toolserver way, but it's a very limiting way of doing things.--Ryan lane (talk) 22:17, 25 September 2012 (UTC)
 * Lets cross this, it isn't really relevant in labs. Sure every ssh user has a home directory, that's standard Linux. But for storing applications and databases, this must be stored elsewhere (on the mounted project storage, not in a home directory, not on the instance itself, he instance itself needs to be recyclable from puppet). Krinkle (talk) 02:42, 26 September 2012 (UTC)
 * Central per-user directory mounted on all instances within a project
 * Again. Let's try to avoid per-user things. We have shared storage at /data/project that is accessible by everyone in a project. It's possible to lock this down by file permissions, even to per-user, but it's way better to make things owned by a service user (and control access to that user), or to have things owned by the project group.--Ryan lane (talk) 22:18, 25 September 2012 (UTC)
 * Can someone explain the difference between this and home directories?--Ryan lane (talk) 22:18, 25 September 2012 (UTC)
 * Per-project optional custom MySQL databases
 * This is on the roadmap. Will likely come some time after replicated databases.
 * Basically similar to the mounted project-wide storage. Not on any individual instance, accessible from within each project instance.
 * Toolserver also has the principle of public databases that can be read from other projects. This is probably something we'd want too, so that projects can build on top of each other.
 * Mysql query killer (especially for queries to the replicated wmf wiki databases)
 * Per-project optional svn and/or git repo
 * For "Tool Labs" that is, since in "WikiDev Labs" this is mandatory workflow
 * Should versioning really be optional? Even for tools, is it ever a good idea not to have a repo? I'd rather improve Git/Gerrit usability and integration. Eloquence (talk) 21:07, 25 September 2012 (UTC)
 * I think it would be hard to enforce the use of source control, unless someone was policing things, or if we required it to deploy tools (maybe via git-deploy?). Using the deployment system for this actually may be the easiest way to enforce this.--Ryan lane (talk) 22:25, 25 September 2012 (UTC)
 * Do we want to have a a bare git server for wmflabs (like there svn.toolserver.org), or do it in Gerrit? Or maybe allow any git url so that users can store it where they like (be it gerrit.wikimedia.org, github.com etc.). Krinkle (talk) 02:42, 26 September 2012 (UTC)
 * Backup of home directories and user databases
 * Create generic project for web tools (Bug 40510) (done)
 * Create generic project for periodic/long-running bots (done)
 * It is not clear yet whether people should share instances or create their own. As they are unlikely to interfere with each other, the overhead of N linuxes may not be worth it. Instead it may be more useful to have 1 big instance, or a grid of instances but control distribution with SGE instead of manually.
 * Local and auto-updated copies of:
 * Wikimedia XML Dumps
 * visits per page (pagecounts)
 * visits per project (projectcounts)
 * This is already accessible on every instance at /public/datasets
 * Simple setup to allow HTTP access to projects/instances (reverse proxy, port forwarding, public ip)
 * Misc. Toolserver features:
 * Support SGE to automatically defer starting of expensive processes based on current capacity and usage (qcronsub, qsub) https://wiki.toolserver.org/view/Job_scheduling#arguments_to_qsub/qcronsub