Wikimedia Labs/Tool Labs/Needed Toolserver features

From MediaWiki.org
Jump to: navigation, search

In addition, please do not add new requirements to this page. Please open a bug in Bugzilla using this link.



This page contains a list of Toolserver features needed in Tool Labs.

Database replication/access[edit | edit source]

  • MySQL-access to the WMF-project-databases (minimum same table/row/field-scope like on the toolserver). Doing...
  • meta-database with updated information for all projects (like toolserver.wiki, toolserver.namespacename, toolserver.language)
    This is very important for the migration from TS of web tools which display a project selector. --Pietrodn (talk) 17:23, 30 May 2013 (UTC)
    Tools Labs has an equivalent meta_p.wiki (see bug 48626), but nothing yet for the other ones that I know of. —Pathoschild 22:14:16, 31 August 2013 (UTC)
  • Separate MySQL-server(-cluster) for user- and project-databases.
  • 1 MySQL-Server per cluster (s1-s7) where user- and project-databases can be created.
  • Minimum 1 MySQL-Server per cluster (s1-s7) where a live-copy of commons is running in parallel.
  • Minimum 1 MySQL-Server per cluster (s1-s7) where a live-copy of wikidata is running in parallel.
  • Support for short- and long-running queries (maximal run-time depending on the load up to few days).
    • CatScan and CatScan 2.0 are popular GUIs to the current databases. I used them extensively on Commons. Unfortunately both suffer from frequent timeouts and occasional "too many users using this tool" errors. Some standard maintenance queries (saved for easy access) that worked several years ago stopped working as number of images on Commons grew. --Jarekt (talk) 18:56, 7 March 2013 (UTC)
  • Mail-message-system about killed queries.
    • If the labs system is not as underpowered as toolserver, in terms of the database servers, a query killer may not be necessary. CBM (talk) 21:54, 20 December 2012 (UTC)
  • DNS-aliases for clusters (sql-s1.toolserver.org) and wikis (enwiki-p.rrdb.toolserver.org). YesY Done (xxwiki.labsdb)
  • replication-replag-reporting-system.
  • user_properties table https://bugzilla.wikimedia.org/show_bug.cgi?id=58196

Filesystem / shared storage[edit | edit source]

  • project-directories for working together. YesY Done Please see wikitech:Help:Shared_storage#Project_storage for documentation
  • home-directories. YesY Done Please see wikitech:Help:Shared_storage#Home_directory_storage for documentation
  • common-directory for wikimedia-dumps with an automatic update of them. YesY Done
  • common-directory for page-stats with an automatic update of them. -> status? planned? in place?
  • Quota limits on filesystem usage / method for users to check their quota usage YesY Done

Note: April 7th, 2013: The above items are marked as done, but WMF is currently working on a replacement of glusterfs. The features are there though and will be.

  • Which is now also YesY Done for the tools project, with the new NFS server. — MPelletier (WMF) (talk) 13:31, 26 April 2013 (UTC)

Question (copied here from the bottom of the page): Is there a way on Labs to easily share data across all projects? --Nemo 05:15, 26 September 2012 (UTC)

  • There is no technical hurdle to doing so (it would be fairly simple to share a NFS volume accross more than one project, for instance) but there are important considerations to keep in, the most sallient of which is that access to root on some projects is not particularly limited, so any files on a share volume might as well by considered publicly writable.

    That said, we might have a terminology confusion; "project" in the context of the Labs is a set of instances (virtual machines) sharing users and settings, of which the Tool Labs proper is one. If you meant between bots, or other tools, then that is provided by default and manageable through normal Unix file permissions. — MPelletier (WMF) (talk) 13:31, 26 April 2013 (UTC)

Languages[edit | edit source]

  • i18n for all (available) languages of this world. YesY Done (tsintuition is available on Tool Labs -- see https://github.com/Krinkle/TsIntuition/wiki/Documentation )
    • Can you clarify this? i18n for what specifically?--Ryan lane (talk) 21:06, 20 December 2012 (UTC)
    • Maybe this is a reference to the Toolserver Intuition framework? – Minh Nguyễn (talk, contribs) 06:38, 21 December 2012 (UTC)
    • asking again: What exactly does this mean and is it done with the Toolserver Intuition tool moving to Labs? Silke WMDE (talk) 16:09, 5 April 2013 (UTC)
  • Support for perl, c, python, c++, php, mono, java, tcl, bash-, ksh- and zsh-script (for (fast-)cgi and cli), Tons of libraries for each programming-language. YesY Done
    • Anything that comes with Ubuntu, or someone is willing to package (that is open source) is available.--Ryan lane (talk) 21:06, 20 December 2012 (UTC)

Web[edit | edit source]

  • Server/project for web tools and pages. YesY Done

Bug tracker[edit | edit source]

This is now an issue on bugzilla: bugzilla:58794


  • Bugreporting-site (issue and bug tracker like JIRA or bugzilla) with the possibility to create (sub-)projects that can administrate by an user/project.
    • preferably with conversion/migration from TS JIRA to the new one on labs

Open questions:

  • Do we know if an import from jira is possible or if jira's content is going to be lost?
  • Can this bugzilla instance also host user projects or "products"? E.g. like "DrTrigon's tools"? What about migration from JIRA? --DrTrigon (talk) 19:36, 24 January 2013 (UTC)
    1. Yes, every tool will be given a component for their own support on request.
    2. It is not clear that migration of data from Jira is feasible. There is no technical hurdle, both Jira and Bugzilla have APIs that would allow it, but someone would have to write a tool to fetch, filter, convert and recreate issues – this is a major undertaking. — MPelletier (WMF) (talk) 19:25, 8 April 2013 (UTC)
    • ACK. There are some reports on successful migrations, and there is even a Python client for JIRA somewhere, but this should be tested on some obsolete JIRA project first, and even then will probably require a lot of manual intervention. While most of the JIRA issues relating to Toolserver specifics aren't that interesting to archive, I think there's some historical value in them so if there is a reasonable way to migrate them, we should do that. --Tim Landscheidt 19:41, 8 April 2013 (UTC)
    • A quick search on google lets me tend to following scheme JIRA→XML→(adopt data)→XML→bugzilla. For this we need:
      • An xml export of jira data - how to get this? Who has the permission to do this? (TS admins?)
      • Somebody has to write a script that does the (adopt data) part - volunteers?
      • The new xml data have to be imported to bugzilla - how to do this? Who has the permission? (labs admins?) --DrTrigon (talk) 10:57, 22 September 2013 (UTC)

Version control[edit | edit source]

  • Git repositories YesY Done (people can already request Git repositories in Gerrit for labs stuff)
    • To clarify, I'm fine with people hosting their code in SVN. What I'm saying is that WMF won't provide a SVN hosting solution. If volunteers want to create an svn project in Labs, and run svn.wmflabs.org, that's fine.--Ryan lane (talk) 21:52, 20 December 2012 (UTC)

Logs/Stats[edit | edit source]

Which of the following items are crucial? Which are rather nice-to-have?

  • Access to (anonymized) web- and web-error-logs.
    • YesY Done for access logs, and partial for error logs. Complete error logs will require an update to Apache 2.4 which hasn't quite made its way to Ubuntu yet. It's in the plans for the Tool Labs, however. — MPelletier (WMF) (talk) 19:26, 8 April 2013 (UTC)
    • Doing... WMF legal says: discussion needed on anonymization procedures, what information is taken out
      • YesY Done After discussion with legal and consultation with the EFF, a set of suitable anonymization has been worked out. — MPelletier (WMF) (talk) 13:33, 26 April 2013 (UTC)
  • Server statistics, workload, status and else

Job-system[edit | edit source]

  • Job-system for executing of non-web-tasks, e.g. stable cron.YesY Done
    • crontab or else without "(CRON) CAN'T FORK (child_process): Not enough space." issue, please confer Cron#Timing and work load
    • may be add a queue system like SGE
  • Various resources like run-time, needed databases, free memory and user-slots.YesY Done
  • Automatic switch-over and moving in case of a server-failure.YesY Done
  • Possibility to let a non-root maintain the job-system (adding resources, killing jobs, etc. pp.).YesY Done

Backup[edit | edit source]

FYI: Any backup done by ops is for disaster recovery. So automated backups are not restorable by users without ops intervention. Users should make their own backups to the project storage. Silke WMDE (talk) 17:41, 5 April 2013 (UTC)

  • Automated daily backups for user- and project-directories for minimum 1 week.
    • What's the current status of automated backups? Planned? Already existing? Silke WMDE (talk) 10:33, 7 April 2013 (UTC)
  • including crontab and all other necessary configuration files YesY Done
    • I understand this is the users' task. Set it to done. Silke WMDE (talk) 10:33, 7 April 2013 (UTC)
  • backups of user databases
    • Backups of databases should likely be handled by users, and saved in project storage--Ryan lane (talk) 06:11, 26 September 2012 (UTC) It's possible that we'll do automatic backups of the user databases as well, assuming they are provided by a database service.--Ryan lane (talk) 01:29, 2 October 2012 (UTC)
  • A bit of news on this point: the current experimental new filesystem is slated to provide "time travel" snapshots at fixed intervals in the past (several hours, several days, possibly a few weeks). This provides a measure of "oops-correction" backups.

    Best practice, however, is to make certain that important code and configuration be stored in git to allow keeping the full history. — MPelletier (WMF) (talk) 19:29, 8 April 2013 (UTC)

Various[edit | edit source]

Mail: to do[edit | edit source]

See bugzilla:58796


Uncertain how this is going to happen, but planned to find a way. For WMF, this is a legal question rather than a technical one. Main question: Which domain? Silke WMDE (talk) 10:27, 7 April 2013 (UTC)

  • discussion at WMF legal Doing...
  • Mail-address for users.
  • Mail-address for projects.
  • Configurable Mail-forwarding for both.

What's the status here? Any progress? --DrTrigon (talk) 08:23, 26 October 2013 (UTC)

Monitoring: done[edit | edit source]

Licenses: Info[edit | edit source]

  • Free license-choice of the user for his/her tools.YesY Done
    • This is of course the case, as long as the license is an OSI approved license.--Ryan lane (talk) 21:09, 20 December 2012 (UTC)

access to instances: http/sftp[edit | edit source]

  • SFTP access to instances YesY Done (ssh access needed)
  • Simple setup to allow HTTP access to projects/instances (reverse proxy, port forwarding, public ip) YesY Done
    • There is an easy-to-use instance proxy for Labs instances. Documentation: wikitech:Proxy
    • YesY Done In the case of Tool Labs, this is actually easier: there is a directly accessible proxy for the tools that does not need setup on a per-tool basis. — MPelletier (WMF) (talk) 19:30, 8 April 2013 (UTC)

Updates[edit | edit source]

  • transparent updating of packages with security problems/bug YesY Done
    • All instances by default are configured for unattended-upgrades. A general overview/management system has been roughly discussed and would be nice to implement though. DamianZaremba (talk) 11:13, 27 September 2012 (UTC)

More[edit | edit source]

  • Local and auto-updated copies of:
    • Wikimedia XML Dumps YesY Done
      • This is accessible on every instance at /public/datasets
    • visits per page (pagecounts) and project (projectcounts)
    • same visits stats, in database format to allow direct querying
  • Misc. Toolserver features:
    • Support SGE to automatically defer starting of expensive processes based on current capacity and usage (qcronsub, qsub) https://wiki.toolserver.org/view/Job_scheduling#arguments_to_qsub/qcronsub
      • If the presented alternative has the same user interface, it shouldn't be a problem. For instance, people don't have an opinion about which of the SGE forks would be preferable.
      • YesY Done Technically, that's a default feature of the grid engine. — MPelletier (WMF) (talk) 13:17, 9 April 2013 (UTC)

End-user-support[edit | edit source]

  • 1 or more root with an understanding for user-problems.
    • We have 3 on-staff and many volunteers.--Ryan lane (talk) 21:10, 20 December 2012 (UTC)
    • Having a root who is *understanding* is particularly important. Many users are not experts in what they are doing, and need friendly help in solving problems or cleaning up messes they have created
  • Documentation for beginners and non-technicals.
  • take into account that former toolserver users might not be familiar with Ubuntu Linux servers.

Projects[edit | edit source]

  • Create generic project for web tools (Bug 40510) YesY Done
  • Create generic project for periodic/long-running bots YesY Done
    • It is not clear yet whether people should share instances or create their own. As they are unlikely to interfere with each other, the overhead of N linuxes may not be worth it. Instead it may be more useful to have 1 big instance, or a grid of instances but control distribution with SGE instead of manually.
      • 'production' tool labs should be locked down access wise and run with some form of scheduler, development should be shared instances for collaboration. DamianZaremba (talk) 11:13, 27 September 2012 (UTC)
        • Agreed. We could use SGE for this, but it really seems like overkill...--Ryan lane (talk) 01:29, 2 October 2012 (UTC)
          • Is there a "standard" way of job scheduling in Ubuntu shops? --Tim Landscheidt 12:27, 8 October 2012 (UTC)
  • The "production" Tool Labs has a more restrictive environment suited for stable tools that need high reliability, and uses gridengine for scheduling, this is the 'normal' destination for working tools. There is another project, 'bots', which is community maintained and more flexible in its architecture for more experimental tools and development. — MPelletier (WMF) (talk) 19:33, 8 April 2013 (UTC)

OSM[edit | edit source]

Moved to tracking bug bugzilla:58797


  • A Postgresql-server for OSM.
  • access to OSM database -> db server with 600 GB of solid state disk space
  • a few terabytes of diskspace to save the rendered map tiles (more or less the same space needed now on the toolserver -> how much is this?)
  • user databases
  • Render-server with enough space to host several layouts.
OSM is being added to WMF production. We'll have a test/dev version of OSM in Labs, but we have no plans on having a quasi-production like version of OSM in Labs.--Ryan lane (talk) 21:08, 20 December 2012 (UTC)
Without a live OSM-planet-db the labs fails the criteria to replace toolserver. --Kolossos 15:20, 23 December 2012 (UTC)
The plan is to clean up the styles and tools that are no longer maintained/needed, to move about 10 styles and 5 tools to Labs, to move one style to production.Silke WMDE (talk) 10:39, 6 April 2013 (UTC)
  • We have special projects on toolserver like WIWOSM that combine Wikipedia- and OSM-data in one script. This is only possible if a script can connect to both databases. Without such an feature we will lose this service in Wikipedia. --Kolossos 18:02, 1 February 2013 (UTC)
  • Other users created tools for translating places or to maintain street-lists. --Kolossos 19:12, 14 February 2013 (UTC)

To be discussed[edit | edit source]

  • will the database server be fast enough / What does it need to be fast enough?
  • Will rendering work in a VM? (Is this the plan to do it in a VM?)
  • definitions use of terms, avoid misunderstandings about what is "production" and what the Labs part is supposed to do. Please be as explicit as possible.

Render[edit | edit source]

bugzilla:58798

Things available to Render on the Toolserver which are needed for a transition to WikiLabs/ToolLabs

  • Database access like currently on the Toolserver
  • Possibility to build and run C and C++ apps
  • Possibility to build external C and C++ libraries (app dependencies) from source and install in user directory, without packaging
  • 64-Bit OS for sufficient address space
  • ~28GB physical RAM (comparable to ortelius)
  • CPU with at least 4 cores with at least 3 GHz, which can actually be used in parallel (comparable to ortelius)
  • We need to keep large amounts of data available in RAM. Regenerating it is expensive so we need a server with good uptimes. A VM or server which reboots or needs to be re-setup every month or so is not acceptable.
  • Ideally, we need to use a physical machine like we currently can. It is not yet clear whether a VM will do.
    • Render is a special case that will need direct attention – Silke will coordinate the necessary information exchange. — MPelletier (WMF) (talk) 15:59, 28 March 2013 (UTC)

WikiMiniAtlas[edit | edit source]

depends on:

    • the OSM database mirror being available
      • Once we have a copy in production, we could arrange replication from there. Max Semenik (talk) 12:53, 4 February 2013 (UTC)
    • the WIWOSM project (although dschwen could proxy that from the TS)
      • Proxying is not allowed per our terms of use. DamianZaremba (talk) 11:13, 27 September 2012 (UTC)
      • If we're looking to replace TS, then we need to make this available.--Ryan lane (talk) 01:29, 2 October 2012 (UTC)
    • Dispenser's coordinate extraction database (GHEL)
      • This should be obsoleted with GeoData. Max Semenik (talk) 12:53, 4 February 2013 (UTC)

Other project-related stuff[edit | edit source]

  • Replicate or transfer MMP (multimaintainer projects) from Toolserver
    • Ryan says: "They can be created in LDAP by making a labsconsole account. Additionally, unless the account needs to directly log in via ssh, there's no request process needed for the user to be used. Alternatively, the user could be created as a system account via puppet."
    • What's that supposed to mean? The migration of (multi-maintainer) projects is the goal of this list, not one of its means. --Tim Landscheidt 12:27, 8 October 2012 (UTC)
    • YesY Done (implicitly). On Tool Labs, every tool is multi-maintainer even if there is only one current maintainer; there is no difference between tools with one maintainer or more. — MPelletier (WMF) (talk) 15:59, 8 April 2013 (UTC)
  • servlets YesY Done -- the Tool Labs webgrid can also run Tomcat servers.
    • Need more information here; which application server(s) is/are in use, and which tool needs what? — MPelletier (WMF) (talk) 16:18, 8 April 2013 (UTC)
  • Localisation for tools on translatewiki.net
  • Per-project optional svn and/or git repo
    • For "Tool Labs" that is, since in "WikiDev Labs" this is mandatory workflow
    • Should versioning really be optional? Even for tools, is it ever a good idea not to have a repo? I'd rather improve Git/Gerrit usability and integration. Eloquence (talk) 21:07, 25 September 2012 (UTC)
      • I think it would be hard to enforce the use of source control, unless someone was policing things, or if we required it to deploy tools (maybe via git-deploy?). Using the deployment system for this actually may be the easiest way to enforce this.--Ryan lane (talk) 22:25, 25 September 2012 (UTC)
    • Do we want to have a a bare git server for wmflabs (like there svn.toolserver.org), or do it in Gerrit? Or maybe allow any git url so that users can store it where they like (be it gerrit.wikimedia.org, github.com etc.). Krinkle (talk) 02:42, 26 September 2012 (UTC)
      • No, not another server that someone has to maintain. Gerrit or GitHub are fine. --Tim Landscheidt 12:27, 8 October 2012 (UTC)
    • YesY Done Gerrit is available on simple request for any tool maintainer; we could provide a Tool Labs-specific subversion server if specifically needed (e.g. for tools that do automatic version control). — MPelletier (WMF) (talk) 16:19, 8 April 2013 (UTC)
  • Mysql query killer (especially for queries to the replicated wmf wiki databases)
    • On the roadmap - relatively easy change.--Ryan lane (talk) 01:29, 2 October 2012 (UTC)
  • Need a clear story for access to bot credentials - should ownership of bot accounts be shared between all users of the server, or should each bot account be owned by an individual user? If the latter, how will projects get access to the credentials? Note that there are some bad practices going on that might affect this: some tools contain hardcoded bot credentials, and some people share passwords between their user accounts and bot accounts.
    • YesY Done (insofar as the support is there); there is no need to share credential to user accounts given that the Tool Labs natively supports multi-maintainer projects. Permission for multiple maintainers to operate bots on a single set of credentials (for the bot itself) is a requirement of the target project(s) and will need to be discussed with their communities. — MPelletier (WMF) (talk) 16:52, 5 April 2013 (UTC)
  • Server statistics, workload, status and else

Permanent blockers for migration of some projects[edit | edit source]

  • license problems ("i wrote code at work for my company and reuse parts for my bot framework. I have not the right to declare this code as open source which is needed by labs policy") N Not done
      • This isn't going to change, we should be open and collaborative. Restricting access to code goes against that. DamianZaremba (talk) 11:13, 27 September 2012 (UTC)
        • +1. We will only ever allow open source and open content. I honestly wonder how you are legally using proprietary software on TS.--Ryan lane (talk) 01:29, 2 October 2012 (UTC)
          • You cannot simplify that much. E.g. I am operating a bot using a lot of CV (computer vision) algorithms. Some of them WE (wikipedia/wikimedia community) are just ALLOWED to use the code because I kindly asked the owners. But you cannot assume they to agree to change their license just because of TS to labs migration! So this could indeed become a blocker and the "irony" is it affects the mainly the newer algorithms (the interesting ones since they are state of the art - which is a pitty). To summarize; it is a good thing to force open source but we should not be too strict here - as a few people mentioned this was a strength of the TS as I see it. Greetings --DrTrigon (talk) 11:21, 1 January 2013 (UTC)
          • You're allowed to use the software for now. Is there a license agreement? What happens if you no longer want to support the bot and someone else needs to maintain it? Do they have permission? What if someone copied the software and publishes it? Who's liable? We do have a guideline for exceptions when using closed source software, but it would require proper paperwork (like license agreements from the owner, etc). There's a very good reason we have this rule, and it's insane that TS doesn't.--Ryan lane (talk) 22:49, 4 January 2013 (UTC)

Bots and webservices project[edit | edit source]

  • Public webserver for logs YesY Done
    • CGI (perl, python, ...) for helper tools YesY Done
    • Text editors: vim (probably some others like nano etc.) YesY Done
    • Shells - bash, ksh, csh, tclsh YesY Done
    • Libraries - libtcl YesY Done (needs puppetize)
    • imagemagick YesY Done
    • Php, java YesY Done (needs puppetize)
      • Update jdk and jre to Java 7. YesY Done We're currently using the old Java 6. -FASTILY (TALK) 00:04, 18 March 2013 (UTC)
        • Who is "we"? Toolserver? Afaik, Labs/Tool Labs is running on Ubuntu precise currently, so it leaves the choice between both version. I set this to done. Silke WMDE (talk) 10:10, 7 April 2013 (UTC)
  • Per-project optional custom MySQL databases YesY Done (bots-sql1, -sql2 ...) Petrb (talk) 07:36, 18 October 2012 (UTC)
    • This is on the roadmap. Will likely come some time after replicated databases.
    • Basically similar to the mounted project-wide storage. Not on any individual instance, accessible from within each project instance.
    • Toolserver also has the principle of public databases that can be read from other projects. This is probably something we'd want too, so that projects can build on top of each other.
      • The current concept behind this in Labs is that all databases will be accessible from all instances. Creation/modification/grants/etc. will be handled by sysadmins in the project that owns the database.--Ryan lane (talk) 06:09, 26 September 2012 (UTC)
  • Some packages to install
    • Support for mono (c#), python, svn, git YesY Done
    • Perl (needs full cpan upgrade and MediaWiki:: packages) YesY Done (needs puppetize)
      • Why does it need cpan at all? I understand that on Toolserver with its limited admin capacity it's hard to wait for proper installation, but I assume that in Labs there will probably always be a root around who can build a proper package on the fly. --Tim Landscheidt 12:27, 8 October 2012 (UTC)
      • Is this a to do or can it be deleted from this list??? Silke WMDE (talk) 10:14, 7 April 2013 (UTC)
        • It's actually a N Not done as strictly stated; no external repositories are supported from labs, but anything available through CPAN can be made into .deb that can be added to the WMF repo. — MPelletier (WMF) (talk) 17:13, 8 April 2013 (UTC)
          • Is it a good idea to get a CPAN upgrade to current packages or do you want to go with the set which happened to be in whatever state they were during some kernel or UI refresh release? 70.59.16.167 06:46, 26 April 2013 (UTC)
            • I think this has to be examined on a package-by-package basis; As a rule, I would generally tend toward keeping the distribution's versions since those are the better tested ones, but there may well be good reasons to bump up to a more recent version. — MPelletier (WMF) (talk) 13:17, 26 April 2013 (UTC)
              • What exactly do you mean by "better tested"? I think you mean "greater exposure" but each CPAN package has its own unit tests, so strictly speaking you are factually incorrect. 209.235.2.8 02:03, 30 May 2013 (UTC)
    • Flup[1] (Python WSGI)
    • All the packages requested by TS users and now taken as a given; see e.g. tswiki:User:Dab/Debian-Packages and install requests on JIRA – someone make a list
      • Question: What about packages from Debian testing in Labs? Are there repos for newer Ubuntu releases than the latest LTS release? is it the users' responsibility to install them without the package management? How is this planned? (On TS, OSM development requires some packages from testing.) Silke WMDE (talk) 10:22, 7 April 2013 (UTC)
        • As a rule, we have access to all of Precise and can pull and backport Raring packages when apropriate. When all else fails, a sysadmin can make a package and add it to our own repo (that would be the preferred solution in the scenarios you describe).

          There is no plan to allow the tool maintainers to directly install packages, but they can always install any software locally if they need it. — MPelletier (WMF) (talk) 15:37, 8 April 2013 (UTC)

Links[edit | edit source]