Wikimedia Labs/Tool Labs/TODO

Jump to navigation Jump to search

Here is the current list of things which are known to be needed for the Tools project, with notes and status. In no particular order, and probably incomplete at this early stage.

To clarify, a tool here means a single, self-contained bot, webtool, or other system and all its dependencies. They are the unit of management, each having its own (set of) maintainer. The project is the Labs project where all the tools are (there are two, at the moment, "tools" and "bots").

Feature / Requirement Implementation Dependencies Status / Comments
Scheduling and job management Open Grid Scheduler[1] is a current candidate Yes Done The grid is working and fully functional, with easier-to-use tools for the typical cases as well.
Unification of labs project for tools The final implementation is to deprecate 'webtools' and use 'bots' as the "experimental/flexible" development project, and "tools" as the official Toolserver-replacement home for stable tools. Yes Done [2]
Tool management Unix groups, shared storage Yes Done Tool management is accomplished through the new service group interface of Wikitech. Tool Labs user can create and manage tool accounts without sysadmin intervention (including managing the list of maintainers).
Database replication General Labs replication Yes Done, and there was much rejoicing. Yeay.
Tool/User databases Yes Done They currently live on a project-local instance until database replication is available, however. They will be transparently moved to the central database once that becomes available.
Tool status / monitoring (Probably via the general management web interface) Yes Done The Tool Labs landing site links to tools' web interfaces, where they can post status information.
Software dependencies Distributed on all execution nodes via puppet Yes Done There is now a well-defined procedure to add software packages to the execution and/or dev environment, including packaging from non-Ubuntu sources.
Bug/Issue tracker Bugzilla. Yes Done There is now a Tool Labs tools bugzilla product where maintainers can request a component for any of their tools they want to support that way.
Anonymized weblogs Yes Done The current webproxy setup anonymises by default. Access and php error logs are split per-tool; there is a limitation in the current Apache which prevents proper splitting of error logs, however. Some mechanism to inspect them by tool maintainers should be looked into.
Local email Forwarded email addresses
What fqdn?
Check on legal / policy / requirements In progress In progress Per-user and per-tool email addresses (the latter exploding to the tool maintainer or to a mailing list); Legal prefers no local mail storage and is looking into the appropriateness of giving out addresses. (E.T.A. October 31st, 2013)
Render Requires more detail on specific requirements Yes Done May need to have its own project; discussion with the Labs infrastructure team will be needed
Access to revision text Look into feasability N Not done The resources to replicate the revision text are prohibitive. That said, using the API to get revision text from the labs is likely to be faster than direct access to the database given caching.
Shared storage Now uses NFS Provided by the infrastructure Yes Done, gluster has been replaced with a NFS server with redundancy at the hardware level.
Cross-tool authentication OAuth most likely.

E.T.A. late 2013 (improvement/new feature)

  • Best case: SUL as OAuth server
  • Actually plausible: an OAuth server with 'SUL account name' and 'email' (for instance) as protected resources that tools can request access to.
  • OpenID is soon to be deployed to allow for identification of end-users to the tools themselves.
Database creation from tools Yes Done Service groups have the privilege to create databases.
Convert PIP packages to debian using some automatic tool Probably create a tool (c++ / shell?) which automaticaly convert the pip package to debian Yes Done

ssh tools-dev

pypi-install <name> --keep

check the /tmp for output