Wikimedia Platform Engineering/REST proposal/Kickoff meeting notes

Wikia/Wikimedia API discussion, 17 Oct 2012

Max Semenik, Rob Lanphier, Federico Lucignano, Owen Davis, Mark Davis, Sumana Harihareswara, Sam Reed, Roan Kattouw, Jon Robson (for 2nd half of meeting: Patrick Reilly)

(MediaWiki API change planning https://www.mediawiki.org/wiki/API )

This meeting: share thoughts, plans

Basic intent
Wikia's Federico (based in Poznan, visiting SF this week) will be doing API stuff for the next 6 months. (Mobile is a big part of that, but not limited to mobile) Wikia wants to attract motivated developers - apps & companies using Wikia's products. Also standardize the APIs, bring it up to standard -- but that's a high-level goal. Might be mobile but also the whole platform, including the enterprise. Mobile is first, driving direction.


 * In 1.19, API internals changed for RequestContext [as did everything else in MediaWiki -Roan]

Goal/idea: Put a key in the API to do quota, throttling, notification goal: RESTful interface, using HTTP verbs (Everything here is public, no business secrets etc.) Wikimedia also wants to avoid boxing ourselves into special-purpose, specific apps. Max: longtime contributor, has big-picture view of MediaWiki


 * Currently working on geo search API


 * "Mobile API" = MobileFrontend helper, delivers content in mobile-friendly format and per-section


 * Mobile-friendly HTML = mostly tweaks like removing magnifying glass icon from thumbnail images, removing [edit] links, etc.


 * Migration into core has been deferred but still a goal

Many APIs are RESTful, use HTTP verbs, the 5 main - [CRUD]


 * Roan: problem: in JS often can't use other than GET and POST


 * Federico: XHR1 and XHR2 support it, can wrap a JS API.


 * maybe pull in WebDAV to get 20 more verbs or so, but kind of an exotic solution

RobLa: We have thought about & toyed with the possibility of rewriting the API. Maybe WM could gently nudge Wikia's early work, like a pilot, and then collaborate on the next generation solution.
 * Owen: it's 2012, maybe we should redesign API with current thinking/affordances
 * Mark Davis: What are our goals? Are they aligned?

Who are the internal customers for API at Wikia?
 * Mobile team, external contractor building a reader


 * Mobile apps for verticals. Less about editability, more about having good experiences reading on devices

Federico: parts of MW are not exposed by current API; there's plenty of existing generic client libraries for RESTful API's that cover a wide range of languages/platforms (bigger the the set covered by the current set of "approved" client libraries for the MW API) which won't fully work or work at all and represent a good opportunity to get more 3rd party developers on board faster (Guzzle, restxl, Restlet, Apache Wink, Spring 3.x, even the Zend framework has a ready-to-use REST client built in) and would require no maintenance

REST
RESTful is subjective .... what do we really want?


 * "RESTful as a *standard* has evolved spontaneously over time"


 * http://martinfowler.com/articles/richardsonMaturityModel.html


 * http://ruben.verborgh.org/blog/2012/09/27/the-object-resource-impedance-mismatch/


 * Verbs, different UI structures *can* work - Roan has been skeptical because it's hard in client libraries, you could not access in JS. Stuff in spec has not always been implemented in reality.


 * Federico: JS in the browser *can* use HTTP verbs other than GET and POST (e.g. DELETE, PUT, HEAD and also custom ones!) as of XMLHttpRequest v1 in all the major browsers including IE http://www.mnot.net/javascript/xmlhttprequest/ (XMLHttpRequest v2 has more thorough support though http://www.w3.org/TR/XMLHttpRequest/#the-open-method)


 * RobLa notes: you look at the existing state of browsers, and you underestimate how much HTTP munging goes on in the network. Proxies (squid, varnish) - e.g. byteranges. Many corporate firewalls have antiquated notions. "Security."  PHP.


 * If people can't type it into their browser, hard to play & test.


 * Federico: the main difference between the current API and a RESTful one would be the use of HTTP verbs (GET, POST, DELETE, PUT, HEAD), I'm not sure those could be blocked via strict rules on a proxy/firewall

Design thoughts
Opportunistic developers want to play with the sandbox. Systematic developers want to look at the whole API on one page, api.php. Pragmatic developers want tools that automate some bits and give them some granular control (like the libraries). Right now our docs, client libraries, Sandbox are all mediocre and should be improved.


 * Sumana will help communicate on docs/tools improvement, like pywikipediabot, client libraries, APISandbox, documentation https://www.mediawiki.org/wiki/User:Sumanah/ApiDocsImprovement, communication with API users community via https://lists.wikimedia.org/mailman/listinfo/mediawiki-api.

Federico: api.php is too long and unstructured to go through, some time ago we developed a JS/HTML frontend for it ( http://api.wikia.com/wiki/Special:ApiExplorer ) but an SDL tool/language would help; WSDL was way too complex (WSDL 2.0 is better http://www.ibm.com/developerworks/webservices/library/ws-restwsdl/ ), swagger ( http://swagger.wordnik.com/ ) is very interesting also as a discovery tool, not only as an SDL language, WADL ( http://wadl.java.net/ ) is also a promising alternative (http://json-schema.org is another approach)
 * Federico: people now use IDEs to develop apps against APIs
 * Roan: in MW, API is mostly for AJAX (currently)
 * Owen: we developed our own AJAX entry point. Just reg an AJAX hook for some, use API for some


 * How can we merge these or make them similar or consistent?


 * Roan: sajax is mostly unused now, may be on its way out. App-specific APIs exist that output HTML, but they stil show up in autodocs


 * Federico: Wikia has developed a system to separate public API's from internal ones used for AJAX widgets in our Nirvana stack to solve this issue


 * Maintainability problem - hundreds of little functions you don't care about


 * Roan: App-specific HTML APIs should not be listed alongside with public-facing ones

Wikia wants to make something standard, not custom. & WMF is not wedded to how the API is right now. Fresh eyes are good.

Logistics
Who is API lead right now?


 * Right now, WMF Platform Eng. Idea was: Sam Reed's been assigned to it, but has not been able to focus on it (upcoming Release Manager will help with that). Roan has strong ideas, as do Max, Timo.

Does the Wikia merge onto MW trunk have a dependency here? https://www.mediawiki.org/wiki/Wikia_code


 * afaik, wikia hasn't made any changes to the core to support our API, we have added new extensions that use the existing framework and a new entry point which is completely standalone


 * http://trac.wikia-code.com/browser/wikia/trunk/wikia.php


 * This entry point can be used to call any Controller/Method in the Wikia App framework

Patrick: Does the API even need to be PHP?


 * Roan, RobLa believe so.


 * Patrick: Develop HipHop-friendly from the beginning. Perf team can go into intermediate HipHop code and improve that further. Approx 10% of Wikia's back end requests are API calls.

Legacy support for a while.... maybe support and keep updating the old one while building the new one.


 * New endpoint, not backwards-compatible.


 * Kind of has to be PHP to call into MW core? but from Wikia's POV, a  standalone API is external. Internal implementation is up to whoever's building a compatible API.

Should we take a consortium approach? Have *An API Designer*? 3rd-party consultant?


 * Gregg Kellogg could work on it on the Wikia side; working on Wikia's structured data?


 * WMF has learned some hard lessons over the years, and WMF will help prevent relearning those :-)


 * Some Wikia people maintain some of the bot frameworks & bots.


 * Develop RFC?


 * Involve Chris Steipp to get OAuth in from the start, & tokenization too

Wikia has optional key/token service. Should also support banning/capping, could be PHP or implemented in Varnish (VCL) or Squid.


 * Patrick: Simplified version of OAuth is fine, but it's a standard, let's use it


 * Federico: Depending on URL scheme, blocking specific methods on specific objects may not be possible, should keep that in mind. OAuth doesn't understand HTTP verbs


 * SDL could describe which keys cover which methods, Varnish could interpret that and perform checks


 * RobLa: OAuth is on the roadmap for January or something, to do before end of June 2013


 * Federico: Zend has an upcoming OAuth built-in implementation for PHP in SVN https://secure.php.net/manual/en/intro.oauth.php

Which bots and frameworks are around in the Wikia & WMF ecosystem?


 * VSTF (spam taskforce) uses bots to revert spam, probably pywikipediabot with tweaks for Wikia-specific APIs (e.g. mass deletes)


 * VSTF built a javascript based anti-spam tool called WHAM that uses the api: http://vstf.wikia.com/wiki/User:Joeyaa/wham.js


 * LyricWiki is heavily bot-maintained

Wikimedia API use & communication

 * Essential housekeeping done by bots on enwiki etc.


 * Talk to Guillaume about communicating out to bot people. Bot community has some functional communication channels but not great.


 * Patrick: Do we have a central bot registration?
 * Sumana: No, but there is Bot Approval Group on en.wp - https://en.wikipedia.org/wiki/Wikipedia:BAG
 * Bot rights: higher limits on edit rate, some limits lifted entirely.
 * We can look at bot rights, and reach out
 * Other API consumers exists as well. Could reach out via mediawiki-announce mailing list - https://lists.wikimedia.org/mailman/listinfo/mediawiki-announce
 * Federico: If pywikipedia bot is ported well, little would change for library users
 * Max: Gadgets use API too, keys are useless for them because code is on public wiki pages. Ways around it: tie keys to domains, make them anonymous, etc.

Existing API communication channels:


 * Users spread among userscript / gadgets, applications we don't see, bots (some visibility), tools, glorified screenscraping, ...


 * No current vibrant API community, API is means to an end


 * Couple different audience-specific mailing lists


 * API mailing list https://lists.wikimedia.org/mailman/listinfo/mediawiki-api


 * Toolserver list https://lists.wikimedia.org/mailman/listinfo/toolserver-l


 * Bots list https://lists.wikimedia.org/mailman/listinfo/wikibots-l


 * Some knowledge of who writes Gadgets / user scripts


 * IRC channels (not great for announcements) https://meta.wikimedia.org/wiki/IRC/Channels


 * Sumana: Should use existing channels as much as possible


 * Wikia has an api wiki for documentation which is specific to our own extensions: http://api.wikia.com/wiki/Wikia_API_Wiki

Next steps
How to talk initially about the kickoff:


 * Patrick: Blog post, then reference that?


 * Could work with call to action and appropriately-set expectations


 * RobLa: Gotta avoid death by consensus, don't make everyone feel like they have a veto. Making too big of a deal of this could lead there


 * Owen: Publish prototype and get support because it's measurably better.


 * RobLa: OK, but do collaboration on the mailing list augmented with occasional get-togethers like this

Example of collaboration Wikia/WMF workflow: VisualEditor. Talk to James Forrester.


 * Right now, WMF-run, mostly.

Timeline for new prototype?


 * We're really early. Let's get wishlists first.


 * Mark: Put out RFC?


 * Patrick: Use Zend as a model?

Next steps:


 * Wikia to link to existing API wiki, put RFC out on the mediawiki-api mailing list


 * RobLa to bring Chris Steipp up to speed


 * Another meeting, maybe a month from now? after RFC? Realistic frequency: the every-few-months Wikia/WMF eng meetup

Possible goals and research points

 * get 3rd party developers attention


 * standardization (RESTful, SDL, data schema)


 * service discoverability via an SDL language and tool (e.g. http://swagger.wordnik.com/ ) - MW has APISandbox https://www.mediawiki.org/wiki/Extension:ApiSandbox & Wikia has a similar prototype tool (http://www.wikia.com/wikia.php?controller=DiscoverApi )


 * easier to document


 * versioning (don't break clients)


 * CORS support (which is now in MW 1.20 as of a few months ago)


 * Cacheability of read operations (Varnish/Squid) for added performance


 * consistent data schema per method (don't change data fields via parameters) and format (JSON) [Roan and MaxSem are in favor of making any API 2.0 JSON-only]


 * simplify URL schema


 * simplify internal usage (faux request)


 * improve testability (mock data sources)


 * usage stats, quotas/thresholds, block access read/write, authentication (keys/OAUTH)