Wikimedia Developer Summit/2016/ApiUsability

Copied from https://etherpad.wikimedia.org/p/WikiDev16-ApiUsability Topics for discussion:
 * Session name: MediaWiki Action API design discussion: the amazing/good/bad/ugly
 * Meeting goal: Anomie has been working on the mediawiki API, let's gather ideas
 * Meeting style: Problem-solving(problem discovery?): surveying many possible solutions
 * Phabricator task link: https://phabricator.wikimedia.org/T122818
 * Use cases
 * Bots/tools/gadgets
 * historical primary use-case
 * need to query content & perform actions
 * action API geared towards information about lots of pages
 * Google: want to get clean wikipedia data. They've written wikitext parser (parse to structured data). Access templates from API. Access templates; contents are still different from what's visible on HTML page. What the user sees is different from the template. Trying to clean templates to unify implementations. Similar to Wikidata's goal: human and machine-readable data.
 * If you access infobox by template vs. html: even the number of infoboxes on the page is different.
 * Broader issue: language agnosticism. Action API for specific installation; RESTBase is a "Cassandra-backed persistent cache layer", with modules.
 * Pain points
 * What is the best way to query infobox information? ...can there be better ways?
 * one problem with infoboxes is that they are written by different people, different inputs and outputs, wikidata is one answer to standardise that
 * See also content format discussions  https://phabricator.wikimedia.org/T119022
 * Discoverability of existing features
 * for example it is hard to understand what each API module will give back
 * cirrus is another example, people might not be interested in that
 * automatically generated documentation:  https://en.wikipedia.org/w/api.php
 * human-(un)maintained documentation https://www.mediawiki.org/wiki/API:Main_page
 * API sandbox https://en.wikipedia.org/wiki/Special:ApiSandbox
 * currently undergoing a rewrite by anomie
 * modules are hard to categorise and relate to each other (e.g. "if you are doing x on page see also module y")
 * Ctrl-F stopped working with the API redesign
 * all help in a single page  https://en.wikipedia.org/w/api.php?action=help&recursivesubmodules=1  (!!!!)
 * The way the XML dumps, the database and the API represent deleted fields is different and poorly documented.
 * Related  https://phabricator.wikimedia.org/T114019
 * Inconsistencies between API access and dumps (e.g. bitfields)
 * A lot of the "actions" aren't actually an action.  action=query, action=edit makes sense.  action=flow doesn't help me flow something  "action" has become a top-level categorization
 * YES.
 * Following on from the point about best practices when writing API modules, this is an important part of the code review process (as well as clear documentation)
 * "action" is really which module to ask to
 * Too many ways of doing similar but not identical tasks (e.g. fetching current page text)
 * part of the problem is fragmentation, often the solution is to ask somebody who has come across the same problem
 * Versioning: let's talk about it. Versioning modules. Brad: where possible, add a new parameter instead of versioning. Issues: complexity creep, how to balance?
 * Versioning could help substantially with addressing the inconsistencies between data (API/XML/Database/etc).  Without versioning, we can't refactor without breaking things.

General notes Action items with owners: Conversations to have: Attendees: DON’T FORGET: When the meeting is over, copy any relevant notes (especially areas of agreement or disagreement, useful proposals, and action items) into the Phabricator task.
 * Design features
 * Querying revisions independent of page/user (SELECT * FROM revision WHERE rev_timestamp BETWEEN "2014" and "2015")
 * check out the allrevisions module (https://www.mediawiki.org/wiki/API:Allrevisions)
 * example of discoverability issues
 * Useful: provide a link to the example queries in API Sandbox (in api.php module docs)
 * More caching:
 * Can caching work for sub-modules of the action API?
 * possible, but needs someone willing to work on it. anomie happy to review.
 * restbase being single-page-oriented is easier to cache/purge, action api not so much since it operates on many pages
 * Mobile views API module should work on more than one article at a time. (depends on the MobileFrontend extension)
 * Can we query the API via PHP in mediawiki?  Most queries/actions internally directly access the databases.
 * not ATM, going back and change that is a huge amount of work to properly separate things
 * Would the team be interested in someone working on this with them?  Yes!  "I'd like to review that code." --anomie
 * Can standardize how we access data because there are some nuances in normalization/etc.
 * Standardization on this can provide common language
 * Unified way of accessing page properties
 * [discoverability] Grouping of actions--what goes together? E.g. Cirrus-related could go together so only people who care about it notice it
 * possible GCI/hackathon project; make a place for information to go, maybe on mw.org
 * Grouping of actions would deal with the action=flow issue (mentioned above). Where that action is essentially a group of everything Flow
 * Is there a long-term plan for the action API?  (Currently work is done ad-hoc)
 * https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap
 * https://www.mediawiki.org/wiki/API/Architecture_work/Planning
 * bd808's notion of code pioneer/settler/city planner for code ( http://blog.gardeviance.org/2015/03/on-pioneers-settlers-town-planners-and.html  among others)
 * Is the purpose to avoid dealing with wikitext? No, not really--you can get HTML out of it, but also handle wikitext.
 * API in layers--wikitext, template, other information to allow user parsing?
 * quarry (web interface for db queries) records queries, can be a useful learning too for newcomers. replicate the same for api sandbox?
 * on the same theme, see also jupyterhub on labs to control pywikibot
 * Fhocutt: suggest API use-case categorization for hackathon
 * !Brad: ask Brad/anomie to review code for API modules, and set aside time to deal with resulting comments. Add anomie as a reviewer on an API-related patch, and if he's not looking at it ping him via email/IRC.
 * vague, no one is assigned to it: fix up API documentation. Make a list of pages that need fixing?
 * Aaron Halfaker
 * Filippo Giunchedi
 * Darian Fitzpatrick
 * Niklas Laxström
 * Jordan Adler (Google)
 * Bryan Davis
 * Zhicheng Zheng (Google)
 * Yanan Qian (Google)
 * Stas Malyshev
 * Frances Hocutt
 * Sam Smith
 * Joaquin Hernandez

See https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016/Session_checklist for more details.