Wikimedia Developer Summit/2016/ApiUsability

From mediawiki.org

Copied from https://etherpad.wikimedia.org/p/WikiDev16-ApiUsability

  • Session name: MediaWiki Action API design discussion: the amazing/good/bad/ugly
  • Meeting goal: Anomie has been working on the mediawiki API, let's gather ideas
  • Meeting style: Problem-solving(problem discovery?): surveying many possible solutions
  • Phabricator task link: https://phabricator.wikimedia.org/T122818

Topics for discussion:

  • Use cases
    • Bots/tools/gadgets
      • historical primary use-case
      • need to query content & perform actions
    • action API geared towards information about lots of pages
    • Google: want to get clean wikipedia data. They've written wikitext parser (parse to structured data). Access templates from API. Access templates; contents are still different from what's visible on HTML page. What the user sees is different from the template. Trying to clean templates to unify implementations. Similar to Wikidata's goal: human and machine-readable data.
      • If you access infobox by template vs. html: even the number of infoboxes on the page is different.
    • Broader issue: language agnosticism. Action API for specific installation; RESTBase is a "Cassandra-backed persistent cache layer", with modules.
  • Pain points
    • What is the best way to query infobox information? ...can there be better ways?
      • one problem with infoboxes is that they are written by different people, different inputs and outputs, wikidata is one answer to standardise that
      • See also content format discussions https://phabricator.wikimedia.org/T119022
    • Discoverability of existing features
      • for example it is hard to understand what each API module will give back
      • cirrus is another example, people might not be interested in that
      • automatically generated documentation: https://en.wikipedia.org/w/api.php
      • human-(un)maintained documentation https://www.mediawiki.org/wiki/API:Main_page
      • API sandbox https://en.wikipedia.org/wiki/Special:ApiSandbox
        • currently undergoing a rewrite by anomie
      • modules are hard to categorise and relate to each other (e.g. "if you are doing x on page see also module y")
    • Ctrl-F stopped working with the API redesign
      • all help in a single page https://en.wikipedia.org/w/api.php?action=help&recursivesubmodules=1 (!!!!)
    • The way the XML dumps, the database and the API represent deleted fields is different and poorly documented.
      • Related https://phabricator.wikimedia.org/T114019
      • Inconsistencies between API access and dumps (e.g. bitfields)
    • A lot of the "actions" aren't actually an action.  action=query, action=edit makes sense.  action=flow doesn't help me flow something  "action" has become a top-level categorization
      • YES.
      • Following on from the point about best practices when writing API modules, this is an important part of the code review process (as well as clear documentation)
      • "action" is really which module to ask to
    • Too many ways of doing similar but not identical tasks (e.g. fetching current page text)
      • part of the problem is fragmentation, often the solution is to ask somebody who has come across the same problem
    • Versioning: let's talk about it. Versioning modules. Brad: where possible, add a new parameter instead of versioning. Issues: complexity creep, how to balance?
      • Versioning could help substantially with addressing the inconsistencies between data (API/XML/Database/etc).  Without versioning, we can't refactor without breaking things.
  • Design features
    • Querying revisions independent of page/user (SELECT * FROM revision WHERE rev_timestamp BETWEEN "2014" and "2015")
    • More caching:
      • Can caching work for sub-modules of the action API?
        • possible, but needs someone willing to work on it. anomie happy to review.
      • restbase being single-page-oriented is easier to cache/purge, action api not so much since it operates on many pages
    • Mobile views API module should work on more than one article at a time. (depends on the MobileFrontend extension)
    • Can we query the API via PHP in mediawiki?  Most queries/actions internally directly access the databases.
      • not ATM, going back and change that is a huge amount of work to properly separate things
      • Would the team be interested in someone working on this with them?  Yes!  "I'd like to review that code." --anomie
      • Can standardize how we access data because there are some nuances in normalization/etc.
      • Standardization on this can provide common language
    • Unified way of accessing page properties
    • [discoverability] Grouping of actions--what goes together? E.g. Cirrus-related could go together so only people who care about it notice it
      • possible GCI/hackathon project; make a place for information to go, maybe on mw.org
      • Grouping of actions would deal with the action=flow issue (mentioned above). Where that action is essentially a group of everything Flow

General notes

  • Is there a long-term plan for the action API?  (Currently work is done ad-hoc)
  • bd808's notion of code pioneer/settler/city planner for code (http://blog.gardeviance.org/2015/03/on-pioneers-settlers-town-planners-and.html among others)
  • Is the purpose to avoid dealing with wikitext? No, not really--you can get HTML out of it, but also handle wikitext.
  • API in layers--wikitext, template, other information to allow user parsing?
  • quarry (web interface for db queries) records queries, can be a useful learning too for newcomers. replicate the same for api sandbox?
    • on the same theme, see also jupyterhub on labs to control pywikibot

Action items with owners:

  • Fhocutt: suggest API use-case categorization for hackathon
  • !Brad: ask Brad/anomie to review code for API modules, and set aside time to deal with resulting comments. Add anomie as a reviewer on an API-related patch, and if he's not looking at it ping him via email/IRC.
  • vague, no one is assigned to it: fix up API documentation. Make a list of pages that need fixing?

Conversations to have:

Attendees:

  • Aaron Halfaker
  • Filippo Giunchedi
  • Darian Fitzpatrick
  • Niklas Laxström
  • Jordan Adler (Google)
  • Bryan Davis
  • Zhicheng Zheng (Google)
  • Yanan Qian (Google)
  • Stas Malyshev
  • Frances Hocutt
  • Sam Smith
  • Joaquin Hernandez

DON’T FORGET: When the meeting is over, copy any relevant notes (especially areas of agreement or disagreement, useful proposals, and action items) into the Phabricator task.

See https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016/Session_checklist for more details.