Parsoid/Interfacing with MW API

From mediawiki.org

Ideas for the support of parser functions, magic variables, tag extensions. Imported from etherpad:ParserNotesExtensions.

Some old notes: https://www.mediawiki.org/wiki/Wikitext_parser/Environment

Fallback for functions, tag hooks not implemented in JS side:

  • send it through to MediaWIki API for parsing (e.g. http://muppet.wikia.com/api.php?action=parse&text=123)
    • tag hooks: return HTML, can probably use existing parse API for that for now
    • parser functions
      • ideally we want to get wikitext back
      • fallback to return HTML would work in some, but not all cases
        • can flatten HTML, so this probably is workable enough

Things would need to be added to API:

  • need to get a list of registered tag, function hooks!
  • get character mappings for variants? Language variants! Character mapping stuff

Things likely to be problematic:

  • link cache manipulation
    • we _may_ be able to get this from api parse already, check
  • adding styles/JS modules
    • ^ may be able to get this out of parser output info though (or from OutputPage/$wgOut) ParserOutput?
  • things like ref that store data and retrieve it -- would have to parse them all together or something?
    • could try: place all the items together in the document with <div> wrappers, then extract them back out after parse
      • also saves time to make a single request

Tag hook and parser function rendering[edit]

So potential:

  1. anything that can be rendered in pure JS, do it
  2. use a list of available parser & tag hooks to see what can be rendered as fallback through PHP
  3. if we have one or more items to render:
    1. bundle each one up in a uniquely-identified <div id="...">
    2. render the lot through API...
    3. extract the divs back from the returned HTML
    4. extract ResourceLoader modules, added links from the metadata
    5. if divs turn out to be problematic, could also use hash text as separators: [1]

Saving[edit]

Note: planning to skip over unchanged bits and resplice when saving; this will let us ensure that source code doesn't get normalized unexpectedly (dirty diffs done dirt cheap). This should work pretty well since we have source data in the tokens.

Template splitting[edit]

Token transformation mostly working for templates... unsure so far what might not work (needs more testing, especially on dumps)

Get stats on function usage![edit]

First pass: pull counts from XML data dump Second pass: compare with template usage, multiply

  • usage counts are in database
    • ??: view counts for pages
  • multiply by view counts?

Note: lua or JS templates[edit]

Rendering of them server-side should be pretty straightforward similar to the extension rendering above.