Parsoid/Todo:PHP parser integration

From mediawiki.org

Extension expansion[edit]

Most extensions don't depend on order and frame state, so can be expanded in parallel and out-of-order. The following extensions are the exceptions among the 455 extensions in the wikimedia extensions.git repository.

Extension tags depending on frame state[edit]

The following extensions define extension tags (which are not run by the PHP preprocessor) that depend on the frame state (grep -r 'frame->expand' extensions; grep -r 'frame->getArguments' extensions):

  • Arrays (frame->expand, shared state so order-dependent)
  • Carp (debugging extension, low-level frame access)
  • ExtTab / ET_ParserFunction (frame->expand)
  • FacebookOpenGraph (parser->replaceVariables, parser->recursiveTagParse)
  • HTMLTags (parser->replaceVariables)
  • HashTables (frame->expand, frame->getArguments, order-dependent)
  • LabeledSectionTransclusion (frame->expand)
  • Loops (frame->expand, frame->getArgument, order/nesting-dependent)
  • Poem (parser->recursiveTagParse)
  • RSS (parser->recursiveTagParse on an optional per-RSS-item wikitext-based template)
  • SelectTag (parser->recursiveTagParse)
  • SoundManager2Button (parser->recursiveTagParse)
  • Spark (parser->replaceVariables)
  • Validator (parser->recursiveTagParse)
  • WikitextLoggedInOut (parser->recursiveTagParse)

Parser functions depending on frame state[edit]

These extensions only define parser functions (which are run by the preprocessor) that depend on the frame state:

  • CreatePage (frame->expand)
  • GeoData (frame->expand)
  • PageInCat (frame->expand)
  • ParserFun (frame->expand, frame stack access, ...)
  • ParserFunctions (frame->expand etc)
  • RegexFun (low-level frame access)
  • ReplaceSet (frame->expand)
  • Scribunto (frame->getArguments)
  • SemanticForms (frame->expand)
  • SemanticMediawiki (frame->expand etc, not 100% sure if it registers tags too)
  • SubpageFun (frame->expand)
  • WikiLovesMonuments (frame->expand, frame->getArgument)

Order-dependent parser functions:

  • UserFunctions (dynamic user-defined parser functions)

Parser functions adding global state:

  • Description2 (frame->expand); also adds an output hook which adds a global meta tag to the parser output

Order-dependent extensions[edit]

These typically maintain internal state between calls and expect all hooks in a page to be called sequentially.

Enabled on WMF wikis[edit]

  • Cite: We have a strategy on how to handle this by re-rendering references sections and numbering as a post-processing step on the full DOM.

Third party[edit]

Preprocessor (Function hooks):

  • Arrays: explicitly defines mutable state in arrays. WONTFIX.

Possible solutions[edit]

  • Add an expandTemplatesAndMostTagExtensions API method to the MW API that expands all templates and most extensions (possibly all except Cite).
    • Top-level template expansion is probably an ok granularity for incremental updates- highly dynamic extensions can still be inserted without a template wrapper to avoid template re-expansions
    • Avoids the need to serialize out & send back frame information for most extensions
    • Should add encapsulation tags around extension output so that we can treat it differently for sanitation purposes
  • Parse all templates in a single action=parse call, separated with unique strings so that the results can be split per template transclusion
    • Problem: Single-threaded, hides a lot of information we would like to have.
  • Instrument the PHP preprocessor to provide a serialized frame parameter for unexpanded extension tags
    • Lets us perform the expansion independently
  • Add API method for direct extension calls rather than action=parse
    • Can support wikitext-returning tag extensions (TODO: find those!)
    • Extension calls still needed for top-level extensions even with an expandTemplatesAndMostTagExtensions API method

Information we would like to get from action=expandtemplates and extension expansions[edit]

  • List of templates and parser functions used in the expansion
    • Lets us track dependencies and cacheability for selective re-rendering (examples: #time-dependent output as used on en:Main Page, which templates should trigger fragment re-rendering etc).
  • A TTL, if time-sensitive (for example for #time). The minimum of the TTLs of all parser functions used (if any).
  • Re-render events this content block depends on. Can be empty or any combination of the events listed in the fragment index section.
  • (Maybe:) Serialized frame for tag extensions in template expansion output
    • Lets us expand those extension tags with the proper frame
    • BUT: Parent frame access is not generally provided by common extensions: User:GWicke/Test

See also Parsoid/Page metadata.