Wikimedia Release Engineering Team/SSD Sync Up/2019-03-26

From mediawiki.org

2019-03-26[edit]

Last Time: 2019-03-19

Discussion[edit]

  • Dan: Some progress on .pipeline/config.yaml. Need feedback on execution graph.
  • Mukunda: Look at DOT format, digraph. Would be useful if DOT could be generated from YAML format.
  • Dan: Looks similar to Form 3 in the examples. Might be suitable.
  • Lars: Needs to be understandable first and foremost. Efficiency doesn't matter much if it can't be understood.
  • Jeena: Progress on developer tooling - basic automated MediaWiki install should merge today. Restbase and parsoid can be automatically enabled. The changes also allow users to use an external service instead of running it all in minikube.
  • Brennen: Moving on to MediaWiki docker images - probably a base image with extensions dependent on local developers' source trees to start?

Execution graph examples[edit]

Relevant to discussion and T210267.

Three different representations of:

    a   f
     \ /
      b
     / \
    c   g
    |   |
    d   |
     \ /
      e

Form 1 – execution graph expressed vertically as a series of parallelized sets[edit]

execution:  # execution order is expressed top-to-bottom
  - [a, f]  # members of set can run concurrently
  - b       # each set is run in serial
  - [c, g]
  - d
  - e

This does not fully represent a directed graph as there's no dependency chains and is inefficient where diverging arcs have incongruent workloads. (In the given example, D would have to wait for G to finish before executing.)

Form 2 – a true execution DAG expressed as nodes and their subsequent siblings[edit]

execution:   # execution order is expressed as each node and its siblings
  a: [b]     # b follows a
  f: [b]     # b also follows f
  b: [c, g]  # c and g follow b as separate arcs, etc.
  c: [d]
  g: [e]
  d: [e]

While this can fully express a DAG it is hard to reason about.

Form 3 – a true execution DAG expressed as horizontal arcs that intersect (join) on common members[edit]

execution:           # execution order is expressed horizontally as separate arcs
  - [a, b, c, d, e]  # first arc
  - [f, b, g, e]     # second arc (intersecting the previous at B and E where execution would join/wait)

This can fully express a DAG (condensed to Form 2 internally by reducing consecutive pairs to a hash/map) and parallel execution is more efficient but perhaps it's still harder to reason about that Form 1.