Parsing/Media structure

Jump to navigation Jump to search

As we take steps towards converging with, and eventually replacing, the current php parser, one hop along the path is unifying the structure of media output. This is proposed in T118517, which implements T51097.

Three patches make up the bulk of the work (which are currently awaiting review):

Parsoid claims to render identically while adding more semantic elements to the markup (ie. the use figure and figcaption, instead of generic divs). In order to verify correctness, it has undergone several rounds of visual diff testing, as well as being the basis of the Visual Editor, which susses out many rendering differences.

Nevertheless, new bugs are still being discovered,

There also remains some known open questions about the output,

  • T171761: Figcaption overflows image width on unbroken words
    • - Set break-word on figcaption
    • Maybe this is an indication that we should switch back to styling the figcaption as a table-caption, and always emitting it so that the bottom border is present
      • Adding the figcaptions always could be useful regardless of switching back to the old css
  • T169975: Missing images render as broken img tags, not redlinks -- this is only an issue with Parsoid output, not with the changes to core.
  • Extension:ImageMap appears to do regexp post-processing of image media HTML, probably needs an update. Are there other similar extensions?

Finally, there is the need to proselytize this change in the community,

  • Come up with a story for how user gadgets and other downstream tools will be migrated
  • T113258: Draft email announcement about proposed change to output <figure> from PHP parser for images


This is probably redundant with Parsing/Notes/Figure_for_Media

gadgets and user scripts on enwp[edit]

  • most gadgets: don't necessarily inspect HTML -- 30% maybe use actual HTML most things might look for ids ...
  • taking an inventory of gadgets
  • no reliable way of knowing how user scripts
  • a few gadgets everyone uses: popup, hotcat, (5 or so) ... ppl post in village pump in < 30 mins if they break
  • some 20 or 30 that a few more ppl use and will take a while to notice
  • last category: used for specialized processes .. about 100
  • page lists a lot of them
  • commons has quite a lot; wikidata a few
  • hardest part is fixing on other wikis where things are copied over to other wikis
  • good to maintain documentation about what we fixed could help