Content Transform Team/Weekly Updates

Week of Feb 21, 2022
Parsoid integration with core


 * First draft of setFunctionHook support ready for review https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/761494
 * ParserOutput compatible support in Parsoid is close to landing
 * In prep for 1.38 and moving along the Parsoid and core integration, we're migration all of Parsoid's extension/* code to MediaWiki core repo

Maps

mobile-html services
 * Collaborating with WMDE folks on refactoring kartotherian
 * By the end of the week we will mirror requests to eqiad


 * Mobile Preview on hold until next week

Week of Feb 14, 2022
Parsoid integration with core


 * First draft of setFunctionHook support in https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/761494
 * Some additions needed to the ContentMetadataCollector in Parsoid (https://gerrit.wikimedia.org/r/c/mediawiki/services/parsoid/+/761996, https://gerrit.wikimedia.org/r/c/mediawiki/core/+/762008)

Extension Updates


 * Linter
 * fixes submitted for review
 * namespace database column add is nearly done and Tag and Template column field add should begin

Performance


 * Patch for benchmarking ready for review (T272331)

Maps


 * Maps tile pregeneration is throwing errors (https://phabricator.wikimedia.org/T301664)
 * Mitigation in place

Week of Feb 7, 2022
Parsoid integration with core


 * Started work on supporting setFunctionHook and bridging Parsoid's Frame object with Parser.php's PPFrame_* object

Extension Updates


 * Linter: further work on adding Template and Tag column

Performance


 * Further work on roundtrip testing (T272331)

Maps


 * Got the green light switch Maps traffic between clusters

Week of Jan 10, 2022
Media Output changes in core


 * inline-media-captions lint turned out to have been a bad idea - disabling it for now

Parsoid integration with core


 * Signature of addModules/addModuleStyles fixes landed in core

Performance


 * Tim's patches from November rolled out on this week's train

Maps


 * A bit of firefighting with T299216

Week of Jan 3, 2022
Media Output changes in core


 * Linter category for inline captions deployed
 * https://www.mediawiki.org/wiki/Special:LintErrors/inline-media-caption
 * Still waiting on Parsoid to start populating the category though


 * Added "resource" attribute to img tags

Parsoid integration with core


 * Strip State Handling issues to resolve: T299103
 * ContentMetadataCollector interface being implemented in core

Extension Updates


 * CI fixes needed to run Parsoid wt2wt and other test modes in extension repositories
 * Hiero
 * Patch in gerrit and in review

Maps


 * Supporting WMDE on beta cluster issues
 * Bugs reported with borders; Fixes being worked on

Week of Dec 13, 2021
Media Output changes in core


 * Linter category for inline captions created and merged. Help page for the category created.
 * "resource" attribute being added to img tags to fix T292657; matches Parsoid
 * Concern about bloating HTML payload. See T297984

Extension Updates


 * Translate
 * Parsoid changes rolled out to production as part of wmf.13 train.
 * InputBox
 * Proof of Concept patch. Blocked on missing support in ParsoidExtensionAPI.

Performance


 * All roundtrip regressions from Tim's patch have been fixed and tested. Will roll out to production in Jan

Maps


 * Maps 2.0 stack has been rolled out to all wikis
 * Last minute issue with overzoom fixed
 * Swift backpressure issue
 * When tile pregeneration parallelism is >= 5 workers cache latency increases not just for pregeneration but for all cache related ops
 * To be investigated.

Week of Dec 6, 2021
Media Output changes in core


 * T287965: Print styles are fixed


 * Inline images & alt text handling: See T297443

Parsoid integration with core


 * Exploring adding setFunctionHook support to Parsoid - related Parsoid SiteConfig fix along the way

Extension Updates


 * Linter
 * Patch to display all lints for a single page in gerrit
 * Translate
 * Split deployment into two pieces. With wmf.12, only html->wt support was introduced to add forward compatibility for Parsoid HTML version 2.4.0. Translate support will be rolled out on the next train.
 * wmf.12 train got rolled back
 * InputBox
 * First proof of concept patch in gerrit; progress now requires discussion about ParsoidExtensionAPI
 * SyntaxHighlight
 * Exploration of why SyntaxHighlight cares about strip state in phabricator. Parsoid's behaviour is more reasonable overall but might need a temporary workaround to deal with Scribunto's use of this mechanism.

Performance


 * Tim's last patch merged and sent to rt testing - regressions found and need to be investigated and fixed before it can be deployed (in the new year)

Visual Diffing


 * Regressions in wmf.12 in image layout (bottom border) between core & Parsoid. Almost 5% drop in test pages without rendering diffs
 * This is mostly something that pops up in visual diff testing more readily but impacts are subtle on wikis that will mostly not be noticed by readers or editors.
 * Regression has been fixed in core and merged - will ride the next train.

Maps


 * Maps 2.0 stack has been rolled out to frwiki - no complaints and everything stable

Everything else


 * Filed T297259 for ServiceOps to run some perf benchmarking for us with newer hardware to estimate what hardware changes might be beneficial when Parsoid is used for read views on all wikis
 * C.Scott (with Subbu's input) presented updates from the Parsoid / wikitext parsing world at SWMCon 2021
 * WIP to look at better CI and parsertests support for extensions that are updated to work natively with Parsoid APIs

Week of Nov 29, 2021
Extension Updates


 * Translate
 * Annotations support rolling out to production in next week's train


 * Linter
 * All lints for a single page patch nearing completion
 * SyntaxHighlight
 * Initial explorations to have it work with Parsoid's Extension API directly

Maps


 * To deploy follow-up patch regarding label cut on Tegola

mobile-html Services


 * Issue with graphs came up on phabricator T285093

Week of Nov 22, 2021
Parsoid integration with core


 * ContentMetadataCollector interface: Basic patch merged in Parsoid

Performance


 * TIm's autoInserted* flag detection via Remex patch cannot be merged till new train rolls out to production to update Remex version on scandium

Week of Nov 15, 2021
Parsoid integration with core


 * First phase of ContentMetadataCollector should land this week (just a few methods left to audit) - might be underwhelming since most of the 'exciting' methods got punted to phab tickets

Extension Updates


 * Translate
 * RT testing showed a few issues, most of them corner-case-y; all the ones we found either in phab or need to be fixed on pages

Performance


 * TIm's autoInserted* flag detection via Remex - patch in gerrit for review. CPU and memory benefits expected with rollout

Other


 * Subbu met with SRE Data Persistence to discuss ParserCache capacity needs for Parsoid Read Views. TLDR is that after recent server upgrades, ParserCache has ~30% utilization and should be able to support Parsoid's HTML as well as long as we rollout to wikis in stages.

Week of Nov 7, 2021
Media output changes in core


 * FAQ edited and approved

Extension Updates


 * Translate
 * Ran RT-testing, examined regressions and filed patches to fix them. Followups needed.
 * Dirty diffs related to newline changes could impact translate behavior and needs investigation.

Performance


 * No new updates. Tim busy working on PHP-VM bug

Maps

Maps v2: T263854


 * Most of the tickets are resolved - the ones not resolved are either low priority or docs related
 * testwiki now is connected to tegola backed kartotherian source
 * Resolved some event related issues
 * cronjob to trigger invalidation on OSM syncs
 * kafka concurrency
 * when we scaled workers kafka didn't allow concurrent consuming
 * envoy + tegola k8s reliability issues


 * Re-introduced batching in tegola pregeneration scrips
 * Next steps
 * Test pregeneration with production load
 * Roll out to more wikis

mobile-html Services


 * Mobile Preview problem statement submitted for preview - T295348

Other


 * Filed TOC Incident report
 * Discussed ParserCache implications of ParserOutput work with Amir (database arch)

Week of Nov 1, 2021
Media output changes in core


 * Started working on the FAQ for the rollout, please add questions you want to see there

Extension Updates


 * Translate
 * Annotations patch merged. Three bugs identified via rt-testing. Investigation done. Patches Soon.

Performance


 * Tim is working to get rid of the start/end meta addition to detect tree builder fixups and register handlers (via subclassing) with Remex to listen to treebuilder events. This has the potential to cut processing and memory if it works out.

Visual Diffing


 * Something seems to have improved arwiki results a bit in the latest run

Maps


 * FYI: WMDE is submitting some patches to Kartographer as part of their tech wishlist
 * Still working on tile pregeneration

mobile-html Services


 * Phab task to track Dark Themes Preview - T295299

Other

Production incidents


 * Regression in ToC output caused firedrill Friday
 * Should figure out how to pass __NOCONTENTCONVERT__ and some other properties to ParserOutput
 * Should document proper mechanism for ParserCache updates
 * Maybe zhwiki needs to be group 1 instead of group 2
 * Proper versioning for ParserCache would be helpful. (Also RestBASE.)
 * Sanitizer interactions with tags, needs followup this week (toc and translate)
 * Follow up to rt testing interaction with mediawiki-vendor as well