Jump to content

Content Transform Team/Chores

From mediawiki.org

Content Transform Team/Chores/Phabricator template

Parsoid

[edit]

Parsoid Common tasks might be useful to look at for instructions for some of these tasks.

Parsoid RT-testing

[edit]

Documentation: Parsoid/Round-trip testing

Parser tests sync

[edit]

Documentation: https://wikitech.wikimedia.org/wiki/Parsoid/Common_Tasks#Sync_parser_tests

Parsoid vendor patch

[edit]

Current patch and page to update: Parsoid/Deployments

Documentation: https://wikitech.wikimedia.org/wiki/Parsoid#Deploying_changes

Parsoid vendor patch review

[edit]

Extracted from https://wikitech.wikimedia.org/wiki/Parsoid#Deploying_changes

Review the generated patch (either via git show or on gerrit), looking specifically for unexpected changes. The code in wikimedia/parsoid should change in roughly the ways you expect from the deploy summary, there should be a change to the version number in composer.json and changes to some hashes, timestamps, and versions in composer.lock and composer/installed.json, but there should be no other changes. See this patch set for an example where an old version of composer was used, resulting in spurious changes to other files in composer/.

Monitor Parsoid Community-reported Issues

[edit]

Talk:Parsoid/Parser Unification/Known Issues

Parsoid logs

[edit]

Logstash Parsoid (restricted access)

Parsoid#Monitoring on wikitech

Parsoid Grafana charts

[edit]

Parsoid Read Views: Kick off weekly visual diff run

[edit]

We want to gather data about Parsoid rendering issues on rest of the wikis where Parsoid Read Views is not enabled yet. After the train has rolled out to group 2 wikis, we are going to kick off a weekly visual diff run on ctt-prv-04.wikitextexp.eqiad1.wikimedia.cloud

High-level steps

[edit]
  1. Stop testreduce services
  2. Prepare the test database with a list of titles from the next set of wikis to test (5 mins or less). This step also backs up the previous db.
  3. Edit the testreduce config file to update the visualdiff test run id
  4. Restart testreduce services
  5. Wait for the test run to complete (10-15 hours)
  6. Purge all 404-ing titles and associated results
  7. Retry significant failures a couple times to shake out false positives (30 mins or so total)
  8. Generate the diffs CSV text file and upload it as a google spreadsheet to CTT google drive
  9. Generate stats & confidence report files (for later use)

Detailed instructions

[edit]

Stopping and starting testreduce services:

[edit]
  • Stop: sudo service parsoid-vs-core-vd stop; sudo service parsoid-vs-core-vd-client stop
  • Start: sudo service parsoid-vs-core-vd start; sudo service parsoid-vs-core-vd-client start

Prepare the test db

[edit]
  • cd /srv/testreduce/server/scripts
  • Read the README there and follow instructions.
  • If you are preparing the list of wikis from the list of wikipedias via the get_wp_list.sh script there, you will need to know where the previous run stopped. For now, this is a manual coordination step between chore wheel members. We can use the weekly chore phab task to list where the previous run stopped. But, as of Sep 5, we've gone through the last 150 wikis (so, start arg of -150 for the script).
  • If you want to check your work, you can connect to mysql and have a look at the pages table; in particular, you can check that the number of pages in there is in the right order of magnitude (select count(*) from pages;).

Monitoring test results

[edit]

The test results are viewable on http://prv-tests.wmcloud.org. If you've run the rt-testing chore, you know how to interpret this screen since it is the same testreduce software but backed by visualdiff runs.

Purging 404-ing titles & retrying failures

[edit]
  • cd /srv/visualdiff/tools and look at the scripts there. They are documented on this page.
  • But, TLDR, you will run the scripts in the following order:
    • bash purge_404s.sh prv_tests DB_PASSWORD
    • bash retry_significant_failures.sh prv_tests DB_PASSWORD - you will have wait for the retried tests to complete by monitoring the web interface or journalctl -u parsoid-vs-core-vd -f
    • Worth retrying a second time after the first run completes to shake out a few more.

Generating diffs & stats

[edit]
  • cd /srv/visualdiff/tools and look at the scripts there. They are documented on this page.
  • But, TLDR, you will run the following scripts:
    • bash diffs.sh prv_tests DB_PASSWORD 7 csv > reports/diffs_MM_DD.csv
    • scp that file to your local computer and upload it to Google drive here. Naming convention for the sheet. "Diffs: YYYY/MM/DD" - name bikeshedding welcome, but please rename all existing files if we tweak this.
    • NOTE: The above command line sets a threshold of 7 for diffs. Any wikis with more than 7 significant diffs is listed as not-deployable.
    • bash confidence_report.sh prv_tests DB_PASSWORD > reports/confidence_report.MM_DD - this generates a wikitext table which can be uploaded (in full or only some rows) to Parsoid/Parser Unification/Confidence Framework/Reports/ if we decide to deploy parsoid read views to any of the tested wikis

Optional step: Process the diffs file by analyzing diffs for low-hanging-fruit wikis

[edit]

Look at diffs for wikis with just 1 entry, then 2 entries, and so on. You will need someone to walk you through this the first time you do this. But, you can skip this step altogether as well.