Jump to content

Content translation/Publishing

From mediawiki.org



The translated article is in HTML at Special:CX. It is passed to the CX Publishing API, which then contacts Parsoid to get the wiki text for the given HTML. Finally the API creates the article in the User namespace with the contenttranslation tag.

The HTML in translation column has lot of annotations with segment spans. Before passing to Parsoid all such wrapping segment spans are removed.

Story: https://wikimedia.mingle.thoughtworks.com/projects/language_engineering/cards/4135


  1. Parsoid returns a complete HTML for the article. but we are interested in the body part. We need to do the cleanup at server before segmentation.
  2. Every page should have <references/> to have the references applied properly.
  3. How to handle templates missing in target wiki?