Extension talk:Translate/Mass migration tools/Alignment

From mediawiki.org

Non-linguistic text comparison[edit]

It seems we agree on the following.

  1. Those markup examples are just examples, they won't cover all that's needed.
  2. It's ok to implement them as first coding step, aka "getting hands dirty experimenting with section aligment", to see what comes out of it.
  3. Later we'll need something more robust that doesn't require tedious work to hardcode all kinds of checks. It's not important to have a perfect solution, just a reasonable initial alignment humans can work on. Ideas.
    1. A MediaWiki markup extractor or DOM tree, whatever, to compare paragraphs with (probably hard).
    2. Use the MediaWiki parser: make it produce the parsed text, subtract visible text to wikitext, calculate e.g. edit distance over what's left.
    3. «CL-CNG, despite its simple approach, is the best choice to rank and compare texts across languages if they are syntactically related».

--Nemo 16:32, 29 April 2014 (UTC)Reply