Extension:Translate/Mass migration tools/Alignment

Alignment Algorithm
Assumptions:
 * 1) The translation has paragraphs in the same order as the original

The step 2 of this project requires importing the translations which were already present before FuzzyBot's edit to the page. These need to be aligned with the translation unit identifiers made by the Translate extension.



As per the mock up design, the only thing we need to worry about is filling out the left and right hand side blocks with appropriate texts. The left hand side blocks are for the source text (English) and the right hand side blocks are for the corresponding imported translations.


 * 1) Left-hand-side blocks: The left-hand side blocks are already shown at Special:Translate and hence, the same algorithm can be used to parse the wikitext and find translation unit identifiers
 * 2) Right-hand-side blocks: The right hand side blocks could be filled in two possible ways:
 * 3) Handle everything differently - section headers, image captions, lists, etc - all handled using some rules/logic. This would involve multiple passes of the wikitexts.
 * 4) Have a common solution, irrespective of the markup. This would solely rely on Machine Translation(MT) and that should be enough to guarantee the accuracy.


 * However, relying completely on a MT tool can be a point of failure for the entire algorithm. The algorithm won't work when such tool is not available and hence, it has been decided that we go ahead with the first approach.


 * This approach would require handling everything separately in multiple passes.
 * In the first pass, section headers can be covered. The flow would be to simply check for section headers present as translation units and get the corresponding section from the translation, assuming that all the sections are in the same order in both the text.
 * In the next pass, translation units which are image captions would be checked (a text enclosed within 'File' like ... ). Then the corresponding caption for the image in the translated version would be retrieved and paired.
 * In the next pass, tables can be covered. All the translation units present in a table in source text would be matched with the text present at corresponding row and column for the translated text.
 * In the next pass, the remaining translation units can be covered by doing paragraph level alignment. Once paired, they would be matched for markup like links, bold or italic text, etc.
 * Issues: