Extension:Translate/Mass migration tools/Alignment

Alignment Algorithm
The step 2 of this project requires importing the translations which were already present before FuzzyBot's edit to the page. These need to be aligned with the translation unit identifiers made by the Translate extension.



As per the mock up design, the only thing we need to worry about is filling out the left and right hand side blocks with appropriate texts. The left hand side blocks are for the source text (English) and the right hand side blocks are for the corresponding imported translations.


 * 1) Left-hand-side blocks: The left-hand side blocks are already shown at Special:Translate and hence, the same code can be used to parse the wikitext and find translation unit identifiers
 * 2) Right-hand-side blocks: Assuming we have all the translation unit identifiers with us, we form a list as [T1, T2, T3, ....] and loop over it. Also, keep a marker M at the beginning of the translated text for the language selected. Foreach Translation Unit Identifier as T -
 * 3) Find the number of sentences present in T
 * 4) Extract those many number of sentences from the translated text starting from the marker M
 * 5) Check for the number of links present in source text and translated text and make sure they point to the same page. If no links are present (section headers, for example), we have to rely on the next step.
 * 6) Further, use a machine translation interface (provided by Content translation or some third party library) and check if T matches upto a certain percentage with the translated text segment
 * 7) Move the marker at the end of the extracted text