Thread:Extension talk:CodeEditor/Box width and syntax highlight locations/reply (2)

I might be able offer some background and pointers on this... but I've been wrong before so take it for what its worth.

The ProofreadPage extension is exclusively used on the Wikisource projects - specifically for side-by-side transcription of primarily written works published in print-only and later scanned into doc or image files (.PDF or .DjVu file formats mostly).

I believe the width issue raised has to with do the customized $wgTextarea-to-thumbnail (the side-by-side part) "layout" in the Page: (ns-104) namespace. See an example HERE. Personally, that entire approach is probably filled with all sorts of problems and could do better if overhauled from scratch but that just my opinion based on all the time I've spent at Wikisource.

The XML issue is far more complicated. In a nutshell both PDF and DjVu document formats contain a "hidden" text layer underneath what amounts to a scanned image of a printed page of mostly (rich) text. Currently, this hidden text-layer, when present, is automatically "dumped" into the editbox upon article creation in the Page: namespace. Nearly all of the formatting and detail is lost in the process and what we get left with transcribing is little more than plain-text - usually generated from and OCR of the scanned pages.

Putting PDFs to the side for the moment & focusing on just DjVu files, it is believed there are ways to take that text-layer and convert & parse it as XML, then modify an associated .DTD file all to produce "dumps" that retain a lot if not all of that use useful detail/formatting info (potentially saving huge amounts of effort wasted in [re]transcribing content).

I can't speak for Aubrey & co. but the problem starts with ancient coding that skips the creation of the XML variant and goes straight to doing a simple text dump instead. I believe the files in question can be found in the git Core under .../includes/media/DjVu.php & /DjVuImage.php (both calling executables from sourceforge.net's DjVuLibre project & probably woefully out of date to boot).

Drop me a line at my talk page if the above wasn't enough to discourage you so far.