Help:Extension:ProofreadPage/2013 draft

= Proofread =

Proofreading is the reading of a galley proof or an electronic copy of a publication to detect and correct production errors of text or art. Proofreaders are expected to be consistently accurate by default because they occupy the last stage of typographic production before publication.

Proofreading produces the works on Wikisource from page scans. Page scans are normally in DjVu or PDF format which are uploaded to Wikimedia Commons. Proofreading takes place in the Index and Page namespaces before being transcluded into the main namespace. The proofreading process is split into different phases which are indicated by each page's page status. Wikisource has a style guide and certain formatting conventions that should be used during proofreading to make sure that our texts look correct and function properly. This proofreading function is provided by the ProofreadPage extension.

New users new to proofreading can experiment with the concept, and test their abilities with these simple introductory tests on the Distributed Proofreading's website.

The Proofread of the Month (PotM) is a good place to start for people who want to learn how proofreading works on Wikisource. This project runs a new work each month and invites all user to take part.

Help

 * Page scans
 * Index pages
 * Page numbers
 * Page status
 * Formatting conventions
 * Transclusion

= Proofread Page extension =

The Proofread Page extension can render a book either as a column of OCR text beside a column of scanned images, or broken into its logical organization (such as chapters or poems) using transclusion.

The extension is intended to allow easy comparison of text to the original and allow rendering of a text in several ways without duplicating data. Since the pages are not in the main namespace, they are not included in the statistical count of text units.

The extension is installed on all Wikisource wikis. However, for this to work the editor's browser (and extensions such as NoScript) must allow script processing. Your Special:Preferences page (section "Gadgets") allows you to control certain features, such as whether the OCR button is enabled and whether the text by default appears side by side or one above another.

Anybody is able to proofread and correct most pages at Wikisource. However, editors must log into an account in order to change the proofread status. IP addresses cannot change this status. When corrections and formatting are complete, the page is marked as proofread and is ready for the main namespace, leave the page as 'not proofread' until it is done. Mark as problematic if appropriate.

Creating your first page

 * Before following these steps ensure you have followed the instructions in Using DjVu with MediaWiki.
 * Create a page in the "Page" namespace (or the internationalized name if you use an not-English wiki). For example if your namespace is 'Page' create 'Page:Alice in Wonderland.djvu'
 * Create the corresponding file for this page File:Alice in Wonderland.djvu
 * Create the index page 'Index:Alice in Wonderland.djvu'
 * To edit page 5 of the book navigate to 'Page:Alice_in_Wonderland.djvu/5' and click edit

OAI-PMH
Since 28904, the extension has an OAI-PMH API for index pages. This API is implemented in a new special page Special:ProofreadIndexOai using a basic OAI-PMH protocol with Simple Dublin Core (oai_dc) and Qualified Dublin Core (prp_qdc). This repository provides the data stored in index pages. [//wikisource.org/wiki/Special:ProofreadIndexOai?verb=ListRecords&metadataPrefix=prp_qdc Example in oldwikisource].

Sets based on MediaWiki categories can be configured in Mediawiki:Proofreadpage_index_oai_sets that contain a JSON array like:

Text layer extraction from djvu file
DjVu files may contain a text layer, typically for the OCR text. This text is extracted when a page is edited for the first time, and added to the edit window.


 * Examples:


 * s:en:Page:Light waves and their uses.djvu/104 (the page was deleted for the purpose of the demonstration).
 * s:fr:Livre:Hugo - La Légende des siècles, 1e série, édition Hetzel, 1859, tome 2.djvu (click on pages)


 * Configuration:

The file description page might need to be purged if the djvu file was uploaded before the feature was added.

Configurable Headers and Footers
The default content of page headers and footers can be configured in Mediawiki:Proofreadpage_default_header and Mediawiki:Proofreadpage_default_footer.

In addition, this default value can be adapted to each book. For this, admins need to add 'header' and 'footer' fields to the index pages.

Proofreading path
ProofreadPage has five quality levels :

The command
Used on index pages, to display links to pages. The name of the index page must match the name of the djvu file.

 where X, Y, Z, A, B are page numbers
 * Syntax:

The "from...to" parameters define an interval of pages. Example :

The "AtoB" parameter applies a style to an interval of pages. Style parameters may also be applied to a single page. Available styles are : "roman", "highroman", "empty". Other strings are passed to the link. Example :  In this example, '1to10' is an interval, and 11 is a single page.

It is possible to define overlapping intervals, or to modify a single page within an interval. Example : 

Counters : if a numeric parameter is applied to a page number, it resets the page counter. Example :


 * Examples :
 * see here for an example
 * [//es.wikisource.org/w/index.php?title=%C3%8Dndice:Dar%C3%ADo_-_Eleven_Poems.djvu&diff=next&oldid=554168 see here for another example with Roman numbers first]

The command
This command transcludes a series of pages from an index. It also inserts links between pages, with the page numbers taken from the index page.

With djvu indexes, parameters should be integers : . With other indexes, parameters should be page names: . Section transclusion is possible for the first and last page: . Section transclusion can be applied to all pages too (cannot be used with fromsection and tosection): .
 * Syntax:


 * Options in order to improve transclusion system of multi-pages books (with .djvu or .pdf file):


 * step
 * Transclude only one page on n. By example :  show the 1st, 3rd, 5th,7r and 9th pages.


 * exclude
 * Don't include following pages. By example :  show the 1st, 6th, 7th, 8th and 10th pages.


 * include
 * Include following pages. By example :  show the 2th, 3th, 4th, 5th and 9th pages.

We can, of course, use all the attributes on the same tag. By example  will show 1st, 5th, 7th, 9th and 31th pages.

Note: Filename components need to be wrapped in "quotation marks" if they contain spaces, or else the spaces in the filenames need to be replaced with underscores (_). Quotation marks also must be used if the filename contains a non-ASCII character.

The template Mediawiki:Proofreadpage_pagenum_template is inserted before each transclude page. It is used to display page numbers, in the text or in the margin. It accepts two parameters : 'page' for the page, 'num' for the page number. example
 * Configuration

Note: This transclusion method inserts a space between all pages. Thus, it is not possible to divide a word across two pages and have it displayed correctly. The recommendation is not to divide words.

User options
The following options can be made available in the user's preferences, as gadgets: The following options is available in the user's preferences:
 * Default layout of the edit window can be horizontal instead of vertical en:MediaWiki:Gadget-pr layout.js
 * Show the headers/footers in the edit window (in Preferences/Editing). Name in software : proofreadpage-showheaders

Configuring index pages
Index pages can be configured by modifying two templates : In addition, some fields of the index page can be passed to the headers/footers. They must be indicated in
 * MediaWiki:Proofreadpage index template: this template defines how the index page is rendered.
 * MediaWiki:Proofreadpage index attributes : this template defines the list of fields in the edit form.
 * MediaWiki:Proofreadpage js attributes

For language interwiki
 * MediaWiki:Proofreadpage specialpage text

About journal issues and partial publication
It is not a good idea to create an index page for a few pages of a book, or for a few pages of a journal issue. Another person might create another index with other pages from the same journal issue, and might not know that another index already exists for the same book.

If you want to publish pages from a journal issue, please name the index after the journal, not after the author of the article you are publishing.

If you create a djvu file, try to create a djvu of the whole book/issue, even if you are planning to publish only a few pages from that issue. You should not worry that the index pages will look unfinished. Centralizing all the pages of a given book/journal issue will help users who publish excerpts from the same book/issu

Headers and Navigation
The 'pages' command can generate headers automatically. For this the command must include a "header" parameter.

fr:s:La Petite Dorrit/Tome 2/Chapitre 5
 * Example :

The header is defined in MediaWiki:Proofreadpage_header_template. It is a template that reads parameters extracted from the index page. In addition, it can provide, navigation links, with the following parameters: , ,

In order to find the previous and next chapters, the index page is used as a Table of Contents.All links from the index page to the to the main namespace are interpreted as 'chapters', except the first one, which is expected to belong to the "title" field. (note: if your wiki does not have an author namespace, this will not work, because the links to author/translator pages will this wrongly interpreted as chapters.)

All parameters defined in MediaWiki:Proofreadpage js attributes are passed to the header template, additionally you can pass any named parameters to this template with a , obviously such parameters needs to be handled by the template. The same mechanism can be used to overload parameter value, e.g.  will avoid to use the default value the extension get from the Index page.

Page numbers are also available: ,

Finally, the value assigned to the "header" parameter is available as : This can be combined with parser functions, in order to define several styles of headers.

A special case is made by the extension for call to the without from and to parameter, in this case is assigned to toc and the TOC is transcluded from the Index: page.

Proofreading status indicator
A coloured proofreading status indicator is displayed in the main namespace, under the title of pages that use transclusion. It shows the proofreading status of transcluded pages from the "Page" namespace. Here is how it looks like :



In this example the text is 40% validated, 30% proofread, 25% raw, and 5% of the transcluded pages are problematic. It also contains a (hidden) backlink to the index page, that can be captured by local javascript.

This indicator is defined by a system message, and it can be configured by admins. Mediawiki:Proofreadpage_quality_template

In the Swedish Wikisource, similar bar graphs are also generated by the template Statusstapel, for example on s:sv:Wikisource:Statistik.

Special:IndexPages
This special page lists index pages and their proofreading status. Index pages that were created before the introduction of this feature need to be purged in order to be displayed in the list.

Pages are ordered using the following criterion : 2*(#validated) + (#proofread). This is intended to reflect the number of proofreading actions. In the future more options will be available.