Help:Extension:ProofreadPage/2013 draft

From mediawiki.org

Proofread[edit]

Proofreading is the reading of a galley proof or an electronic copy of a publication to detect and correct production errors of text or art. Proofreaders are expected to be consistently accurate by default because they occupy the last stage of typographic production before publication.

Proofreading produces the works on Wikisource from page scans. Page scans are normally in DjVu or PDF format which are uploaded to Wikimedia Commons. Proofreading takes place in the Index and Page namespaces before being transcluded into the main namespace. The proofreading process is split into different phases which are indicated by each page's page status. Wikisource has a style guide and certain formatting conventions that should be used during proofreading to make sure that our texts look correct and function properly. This proofreading function is provided by the ProofreadPage extension.

New users new to proofreading can experiment with the concept, and test their abilities with these simple introductory tests on the Distributed Proofreading's website.

The Proofread of the Month (PotM) is a good place to start for people who want to learn how proofreading works on Wikisource. This project runs a new work each month and invites all user to take part.

Help[edit]

Proofread Page extension[edit]

The Proofread Page extension can render a book either as a column of OCR text beside a column of scanned images, or broken into its logical organization (such as chapters or poems) using transclusion.

The extension is intended to allow easy comparison of text to the original and allow rendering of a text in several ways without duplicating data. Since the pages are not in the main namespace, they are not included in the statistical count of text units.

The extension is installed on all Wikisource wikis. However, for this to work the editor's browser (and extensions such as NoScript) must allow script processing. Your Special:Preferences page (section "Gadgets") allows you to control certain features, such as whether the OCR button is enabled and whether the text by default appears side by side or one above another.

Anybody is able to proofread and correct most pages at Wikisource. However, editors must log into an account in order to change the proofread status. IP addresses cannot change this status. When corrections and formatting are complete, the page is marked as proofread and is ready for the main namespace, leave the page as 'not proofread' until it is done. Mark as problematic if appropriate.

How to use it[edit]

Creating your first page[edit]

  • Before following these steps ensure you have followed the instructions in Using DjVu with MediaWiki.
  • Create a page in the "Page" namespace (or the internationalized name if you use an not-English wiki). For example if your namespace is 'Page' create 'Page:Alice in Wonderland.djvu'
  • Create the corresponding file for this page File:Alice in Wonderland.djvu
  • Create the index page 'Index:Alice in Wonderland.djvu'
  • To edit page 5 of the book navigate to 'Page:Alice_in_Wonderland.djvu/5' and click edit


Configuration[edit]

Configuration of index namespace[edit]

For more details, see Extension:Proofread Page/Index data configuration

The configuration is a JSON array of properties. Here is the structure of a property in the array, all the parameters are optional, the default value are set:

{
  "ID": { //id of the metadata (first parameter of proofreadpage_index_attributes)
    "type": "string", //the property type (for compatibility reasons the values have not to be of this type). Possibles values: string, number, page. If set, the newly set values should be valid according to the type (e.g. for a number a valid number, for a page an existing wiki page...)
    "size": 1, //only for the type string : number of lines of the input (third parameter of proofreadpage_index_attributes)
    "values":  {"a":"A", "b":"B","c":"C", "d":"D"}, //an array values : label that list the possible values (for compatibility reasons the stored values have not to be one of these)
    "default": "", //the default value
    "header": false, //add the property to MediaWiki:Proofreadpage_header_template template (true is equivalent to being listed in proofreadpage_js_attributes)
    "label": "ID", //the label in the form (second parameter of proofreadpage_index_attributes)
    "help": "", //a short help text
    "delimiter": [], //list of delimiters between two part of values. By example ["; ", " and "] for strings like "J. M. Dent; E. P. Dutton and A. D. Robert"
    "data": "" //proofreadpage's metadata type that the property is equivalent to
  }
}

The data parameter can have for value: "type", "language", "title", "author", "translator", "illustrator", "editor", "school", "year", "publisher", "place", "progress"

Page separator[edit]

The extension puts a separator between every transcluded page and the next, which is defined by wgProofreadPagePageSeparator. The default value is   (a whitespace). Set wgProofreadPagePageSeparator = "" to suppress the separator.

Join hyphenated words across pages[edit]

When a word is hyphenated between a page and the next, the extension joins together the two halves of the word. Example: his- and tory becomes history. The "joiner" character is defined by wgProofreadPagePageJoiner and defaults to '-' (the ASCII hyphen character).

Configure change tagging (optional)[edit]

See Change tagging to set up change tags.

Since gerrit:28904, the extension has an OAI-PMH API for index pages. This API is implemented in a new special page Special:ProofreadIndexOai using a basic OAI-PMH protocol with Simple Dublin Core (oai_dc) and Qualified Dublin Core (prp_qdc). This repository provides the data stored in index pages. Example in oldwikisource.

Sets based on MediaWiki categories can be configured in MediaWiki:Proofreadpage_index_oai_sets that contain a JSON array like:

{
  "test": { //spec of the set ie its ID
    "name": "Test", //The set name
    "category": "tests_list", //The category to use, without the "Category:" prefix
    "description": "A test set." //Description of the set, optional
  }
}


Text layer extraction from djvu file[edit]

DjVu files may contain a text layer, typically for the OCR text. This text is extracted when a page is edited for the first time, and added to the edit window.

Examples
Configuration

The file description page might need to be purged if the djvu file was uploaded before the feature was added.

Configurable Headers and Footers[edit]

The default content of page headers and footers can be configured in MediaWiki:Proofreadpage_default_header and MediaWiki:Proofreadpage_default_footer.

In addition, this default value can be adapted to each book. For this, admins need to add 'header' and 'footer' fields to the index pages.

Proofreading path[edit]

ProofreadPage has five quality levels :

Without text
not yet created page Not proofread Proofread Validated
Problematic

The <pagelist/> command[edit]

Used on index pages, to display links to pages. The name of the index page must match the name of the djvu file.

Syntax
<pagelist from=X to=Y Z=foo AtoB=bar />

where X, Y, Z, A, B are page numbers

The "from...to" parameters define an interval of pages. Example :

<pagelist from=10 to=100 />

The "AtoB" parameter applies a style to an interval of pages. Style parameters may also be applied to a single page. Available styles are : "roman", "highroman", "empty". Other strings are passed to the link. Example :

<pagelist 1to10="roman" 11="Foreword"/>

In this example, '1to10' is an interval, and 11 is a single page.

It is possible to define overlapping intervals, or to modify a single page within an interval. Example :

<pagelist 1to5="empty" 3to10="roman"  />

Counters : if a numeric parameter is applied to a page number, it resets the page counter. Example :

<pagelist 1to10="roman" 11=1 />
Examples

The <pages/> command[edit]

This command transcludes a series of pages from an index. It also inserts links between pages, with the page numbers taken from the index page.

Syntax

With djvu indexes, parameters should be integers :

<pages index="foo.djvu" from=100 to=200 />. 

With other indexes, parameters should be page names:

<pages index=foo from=foo_page1.jpg to=foo_page15.jpg />. 

Section transclusion is possible for the first and last page:

<pages index="foo.djvu" from=100 to=200 fromsection=section2 tosection=section1 />.

Section transclusion can be applied to all pages too (cannot be used with fromsection and tosection):

<pages index="foo.djvu" from=100 to=200 onlysection=english />.
Options in order to improve transclusion system of multi-pages books (with .djvu or .pdf file):
step
Transclude only one page on n. By example : <pages from=1 to=10 step=2 /> show the 1st, 3rd, 5th,7r and 9th pages.
exclude
Don't include following pages. By example : <pages from=1 to=10 exclude="2-5,9" /> show the 1st, 6th, 7th, 8th and 10th pages.
include
Include following pages. By example : <pages include="2-5,9" /> show the 2th, 3th, 4th, 5th and 9th pages.

We can, of course, use all the attributes on the same tag. By example <pages from=1 to=10 include="31" exclude="2-4" step="2" /> will show 1st, 5th, 7th, 9th and 31th pages.

Note: Filename components need to be wrapped in "quotation marks" if they contain spaces, or else the spaces in the filenames need to be replaced with underscores (_). Quotation marks also must be used if the filename contains a non-ASCII character.

Configuration

The template MediaWiki:Proofreadpage_pagenum_template is inserted before each transclude page. It is used to display page numbers, in the text or in the margin. It accepts two parameters : 'page' for the page, 'num' for the page number. example

Note: This transclusion method inserts a space between all pages. Thus, it is not possible to divide a word across two pages and have it displayed correctly. The recommendation is not to divide words.

User options[edit]

The following options can be made available in the user's preferences, as gadgets:

The following options is available in the user's preferences:

  • Show the headers/footers in the edit window (in Preferences/Editing). Name in software : proofreadpage-showheaders

Configuring index pages[edit]

Index pages can be configured by modifying two templates :

In addition, some fields of the index page can be passed to the headers/footers. They must be indicated in

For language interwiki

About journal issues and partial publication[edit]

It is not a good idea to create an index page for a few pages of a book, or for a few pages of a journal issue. Another person might create another index with other pages from the same journal issue, and might not know that another index already exists for the same book.

If you want to publish pages from a journal issue, please name the index after the journal, not after the author of the article you are publishing.

If you create a djvu file, try to create a djvu of the whole book/issue, even if you are planning to publish only a few pages from that issue. You should not worry that the index pages will look unfinished. Centralizing all the pages of a given book/journal issue will help users who publish excerpts from the same book/issu

Headers and Navigation[edit]

The 'pages' command can generate headers automatically. For this the command must include a "header" parameter.

Example

fr:s:La Petite Dorrit/Tome 2/Chapitre 5

The header is defined in MediaWiki:Proofreadpage_header_template. It is a template that reads parameters extracted from the index page. In addition, it can provide, navigation links, with the following parameters:

{{{prev}}}, {{{current}}}, {{{next}}}

In order to find the previous and next chapters, the index page is used as a Table of Contents.All links from the index page to the to the main namespace are interpreted as 'chapters', except the first one, which is expected to belong to the "title" field. (note: if your wiki does not have an author namespace, this will not work, because the links to author/translator pages will this wrongly interpreted as chapters.)

All parameters defined in MediaWiki:Proofreadpage js attributes are passed to the header template, additionally you can pass any named parameters to this template with a <pages index="..." my_parameter=value />, obviously such parameters needs to be handled by the template. The same mechanism can be used to overload parameter value, e.g. <pages index="..." Author=value /> will avoid to use the default value the extension get from the Index page.

Page numbers are also available:

{{{from}}}, {{{to}}}

Finally, the value assigned to the "header" parameter is available as :

{{{value}}}

This can be combined with parser functions, in order to define several styles of headers.

A special case is made by the extension for call to the <pages index="" /> without from and to parameter, in this case {{{value}}} is assigned to toc and the TOC is transcluded from the Index: page.

Proofreading status indicator[edit]

A coloured proofreading status indicator is displayed in the main namespace, under the title of pages that use transclusion. It shows the proofreading status of transcluded pages from the "Page" namespace. Here is how it looks like :

here

In this example the text is 40% validated, 30% proofread, 25% raw, and 5% of the transcluded pages are problematic. It also contains a (hidden) backlink to the index page, that can be captured by local javascript.

This indicator is defined by a system message, and it can be configured by admins. MediaWiki:Proofreadpage_quality_template

In the Swedish Wikisource, similar bar graphs are also generated by the template Statusstapel, for example on s:sv:Wikisource:Statistik.

Special:IndexPages[edit]

This special page lists index pages and their proofreading status. Index pages that were created before the introduction of this feature need to be purged in order to be displayed in the list.

Pages are ordered using the following criterion : 2*(#validated) + (#proofread). This is intended to reflect the number of proofreading actions. In the future more options will be available.


See also[edit]