Project:PD help/export

This page exists in order to discuss the process for creating/structuring and maintaining the Help: namespace on MediaWiki.org, and in particular being able to export it in a form usable in other wikis. This is a proposed solution, and feedback is encouraged. There are no doubt other methods that we could use - please use the talk page to discuss alternatives, or create new pages if you want to lay out proposals in detail (rather than placing alternative solutions on the same page).

As it will still be a while before a robust export process is made available, you may want to investigate a method of mirroring the current help namespace. This is less flexible but is a good interim method of getting a help: namespace up and running.

Goals

 * To provide public domain help content for the MediaWiki software
 * The help should be available in as many languages as possible
 * Our mechanism should be able to scale to hundreds of languages.
 * To have a simple set of guidelines for people creating the help content
 * As few rules as possible.
 * To automate the process of converting the on-wiki help content into downloadable files, ready for import.
 * A secondary aim might be to have an automated script to import the files too
 * To make the help content available in the following forms:
 * Single language, in the main (localised) Help: namespace.
 * All languages combined (a mirror of Help: on MW.org)
 * Multiple arbitrary languages, one in the main Help: namespace, the rest as sub-pages (as per MW.org).
 * With or without images (though this could be tricky).

Rules for the Help: namespace
It has been proposed that the following small set of rules is used when writing help documentation within the Help: namespace. These are separate and in addition to any editorial or stylistic rules that may also be adopted.
 * All main pages contain English content only.
 * All non-English pages are sub-pages of the English equivalent. E.g.   (English),   - French (not ,   or  ).
 * Help pages may not contain sub-pages that are not language sub-pages.
 * Language sub-pages will be named using the Wikipedia prefix for that language.
 * If a Wikipedia has not been started in a language then we should not host help pages in that language.
 * All links within the Help: namespace should be to pages in the same language even if that page doesn't exist yet.
 * Links to other pages on the wiki (non-PD Help) are allowed (but are discouraged).
 * Interwiki links are allowed (but are discouraged).
 * Links to external sites are allowed (but are discouraged).
 * A template will be created that will perform the following functions, which should be placed at the top of each page (the template will not be created until this whole process is finalised):
 * Display the PD help notice
 * Create links to all other language-versions of the page.
 * Add the page to the appropriate help category.
 * Display the 'correct' page title (what it would be called in that language). This will be passed as an argument to the template.
 * Any templates that will be required in the exported help pages must be defined in the Help: namespace (and should be in the format.
 * Any templates that are only used on MW.org and which should not be exported should be placed in the standard Template namespace.
 * Help: pages may be placed in categories. Category names that begin 'Help:' will be exported, any others will not.
 * The rules here also apply to all categories that begin 'Help:'.
 * No extension-specific markup (including from the ParserFunctions extension) should be included in the Help: namespace.
 * All wiki markup and other MW features used in the Help: namespace must work on the 3 most recent major versions (e.g. if the last release was 1.7.1 then all pages should work on 1.7.x, 1.6.x and 1.5.x)
 * Wiki text or features that do not work on older versions than the above are discouraged, but allowed.

Dumps
The dumps will be in the standard MW export format. The following dumps will be available:
 * A single dump containing all languages. This will mirror the current Help: namespace.
 * Individual dumps for each language, ready to be imported to the main Help: namespace.
 * Individual dumps for each language, ready to be imported into appropriate sub-pages within the Help: namespace.

We also need to consider how images are handled.

Exporting the data
Exporting the data will be an automated process, that will create the above dumps from the pages in the Help: namespace. The format of the exported code is already defined (it is the standard export format generated by Special:Export). The program checks all pages in the namespace and adds them to the appropriate language file, with the following modifications to the wiki text:


 * All template inclusions that do not start Help: are removed.
 * All interwiki links are expanded to full URLs, using the data in the interwiki table.
 * All internal links that do not point to the Help: namespace are rewritten as full URLs pointing to MW.org
 * All internal links within the help namespace are left as they are, with the following exceptions:
 * If exporting the English pages as sub-pages, all pages are rewritten from Help:Name to Help:Name/en
 * If exporting non-English pages as main pages, all pages are rewritten from Help:Name/lang to Help:Name
 * The same translation is performed on template inclusions within the Help: namespace.
 * The log will contain warnings about help pages that link to other-language pages, but these will not be modified.
 * Category links that do not begin 'Category:Help:' are removed.
 * Links to our special template (which will have a pre-defined name) will be changed to link to an alternative template that is simply displays the 'correct' name of the page.
 * These names have not been finalised, and there will be multiple versions of the templates (one for each language). Example:   might become.

A dump is made for each language as a 'main' page and as a 'sub-page'. In addition the English 'main' pages and all the other sub-pages are combined into the single complete dump.

The following points should also be noted:


 * is not exported.
 * Only the most recent version of a page is exported - the history is not exported.
 * Blank pages are not exported.
 * Redirects to pages outside of the Help: namespace are not exported (these are generally pages which have been moved)
 * Useful redirects within the Help: namespace will be kept, so that they can be exported too. Redirects that result from a page move and which are not useful should be deleted.
 * We should standardize soft redirects, so the scripts can recognize those, as well. (please expand on this...)
 * Page author is 'MediaWiki Default' (as per default template messages).
 * The edit summary is 'Imported from MWURL', where MWURL is a clickable link to the original page source.
 * Note that this assumes the required entry is in the interwiki table, since we need the syntax to make the link.
 * The edit date is the date of export (though ideally it will be the date of import to the target wiki).

All categories that begin 'Help:' are also exported, using the same rules as above.

Native language for page title
It would be good to have the main pages named in the appropriate language when imported into the new wiki. In order for this to be feasible, we will also require an import script to do whatever conversions are required in order to make this work. Without an import script we should stick to the above method of page naming otherwise it will not be possible to 'add' a language to a wiki. However, once this is implemented, it should be possible to achieve a fully-flexible naming system on target wikis without altering the layout of MW.org.

Vandalism
If we are automating the dump process, we probably need some way of flagging 'safe' (non-vandalised) copies of the help content. We should not be hosting dumps that contain vandalism - it will damage our reputation to provide downloadable help files that say "FSDF YOUR GAY". This could be done using the patrolled edits mechanism (with only 'trusted' users able to approve edits), by installing a review/validation extension, or using a custom method, e.g. the export script taking the most recent version of the page by someone on a list of 'trusted' users (this would require that all changes by 'non-trusted' users will need to have a subsequent (posisbly 'null') edit by a 'trusted' user in order to get the latest version accepted). There may be other possible solutions to this, so suggestions welcome.