Project talk:PD help/export

From mediawiki.org

Nice idea!

Even without the dumps, this can still be mirrored via a bot. It is somewhat difficult, however, since this requires grabbing the referenced templates, which sometimes require parser functions and images. So these rules would be nice.

I'm not sure about allowing (and rewriting) links is a good idea. It would be nice to have a self contained system.

It would also be useful to allow the admin to override certain pages/templates; i.e. based on protection level, or by removing a special category that has to be present to replace the page.

-Steve Sanbeg 23:55, 10 October 2006 (UTC)Reply

The point of the rewriting is so that a new wiki can use whichever language they like as their main 'help' language. E.g. if I run a wiki that uses Spanish as the main language, but also has a lot of Portugese content, I might want to have all the help that is currently in Help:xxx/es in the main namespace (as Help:xxx) and also want to include Portugese at Help:xxx/pt. We can't offer this kind of flexibility without rewriting the pages. The main thing I am trying to acheive here is for the dumps to be self-contained.
I'm not sure what you meant in your last sentence - can you elaborate?
--HappyDog 12:55, 11 October 2006 (UTC)Reply
I agree that it should be self contained, that's why I think we should avoid rewriting external links. If it's self contained, we shouldn't need external links. Even interwiki links would potentially break that, although at least then a local admin could redirect them by modifying the interwiki table, as long as we don't rewrite them.
Currently, if you mirror the help and all of the templates, the pages will generally begin with a notice that contributions are public domain, and end with broken code (unless parser functions are installed).
Ideally, there should be a simple system where someone could edit some of the articles or templates, and those won't be replaced when the next update is imported. I think the best way to do this would be to have a category for articles that can be automatically mirrored, so the import software would skip any that exist but aren't in that category
Also, it would be nice if the help templates only included other help pages, plus one header and footer template from template space, which could have the public domain notes & parser functions here, but would be locally defined there (i.e. with a notice that the page is automatically updated). This would avoid having to chase link to other templates, and then select which ones to override. -Steve Sanbeg 16:45, 11 October 2006 (UTC)Reply
I should also add that to use these now, language structure is not a problem, since it's simple to filter entries with with a /. Currently, the template issue is the main roadblock; if we could resolve that, then we'd have something. But this still looks like the closest thing to a useable, importable help system, actually quite close.-Steve Sanbeg 20:46, 11 October 2006 (UTC)Reply
Hi Sanbeg - here's some specific responses:
  • In general external links will be discouraged, but in some cases they will be necessary. There is no need for the dump script to touch these links.
  • Interwiki links may well become broken if the external wiki is not defined in the interwiki table. That is why links such as [[w:Wiki|wiki]] will be automatically rewritten to become [http://en.wikipedia.org/w/Wiki wiki], using the URLs defined in the interwiki table. Again these will probably be discouraged but may be useful in some situations.
  • The current templates will not be used in the final dumps. All templates that do not reside in the help template will be removed from the page when it is dumped. Therefore the pd help notice and language links (as it currently stands) will be removed automatically. Any templates that should not be removed (such as Template:Admin tip) will need to be in the help namespace.
  • If an update is performed by doing a standard 'import', then pages will be overwritten if they already exist (I think - either way the standard behaviour will apply here too). I am considering an import script that is a bit more sophisticated. Actually, ideally there will be a special page to manage help files, and that will have an option about how to handle cases when the content has been changed locally. However, this would be further down the line and is not really discussed on this page. In short, it should be simple for an import script to tell if a page has been edited and skip the update, but it will be impossible if using the built-in 'import' function.
  • The idea about a generic header/footer template is a good idea. To work these would be located in the help namespace, and should default to being empty. These would _not_ contain our local header/footer - these need to be separately held in the template namespace otherwise they will show up in the dumps.
In general, I don't see a problem with the templates - hopefully the above comments resolve some of the issues that you are currently seeing. Also, regarding the rewriting of the pages, the important thing is that we remove as many dependencies as possible so that we can minimise the possibility of things breaking when the dumps are made. It is an editorial decision whether to allow interwiki links (for example), but from a technical point of view we need to assume that they will exist and that they won't break if the interwiki table is empty, so that editors can get on and edit without worrying about the technical details. --HappyDog 09:54, 12 October 2006 (UTC)Reply
Yeah, I think the rules are ahead of what is currently done, so renaming some of the current templates would help a lot. The trickiest part with templates will probably be documenting how to use them, since it would make the documentation seem contrived if they can't call into template space, and if template stripping could gut those docs if it's not careful. I was even thinking a header/footer could also work as an alternative to stripping; by putting the local content into just a help_header & help_footer that aren't exported, the import wold only need to create two templates.
That is a good point, that we should make it simple for people to edit these docs. I also think we should try to make this workable as soon as possible. The import/export sounds very useful, but mirroring via pywikipedia is useful, too. -Steve Sanbeg 15:04, 12 October 2006 (UTC)Reply

Categorising[edit]

Sanbeg - you're recent edit summary reads:

Rules for the Help: namespace - may->must; we shouldn't expect 
people to import uncategorized pages. This can also help catch name 
conflicts, in case the importer has a page of that name.

Can you please expand on this - I'm not sure how it will help with conflicts. In terms of "we shouldn't expect people to import uncategorized pages" - this is an editorial decision, not a technical one, and therefore should not be included in this document. I will wait to see whether there is a technical basis to this requirement (i.e. an answer to my first question) before reverting.

--HappyDog 17:59, 16 October 2006 (UTC)Reply

I wasn't aware of a seperation between technical and editorial issues; it doesn't seem useful to me to have uncategorized pages. The editorial reason if that if someone is careful about the category use, it wouldn't be good for this to clutter the uncategorized pages list.
The technical issue is the one I was talking about before; If the user either has their own help page, or modified one of these, could the import detect that, and what would it do? I think blindly overwriting user content isn't the right thing to do, and trying to use protection or such to prevent overwriting is less than ideal. My sense is that it's best if they implicitly give permission to overwrite the file. So the import could create it if it doesn't exist, or replace it if it's in the right category. But if there is some other local page with the the same name, or they removed the category from the page becasue they're maintaining it locally, the import could do something else.
Admittedly, this is less of an issue than the current template issues; but it would an an extra level of safety, and reduce the chance of surprising someone when things get overwritten. -Steve Sanbeg 21:11, 16 October 2006 (UTC)Reply
There are two ways that a remote wiki-admin could import our help pages. Either we just provide the dumps 'as is', and they use special:import, or we also provide an import script that has some way (categories/edit history/user prompt/MD5/etc.) of deciding whether to overwrite an existing page of the same name. In the first case we have no control at all, in the second we can use whatever method we choose. Categories are not required for this, although they are one of the options.
In practical terms, I imagine that the help export script will be written long before a dedicated import script, in which case the _initial_ method will be using special:import, and that later it will become more sophisticated. However it will still be a while before these pages contain content worthy of importing into a remote wiki, so perhaps by that stage we will be ready.
Regarding the editorial/technical distinction, I strongly feel that we should place the minimum restrictions on content at this stage - those restrictions being those which are required in order to make the system work at a technical level. All other issues will evolve naturally through community interaction as they have on Wikipedia and other projects. For example, you believe that all help pages should be categorised so they don't clutter the 'uncategorised pages' page. I, on the other hand, never use that on my wiki, and would rather not have a load of unwanted categories created on my site. There are arguments on both sides, and here and now is not the place to have them. My point is that, when that decision is made it should be an editorial decision based on what the community (both writers and users) want.
With all that in mind, I am reverting the 'must' to 'may. :)
--HappyDog 23:58, 16 October 2006 (UTC)Reply
I hadn't realised that would be a contentious issue. I think the initial way would probably be to mirror using pywikipedia, since that way we can see how they look on the other side, prior to releasing dumps. Although it's important that every page has a category, the script could add its own category while it's copying, which would be safer than trusting them to be present on the source. At some point, this should be done somewhere in whatever export/import system we use.
This already seems to give a reasonable, basic overview, and is simple to mirror than meta:help. So I don't see any reason to not work on resolving minor issues and at least see how useful it is. -Steve Sanbeg 16:08, 17 October 2006 (UTC)Reply

clickable link[edit]

I tried using external links instead of interwiki links in templates & edit summaries, but then I realised that edit summaried can't have external links. So we should either specify the interwiki prefixes we assume (i,e, mw: to link back, manual: to resolve a bunch of redlinks), or drop the clickable link from the edis summary. -Steve Sanbeg 23:38, 20 October 2006 (UTC)Reply

Yeah - you're probably right. I think we should avoid interwiki links, as this requires that the person who wishes to import the help pages has access to the DB table. If external links don't work in the history, then we should avoid them. We could link to a local page in the help namespace (e.g. Help:Credits) which explains the copyright status (PD unless modified locally) and provides a link back to MW. Just something to mull over - this issue is not terribly urgent I don't think. --HappyDog 19:22, 21 October 2006 (UTC)Reply

Which languages should qualify?[edit]

Current rule:

  • If a Wikipedia has not been started in a language then we should not host help pages in that language.

I suggest to weaken it in one or two or three possible ways, which will allow the inclusion of slightly more languages. With the new language policy of the Wikimedia Foundaton, see metawiki, this appears now possible. Ideas:

  1. If a Wikimedia Foundation project (e.g. Wikipedia, Wiktionary, etc.) has been started in a language then we should host help pages in that language, but not otherwise.
  2. If a Wikimedia Foundation project (e.g. Wikipedia, Wiktionary, etc.) has been started in a language, or a test project exists in the incubator then we should host help pages in that language, but not otherwise.
  3. If a language exists in Names.php then we should host help pages in that language, but not otherwise.

Note that, since Wikipedias usually come first, and currently there is no exception to that rule, suggestion number 1 at present is effetively identical to the currently existing rule, regarding the language set that we should support. --Purodha Blissenbach 09:33, 24 May 2008 (UTC)Reply

As this is help for MediaWiki, then I think we should support the languages that MediaWiki supports, therefore the third option sounds best. I have amended the page. --HappyDog 22:17, 5 October 2010 (UTC)Reply

Making imports easier[edit]

It is fairly straightforward to find ways of exporting or downloading the help pages, and importing or uploading them into another wiki. I see only one drawback, which is, if you want to rename non-English page titles of "your language" in the target wiki to "your language", you find such localized names in the content page only, which often calls for a two pass parse through a dump or intermediate file, and requires a slightly complicated method of extration. Thus, I suggest to find a way to include localized names of all pages in the pages themselves. This will also aid copying single pages when, for instance, only a small set has been changed. There are several ideas coming to mind how to place the names into pages having language code zxx:

  1. {{DISPLAYTITLE:local page name}}
  2. {{PD Help Page/zxx|local page name}}
  3. {{some-new-template|zxx|local page name}}
  4. <--local page name--> (in the 1st line of wikicode, e.g.)

Comments:

  • The DISPLAYTITLE method has an advantage of elegance, that is, pages possibly need not be renamed in the target wiki at all.
  • Reusing the template PD Help Page has the disadvantage of putting unrelated things together in one template transclusion call.

--Purodha Blissenbach 09:33, 24 May 2008 (UTC)Reply