Northern Sami Wikipedia notes 2011

Some notes about small projects covers identified troubles and possibilities at Northern Sami Wikipedia. This project is very small and has identified a number of problems with the current software. In the following the reasoning is to make a few editors more productive by removing time-consuming unnecessary steps and cannibalizing material from larger projects. Hopefully this can be a sufficient aid for a small group to produce a basic lexicon in a small language from larger lexicons in other languages.

Often years are written at this project in the form 1997:as, 1997:is, etc. This form fails to make proper links and must be written as 1997:as and 1997:is even if it should be possible to write them as 1997:as and 1997:is. If this is to be solved some of the regex-patterns must be updated on a per-language basis. For examples see Ole Henrik Magga.

Words are often changed after more or less well-defined prefix, infix and suffix rules. A link should be checked on a successive back of model whereby more and more aggressive rules are applied. If a run of a particular regex pattern or combination of such patterns produces a match on a particular page then it is used. All such rules adds up and if the total seems to be to radical the rewrite is dropped altogether. Such rules will fail, but they will save a lot of typing. For examples see Oppland which becomes Opplándda. Note that this is wetter it is possible to find a legal string of transformations from one known form into another known form, it is not a free form translation. As such it is far easier.

Small projects needs as much help as possible to create a back bone structure. Such structures are for example categories and templates. Construction of such structures is somewhat difficult for beginners and it should be possible to reuse them.

If it were possible to declare common categories and rules for initiating them it would greatly help such small projects. Perhaps this could be a page where category names can be declared and connected to other more general names in larger projects, with fallbacks if the given granularity isn't useful for the specific project. If there are to few articles about some geographical feature at a municipality level, then bump them up to the county. If the county is still to granular, then bump it up to a country level. The editor could categorize Máze as a small place in Guovdageaidnu, but any entry to a category Wikipedia:se:Category:Guovdageainnu báikkit would be bumped to Wikipedia:se:Category:Finnmárkku báikkit.

If it is possible to bump placement in categories in this way it will also be possible to reuse categorization on another project. If Ole Henrik Magga at Norwegian (bokmål) Wikipedia is categorized at a to fine granularity, then the article is bumped upwards at Northern Sami Wikipedia until the categories works out. If subcategories are defined, then articles moves down to lower ones as appropriate.

Templates are very difficult to make for new beginners. It would be very helpfull if a set of carefully crafted and localizable templates are available from the very beginning. This includes navigational aids, infoboxes and maintenance templates. Some of these are pretty easy to make reusable while others are very difficult. If possible such templates could be stored at Commons, Meta or Mediawiki. For an example of reusable maintenance templates see Wikipedia:no:Template:$maintenance.

Construction of articles from infoboxes on other projects could be very interesting, but right now it is very difficult to even reuse information across projects through cut and paste. The whole discussion about a data commons is a little to involved for this, whats necessary to get this to work is a simple syntax that makes importing data a straight forward task. Instead of a there should be something like , possibly even only acting like a subst statement while the page being stored. This will make it possible to create special templates that constructs articles from infoboxes at other projects.

Because infoboxes may include lists of entries there must be some methods to transform such lists into readable sentences. Especially a list of items should be transformed into something that has a head, a middle part and an end. Before, after, and in between all of those there shoulde be joiners. In Northern Sami the joiner between the middle and end part is ja, in English it is and and in Norwegian it is og. All other joiners are set to comma. On a per instance basis it should be possible to override this, for example like.

Sometimes a parameter will have a specific form but this form does not fit in a new role. In Northern Sami this will for example be the situation where a place name is imported and used in an aggregate to name a church. In Norwegian we write Aurdal kirke, while in English this becomes Aurdal church. The name of the place does not change. In Northern Sami this becomes Aurdalas girku. This kind of transformation is defined as part of tools created by Sámi giellatekno, a language project at University of Tromsø. The syntax is pretty simple and straight forward and we could do something like it in a parser function. In this specific example the the call Aurdal+N+Prop+Plc+Sg+Loc will generate Aurdalas, and we may write this as where the word is Aurdal and the parameters are +N+Prop+Plc+Sg+Loc. We can even slightly redefine this as to make some sort of guided translation of fixed strings. If it fails the translation can simply be left to the editor. Note that this can also be generalized to translate by example, and also that it should be possible to change parameter sets generated from an example.

It is important to note that this kind of article production is not about translating existing articles, its about creating articles from well-defined infoboxes in other languages. It seems like statistical translations will not work very well but rule based translations like the ones produced by Apertium can work.