Writing systems/LanguageConverter

From mediawiki.org
Jump to navigation Jump to search
This is mostly based on a translated version from the H:AC help page on the Chinese edition of Wikipedia. Don't even think about translating it back to Chinese.

LanguageConverter (LC) is a system that converts between language variants via means of character/word replacement.

Presentation[edit]

The original reason for deploying LC comes from Chinese Wikipedia, as users from all over the world use different variants of Chinese (or more precisely, Mandarin). There are many differences in the nature of the Chinese orthographic system that they need or contribute, such as the differences between simplified and traditional characters within the same ideographic script (possibly as well different scripts used, such as Latin or Cyrillic), different terminologies in different regions (should we write "blog" as 網誌 meaning "web log" or just use a phonetic transcription?), differences in written languages caused by differences in dialects, and so on. MediaWiki grouped a specific set of these properties together and called it a "character mode" (and later a "language variant"). It can be said that a pattern of characters is a collection of certain Chinese properties.

In order to integrate the diverse resources of readers and editors, and to promote exchanges between all parties, a local wiki does not have to regulate which language variant readers or editors should use, but needs to adapt to these differences through automatic conversion by computer programs, so that editors can provide information with one's own usual words and orthographic conventions, and also allow readers to choose the words and orthographic system they want.

In order to avoid forcing users into writing and reading one single variant, MediaWiki sites uses a the LanguageConverter (LC) in order to "convert" an article's source code, which may be a mix of different variants, to a selected target variant. This mixture, however, does require some care in handling. LC is built into MediaWiki as an additional pass when parsing the wikitext source code. Running on a form of proto-HTML, it is able to convert regular content as well as category links.

The automatic conversion in "word mode" is related to the principle of the wiki system itself. Most editors enter the content of the article in the system, including the text and the wiki markup syntax, etc., which are referred to as "source code" here. The wiki system usually retains the complete source code without automatic conversion. When readers use the Wiki system, they do not directly read the source code, but the system automatically converts the source code into a suitable form immediately, such as adding pictures, hyperlinks, etc. And Chinese Wikipedia's "word mode" conversion is one of many automatic conversion programs. The ability to automatically convert contents by computer programs is not scoped only to encyclopedia entry articles, but also to the page classification and other pages.

In most cases, LC operates on words and characters according to a predefined conversion table. Sometimes, it is automatically converted according to the method in the source code specified by the editor, including no conversion or the so-called "manual conversion". The conversion table is a table that lists the mapping between different word patterns, morpheme to morpheme, or word to word. Currently only the administrator can edit the conversion table. The so-called "manual conversion" is still an automatic conversion by the wiki system when readers use it, but this time the program is based on the method specified by the editor in the source code. Editors can switch to other word modes to check the situation after editing and archiving.

By including certain instructions in the source code, users can customize the conversion rules, include some one-off "manual" conversion tags, or even disable the conversion completely. Editors can verify the output of the convertor by switching variants after saving or by using the PreviewWithVariant gadget.

Therefore, there are special considerations when editing and reading Chinese Wikipedia and other wikis where this technology was developed and deployed for their locally supported languages.

This page is incomplete as of now. For "instantly useful" information, see Writing systems/Syntax and Writing systems#LanguageConverter.

Variant selection[edit]

Variants in user interface[edit]

...

Conversion technology[edit]

...

Default conversion tables of the system[edit]

...

Custom conversion tables[edit]

...

Using the -{}- tags[edit]

...

Source code generally immutable and not modified[edit]

...

Word hyphenation problems: common problems with automatic conversion programs[edit]

...

Code to control the automatic conversion[edit]

...

Common conversion tool syntax[edit]

...

Item title[edit]

...

Full text prohibits automatic conversion[edit]

...

Range of automatic conversion[edit]

Page classification[edit]

...

Software problem[edit]

...

Internal links, URLs, redirects and searches[edit]

...

How to create a traditional and simplified redirect page[edit]

...

Notes when editing general articles[edit]

...

See also[edit]