Writing systems

From MediaWiki.org
Jump to: navigation, search

This page gives basic information on support for various aspects of writing systems: languages written in multiple scripts; writing direction; font rendering and input.

Multiple scripts, multiple dialects[edit | edit source]

Many languages are written with multiple scripts. Often this is possible but lacks support in the software, and sometimes it is difficult if not impossible to implement. Some languages do have a LanguageConverter that adds support for multiple writing systems.

Some languages have very similar dialects that are written in the same script(s) and can - on a technical level - be treated in the same way as different scripts.

LanguageConverter[edit | edit source]

LanguageConverter is a system based on language variants that automatically converts the content of a page into a different variant. A variant is mostly the same language in a different script. To use the LanguageConverter, go to your Internationalisation preferences. If you are on a wiki that supports conversion, you'll see an extra option for choosing the script.

bugzilla:19044 -- this needs more documentation!

It is implemented for the following languages (as of July 2012; see languagesWithVariants for the latest list):

  • Chinese (zh): Simplified (zh-hans), Traditional (zh-hant), zh-cn, zh-tw, zh-hk, zh-mo, zh-sg, zh-my
  • Gan (gan): Simplified (gan-hans), Traditional (gan-hant)
  • Inuktitut (iu): Latin (ike-latn), Syllabics (ike-cans) [since 1.18]
  • Kazakh (kk): Cyrillic (kk-cyrl), Latin (kk-latn), Arabic (kk-arab)
  • Kurdish (ku): Latin (ku-latn), Arabic (ku-arab) [since 1.11]
  • Serbian (sr): Cyrillic (sr-ec), Latin (sr-el)
  • Tachelhit (shi): Tifinagh (shi-tfng), Latin (shi-latn) [since 1.19]
  • Tajik (tg): Cyrillic (tg-cyrl), Latin (tg-latn)
  • Uzbek (uz): Cyrillic (uz-cyrl), Latin (uz-latin) [since 1.20]

And it is needed for many more languages!

Language code tags for scripts should follow the ISO 15924 standard.

A current limitation of this system is that it may be particularly bad at dealing with multiple writing systems based on the same underlying script.

See also[edit | edit source]

Directionality[edit | edit source]

Most writing systems operate as characters written left-to-right (LTR), with lines stacked from top-to-bottom (TtB).

A few common scripts (Arabic and Hebrew in particular) write characters right-to-left (RTL) -- see directionality support for more details on how we handle right-to-left and mixed bidirectional text with HTML output and CSS styles.

Note that an individual language can be used with scripts that have different directionalities, such as Kazakh and Kurdish which support Latin and Arabic variants.

Note also that the World Wide Web Consortium is working on developing more directionalities for the use in web pages, such as North East Asian top-to-bottom ones, with lines stacked either from left to right or right to left. [1]

Font rendering and input[edit | edit source]

Many scripts do not have proper fonts easily available to users. This may be because operating systems do not ship these fonts, or users don't know how to install them or don't have enough permissions to do this. The WebFonts extension tries to solve this by embedding the fonts in the wiki itself. Fonts will be served from the server and the user's system would not need to have the fonts installed.

Similarly, the Narayam extension adds support to be able to type a certain script, so users do not have to rely on external tools or support on their systems.