文字体系

From MediaWiki.org
Jump to navigation Jump to search
This page is a translated version of the page Writing systems and the translation is 22% complete.

Other languages:
English • ‎Türkçe • ‎dansk • ‎français • ‎magyar • ‎português • ‎العربية • ‎தமிழ் • ‎བོད་ཡིག • ‎中文 • ‎日本語
国際化の説明文書 地域化 · システムメッセージ · メッセージAPI · 言語 · translatewiki.net · 表記体系 · 書字方向

This page gives basic information on support for various aspects of writing systems: languages written in multiple scripts; writing direction; font rendering and input.

Multiple scripts, multiple dialects

Many languages are written with multiple scripts. Often this is possible but lacks support in the software, and sometimes it is difficult if not impossible to implement. Some languages do have a LanguageConverter that adds support for multiple writing systems.

Some languages have very similar dialects that are written in the same script(s) and can—on a technical level—be treated in the same way as different scripts.

LanguageConverter

For documentation on how to use LanguageConverter, see 文字体系/構文

LanguageConverter (LC) is a system based on language variants that automatically converts the content of a page into a different variant. A variant is mostly the same language in a different script. To use the LanguageConverter, go to your Internationalisation preferences. If you are on a wiki that supports conversion, you'll see an extra option for choosing the script.

Phab:T21044 -- this needs more documentation!

It is implemented for the following languages (as of March 2017; see languagesWithVariants for the latest list):

  • Crimean Tatar (crh): Latin (crh-latn), Cyrillic (crh-cyrl)
  • English(en): Normal (en), Pig Latin (en-x-piglatin) (for testing, only when $wgUsePigLatinVariant is enabled)
  • Gan (gan): Simplified (gan-hans), Traditional (gan-hant)
  • Inuktitut (iu): Latin (ike-latn), Syllabics (ike-cans) [1.18+]
  • Kazakh (kk): Cyrillic (kk-cyrl), Latin (kk-latn), Arabic (kk-arab)
  • Kurdish (ku): Latin (ku-latn), Arabic (ku-arab) [[[phabricator:rSVN23067|1.11]]+]
  • Tachelhit (shi): Tifinagh (shi-tfng), Latin (shi-latn) [1.19+]
  • Serbian (sr): Cyrillic (sr-ec), Latin (sr-el)
  • Tajik (tg): Cyrillic (tg-cyrl), Latin (tg-latn)
  • Uzbek (uz): Cyrillic (uz-cyrl), Latin (uz-latin) [1.20+]
  • Chinese (zh):
    • Simplified Script (zh-hans): China (zh-cn), Singapore (zh-sg), Malaysia (zh-my)
    • Traditional Script (zh-hant): Taiwan (zh-tw), Hong Kong (zh-hk),[1] Macau (zh-mo)

And it is needed for many more languages!

Language code tags for scripts should follow the ISO 15924 standard.

A current limitation of this system is that it may be particularly bad at dealing with multiple writing systems based on the same underlying script. Chinese Wikipedians occasionally use => (unidirectional) for failing cases. As LC always tries to eat up the largest chunks of words using strtr in PHP, -{}- (breaking up words) can be often useful too.

Supporting configuration

The WPULS/WPUVS functions in zhwp's sitelib allows for easy variant selection in userscript UIs. This can help script writers produce a variant-aware interface for users. For other places unreachable by LC, {{int:Conversionname}} can be used to fetch the current UI language/variant.

The PreviewWithVariant gadget allows Wikipedians to check conversion results in the editor preview. You can configure it for your own wiki.

"Foreign language marker" templates like {{lang}} should add "disable conversion" markers -{ text }- around the quoted foreign text to avoid mis-conversion. On Hans/Hant wikipedias this becomes a concern for Japanese Kanji and Vietnamese Han Nom, while on wikipedias with Latin text marked for conversion this concern should be immediate.

The WikitextLC module allows for easily inserting LC commands to Lua output. Module:地区用词 allows for an adaptive output of the form "foo, known in PLACE and PLACE as bar, and PLACE as baz".

Automated title redirection on URLs may cause apparent inconvenience for interfaces without this feature. See T49725 for the Lua task and T160952 for the section-anchor task.

関連項目

書字方向

Most writing systems operate as characters written left-to-right (LTR), with lines stacked from top-to-bottom (TtB).

A few common scripts (Arabic and Hebrew in particular) write characters right-to-left (RTL) -- see directionality support for more details on how we handle right-to-left and mixed bidirectional text with HTML output and CSS styles.

注意事項として、単一言語で異なる書記方向を用いる例があり、カザフ語とクルド語ではラテン語 (右横書き) とアラビア語 (左横書き) のつづり方を使います。

注意事項としてさらに、W3Cではウェブページで使う複数の書記方向の開発において、例えば東北アジアの縦書きと、左縦書きと右縦書きに取り組んでいます。詳細はこちらをご参照ください

フォントの表示と入力

利用者向けに使いやすいフォント (字体) を用意したスクリプトは多くありません。 理由はOSにそのようなフォントが同梱されていない、あるいは利用者がフォントのインストール方法を知らない、もしくは権限がない場合が考えられます。 ウィキ自体にフォントを埋め込み、問題を解決しようとするのがWebFonts 拡張機能です。 サーバから供給するため、フォントを利用者のシステムにインストールする必要がありません。

Narayam 拡張機能では同様に、固有の表記体系で入力できるようにサポートし、利用者は外部ツールや利用するシステムのサポートに頼らなくて済みます。

References

  1. Taiwan and Hong Kong are two major variants written in the same Traditional script with significant differences in phrase usage due to market separation and influence from local zho languages, so you likely want to at least keep CN, TW, and HK in your list of variants. If you insist on flattening the scope of Chinese variants to a script-based Simp/Trad separation, follow what the reporter did in phab:T149278.