Manual:Adding and removing languages

MediaWiki is heavily multilingual and localized. Adding even more languages to it is a frequent activity. Sometimes, languages have to be removed, too. This is done in various contexts, and the procedures can be quite different, both from the technical and the community policies perspectives. This page documents the various language adding procedures.

Some conventions for this document:


 * "qqf" is a generic example language code.

Adding to core
MediaWiki core is usually not the first place to which a language is added; the first places are usually translatewiki.net and language-data.

A new language is usually added to MediaWiki core when the translation of the messages in the "MediaWiki core" group in translatewiki.net reaches the export threshold (as of February 2023, it's 13%; see translatewiki:Translating:MediaWiki and the "mediawiki" section in ). When this happens, the translations are automatically exported to, a bot adds the new language to Translating:MediaWiki/New languages, and one of the translatewiki administrators creates a Phabricator task to add the new language (example: ).

Preparation steps:


 * 1) Determine the ISO 639 code. No ISO 639 code—no adding! (At this stage, it should have already been checked when adding to language-data and translatewiki, but you still need to double-check it, because Names.php is one of the most notable and stable locations for language configuration in MediaWiki.)
 * 2) Determine the autonym. By this point, this should have already been done when the language was added to translatewiki or language-data, but it doesn't hurt to double-check. Check the patches that added the languages to those other places, and verify the sources for the autonym.
 * 3) Do basic quality control on the translations: No one can know all the languages, but it's possible for anyone with MediaWiki experience to check simple things:
 * 4) Messages must be actually translated and not just copied from English.
 * 5) General syntax correctness is practiced with magic words, links, etc.
 * 6) Decide whether the language needs a fallback language. English is the default final fallback language and doesn't need an explicit definition. Don't guess this, but ask native speakers whether it's better for most people who speak this language to see untranslated things in English or in some other language. Remember that one language may be spoken in several countries.
 * 7) Optional, but highly recommended: Determine what are all the characters that are necessary for writing this language's words. This is necessary for defining the linktrail. The Wikipedia articles about the language is often a good source for the alphabet, but double-check them with external reliable sources. If the language is written in the plain 26-letter ASCII Latin alphabet without any diacritics or special characters, then an explicit linktrail is not necessary.
 * 8) * Note: Languages that are written in scripts of East Asia (Chinese, Japanese, Korean) or Southeast Asia (such as Thai, Burmese, Javanese, etc.) probably don't need a linktrail. If in doubt, ask a speaker.
 * 9) Optional, but highly recommended: Ask people who know this language well (for example, trustworthy translatewiki translators or Incubator contributors) to give you translations for the namespace names. Useful documentation about this for translators can be found on the page Translating:MediaWiki#Translating namespace names. The list is short, but give the translators some time to think about it: it's a bit difficult to change it later, so it's important to get it right.
 * 10) * Note: Check that none of the namespaces are the same as language codes! This creates ambiguity with interlanguage links in wikitext.
 * 11) Optional: Get date formats for the language.

Make a Gerrit patch in core:


 * 1) Add an entry about adding this language to the "Languages updated" section in the newest version RELEASE-NOTES.
 * 2) Edit  . Add an entry for the language. Copy the autonym from translatewiki or language-data (they should be the same; if they aren't, something may be wrong somewhere). The name doesn't have to begin with a capital letter; English requires it, but most languages don't. Put the language's English name in a comment.
 * 3) If the language is written in a writing system that requires it, define wider line-height. This is mostly needed for South Asian and South East Asian writing systems, such as Devanagari, Bengali, Thai, etc. This is done in the file.
 * 4) If you have any information to add there, create the file  . If you don't have any information to put there, skip this step. You can copy the boilerplate from a file for a similar language, but make sure to change all the necessary parts. Details:
 * 5) * Add fallback in the beginning, if necessary.
 * 6) * If the languages is written from right to left, add.
 * 7) * Add namespace names in the variable, if you have them. Make sure to use underscores instead of spaces in the strings.
 * 8) * If necessary, add gender aliases for them in the variable.
 * 9) * If the new language uses a fallback that has gender aliases, such as French, Russian, Spanish, or Portuguese, but the new language itself doesn't need them, reset them by adding.
 * 10) * Add date formats in the variables,  , and  , if you have them.
 * 11) * If the language needs different numerals, add them in the variable . Examples can be found in Persian (fa), N'Ko (nqo), Burmese (my). Don't guess this—consult with native and ask whether they actually use them. Some languages have traditional native numerals defined in the Unicode block for their writing system, but in practice they use Arabic numerals or some other system.
 * 12) * Add the  in the end, if you have it.
 * 13) * Add magic words and special page aliases if you have them.

After this is done and deployed:


 * 1) Remove the language from translatewiki.net configuration.
 * 2) Test whether you can change your interface to this language using the "Internationalisation" section in Preferences, and using Universal Language Selector.
 * 3) Check whether there are any special definitions for this language in Wikibase and Wikidata, and remove them. When the language is in Names.php, it's fully supported in Wikibase (and Wikidata), too.

Special scenarios for core
TODO

Removing from core
TODO

CLDR
The CLDR extension of MediaWiki holds a copy of the official Unicode CLDR Project. It is used to display language names and other information about languages, in a lot of languages. Examples:

It is used by Babel and Wikidata too (test page).
 * = The language name of "fr" (French) in "de" (German) =
 * = The language name of "ko" (Korean) in "en" (English) =
 * = The language name of "en" (English) in "ar" (Arabic) =

Therefore, if the new language code is not part of CLDR, only the autonym will be displayed instead of the name of language in the requested language or the user language.

To override this locally:


 * 1) Check whether the language code is in  and is correct:
 * 2) Yes: Go to the next section
 * 3) No: check if the language code is in . This is a MediaWiki-specific file for languages that are not part of Unicode CLDR or are wrongly translated.
 * 4) Yes: Go to the next section.
 * 5) No: Write a patch which adds the new language code to LocalNamesEn.php. Translations into other languages are welcome. Example commit: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/cldr/+/846592
 * 6) If the official Unicode CLDR translation is wrong, it would be a bit more complex: File a ticket at Unicode CLDR and write a patch that changes the language code in LocalNamesEn.php (and, if needed, in other languages). Sadly, the process of fixing a translation in the official Unicode CLDR is really, really slow.

jquery.ime
jquery.ime is a library for typing in various languages and alphabets. It is embedded into websites, and the users don't have to install anything or change their settings (the only condition is using a modern browser with JavaScript enabled).

In MediaWiki, jquery.ime is integrated into the UniversalLanguageSelector extension, but because jquery.ime is designed to be independently usable without MediaWiki, it has its own list of languages. Therefore, when adding a new keyboard layout for a language that is not supported yet, the language must be added to that list:


 * 1) Determine the ISO 639 code and the autonym.
 * 2) If it's not there already, add the language to Universal Language Selector (language-data, jquery.uls, ULS extension).
 * 3) In the jquery.ime GitHub repository, add the input method's code (see the README in the repository). After that, edit  . In the   section, add a new block, ordered by the ISO 639 code. There, write the autonym in the   and list the input method identifiers in the   field.

To test, run a local webserver, open http://localhost/examples, and try selecting the language and using your new input method.

jquery.uls
jquery.uls is the generic MediaWiki-independent library that provides the core functionality of the Universal Language Selector extension. To add a language to jquery.uls:


 * 1) Add it to the language to language-data.
 * 2) After that, make a GitHub patch in the wikimedia/jquery.uls. In the repo's root folder, run the script.
 * 3) Check the diff. In the commit message, describe the changes (using the English name of the language), and give a link to the latest language-data commit on GitHub.

Example commit: https://github.com/wikimedia/jquery.uls/pull/422

language-data
language-data is a library with a list of language codes. Its primary purpose is to make the language selectable in Universal Language Selector, although it may also be used in other contexts. As of late 2022, it's stored in the wikimedia/language-data repository.

It's one of the most "liberal" places for adding a language. Since the generic Universal Language Selector is used for various purposes in MediaWiki, and even outside the MediaWiki and Wikimedia world, it's not as strictly limited to new languages as, for example, the Wikimedia Language proposal policy for new wikis, and it may include ancient and constructed languages or variants that can be generally useful.

Despite being relatively liberal, it still strictly requires that only languages with a valid ISO 639 code are added. No ISO 639 code—no adding!

To add a language, determine the autonym and make a GitHub patch in the wikimedia/language-data repository. Before doing it for the first time, make sure that you are familiar with that repository's README and with the other documentation to which it links.

After the pull request is merged, update jquery.uls.
 * 1) Update your copy of the the language-data repository.
 * 2) Run the script   and view the diff. If it creates any changes, make a patch. In the commit message, say that the change is automatic, but describe the changes.
 * 3) * The  script automatically downloads a file with data about languages from the CLDR server. Usually, these automatic changes are adding or removing languages used in countries.
 * 4) When   script doesn't create any changes, edit the file  . Add a line for that language in alphabetical order of language codes. List the ISO 15924 code of the language's script, the continents where the language is spoken, and the autonym.
 * 5) * The ISO 15924 code must appear in one of the  sections towards the end of the file. If the language is written in a script that doesn't yet appear in any of these groups, determine its writing system, make sure that it has a valid four-letter ISO 15924 code, and add it to a one of the groups. Use your judgment to put it in an appropriate group.
 * 6) Run the script.
 * 7) Using , check that the file   was updated.
 * 8) Run  . All the tests must succeed.
 * 9) In the commit message, mention the reason for adding the language (using the language's English name), and the source for the autonym. (This is similar to adding a language to translatewiki.)
 * 10) Submit the changes as a pull request.

Consider also adding a keyboard layout for the language using jquery.ime.

Names.php
See Core MediaWiki.

Adding to translatewiki
Adding new languages to translatewiki usually begins as a request on the page Support. The request must include the language's ISO 639-3 language code.

Before adding, read the policy at Translatewiki.net languages and make sure that the language fits it. Ask the requester for clarifications if needed. If the language doesn't fit the policy, politely decline the request.

Determine the autonym. The autonym should be mentioned in the request, but you still have to verify it according to the instructions on this page.

Make a Gerrit patch in the translatewiki repository:

In the commit message:
 * 1) Edit the file  . Add a line for this language in alphabetical order of language codes. Write its autonym in the quotes. In a comment on the same line, add the English name of the language, and write your name and date.
 * 2) Optional: Add default assistant languages. Edit the file , add a line for that language, and list the languages in the array. The language that is the most likely to be helpful to translators should be at the top. Even though the file where it's done is called   and the variable in question is called  , this is not actually a fallback language, but an assistant language that is shown to translators as an aid, when a translation is available. This will make translation easier for people who didn't define assistant language preferences. Only languages that are already available on translatewiki should be added there. English doesn't have to be added there, because it's always shown. Add no more than six languages. Consider also adding the new language as a default assistant language to other languages. The languages to add there are:
 * 3) * A common foreign or official language in the country where this language is spoken. E.g., if a language is spoken in Indonesia, add Indonesian; if a language is spoken in a Francophone African country, add French; etc.
 * 4) * Languages from the same family. Don't just add everything from the same linguistic family; only add languages that the translators are likely to find helpful.
 * 5) * Other languages spoken in the same country or region.
 * 6) If a language is a variant of another language, and needs only a partial translation, edit the file   and add the language code to the relevant always-export-languages sections. (Example: "pap-aw".)


 * 1) Give a reason for adding the language (using the language's English name). If it's a request on the Support page, give a direct permanent link to the relevant thread. If the reason is different, explain it clearly.
 * 2) Give the source for the autonym.

See the git log for  for example patches.

After the patch is reviewed and merged, deploy the updated configuration to the production translatewiki server. After the deployment, test that the language was added correctly: go to Special:Translate, click the target language selector, and type the new language's code. The new language should appear in the results.

If the language is written from right to left, add its code to the following local translatewiki.net pages:
 * translatewiki:Template:Dir
 * translatewiki:MediaWiki:Common.css - to the section that defines all  elements as.

Add the language to language-data and then to Universal Language Selector. The translatewiki.net change can be deployed before the Universal Language Selector change is done, and the language will be usable as a target language for localization, but selecting it can be broken in some cases, so don't forget it and do it as quickly as possible.

Make sure that there is a language portal for the language on translatewiki. The portals have names in the form "Portal:qqf". Some portals for languages that aren't yet configured already exist, but the language may be marked as "disabled". See the instructions at Template:Portal.

Add the language's native and transliterated autonyms to the Wikimedia Portals localization (not to be confused with translatewiki's own language portals).

Consider also adding a keyboard layout for the language using jquery.ime.

Removing from translatewiki
Languages are removed from translatewiki in two cases:


 * 1) If the language has crossed the export threshold (see Translating:MediaWiki) and was added to MediaWiki core's , it should be removed from translatewiki.net's own configuration. In this case:
 * 2) Make sure that the language is actually in Names.php, and that the MediaWiki revision that includes it has been deployed to translatewiki!
 * 3) Remove the language's entry from the file.
 * 4) Check the language's entry in the file  . If the language has no entry, you're all set. If it has an entry, check whether any of the assistant languages has been added to core MediaWiki as the fallback language, remove that language from  . Leave the rest of the languages, as they may still be useful as assistant languages.
 * 5) Remove special definitions for this language code from the translatewiki:MediaWiki:Common.css page.
 * 6) Sometimes, the translatewiki administrators decide that MediaWiki and other projects hosted on translatewiki shouldn't be localized into that language. This may happen, for example, if the language had been added not according to policy, if grave mistakes were made while adding the language, or if the localizations are very low quality and it's better to remove them and start the work in that language from scratch.
 * 7) Remove the language's entry from the file   and from.
 * 8) If the language is being removed because it was added by mistake, check whether it appears in any special configurations in , and remove them if needed (to be extra-sure, grep the repository for more appearances).
 * 9) Consider also deleting all the translations from the site.
 * 10) Delete the language's portal page or mark the language as disabled.
 * 11) TODO: anything more to document here?

Universal Language Selector
Universal Language Selector is the MediaWiki extension that provides language selection functionality in various contexts.

To add a language to Universal Language Selector, first add it to language-data and then to jquery.uls. After that is merged, make a Gerrit patch for Universal Language Selector:


 * 1) From the repo's root folder, run the script.
 * 2) Check the diff. In the commit message, describe the changes (using the language's English name), and give a link to the latest jquery.uls commit on GitHub.

Example commit: https://gerrit.wikimedia.org/r/c/mediawiki/extensions/UniversalLanguageSelector/+/793082

Furthermore, if the language's autonym has commonly used aliases or alternate spellings, add them to the $specialLanguages array in the file data/LanguageNameIndexer.php. This is done to ensure the ULS's search box usefulness: many people may use a name that is significantly different from the autonym to search for the language they need. You can see the existing examples of cases when this is needed in the file itself:
 * A language that has a common alias in the language itself. For example, in Spanish, the Spanish language itself is often called both "español" and "castellano". The name "español" is used as the autonym, but since "castellano" is a common alias, it is also added here.
 * Some languages that aren't written in the Latin alphabet are frequently searched using the Roman alphabet, such as Armenian or Japanese.

Wikibase
TODO

Lexemes
TODO

Remove wikibase-lexeme-language-name-qqf

Wikimedia Portals
Wikimedia Portals are the main language-neutral pages of Wikipedia, Wiktionary, and some other Wikimedia projects. They have www in the beginning of their URL, rather than a language code.

Autonym configuration
To make sure the language's autonym is properly supported in portals, do the following steps:


 * 1) Make sure that the language is configured and usale in translatewiki.net.
 * 2) If the autonym is written in an alphabet that has uses letter casing, such as Latin or Cyrillic, you should begin the autonym with a capital letter, unless the language specifically requires that it be a small letter. If in doubt, verify with someone who knows the language well.
 * 3) Go to Special:Translate on translatewiki.net.
 * 4) Select "Wikimedia Portals" in the message group selector.
 * 5) Select the target language.
 * 6) Click "..." in the toolbar (next to "Translated") and check the "Optional messages" box.
 * 7) In the key portals-language-name, add the autonym.
 * 8) If the autonym is not in the Latin alphabet: In the key portals-language-name-romanized, add a transliteration of the autonym. It may include diacritics and various special character.
 * 9) If the key portals-language-name or portals-language-name-romanized includes any characters outside the basic 26-letter Latin alphabet: In the key portals-language-name-romanized-sorted, add a transliteration of the autonym without any special characters.

Make sure to read the documentation (qqq) for each of the messages above!

The rest of the "Wikimedia Portals" message group should also be translated, but this, of course, should be done by people who know the language.

Configuring the project on the portal in production
If there is a project in that language, but its name appears incorrectly, doesn't appear on the portal at all, or appears in the wrong section, please create a Phabricator ticket with the tag "#wikimedia-portals".

TODO:


 * Describe the actual fixing for cases when the name doesn't appear or appears in the wrong section.
 * Describe rtl language handling (they probably have to be added while creating a new wiki, but verify this).
 * Describe various caveats, exceptions, overrides.

Wikistats
Wikistats 2 has a manual step for adding a new language. See Data Engineering/Systems/Wikistats 2#Adding languages. It would be nice to make it more automatic, but for now it's manual.

Determining the autonym
When people ask to add a language or to change the autonym of an already-configured language, please do your best to verify the autonym. This is sometimes challenging: MediaWiki already supports well over 400 languages, which are, naturally, the world's better-documented languages. This means that many of the new languages that are being added are less well documented, so it's generally harder to find information about them. In particular, their autonyms may be hard to find, or different sources may cite different autonyms, and when adding the language you'll have to make a decision without actually knowing the language. Try to use your best judgment and to reach a reasonable compromise between the information in available third-party sources and the information given to you by the requesters.

Autonyms usually don't have to be written with capital letter. English spelling conventions require that names of languages be written with a capital letter in English, but most languages don't have such a requirement. Use a capital letter only if this is specifically required by the language's orthography or if you are doing it in an environment where a capital letter is needed for another specific reason.

Autonyms usually appear in lists of languages, so they have to be unique. The users must be able to choose the precise language that they need, and not something with a similar name.

A particular comment must be added about creole, pidgin, and patois languages. They often have the word "creole", "pidgin", and "patois" in their names (adapted to their spelling). Their speakers often call them just by that word. However, there are many languages of this kind, and their names have to be unique. Therefore, try to find a name that at least includes another word, such as the name of the country where it's spoken, or a completely unique name. Other than that, all the other suggestions about autonyms apply to these languages.

Some sources where you may find autonyms:

Note that when writing in English, for example in English-language discussions, code comments, Git commit messages, etc., you should use the English name of the language and not the autonym.
 * A good autonym may appear in the Wikipedia article about the language in major languages, such as English, French, German, Portuguese, Spanish, or Russian. You can also look for it at the Wikidata item page for the language. It may be correct, but as with every statement in Wikipedia and Wikidata, they are often good starting points, but you have to verify it with an external reliable source.
 * The best source for autonyms is a professionally written and published online or printed book about the language: a dictionary, a grammar reference, a standard orthography guide, a history of the language, etc. Academic articles about the language are a good source, too. Glottolog often has titles of such sources, although it doesn't always have a link to an online copy. A good source where actual books or at least parts thereof may be found are the Internet Archive (many such books are available for free time-limited lending in the e-book library). Some providers in The Wikipedia Library, such as L'Harmattan, Cambridge, and others have free access to relevant e-books. Google Books and Amazon.com may have enough in the free preview to find the autonym.
 * Other websites and apps in this language can be a very good source, especially if they have a language selector. One particular kind of app to check is Android keyboard apps, such as Gboard: install the app, enable the language, and check how does its name appear when you choose to type in it. Check also the language settings on Windows—recent versions have a very large selection of languages.
 * The UN's Universal Declaration of Human Rights is one of the most translated documents in history, and translations are available online. Unfortunately, the lookup system for languages is not so convenient, and the presentation is non-standardized and quite inconsistent. Nevertheless, it is sometimes a useful source.
 * Ethnologue: the autonym is usually available as part of the freely-shown pages, although it's not always the best option. Try to check in other sources, too.
 * JW.org: this site is available in more than a thousand languages and has an autonym for each of them, but they are sometimes written differently from other websites or books. Do your best not to rely only on this source.