Jump to content

Language Onboarding and Development

From mediawiki.org

The Knowledge gaps Taxonomy by the research team categorizes content gaps into "Representation" and "Interaction". Representation gaps are categorized into geography, language, socio-economic status, important topics, etc. Full paper, pages 22 onward. Page 23 highlights the language gap and defines it as referring to the difference in content coverage across different languages. Additionally, the movement strategy recommendations, Improve User Experience recommends  “Clear pathways for advancing new wiki proposals (including new language versions) and for reusing community-developed software features on them.”

State of Language Wikis

[edit]
Number of test, hosted, and closed language editions per wiki project type (April 2024)

As of January 2025, there are 342 languages that have at least one content project hosted by Wikimedia (source). According to ethnologue there are 7,000+ languages in use today, according to Ethnologue. ( NB: many of these languages might be non-written languages, or the language community might not want a Wikimedia content project.) Hundreds of these languages are currently being worked on in the Incubator with the goal of graduating. WMF’s vision asks us to imagine a world in which humans can “freely share in the sum of all knowledge.” As we consider language as knowledge and as a means of sharing knowledge, understanding the gaps we have in terms of language coverage and representation is vital.

As of February 2025, according to the List of Wikipedia, there are 100+ languages hosted on the Wikimedia platform with 20 or fewer editors, even though some of them are languages spoken by many millions of people. The reasons why these languages have low Wikimedia presence and generally low presence in online and offline written publishing are systemic, complicated, and diverse. There are some things we can do to make the development of Wikimedia content in these languages more accessible. Some are in the space of product design and infrastructure, and some of them are more in the space of human-to-human community support.

Onboarding & Development Journey for New Wikis

New language versions go through several stages before becoming wikis (reference). Across each of these stages, they face various challenges, both social and technical. Existing research and materials reveal technical challenges in every phase of language onboarding: adding new languages to the Incubator, complexities in developing and reviewing content, and a slow process in creating a wiki site when a language graduates from the Incubator. When a wiki is fully created, the real life of the community begins: writing articles, communicating with each other, inviting more writers, growing content, etc.

Disclaimer: The process of onboarding languages entails different stages as shown in the diagram above . There are areas that are outside the scope of this initiative, including evaluating the feasibility of new languages (owned by the Language committee), Wiki creation, Incubator maintenance, etc.

Opportunity Areas

[edit]
  • Shows the Starter Kit dashboard highlighting Essential Tasks, Community & Collaboration, and Activity & Growth sections.
    Towards Sustainable Onboarding Guidance: efforts Wikipedia Starter kit exploration. There is currently no formal process to provide new and smaller language communities with onboarding guidance. Communities discover relevant resources on their own and occasionally receive ad hoc guidance from experienced Wikimedians. We will begin exploring how new and small languages can be empowered to get started and make progress with an improved ease of navigation and self guidance for new and small language communities. Please read more here
  • Progressive steps towards developing and growing languages; As part of the broader initiative, we are keen to explore the following open questions that connects the operational ability of smaller communities as part of the multigenerational growth opportunities for the Wikimedia projects. Key question: How can language projects be supported to determine incremental growth goals that are appropriate for them? Please read more here
  • New Languages Usability as an initial step; efforts entail enabling newly introduced or  smaller languages  to be usable on Wikimedia environments. This has entailed support internationalization, and localization needs of these language communities including onboarding new languages,  UI localisation, Plural support, Keyboard support, Fall back, Unicode among other areas. Please read more here

Key Learnings

[edit]
  • Average weekly edits per editor, Languages onboarding experiment 2024
    2025-26: Currently experimenting Wikipedia Starter kit , an experience that will provide essential tasks , tools and resources for growing a language community. Earlier on in the year , we experimented engaging native speakers to improve vital content in small language Wikipedias .
  • 2024-25: Through the Language onboarding experiment , we tested whether moving incubating languages directly to production wikis with modern tools would increase editing activity. We learnt that, Technical Improvements Alone Are Insufficient, Focus on Expanding Editor Base is key and Improve Support for New Wikis
  • 2023-24- Gathered recommendations to establish a streamlined technical infratructure and social pathways for creating language wikis and improving the complex processes involved in each of the distinct phases of language onboarding; before incubation, during incubation and after incubation
[edit]