Wikimedia Language engineering/Incubator conversations

From mediawiki.org

High level objectives and key results[edit]

Objective 1 - There is a clear picture of the state of languages and the process of supporting existing and new languages to the Wikimedia movement. This objective is part of the Wikimedia Foundation’s Annual Plan 2023-24.

KR 2 - By Q4, identify 3 key recommendations for improving the social and technical infrastructure to support existing and new languages.

Deliverable; Proposal on Language onboarding process recommendations.

Overview[edit]

The Incubator was launched in 2006 with the assumption that its users would have prior wiki editing knowledge. While editing on Wikimedia wikis has significantly improved since then, the Incubator hasn't received these updates due to technical limitations. Currently, it takes several weeks for a wiki to graduate from the Incubator and only around 12 wikis are created each year [1], showing a significant bottleneck. Various stakeholders have highlighted issues such as the absence of Wikidata, integration of content translation, and complex editing processes, affecting the user experience of both content creators & reviewers. The Wayúu community even created their own version of the Incubator [2] to improve the editing experience for their members.

The aim of these discussions is to propose technical recommendations based on existing research and resources. These suggestions will be prioritized for implementation in the Wikimedia Foundation's upcoming 2024-25 annual plan. Participants in these conversations will include language committee members, software engineers, linguists, wiki community leaders, researchers, analysts, and Incubator advocates.

Current technical challenges[edit]

Existing research and materials on the topic (see sections below) reveal technical challenges in every phase of language onboarding: adding new languages to the Incubator, developing and reviewing content, and creating a wiki site when a language graduates from Incubator. Each phase is slow, manual, and complex, indicating the need for improvement.

1. Manual complexity in adding new languages to Incubator

  • Involves editing multiple wiki pages with advanced templates and creating Phabricator tasks.
  • Tracking, approving, and rejecting requests by Language Committee members is also a labor-intensive manual process.

2. Complexities in developing and reviewing content in Incubator

  • New contributors face challenges with complex and repetitive wikitext and bureaucratic procedures.
  • Essential features like integration with ContentTranslation and Wikidata are missing, making editing cumbersome.
  • Numerous content restrictions, such as the inability to search for content, adding citations and uploading files, hinder the process.
  • Experienced volunteers reviewing content quality have to manually track disparate wiki pages and talk pages.
  • Editing from a mobile device within the Incubator is extremely difficult.
  • Lack of support for essential tools like appropriate keyboards, online dictionaries, spell checks, and grammar tools for many small and underserved languages.
  • Finding the right translations for technical terms is a challenge in several communities.
  • Machine translation is not available for smaller languages.

3. Slow wiki site creation process

  • After Language Committee approval, creating wikis takes several days or even weeks.
  • Manual steps for engineers involves configuring domains, setting up databases, and installing components like MediaWiki, Parsoid, and Wikidata integration.
  • Maintenance and closure processes for wikis are also manual and time-consuming.

Guiding questions[edit]

  • What specific technical areas should the Wikimedia Foundation prioritize for the Incubator? Examples include:
    • Simplifying content creation
    • Expediting the language approval process
    • Enhancing the efficiency of site set-up process
    • Integrating content translation and Wikidata
  • What potential solutions can address challenges in the technical areas that need to be prioritized?
  • What aspects of language onboarding can the affiliates/communities concentrate their efforts on? Examples include:
    • Establishing partnerships with external organizations to provide assistance with language tools, font, and keyboard support, among others.
    • Undertaking capacity-building and training initiatives, among other activities.
  • [Researchers/Data Analysts] Which technical enhancements are expected to have the most significant and positive impact on the language communities?
  • [For Engineers] Which solutions appear feasible, and what level of resources would be required to implement them effectively?
  • [Optional] What other research has been conducted concerning the Incubator that should be addressed during these discussions?

Proposed recommendations[edit]

Prioritizing technical areas[edit]

1. Simplifying content creation: These conversations highlighted several challenges faced by individual and their communities contributing to the Wikimedia Incubator, particularly those involving non-tech-savvy users. The main issues revolve around the technical complexity of the platform, which inhibits the editing experience compared to normal wikis.

Editing on Incubator should feel similar to editing on normal wikis, but we are far from achieving this goal.

Some of the key challenges related to editing were discussed:

  • Prefixes and URL Access: The organization of content using prefixes poses a significant challenge. While there are gadgets to hide them during editing, there are fewer technical contributors with the required expertise to both develop and maintain these tools. Furthermore, accessing pages on Incubator for a language without ISO codes is difficult.
  • Wikidata Support and Content Translation: Incubator is disconnected from Wikidata, hindering interwiki linking. The absence of the content translation tool makes it difficult for contributors, who often resort to translating articles manually across tabs. The fastest way for contributors to make new articles seems to be through translating existing articles from other languages that they understand. It is quite cumbersome for them to import content from other languages into their projects as there is no content translation tool that is available on the Incubator like on other language Wikipedias.
  • Templates and CSS Stylesheets: Users need to copy-paste infoboxes and templates from other Wikipedias, and they are overly complicated to both use and maintain as they rely on other templates and stylesheets. The lack of a central repository necessitates copying and maintaining templates for each project.
  • Search Features: There's a need for improved search functionality to identify code errors in templates more easily and language search boxes similar to those on Wikipedia.org.

2. Language approval & site creation process: Though the conversations were largely centered around the difficulty with editing experience, concerns were raised about the inconsistency in the language approval and site setup processes, where some projects are approved quickly while others face delays.

Potential solutions[edit]

These conversations highlighted the need to streamline the technical aspects of the Incubator to make editing more intuitive and accessible for all contributors, particularly those from non-technical backgrounds.

1. Streamlining technical infrastructure:  Many language communities face difficulties contributing to Incubator and obtaining approval due to limited resources. Simplifying the technical infrastructure is crucial to address these challenges, including issues with prefixes, template importing, and content formatting.

We should forget about Incubator completely. And, find another way of starting wiki. Because of the complexities around it, it might take time to improve the technical side of it.

Currently, creating a new wiki on Incubator requires starting from scratch, which is time-consuming. Establishing basic building blocks for new wikis could streamline this process. Instead of a shared infrastructure like Incubator, a placeholder production wiki for each language wiki could simplify matters, especially if it's not indexed in search results until it reaches a threshold in content. This would require solving complex infrastructure issues, but it's feasible and necessary. Speeding up the creation of new wikis is essential, and evaluating this idea is recommended. Overall, these challenges are solvable, and addressing them would benefit many contributors. There is a related task filed already filed on on Phabricator about this idea.

2. Progress tracking: Providing a progress bar for each test project with specific goals can motivate contributors and improve the approval process by giving them insight into their project's status.

3. Improving templates experience:  Providing basic templates can simplify the editing process on wikis. Copying infoboxes from one wiki to another involves copying modules and templates, which can be complex. Starting a test wiki on Incubator is like starting from scratch. Having foundational building blocks for new wikis, like global templates, would be helpful. Including commonly used templates such as citation, references, infoboxes, and main pages at the beginning of a project can encourage community involvement and make editing more accessible. Link to Global templates.

4. Improving translation processes: It would be good to explore if we can translate directly from Incubator instead of Translatewiki.net, which is currently a lengthy process. Many of us use Translatewiki.net for translations, but its interface differs from Incubator's. It requires some expertise to translate on Translatewiki.net. Providing resources like a glossary of technical terms within Incubator could help contributors translate content more effectively, especially for those unfamiliar with technical language.  

5. Exploring social pathways for language onboarding: In addition to technical improvements, exploring social pathways for language onboarding, such as enhancing the discoverability of Incubator, creating welcoming pages, and orienting communities to various Wikimedia projects (such as Wiktionary, Wikisource, etc.) that might be more relevant for them than creating a Wikipedia, and guiding them on how to contribute effectively, can help make the process more accessible and inclusive.

Exploring social pathways[edit]

1. Empowering resource sharing: Facilitate a forum for affiliates to share resources regarding language onboarding and creation, without necessarily coordinating their efforts. Encourage the sharing of templates for translations and maintain a compact list of common pages needed for projects. Set weekly or monthly goals for page creation/translation with specific themes.

2. Enhancing social infrastructure for article development: Develop social infrastructure to assist in understanding which articles need to be written and provide guidance for creating articles, possibly through technical solutions like automated article suggestions. Encourage diverse topic selection and improve documentation accessibility. Enhance transparency in the approval process for articles.

3. Outreach and partnerships for linguistic diversity: Conduct outreach to language communities and facilitate connections with organizations experienced in providing support for in developing language tools, fonts, keyboards and apps, such as with the Giellatekno (Center for Language Technology) at the University of Tromsø, Norway and Language Diversity Hub.

4. Streamlining wiki incubation:  Streamline the process of adding wikis to Incubator and graduating them from Incubator by collaborating with local communities, affiliates, and other regional Wikimedia organizations. Encourage feedback and collaboration to expedite the process. Ensure that articles are not lost during the migration of wikis from Incubator to independent sites.

5. Guiding principles for new wikis: Provide a basic set of guidelines for new wikis, including principles like neutrality, verifiability, and basic project policies, to facilitate their establishment and operation.

6. Enhancing community interaction: Improve on-wiki communication channels, such as Village Pump pages, to assist newcomers in understanding how wikis function and to foster community interaction.

7. Increasing language committee members: Expanding the number of language committee members can expedite the project approval process, reducing delays caused by limited manpower.

Final recommendations[edit]

Note: Recommendations are currently being discussed with WMF engineers and relevant stakeholders. Next steps will be published here soon.

Timeline[edit]

December 2023–March 2024

  • Recruit potential stakeholders for Incubator conversations (Dec 15th)
  • Host a kick-off meeting for introductions & agenda setting (Jan 12th)
  • Conduct one-on-one discussions with stakeholders on key questions (Feb 9th)
  • Gather stakeholder feedback asynchronously on recommendations (Feb 9th)
  • Host a final meeting to finalize the recommendations (Feb 16th)
  • Collect asynchronous feedback from various platforms on the proposed recommendations (e.g., Incubator talk pages, translatewiki.net, Telegram channels, etc.) (Mar 16th)
  • Document final recommendations for future discussions with WMF engineers and other stakeholders (Mar 28th)

Potential stakeholders[edit]

Resources[edit]

See also[edit]

References[edit]