Wikimedia Language engineering/Newsletter/2024/January

From mediawiki.org

Welcome to the January 2024 edition of the Language and internationalization newsletter by the Wikimedia Foundation Language team!

This newsletter provides you with quarterly updates on new feature developments, improvements in various language-related technical projects and support work, community meetings, and ideas to get involved in contributing to the projects.

Key highlights[edit]

Mahuton, a Beninese Wikimedian on building a keyboard for easy article editing in Fon at Wikimedia Hackathon 2018, Barcelona.

Fon Wikipedia officially launched after five years of development in the Wikimedia Incubator[edit]

Fon Wikipedia, born at Wikimedia Hackathon 2018 in Barcelona, has officially launched after graduating from the Incubator! Fon is spoken by millions in Benin and Togo and is the mother tongue for many. It's also widely used in Benin as their national language. It took five years to create this new Fon Wikipedia. Since many people couldn't write in Fon, and native languages in Africa get less attention than others, building a community to support the project was a tough challenge for the community members who started it.[1] Also, discover more about the four new Wikimedia language projects that were approved recently (Wikipedia Dagaare, Wikipedia Moroccan Amazigh, Wikipedia Toba Batak, and Wikiquote Banjar).

Introducing Sentencex, tool for enhanced Natural Language Processing (NLP) and multilingual sentence extraction[edit]

The language team has just launched a new tool called Sentencex, now available in both Python and Javascript. Sentence segmentation, an essential part of natural language processing, involves breaking down a text into individual sentences. This process has various uses and helps improve language functionality and speed, especially in Wikimedia's new machine translation system (MinT) and the section translation project.[2]

You can find the tool on GitHub and see it in action.

MinT translation service available to 55 new Wikipedias, doubles content, ranks second in usage[edit]

Graphical representation of languages supported by MinT for the first time.

The new machine translation service, MinT, which now offers machine translation for the first time to 55 Wikipedias, has had a positive impact on Wikimedia language communities. This extensive language support has nearly doubled published translations, and articles created using MinT have a low deletion rate (1.72%). MinT is now used in 8% of the translations published with Content Translation, making it the second most used translation service in Wikipedia, after Google Translate, in just a few short months.[3]

Open language identification service now available for 200+ languages[edit]

The Language team created an open language identification service to automatically detect the language in which a given text is written to simplify users' interaction with Wikimedia platforms. The service supports the detection of 201 languages, and anyone can access the API to use the service. Currently, the final checks for the service and the evaluation of its ability to withstand high traffic are underway.[4]

Ancient manuscript.

Wikisource now recognizes handwritten texts with Transkribus[edit]

Handwritten text recognition is now active on Wikisource through the Transkribus OCR Engine. Transkribus, an AI-powered platform, simplifies the handling of handwritten or printed manuscripts by offering various models tailored to different writing scripts, historical periods, and other factors. The Transkribus engine is now available as an option alongside Google and Tesseract and it is currently operational on the Wikisources listed on this page.[5]

Unified section translation dashboard for desktop and mobile users[edit]

The Language team is actively working towards the adoption of a unified section translation dashboard for both desktop and mobile users. Originally designed for mobile in Content Translation, it's now being refined to serve as a unified dashboard across various platforms, providing an improved translation environment. Currently in beta mode, you can test it on Test Wikipedia or any Section Translation-enabled wiki using the URL parameter "unified-dashboard=true" (e.g., ig.wikipedia.org/wiki/Special:ContentTranslation?unified-dashboard=true).

This unified dashboard offers a seamless cross-platform translation experience. Users can start translating on their desktop and continue on a mobile device, or vice versa. It also supports section translations on the desktop, giving users flexibility across devices.

Community meetings and events[edit]

Get involved[edit]

Stay tuned for the next release! You can subscribe to this newsletter.

References[edit]