维基媒体语言工程/电子报/2024年/1月

From mediawiki.org
This page is a translated version of the page Wikimedia Language engineering/Newsletter/2024/January and the translation is 58% complete.

欢迎阅读语言团队带来的语言与国际化电子报2024年1月号!

这份通讯为您提供了每季度更新的新功能开发,各种语言相关技术项目的改进和支持工作,社群会议,以及参与项目的贡献的想法。

亮點

贝宁維基人Mahuton在2018年巴塞隆納維基媒體黑客松演講,主題是建立一個能使用豐語輕鬆編輯文章的鍵盤

經過維基孵育場五年的發展 豐語維基百科正式上線

豐語維基百科誕生於2018年巴塞隆納維基媒體黑客松,已經從孵育場畢業後正式上線!在贝宁多哥有數百萬人使用豐語,並有許多人以豐語為母語。在贝宁,豐語更作為國家語言被廣泛使用。创建這個全新的豐語維基百科花了五年時間。由於許多人不會書寫豐語,而且非洲的本土語言比其他語言受到較少關注,因此创建一個社群來支持該项目對於發起該项目的社群成員來說是一項艱鉅的挑戰。[1]此外,了解更多關於最近獲得批准的四個新的維基媒體語言项目(Dagaare維基百科、標準摩洛哥塔馬塞特文維基百科、巴塔克托巴文維基百科和班亞爾文維基語錄)。

Introducing Sentencex, tool for enhanced Natural Language Processing (NLP) and multilingual sentence extraction

The language team has just launched a new tool called Sentencex, now available in both Python and Javascript. Sentence segmentation, an essential part of natural language processing, involves breaking down a text into individual sentences. This process has various uses and helps improve language functionality and speed, especially in Wikimedia's new machine translation system (MinT) and the section translation project.[2]

You can find the tool on GitHub and see it in action.

MinT translation service available to 55 new Wikipedias, doubles content, ranks second in usage

Graphical representation of languages supported by MinT for the first time.

The new machine translation service, MinT, which now offers machine translation for the first time to 55 Wikipedias, has had a positive impact on Wikimedia language communities. This extensive language support has nearly doubled published translations, and articles created using MinT have a low deletion rate (1.72%). MinT is now used in 8% of the translations published with Content Translation, making it the second most used translation service in Wikipedia, after Google Translate, in just a few short months.[3]

Open language identification service now available for 200+ languages

The Language team created an open language identification service to automatically detect the language in which a given text is written to simplify users' interaction with Wikimedia platforms. The service supports the detection of 201 languages, and anyone can access the API to use the service. Currently, the final checks for the service and the evaluation of its ability to withstand high traffic are underway.[4]

古代手稿

Wikisource now recognizes handwritten texts with Transkribus

Handwritten text recognition is now active on Wikisource through the Transkribus OCR Engine. Transkribus, an AI-powered platform, simplifies the handling of handwritten or printed manuscripts by offering various models tailored to different writing scripts, historical periods, and other factors. The Transkribus engine is now available as an option alongside Google and Tesseract and it is currently operational on the Wikisources listed on this page.[5]

Unified section translation dashboard for desktop and mobile users

The Language team is actively working towards the adoption of a unified section translation dashboard for both desktop and mobile users. Originally designed for mobile in Content Translation, it's now being refined to serve as a unified dashboard across various platforms, providing an improved translation environment. Currently in beta mode, you can test it on Test Wikipedia or any Section Translation-enabled wiki using the URL parameter "unified-dashboard=true" (e.g., ig.wikipedia.org/wiki/Special:ContentTranslation?unified-dashboard=true).

This unified dashboard offers a seamless cross-platform translation experience. Users can start translating on their desktop and continue on a mobile device, or vice versa. It also supports section translations on the desktop, giving users flexibility across devices.

社群會議與活動

  • 語言社群會議即將在2月21日 (三) 12:00至13:00 UTC舉行。如果您想參加,請在此連結報名。想分享您的项目的技術更新嗎?請隨意將其新增至議程文件Technical updates章節。
  • 如果您錯過了2023年11月的第一次語言社群會議,您可以透過觀看影片錄製和閱讀筆記來跟上進度。

參與

請繼續關注下一期!你可以訂閱這份電子報

參考資料