Outreach programs/Possible projects

We are using this list of projects as a master branch for Mentorship programs such as Google Summer of Code and Outreach Program for Women. The projects listed are good for students and first time contributors but they require a good amount of work. They might also be good candidates for Individual Engagement Grants.


 * Featured project ideas usually have mentors ready for you to jump in.
 * Raw projects are interesting ideas that have been proposed but might lack definition, consensus or mentors, and therefore we can't feature them. If you're interested in one of those, wonderful! You'll need to work a bit more to improve their fundamentals.

If you are looking for smaller tasks check the Annoying little bugs. For a more generic introduction check How to contribute.



Be part of something big
We believe that knowledge should be free for every human being. We prioritize efforts that empower disadvantaged and underrepresented communities, and that help overcome barriers to participation. We believe in mass collaboration, diversity and consensus building to achieve our goals.

Wikipedia has become the fifth most-visited site in the world, used by more than 400 million people every month in more than 270 languages. Wikimedia Commons, Wikidata and Wiktionary are some of the other free content projects hosted by Wikimedia thanks to MediaWiki. There is also a wide collection of open source software projects around them.

Much more can be done: stabilize infrastructure, increase participation, improve quality, increase reach, encourage innovation.

You can help reach these goals in many ways. Below you have some selected ideas.

Where to start
Maybe at this point your proposal is just a vague idea and you want to get some feedback before investing much more time planning it? We know this feeling very well! Just send an email to wikitech-l (or qgil@undefinedwikimedia.org if you prefer) sharing what you have in mind. One short paragraph can be enough to get back to you and help you working in the right direction.

Learn and discuss
Obligatory reading:
 * Any potential contributor new to our community is encouraged to follow the Landing instructions.
 * How to become a MediaWiki hacker is a good place to start learning your skills and becoming a better candidate.
 * Lessons learned for mentorship programs is particularly useful when you start writing your application.

To set up your MediaWiki developer environment, we recommend you start installing a local instance using mediawiki-vagrant. You can also have a fresh MediaWiki to test on a remote server. Just register and request your own instance at Wikitech.

If you have general questions you can start asking at the |Discussion page. IRC channel is also a good place to find people and answers. We do our best connecting project proposals with Phabricator reports and/or wiki pages. Other contributors may watch/subscribe to those pages and contribute ideas to them. If you can't find answers to your questions, ask first in those pages. If this doesn't work then go ahead and post your question to the wikitech-l mailing list.

Add your proposal

 * Use your user page to introduce yourself.
 * Draft your project in a separate page in main namespace, or as subpage of an existing project or extension your idea will integrate with. Try to pick a short, memorable and catchy title which communicates your core idea on how to tackle the issue/project you chose.
 * Use the template. For GSoC proposals, remember to add them to the proposals category and the table so that it's clear it's a proposal (not yet approved) and you're working on it.
 * The GSOC student guide is a good resource for anybody willing to write a good project proposal. And then there is a list of DOs and DON'Ts full of practical wisdom.

Featured project ideas
Below you can find a list of ideas that already have gone through a reality check and have mentors confirmed. You can find more suggestions in our list of Raw projects.

But before, let us talk about...

Your project
That's right! If you have a project in mind we want to hear about it. We can help you assessing its feasibility and we will do our best finding a mentor for it.

Here you have some guidelines for project ideas:


 * Opportunity: YES to projects responding to generic or specific needs. YES to provocative ideas. NO to trivial variations of existing features.
 * Community: YES to projects encouraging community involvement and maintenance. NO to projects done in a closet that won't survive without you.
 * Deployment: YES to projects that you can deploy. YES to projects where you are in sync with the maintainers. NO to projects depending on unconvinced maintainers.
 * MediaWiki != Wikipedia: YES to generic MediaWiki projects. YES to projects already backed by a Wikimedia community. NO to projects requiring Wikipedia to be convinced.
 * Free content: YES to use, remix and contribute Wikimedia content. YES to any content with free license. NO to proprietary content.
 * Free API: YES to the MediaWiki API. YES to any APIs powered with free software. NO to proprietary APIs.

Internationalization and localization
Internationalization (i18n) and localization (L10n) are part of our DNA. The Language team develops features and tools for a huge and diverse community, including 287 Wikipedia projects and 349 MediaWiki localization teams. This is not only about translating texts. Volunteer translators require very specialized tools to support different scripts, input methods, right-to-left languages, grammar...

Below you can find some ideas to help multilingualism and sharing of all the knowledge literally for everybody in their own language.

Extensive and robust localisation file format coverage
Translate extension supports multiple file formats. The formats have been developed "as needed" basis, and many formats are not yet supported or the support is incomplete. In this project the aim would be to make existing file formats (for example Android xml) more robust to meet the following properties: Example known bugs are 31331, 36584, 38479, 40712, 31300, 57964, 49412.
 * the code does not crash on unexpected input,
 * there is a validator for the file format,
 * the code can handle the full file format specification,
 * the code is secure (does not execute any code in the files nor have known exploits).

In addition new file formats can be implemented: in particular Apache Cocoon and AndroidXml string arrays have interest and patches to work on, but we'd also like TMX, for example. Adding new formats is a good chance to learn how to write parsers and generators with simple data but complicated file formats. For some formats, it might be possible to take advantage of existing PHP libraries for parsing and file generation. (More example formats other platforms support: OpenOffice.org SDF/GSI, Desktop, Joomla INI, Magento CSV, Maker Interchange Format (MIF), .plist, Qt Linguist (TS), Subtitle formats, Windows .rc, Windows resource (.resx), HTML/XHTML, Mac OS X strings, WordFast TXT, ical.)

This project paves the way for future improvements, like automatic file format detection, support for more software projects and extension of the ability to add files for translation by normal users via a web interface.


 * Skills: PHP, XML, aware how to write robust and secure PHP code
 * Suggested micro-task: submit a patch for one of the bugs linked above; then get it merged, or get an i18n bug fixed (see the lists of open tickets for interface messages that need rewording under "Blocked by" and open Language Engineering bugs); then feel free to contact the mentors about this project.
 * At some point in time you'll need to set up a Translate wiki, familiarise yourself with message group configuration and play with some l10n files borrowing from translatewiki.net system (all available in its repository).
 * Mentors: Niklas Laxström, Federico Leva, Siebrand Mazeland

One stop translation search


A Special:SearchTranslations page has been created for the Translate extension to allow searching for translations. However it has not been finished and it lacks important features: in particular, being able to search in source language, but show and edit messages in your translation language. The interface has some bugs with facet selection and direct editing of search results is not working properly. It is not possible to search by message key unless you know the special syntax, nor to reach a message in one click. Interface designs are available for this page.


 * Skills: Backend coding with PHP, frontend coding with jQuery, Solr/ElasticSearch/Lucene
 * Suggested micro-task: submit a patch for one of the bugs linked above; then get it merged, or get an i18n bug fixed (see the lists of open tickets for interface messages that need rewording under "Blocked by" and open Language Engineering bugs); then feel free to contact the mentors about this project.
 * Mentors: Niklas Laxström, Federico Leva, Nik Everett (for the Elasticsearch part if needed)

Wikipedia article translation metrics
It is known that a lot of articles in Wikipedias in many languages are translated from the corresponding Wikipedia articles in other languages. What is not known is the exact number of translated articles, because metadata about translation is not recorded by the software in any way. Some researchers attempted to estimate this number; for an example of such a work see the paper Multilinguals and Wikipedia Editing by Scott Hale. Much more work could be done in this area, however: the estimation methodology could be improved; the editing patterns of users who translate articles could be researched more deeply; the findings could be more thoroughly cross-referenced with information about the different Wikipedia language editions and with real-life information about the languages in question, such as number of speakers, penetration of broadband internet connection in the area where the language is spoken, level of bilingualism, and so on. These findings will contribute to better understanding of content development in Wikipedias in different languages and to the development of the ContentTranslation project.

Skills: Data mining, data analytics, R, SQL, understanding of social and demographic data that is relevant to languages and Internet connectivity

Suggested micro-task: In the top 20 Wikipedias, find the different ways in which users mark articles as translated - comments on talk pages, edit summaries, templates, etc.; analyze how frequently these methods are used and how do they map to the current known estimation of the number of translated articles.

Mentors: Amir E. Aharoni

Unified language proofing tools integration framework
Wikipedia communities in some languages developed automatic or semi-automatic tools to improve the quality of language or typography. Some examples are: These tools are written as bots, gadgets or user scripts, and each project implements them in a different internal framework and with a different UI. It would be useful to unify at least some of these tools into a single internal framework - for example (but not necessarily) to store the replacement rules as a uniform JSON data structure rather than disparate JavaScript variables. Using external open source software, such as LanguageTool, is acceptable as well, as long as the functionality that the different language communities are using is preserved. Finally, this framework should have a single interface that would be usable with both the wiki syntax source editor and the VisualEditor.
 * The Wikificator tool in the Russian Wikipedia (similar tools exist in Ukrainian, Belarusian and possibly other Wikipedias)
 * The Checkty gadget in the Hebrew Wikipedia, a semi-automatic script for fixing common grammar mistakes, as well as another list of automatic replacements usually performed by a bot.
 * The orthography converter in the Belarusian-Taraškievica Wikipedia.
 * Extra edit buttons in the Persian Wikipedia.
 * ... and many other tools in other languages.

Skills: JavaScript, regular expressions, data abstraction. Knowledge of the (human) languages in question is not required, but can be helpful.

Suggested micro-task: Fix a bug related to a VisualEditor toolbar button.

Mentors: Amir E. Aharoni

Collaborative spelling dictionary building tool
There are extensive spelling dictionaries for the major languages of the world: English, Italian, French and some others; at various degrees of coverage, Mozilla has over a hundred, LibreOffice dozens. They help make Wikipedia articles in these languages more readable and professional and provide an opportunity for participation in improving spelling. Many other languages, however, don’t have spelling dictionaries. One possible way to build good spelling dictionaries would be to employ crowdsourcing, and Wikipedia editors can be a good source for this, but this approach will also require a robust system in which language experts will be able to manage the submissions: accept, reject, filter and build new versions of the spelling dictionary upon them. This can be done as a MediaWiki extension integrated with VisualEditor, and possibly use Wikidata as a backend.


 * Skills: PHP, Web frontend. Bonus: Familiarity with VisualEditor and Wikidata; experience in an existing dictionary-building community.
 * Mentors: Amir Aharoni, Kartik Mistry

Wikimedia Identities Editor
Mediawiki Community Metrics is a Wikimedia project which goal is to describe how the MediaWiki / Wikimedia tech community is doing.

Once the website with the metrics is reaching a first complete version, a web application to manage community identities is needed. A community member will access the web application and authenticate using OAuth or creating a new account. All the information about the member in Mediawiki Community Metrics will be presented, so the user can update her information, add new identities, the localization and so on.


 * Skills: Django or similar web framework to develop the application. OAuth and other authentication techs.
 * Mentors: Alvaro del Castillo, Daniel Izquierdo.

New media types supported in Commons
Wikimedia Commons a database of millions of freely usable media files to which anyone can contribute. The pictures, audio and video files you find in Wikipedia articles are hosted in Commons. Several free media types are already supported but there are more requested by the community, like e.g. X3D for representing 3D computer graphics or KML/KMZ for geographic annotation and visualization. Considerations need to be taken for each format, like security risks or fallback procedures for browsers not supporting these file types.


 * Skills: PHP at least. Good knowledge of the file type chosen will be more than helpful.
 * Mentors: Bryan Davis, ?.

Import transcription into DjVu file
DJVU files include a text layer. Typically a DjVu file begins with a text layer that consists of OCR text, which Wikisource uses as the initial version of the transcription. Wikisource contributors then 'fix' the OCR errors and save the corrections onto the Wikisource project as wikitext, and eventually the transcription is accurate & completed. A tool is needed to create a new DjVu file with the accurate & complete Wikisource transcription.

There are existing tools being worked on that extract the accurate & complete Wikisource transcription, typically exporting it as EPUB. However they likely discard a lot of useful information that is needed to recreate a DJVU file, most importantly the (x,y) positions of each piece of text. They may also discard the page numbers.

Tools exist which work with the hOCR data, for instance hOCR.js by Alex brollo (the gadget author who worked most with the DjVu layers), and djvutext.py.


 * Skills: Good knowledge of the DjVu file type desirable, and EPUB.
 * Mentors: John Vandenberg, ?.

Semantic MediaWiki
Semantic MediaWiki is a lot more than a MediaWiki extension: it is also a full-fledged framework, in conjunction with many spinoff extensions, and it has its own user and developer community. Semantic MediaWiki can turn a wiki into a powerful and flexible collaborative database. All data created within SMW can easily be published via the Semantic Web, allowing other systems to use this data seamlessly.

There are more than 500 SMW-based sites, including wiki.creativecommons.org, docs.webplatform.org, wiki.mozilla.org, wiki.laptop.org and wikitech.wikimedia.org.

Multilingual Semantic MediaWiki
Semantic MediaWiki would benefit from being multilingual-capable out of the box. We could integrate it with the Translate extension. This can be done in some isolated steps, but there is a need to list all the things in need of translation and define appproach and priority for each of them. Some of the steps could be:


 * Fix the issues that prevent full localisation of Semantic Forms.
 * Enhance Special:CreateForm and friends (all the Special:Create* special pages by Semantic Forms) to create forms that are already i18ned with placeholders and message group for Translate extension.
 * Make it possible to define translation for properties and create a message group for Translate extension, similar to what CentralNotice does (sending strings for translation to Translate message groups).
 * There are lot of places where properties are displayed: many special pages, queries, property pages. Some thinking is required to find out a sensible way to handle translations on all these places.
 * Currently In most wikis, properties names are supposed to be hidden to the user, e.g. queries results are usually shown in infobox-like templates (whose labels could in theory be localised as all templates).

Translate would be fed with the strings in need of translation. Localised strings/messages would be displayed based on the interface language, that in core every user can set on Special:Preferences and with ULS is made way easier to pick for everyone including unregistered users.

For real field testing, WikiApiary could be used, or at worst translatewiki.net (quick deployments, little SMW content).


 * Skills: PHP and web frontend, has used Semantic MediaWiki and Semantic Forms is a plus.
 * Suggested micro-task: submit a patch for one of the bugs linked above; then get it merged, or get an i18n bug fixed (see the lists of open tickets for interface messages that need rewording under "Blocked by" and open Language Engineering bugs); then feel free to contact the mentors about this project.
 * Mentors: Niklas Laxström, Federico Leva, Yaron Koren.

Visual translation: Integration of page translation with VisualEditor
The wiki page translation feature of the Translate extension does not currently work with VisualEditor due to the special tags it uses. More specifically, this is about editing the source pages that are used as the source for translations, not the translation process itself. The work can be divided into three steps:
 * 1) Migrate the special tag handling to a more standard way to handle tags in the parser. This need some changes to the PHP parser for it to be able to produce wanted output.
 * 2) Add support to Parsoid and VisualEditor so that editing page contents preserves the structures that page translation adds to keep track of the content.
 * 3) Add to VisualEditor some visual aid for marking the parts of the page that can be translated.

This is likely to be a difficult project due to complexities of wikitext parsing and intersecting multiple different products: Translate, MediaWiki core parser, Parsoid, VisualEditor.


 * Skills: PHP, JavaScript, wikitext parsing
 * Mentors: general mentors + Niklas Laxström

Pywikibot
PWB is one of the most widely used tool to edit in Wikipedia, it's based on python which is easy to learn and program. Main issues are:

Token handling [needs refining or removal]
Core has issues in handling token, but it can be fixed easily since the TokenWallet class has been introduced.

Project goals:
 * Making core more stable since high proportion of crashes are related to tokens.
 * Provide a useful system of token-related tests.


 * Skills: Python, mediawiki API
 * Mentors: Amir Sarabadani

Major wiki engine support
Add basic support for a major wiki engine which is conceptually similar to MediaWiki.

Project goals:
 * 1) Add support for a wiki engine, with at least one of the script working.
 * 2) Add support for alternative wiki text syntax
 * 3) Allow easy transfer of content between wiki engines using Pywikibot.


 * Skills: Python
 * Suggested micro-task: Analyse the wiki engines used by 'anyone can edit' free content projects, and start a wiki page listing at least 10 wiki engines that Pywikibot should support, including benefits and difficulties.
 * Mentors: John Vandenberg, Nemo bis

Experimental wiki engine support
Add basic support for a wiki engine which is conceptually very different to MediaWiki.

Project goals:
 * 1) Add support for a wiki engine.
 * 2) Investigate where Pywikibot is too tightly designed around MediaWiki concepts.
 * 3) Prepare Pywikibot for future wiki concepts


 * Skills: Python
 * Suggested micro-task: Develop a list of wiki engines which use concepts that are very different to MediaWiki. (e.g. Fed Wiki, git based wikis, TiddlyWiki/giewiki), indicating which concepts the wiki engine has which are not present in MediaWiki
 * Mentors: John Vandenberg, Nemo bis