Outreach programs/Possible projects

We are using this list of projects as a master branch for Mentorship programs such as Google Summer of Code and Outreach Program for Women. The projects listed are good for students and first time contributors but they require a good amount of work. They might also be good candidates for Individual Engagement Grants.


 * Featured project ideas usually have mentors ready for you to jump in.
 * Raw projects are interesting ideas that have been proposed but might lack definition, consensus or mentors, and therefore we can't feature them.

If you are looking for smaller tasks check the Annoying little bugs. For a more generic introduction check How to contribute.



Be part of something big
We believe that knowledge should be free for every human being. We prioritize efforts that empower disadvantaged and underrepresented communities, and that help overcome barriers to participation. We believe in mass collaboration, diversity and consensus building to achieve our goals.

Wikipedia has become the fifth most-visited site in the world, used by more than 400 million people every month in more than 270 languages. We have other content projects including Wikimedia Commons, Wikidata and the most recent one, Wikivoyage. We also maintain the MediaWiki engine and a wide collection of open source software projects around it.

But there is much more we can do: stabilize infrastructure, increase participation, improve quality, increase reach, encourage innovation.

You can help to these goals in many ways. Below you have some selected ideas.

Where to start
Maybe at this point your proposal is just a vague idea and you want to get some feedback before investing much more time planning it? We know this feeling very well! Just send an email to wikitech-l (or qgil@undefinedwikimedia.org if you prefer) sharing what you have in mind. One short paragraph can be enough to get back to you and help you working in the right direction.

Any potential contributor new to our community is encouraged to follow the Landing instructions. Use your user page to introduce yourself and draft your project (use the template). The GSOC student guide is a good resource for anybody willing to write a good project proposal. And then there is a list of DOs and DON'Ts full of practical wisdom.

To set up your MediaWiki developer environment, we recommend you start installing a local instance using mediawiki-vagrant. You can also have a fresh MediaWiki to test on a remote server. Just get developer access and request your own instance at Wikitech.

If you have general questions you can start asking at the |Discussion page. IRC channel is also a good place to find people and answers. We do our best connecting project proposals with Bugzilla reports and/or wiki pages. Other contributors may watch/subscribe to those pages and contribute ideas to them. If you can't find answers to your questions, ask first in those pages. If this doesn't work then go ahead and post your question to the wikitech-l mailing list.

Featured project ideas
Below you can find a list of ideas that already have gone through a reality check and have mentors confirmed. You can find more suggestions in our list of Raw projects.

But before, let us talk about...

Your project
That's right! If you have a project in mind we want to hear about it. We can help you assessing its feasibility and we will do our best finding a mentor for it.

Here you have some guidelines for project ideas:


 * Opportunity: YES to projects responding to generic or specific needs. YES to provocative ideas. NO to trivial variations of existing features.
 * Community: YES to projects encouraging community involvement and maintenance. NO to projects done in a closet that won't survive without you.
 * Deployment: YES to projects that you can deploy. YES to projects where you are in sync with the maintainers. NO to projects depending on unconvinced maintainers.
 * MediaWiki != Wikipedia: YES to generic MediaWiki projects. YES to projects already backed by a Wikimedia community. NO to projects requiring Wikipedia to be convinced.
 * Free content: YES to use, remix and contribute Wikimedia content. Yes to any content with free license. NO to proprietary content.
 * Free API: YES to the MediaWiki API. Yes to any APIs powered with free software. NO to proprietary APIs.



Parsoid
The Parsoid project is developing a wiki runtime which can translate back and forth between MediaWiki's wikitext syntax and an equivalent HTML / RDFa document model with better support for automated processing and visual editing. It powers the VisualEditor project, Flow and semantic HTML exports.

Parser migration tool
Periodically, we come across some bit of wikitext markup we'd like to deprecate. See Parsoid/limitations, Parsoid/Broken wikitext tar pit, and (historically) MNPP for examples. We'd like to have a real slick tool to enhance communication with WP editors about these issues:
 * It would display a list of wiki titles (filtered by wikipedia project) which contain deprecated wikitext. Each title would link to a page which would briefly describe the problem(s), general advice on how the wikitext should be rewritten, and (perhaps) some previously-corrected pages for editors to look at.
 * Ideally this would be integrated with a wiki workflow and/or contain "revision tested" information so that editors can 'claim' pages from the list to fix and don't step on each others work. Fixed/revised pages would be removed from the list until their new contents could be rechecked.
 * It should be as easy as possible for Parsoid developers to add new "bad" pattern tests to the tool. These would get added to the testing, with appropriate documentation of the problem, so that editors don't have to learn about a new tool/site for every broken pattern.
 * Some of these broken bits of wikitext might be able to be corrected by bot. The tool could still create a tasklist for the bot and collect and display the bots' fixes for editors to review.
 * The backend which looks for broken wikitext could be based on the existing round-trip test server. Instead of repeatedly collecting statistics on a subset of pages, however, it would work its way through the entire wikipedia project looking for broken wikitext (and preventing regressions).
 * Some cleverness might be helpful to properly attribute bad wikitext to a template rather than the page containing the template. This is probably optional; editors can figure out what's going on if they need to.


 * Skills: node.js, and probably MediaWiki bots and/or extensions as well. A candidate will ideally have some node.js experiences and some notions of web and UX design.  This task could be broken into parts, if a candidate wants to work only on the front-end or back-end portions of the tool.
 * Mentors: C. Scott Ananian, Subramanya Sastry

Import the very early Usemod edit history of Wikipedia with Parsoid
The very early history of Wikipedia used a Usemod wiki that had a different syntax than MediaWiki. This history was never imported into MediaWiki. Parsoid now has very good support for parsing and serializing MediaWiki syntax, and also provides a future-proof HTML format. The idea for this project is to develop a specialized tokenizer for the old usemod syntax. By leveraging the Parsoid infrastructure, this can then be used import the very early Wikipedia history into the database as both HTML and MediaWiki Wikitext. This would finally make the very early history of Wikipedia widely available to historians and Wikipedians. It is also very likely to provide an interesting perspective on how far Wikipedia has come since those early days.


 * Skills: JavaScript and node.js. Experience with parser generators is a plus.
 * Mentors: Gabriel Wicke, Subramanya Sastry

Internationalization and localization
Internationalization (i18n) and localization (L10n) are part of our DNA. The Language team develops features and tools for a huge and diverse community, including 258 Wikipedia projects and 349 MediaWiki localization teams. This is not only about translating texts. Volunteer translators require very specialized tools to support different scripts, input methods, right-to-left languages, grammar...

Below you can find some ideas to help multilingualism and sharing of all the knowledge literally for everybody in their own language.

Multilingual, usable and effective captchas
This project is very ambitious and challenging. Current CAPTCHAs are mostly broken, and still they are important to guard web sites like Wikipedia from a lot of spam. Risk of failure is high, but when it succeeds, the rewards may be huge.

This project has a large research, design and user test component. The student will research and assess ways to use different CAPTCHA options, designed for multilingualism, to identify a more effective CAPTCHA than the current implementation used by Wikimedia. The student will create an implementation for use in MediaWiki of the identified CAPTCHA method. See related bug 32695. Some prototypes have been designed a while ago.


 * Skills: Design, JavaScript and PHP.
 * Mentors: Siebrand Mazeland and Pau Giner; User:Emufarmers

Tools for mass migration of legacy translated wiki content
The MediaWiki Translate extension has a page translation feature to make the life of translators easier. It allows structured translation of wiki pages separating text strings from formatting or images, and also tracks changes in the source pages (usually in English). You can see it in action (click the Edit view). Often, wikis have a lot of legacy content that requires tedious manual conversion to make it translatable. It would be useful to have a tool to facilitate the conversion. You would show the proof of concept in Meta-Wiki, a Wikimedia community looking forward for a project like this.


 * Skills: PHP, interest in usability and conducting user research.
 * Mentors: Niklas Laxström, Federico Leva.

New media types supported in Commons
Wikimedia Commons a database of millions of freely usable media files to which anyone can contribute. The pictures, audio and video files you find in Wikipedia articles are hosted in Commons. Several free media types are already supported but there are more requested by the community, like e.g. X3D for representing 3D computer graphics or KML/KMZ for geographic annotation and visualization. Considerations need to be taken for each format, like security risks or fallback procedures for browsers not supporting these file types.


 * Skills: PHP at least. Good knowledge of the file type chosen will be more than helpful.
 * Mentors: Bryan Davis and ?.

Allowing 3rd party wiki editors to run more CSS features
The 3rd party CSS extension allows editors to style wiki pages just by editing them with CSS properties. It could be more powerful if we find a good balance between features and security. Currently this extension relies on basic blacklisting functionality in MediaWiki core to prevent cross-site scripting. It would be great if a proper CSS parser was integrated and a set of whitelists implemented.

Additionally, the current implementation uses data URIs and falls back to JavaScript when the browser doesn't support them. It would be a great improvement if the MediaWikiPerformAction (or similar) hook was used to serve the CSS content instead. This would allow the CSS to be more cleanly cached and reduce or eliminate the need for JavaScript and special CSS escaping.


 * Skills: PHP, CSS, JavaScript, web application security.
 * Mentors: Rusty Burchfield and ?.

MassMessage page input list improvements
The MassMessage extension currently allows for users to create lists of pages to send messages to using a parserfunction:. This is not very user friendly and should be replaced with a structured ContentHandler, probably in JSON. This page has some more technical details.

This will involve implementing a ContentHandler-based backend and a frontend for adding/editing/removing entries in JavaScript (with a non-JS fallback).


 * Skills: PHP, JS, probably minor design/CSS
 * Mentors: Legoktm and MZMcBride

Automatic cross-language screenshots for user documentation
MediaWiki is a large and complex piece of software, and the user guide for core and those for extensions (like VisualEditor or Translate) each have a large number of images illustrating functions and stages of operation. However, these often date quickly as the software changes, and are generally only available in English or at best a few languages, which means that non-English users are not as well served. It would be fantastic to give documentation maintainers a way to capture the current look of the software across the hundreds of languages that MediaWiki supports. Being able to make screenshots – or even screencasts – of the entire browser window, or sections of it, doing some scripted actions. It would probably be most sensible to do this by extending the existing browser testing framework, which is written in Ruby-based selenium.


 * Skills: Ruby; browser testing.
 * Mentors: James Forrester and the QA team

Semantic MediaWiki
Semantic MediaWiki is a lot more than a MediaWiki extension: it is also a full-fledged framework, in conjunction with many spinoff extensions, and it has its own user and developer community. Semantic MediaWiki can turn a wiki into a powerful and flexible collaborative database. All data created within SMW can easily be published via the Semantic Web, allowing other systems to use this data seamlessly.

There are more than 500 SMW-based sites, including wiki.creativecommons.org, docs.webplatform.org, wiki.mozilla.org, wiki.laptop.org and wikitech.wikimedia.org.

Switching Semantic Forms autocompletion to Select2
The Semantic Forms extension makes use of the jQuery UI Autocomplete JavaScript library for autocompletion - including for multiple-value autocompletion and combo boxes. This solution has worked fine, but better tools now exist. This project would switch Semantic Forms to use the Select2 JS library for autocompletion, which will enable a number of important improvements:


 * "Tokenization" of values (putting squares around each term), which has become increasingly common in user interfaces
 * Flexible autocompletion for characters with accents
 * Display of an image associated with each term
 * Displaying values in a tree-type structure
 * Much better support for autocompletion on remote data sets

Some of these are possible to do just by taking advantage of more recent additions to jQuery UI Autocomplete, but overall Select2 is the much more complete system.

The last of these items, especially, could involve some interesting design work in enabling admins/users to define the layout of remote data sets.


 * Skills: JavaScript and CSS. Experience with PHP or MediaWiki is helpful, but not necessary.
 * Mentors: Yaron Koren

Catalogue for MediaWiki extensions
MediaWiki can be enhanced with additional functionality by adding extensions. There are currently about 2000 extensions available on MediaWiki.org. However, it is hard to identify and assess which extension fits a particular need. Moreover, it is not clear which version of the extension to take for a particular MediaWiki version. And if you want to find the most popular or most frequently downloaded extensions, you'd have to go to a third party site like WikiApiary.

This situation leaves a lot of room for improvement and creative ideas. It would be fantastic if came up with an extension that runs on MediaWiki.org. Here are some features that will bring great benefit to all the 3rd party users of MediaWiki:
 * An set of structured information about extension (this might be download numbers, release state, ratings or version compatibility). Some of this information can already be found on the extension pages.
 * A catalogue function where you can search for extensions, find similar extensions, sort them by popularity, authors, rating, etc.
 * The possibility to add external data such as WikiApiary's usage statistics
 * A redesign of the presentation layer, so that it is actually good fun to browse the extension catalogue

Technically, it might be an option to integrate with WikiData and be on the bleeding edge of the wiki way to data representation.

There's already a more detailed proposal page which can be found here.


 * Skills: PHP, JavaScript and MySQL
 * Mentors: Markus Glaser and Mark Hershberger

Adding proper email bounce handling to MediaWiki (with VERP)
(description of Bug#46640 by Luke Weilling)

It's likely that many Wikipedia accounts have a validated email address that once worked but is out of date. Mediawiki do not currently unsubscribe users who trigger multiple non-transient failures and some addresses might be 10+ years old. Mediawiki should not keep sending email that is just going to bounce. It's a waste of resources and might trigger spam heuristics.

Two API calls needs to be added : For the second call, authentication will be needed so fake bounces are not a DoS vector or a mechanism for hiding password reset requests. The reason for the threshold is that some failure scenarios will resolve themselves, eg mailbox over quota, so we don't want to react to one bounce. A history of consecutive mails bouncing needs to be maintained.
 * One to generate a VERP address to use when sending mail from Mediawiki.
 * One that records a non-transient failure. That API call would record the current incident and if there had been some threshold level met, eg at least 3 bounces with the oldest at least 7 days ago, then it would un-confirm the user's address so mail will stop going to it.

There would be Mediawiki development component to this task to build the API, to add VERP request calls wherever email is sent, and an Ops component to route VERP bounces to a script (taking the mail as stdin, and optionally e.g. the e-mail address as arguments), which can then call the (authenticated) MediaWiki API method to remove the mail address.



Welcoming new contributors to Wikimedia Labs and Wikimedia Tool Labs
Wikimedia Labs is the hosting platform for volunteer and experimental development. Currently we have difficulties matching new contributors with existing projects. Our documentation for newcomers needs improvements as well. To make it more complex, currently we're integrating the Tools Labs project, with people coming from an existing community with own documentation and processes. As a result, many potential Labs contributors are either lost or they end up creating new projects unaware of existing projects with very similar goals. We welcome project proposals to address this problem, improving our current documentation and finding a way to organize and promote the active initiatives among the more than 170 projects hosted in Labs. Proposals should also include ideas for ways to track and document short-lived projects and distinguish between those that are active and those that are abandoned.


 * Skills: technical writing, organization.
 * Mentors: Andrew Bogott.



= Raw projects =

Make Wiktionary definitions available via the dict protocol
The dict protocol (RFC 2229) is a widely used protocol for looking up definitions over the Internet. We'd like to make Wiktionary definitions available for users. Doing that using the dict protocol would help drive the use and usefulness of Wiktionary, as well.

Possible users:
 * Tablet readers often have dictionary lookup included.
 * Students writing papers would have access to a large corpus of words.
 * Mobile applications for Wiktionary would be less tied to MediaWiki itself.


 * Skills: ?
 * Mentors: Yurik Astrakhan + ?

MediaWiki development
If you're a programmer, we have lots of things for you to do. (To do: copy some relevant ideas from http://socialcoding4good.org/organizations/wikimedia )

Effective anti-spam measures
Use something like a minimal version of Extension:ConfirmAccount to require human approval of each account creation. That is the applicant fills in forms for user name, email and a brief note about who they are and why they want to edit the wiki. Also set the wiki so that the initial few edits also need approval. Then have it that any bureaucrat can approve the account creation, initial edits and remove the user id from moderation. Rob Kam (talk) 09:50, 1 December 2013 (UTC)


 * Requirements have to be clarified here: the proposed approach is much more complex than ConfirmAccount, not "minimal". Perhaps what you want is a sandbox feature? --Nemo 10:32, 1 December 2013 (UTC)


 * Sandbox feature looks good, but for all new accounts not just translators. Rob Kam (talk) 10:49, 1 December 2013 (UTC)

VisualEditor plugins
VisualEditor is a rich visual editor for all users of MediaWiki so they don't have to know wikitext or HTML to contribute well formatted content. It is our top priority and you can already test it on the English Wikipedia. While we focus on the core functionality, you could write a plugin to extend it, such as to insert or modify Wikidata content. There are also many possibilities to increase the types of content supported, including sheet music, poems, timelines…


 * Skills: HTML / JavaScript / jQuery development is required. A good grasp of UX / Web design will make a difference.
 * Mentors: James Forrester, Roan Kattouw, Trevor Parscal.

Flow
Flow brings a modern discussion and collaboration system to MediaWiki. Flow will eventually replace the current Wikipedia talk page system and will provide features that are present on most modern websites, but which are not possible to implement in a page of wikitext. For example, Flow will enable automatic signing of posts, automatic threading, and per-thread notifications.

Templates and bits and pieces for non-WMF MediaWiki installs
For a less than guru level non-WMF Mediawiki wiki admin: After the initial install of a MediaWiki wiki then finding out that templates are also an essential part of setting up a wiki. They have to figure out which wiki to use as a source and then which templates. When the XML gets imported into the destination wiki, they've also got red links to missing sub-templates, details that apply specifically to the source wiki, (e.g. the name and logo of that source wiki) and maybe other features like common.css and common.js need tweaking. Afterwards how do they keep theses features up to date with bug fixes and other changes.

Some of the problems encountered with setting up templates are discussed at Help talk:Templates

There ought to be a simpler and efficient way to do this. Installing and maintaining extensions is mostly a neat and simple process. A central repository for templates and other bits and pieces would be much neater and more manageable way for implementing the rest of the essential features used on MediaWiki wikis, than everyone figure it out for themselves each time anew.

See also Requests for comment/Global scripts, Requests for comment/Global bits and pieces and Global-Wiki

-- Rob Kam (talk) 16:23, 10 February 2014 (UTC)

Improving the skinning experience
Research how to make the development of skins for MediaWiki easier. Many users complain about the lack of modern skins for MediaWiki and about having a hard time with skin development and maintenance. Often sys admins keep old versions of MediaWiki due to incompatibility of their skins, which introduces security issues and prevents them from using new features. However, little effort was done to research the exact problem points. The project could include improving skinning documentation,organizing training sprints/sessions, talking to users to identify problems, researching skinning practices in other open source platforms and suggesting an action plan to improve the skinning experience.

Maria Miteva proposed this project.

Extensions
Check Manual:Extensions and extension requests in Bugzilla.

An easy way to share wiki content on social media services
Wikipedia, as well as other wikis based on MediaWiki, provide an easy way to accumulate and document knowledge, but it is difficult to share it on social media. According to https://strategy.wikimedia.org/wiki/Product_Whitepaper 84% of Wikimedia users were Facebook users as well in 2010, with the portion incresing from previous years. The situation is probably similar with other social media sites. It only makes sense to have an effective "bridge" between MediaWiki and popular social media site. More details here: ]Product Whitepaper.

Some previous work you can use as a base, improve, or learn from:


 * Extension:Widgets
 * Extension:WidgetsFramework - experimental extension
 * Extension:AddThis
 * Extension:Facebook - just Facebook
 * Extension:WikiShare - unstable version, seems like it's not worked on any more

Extension:OEmbedProvider
Finish Extension:OEmbedProvider, as proposed here. See also Bug 43436 - Implement Twitter Cards

Leap Motion integration with MediaWiki
MediaWiki has a wide user base and a lot of users today prefer touch based interfaces. Gesture based interface are friendly and the latest trend. Leap Motion provides controllers that can recognize gestures. It can be integrated with MediaWiki products like Wikisource. As an example, this would make it more friendly for users to flip through pages in a book. Another advantage of using gesture recognition would be to include turning through multiple chapters or pages at a time by identifying the depth of user's finger's motion.

It would also be helpful for flipping through images in Wikimedia Commons.

(Project idea suggested by Aarti Dwivedi).

Work on RefToolbar
The en:Wikipedia:RefToolbar/2.0 extension is incredibly useful, especially for new editors but also for experienced editors (I use it every day, and I've got a few miles under my belt!). But it suffers from bugs and problems, and there are a lot of improvements that could be made. For instance: adding additional reference types, adding fields for multiple authors, tool-tip help guidance, etc. I also suspect it will need an upgrade to match Lua conversions of common cite templates. Also, I don't think this is in wide deployment on other wikis, so translation/deployment could be a project. Looking at the talk page, there are a couple people starting to work on this but serious development isn't happening (so I'm not sure who would mentor this) but the code was recently made accessible. At any rate, it is an extension that really needs some work and where improvements would have immediate benefit for many editors.

Project idea contributed by Phoebe (talk) 23:23, 22 March 2013 (UTC) [n.b.: I can't mentor on the tech side, but can give guidance on the ins and outs of various citation formats in the real world & how cite templates are used on WP].

Global, better URL to citation conversion functionality
Suppose, in Wikipedia, all that needed to be done to generate a perfect citation was to provide a URL? That would be a tremendous step toward getting a much higher percentage of text in Wikipedia articles to be supported by inline citations.

There are already expanders (for the English Wikipedia, at least) that will convert an ISBN, DOI, or PMID, supplied by an editor, into a full, correct citation (footnote). These are in the process of being incorporated into the reference dialog of the VisualEditor extension, making it almost trivial (two clicks, paste, two clicks) to insert a reference.

For web pages, however, the existing functionality seems to be limited to a Firefox add-on. Its limits, besides the obvious requirement to use that browser (and to install the add-on), include an inability to extract the author and date from even the most standard pages (e.g., New York Times), and the lack of integration with MediaWiki.

For a similar approach, using a different plug-in/program, see this Wikipedia page about Zotero.

A full URL-to-citation engine would use the existing Cite4Wiki (Firefox add-on) code, perhaps, plus (unless these exist elsewhere) source-specific parameter specifications. For example, the NYT uses "" for its author information; that format would be known by the engine (via a specifications database). Each Wikipedia community would be responsible for coding these (except for a small starter set, as examples), in the way that communities are responsible for TemplateData for the new VisualEditor extension.

(Project idea suggested by John Broughton.)

Education Program, outreach and projects
The Wikipedia Education Program helps professors and students contribute to Wikipedia as part of coursework. The current Education Program extension provides features for keeping track of the institutions, courses, professors, students and volunteers involved in this. However, the extension has several limitations and will be largely rewritten. Help is needed to design and build new software to support both the Education Program and other related activities, including topic-centric projects and edit-a-thons.

This project offers tons of opportunities to learn about different facets of software development. There's work to be done right away on UX, flushing out details of requirements, and architecture design. On this last point, a fun challenge we'll face is creating elegant code that interfaces with a not-so-elegant legacy system. Another challenge will be to create small deliverables that are immediately useful, that can replace parts of the current software incrementally, and that can become components of the larger system we're planning.

Student developers eager to dive into coding tasks can also take bugs on the current version of the software&mdash;much of which will remain in production for a while yet. In doing so, they'll practice their code-reading skills, and will get to deploy code to production quickly. :)


 * Skills: PHP, Javascript, CSS, HTML, UI design, usability testing, and object-oriented design
 * Mentors: Andrew Green, Sage Ross.

Wikimedia Commons / multimedia
Sébastien Santoro (Dereckson) can mentor these projects idea.

Support for text/syntax/markup driven or WYSIWYG editable charts, diagrams, graphs, flowcharts etc.
Resuscitate Extension:WikiTeX and fold Extension:WikiTex into it.

Provide a way to create interactive 2D/3D timelines and infographics à la Java applets, AJAX, Flash
We almost surely don't want to invent our own markup, but SVG probably doesn't suffice and we surely won't use any proprietary format. Ideally we would adopt some syntax/format/technology already adopted and supported by a lively community, preferably offering a certain amount of existing timelines/infographics and other resources which we would then be able to directly use on Wikimedia projects, save copyright incompatibilities. Perhaps http://timeline.knightlab.com/, used by Reasonator?

VisualEditor support for EasyTimeline
Also mentioned at.

Accessibility for the colour-blind
Commons has a lot of graphs and charts used on Wikipedia and elsewhere, but few consider how they look with colour blindness, mostly because the creator/uploader has no idea. Accessibility lists some tools that can be used to automatically transform images into how they are seen by colour blind people. We could run such automated tools on all Commons graphs and charts and reporting the results, ideally after assessing automatically in some way that the resulting images are not discernible enough, lower than some score. The warnings can be relayed with some template on the file description or directly to the authors and can havhe a huge impact on the usefulness of Commons media.

Depending on skills and time constraint, the project taker would do 1, 1-2 or 1-3 of these three steps: 1) develop the code for such an automatic analysis based on free software, 2) identify what are the images to check on the whole Commons dataset and run the analysis on it producing raw results, 3) publish such results on Commons via bot in a way that authors/users can notice and act upon.

Wikidata
Wikidata is a free knowledge base that can be read and edited by humans and machines alike. If you understand the difference between plain text and data you will understand that this project is Wikipedia's Game-changer. The conversion from text to Wikidata content fields has started in Wikipedia and sister projects and continues diving deeper, but there is still a lot to do!

The Wikidata team welcomes your suggestions.
 * Mentors: Wikidata team available. Lydia Pintscher is provisionally acting as proxy.

Merge proofread text back into Djvu files
Wikisource, the free library, has an enormous collection of Djvu files and proofread texts based on those scans. However, while the DjVu files contain a text layer, this text is the original computer generated (OCR) text and not the volunteer-proofread text. There is some previous work about merging the proofread text as a blob into pages, and also about finding similar words to be used as anchors for text re-mapping. The idea is to create an export tool that will get word positions and confidence levels using Tesseract and then re-map the text layer back into the DjVu file. If possible, word coordinates should be kept.

Mentors and skills:
 * Project proposed by Micru. I have found an external mentor that could give a hand on Tesseract, now I'm looking for a mentor that would provide assistance on Mediawiki.
 * Aubrey can be a mentor providing assistance regarding Wikisource, and some past history of this issue. Not much, but glad to help if needed.
 * Rtdwivedi is willing to be a mentor.

Mobile reporting platform
Wikinews encourages its journalist to engage in original reporting. Timeliness of contributing it very important, and being able to do reporting from the field can be essential for larger events such the Paralympic Games, World Championships, Olympic Games, and Commonwealth Games. Creating a reporting toolkit that could allow reporting from a tablet device would be extremely useful. Some of these tasks are already built into other projects, such as commons. Tasks that would be nice for a mobile reporting tool for reporters to do (though not all required but would be nice to have):
 * Recording audio files in ogg or converting audio files to an ogg format. Being able to chose whether to upload these files to a local Wikinews project or to Commons under a compatible license.
 * Transcription of audio files into text format.
 * Recording video files in ogv or converting video files to an ogv format. Being able to chose whether to upload these files to a local Wikinews project or to Commons under a compatible license.
 * Being able to upload pictures to a local Wikinews project or to Commons under a compatible license.  Automatically create gallery code with the Wikinews photographer credited in compliance with local Wikinews policies for photo essays.
 * Being able to type notes that can be automatically shared on the talk page of the associated Wikinews article being developed.
 * Being able to import e-mails that are used for reporting verification in PDF format to the program that can be uploaded locally to Wikinews or can be sent to scoop@wikinewsie.org. Have an automatic message posted to the talk page of the associated Wikinews article being developed explaining were the verifying information can be found.
 * Allowing reporters to sort audio, video, image files, typed reporter notes, e-mailed information by reporting event for ease in use.
 * Automate part of the process for creating an article on Wikinews by pulling in the default draft template, autogenerating category for the reporter, putting in the original reporting template, pulling in the location from the device to say where the reporting is being submitted from, then querying reporter about topic for appropriate infobox and categories for the article.

Laura Hale and pi zero, both on the provisional board of The Wikinewsie Group would be keen to act as mentors for how to implement this from a community perspective. Some limited technical assistance may be available. The tool may also be useful for educational outreach programs and some Commons uploading activities.

Mobile viewing application
A mobile web application would be nice to have for Wikinews that could be downloaded from Apple's App Store or Android's App Store.

Laura Hale and pi zero, both on the provisional board of The Wikinewsie Group would be keen to act as mentors for how to implement this from a community perspective. Some limited technical assistance may be available. The tool may also be useful for educational outreach programs and some Commons uploading activities.

Automated news translation
There have been several discussions about implementing machine assisted translation tools on Wikipedia. Wikinews would strongly benefit from this too, because it would make reporting more cost effective by bringing the total number of stories generated to a higher number. Most projects have their own review process before publishing and their own local style guides. Some of the localization information is available on English Wikinews. There is no preference for the technical backend of how automated news translation is done, nor any specific requirements for the amount of human assistance needed to insure article publication. The community is willing to discuss any potential experiments in implementation.

Laura Hale and pi zero, both on the provisional board of The Wikinewsie Group would be keen to act as mentors for how to implement this from a community perspective. Some limited technical assistance may be available. The tool may also be useful for educational outreach programs and some Commons uploading activities.

Generic, efficient Localisation Update service
We do not know how much current Localisation Update extension is used outside the Wikimedia Foundation, but we believe that number to be quite low. One of the reasons for low adoption might be the cron jobs and other manual configuration needed to be able to use the extension. We can avoid much of that complexity by moving part of it to a new, separate service (server), which could be hosted by translatewiki.net for example.

The service would keep track of translation updates in way, which allows clients to only request a delta of changes since last update. If we also take into account, that not all wikis need all 300 or so languages, the updates would become a lot faster as opposed to the current way of working: the client downloads all latest translation for all extensions and languages to local cache, and then compares which translations can be updated.

The server does not need to be MediaWiki specific so that other software projects could also use it to implement low-delay localisation updates.


 * Skills: PHP for the extension (client), language of choice for the backend (server), capable of designing a protocol between the server and client.
 * Mentors: Niklas Laxström, Kartik Mistry

Extensive and robust localisation file format coverage
Translate extension supports multiple file formats. The formats have been developed "as needed" basis, and many formats are not yet supported or the support is incomplete. In this project the aim would be to make existing file formats (for example Android xml) more robust to meet the following properties:
 * the code does not crash on unexpected input,
 * there is a validator for the file format,
 * the code can handle the full file format specification,
 * the code is secure (does not execute any code in the files nor have known exploits).

In addition new file formats (for example TMX) can be implemented. This is a good chance to learn how to write parsers and generators with simple data but complicated file formats. For some formats, it might be possible to take advantage of existing PHP libraries for parsing and file generation. (More example formats other platforms support: OpenOffice.org SDF/GSI, Desktop, Joomla INI, Magento CSV, Maker Interchange Format (MIF), .plist, Qt Linguist (TS), Subtitle formats, Windows .rc, Windows resource (.resx), HTML/XHTML, Mac OS X strings, WordFast TXT, ical.)

This project paves the way for future improvements, like automatic file format detection, support for more software projects and extension of the ability to add files for translation by normal users via a web interface.


 * Skills: PHP, XML, aware how to write robust and secure PHP code
 * Mentors: Niklas Laxström, Siebrand Mazeland

One stop translation search


A Special:SearchTranslations page has been created for the Translate extension to allow searching for translations. However it has not been finished and it lacks important features: in particular, being able to search in source language, but show and edit messages in your translation language. The interface has some bugs with facet selection and direct editing of search results is not working properly. It is not possible to search by message key unless you know the special syntax, nor to reach a message in one click. Interface designs are available for this page.

A possible extension for the project scope: the backend is currently Solr, but an alternative backend based on ElasticSearch via Elastica is wanted to ensure good integration with WMF's ElasticSearch cluster.


 * Skills: Backend coding with PHP, frontend coding with jQuery, Solr/ElasticSearch/Lucene
 * Mentors: Niklas Laxström

Visual translation: Integration of page translation with Visual Editor
The wiki page translation feature of the Translate extension does not currently work with Visual Editor due to the special tags it uses. More specifically, this is about editing the source pages that are used as the source for translations, not the translation process itself. The work can be divided into three steps:
 * 1) Migrate the special tag handling to a more standard way to handle tags in the parser. This need some changes to the PHP parser for it to be able to produce wanted output.
 * 2) Add support to Parsoid and Visual Editor so that editing page contents preserves the structures that page translation adds to keep track of the content.
 * 3) Add to Visual Editor some visual aid for marking the parts of the page that can be translated.

This can be a difficult project due to complexities of wikitext parsing and intersecting multiple different products: Translate, MediaWiki core parser, Parsoid, Visual Editor.


 * Skills: PHP, JavaScript, wikitext parsing
 * Mentors: Niklas Laxström + one person from parsoid team? (C. Scott Ananian? Gabriel Wicke?)

Wiki page translation revisited
The wiki page translation feature of the Translate extension has become successful. As the usage has grown to several dozens wikis, new issues have come up which if fixed would ensure smooth user experience and further expansion. Each issue alone is not a major thing, but together they make page translation less nice than it could be.
 * (bug 35489) The user is unable to set the page source language: this prevents translation to the wiki content language from other languages. For instance, Wikimedia chapters would often like to translate their reports from their language to English on Meta-Wiki.
 * (bug 34098) Currently the page title is always up for translation. For some pages the title is not relevant and unused because the content is consumed in other ways. If the translation admin could choose not to translate the page title, translators time would be saved.
 * (bug 51533) The interface on the pages itself has been nominated for redesign. Main issues are the language selection and calls to action. The first one takes too much space and is hard to use if there are many languages, the latter is hard to notice but in a place where it can break the page flow.
 * (bug 37297, 39415) Updating issues when moving or deleting translation units (for example to remove spam).
 * (bug 36298) The page Special:AggregateGroups is clunky, lacks features and does not scale well to thousands of pages. It needs some re-architecturing to stay usable.


 * Skills: mostly PHP, some JavaScript
 * Mentors: Niklas Laxström + ?

Multilingual SemanticMediaWiki
Semantic MediaWiki would benefit from being multilingual-capable out of the box. We could integrate it with the Translate extension. This can be done in some isolate steps. Some of the steps could be:


 * Fix the issues that prevent full localisation of semantic forms.
 * Enhance Special:CreateForm and friends to create forms that are already i18ned with placeholders and message group for Translate extension.
 * Make it possible to define translation for properties and create a message group for Translate extension
 * There are lot of places where properties are displayed: many special pages, queries, property pages. Some thinking is required to find out a sensible way to handle translations on all these places.


 * Skills: PHP and web frontend, has used SemanticMediaWiki and SemanticForms is a plus.
 * Mentors: Niklas Laxström (with yet unknown co-mentor from SMW).

Improve Extension:WebFonts or Extension:UniversalLanguageSelector for Chinese (or CJK) wikis
Chinese uses too many characters, and many are rarely used so it's not often installed on readers' systems. However including all of them in the font file makes it huge, so we may want to tailor the font file for every page based on characters used on that page.

As of writing, there isn't any "good" enough free font which includes all Chinese characters in Unicode and the "wiki" concept itself encourages collaborative content creation, so it would be nice to invite user to create a glyph for it when the system sees a character without existing data (remember we need free contents). en:WenQuanYi and glyphwiki.org already have some online glyph creators which can be useful for us.

[Update: Hanazono (with Japanese glyph) is (almost?) complete, but the size issue mentioned in the first paragraph still applies.]

Maybe we can donate glyphs created by wiki users to other projects, but we have to make sure our data meet their quality standards...


 * Skills: PHP, Web frontend, Font creation and management. Some knowledge of CJK characters can be a plus.
 * Mentors: User:Liangent + ?

Distributed cron replacement
A common requirement in infrastructure maintenance is the ability to execute tasks at scheduled times and intervals. On Unix systems (and, by extension, Linux) this is traditionally handled by a cron daemon. Traditional crons, however, run on a single server and are therefore unscalable and create single points of failure. While there are a few open source alternatives to cron that provide for distributed scheduling, they either depend on a specific "cloud" management system or on other complex external dependencies; or are not generally compatible with cron.

The Wikimedia Labs has a need for a scheduler that:
 * Is configurable by traditional crontabs;
 * Can run on more than one server, distributing execution between them; and
 * Guarantees that scheduled events execute as long as at least one server is operational.

The ideal distributed cron replacement would have as few external dependencies as possible.

&mdash; Coren (talk)/(enwp) 19:29, 23 November 2013 (UTC)

Implementing volunteer testing tracking framework
Wikimedia frequently deploys changes to software. It is always useful to test features as early and thoroughly before deployment. Currently Wikimedia doesn't have a proper process to communicate with volunteer testers and invite them to test features. Sometimes the wikitech-ambassadors list is used, sometimes new features run in beta and volunteers are invited to write up their experiences on a talk page somewhere, but very frequently features are not announced at all. The situation is complicated by the fact that the different Wikimedia sites work in almost 300 languages with different fonts, different string lenghts, different templates, different extensions, different CSS etc.

One way to solve this is to develop some tools and procedures to communicate with prospective volunteer testers and to collect feedback from them, both positive and negative. It can be a simple form that says: feature x, languages XX, OK/FAIL. See an example from Fedora here: QA-L10N:nautilus test day. In Fedora, the technical side of things is actually just a MediaWiki table. We could just use that, or we could do something even better: maybe a MediaWiki extension, or maybe even some non-MediaWiki-based technology.

In any case, an easy-to-understand workflow would be very important, even if the technical tools are good, and writing these tools and procedures would be a very useful contribution to the MediaWiki developers' and users' community. --Amir E. Aharoni (talk) 19:36, 14 November 2012 (UTC)

System documentation integrated in source code
It would be really nice if inline comments, README files, and special documentation files could exist in the source code but be exported into a formatted, navigable system (maybe wiki pages or maybe something else). It could be something like doxygen, except better and orientated to admins and not developers. Of course it should integrate with mediawiki.org and https://doc.wikimedia.org.

The idea would be that one could:
 * Keep documentation close to the code and thus far more up to date
 * Even enforce documentation updates to it with new commits sometimes
 * Reduce the tedium of making documentation by using minimal markup to specify tables, lists, hierarchy, and so on, and let a tool deal with generating the html (or wikitext). This could allow for a more consistent appearance to documentation.
 * When things are removed from the code (along with the docs in the repo), if mw.org pages are used, they can be tagged with warning box and be placed in maintenance category.

Proposed by Aaron Schulz.

Ranking articles by pageviews for wikiprojects and task forces in languages other than English
Currently we have an amazing tool which every month determine what pages are most viewed for a Wikiproject and then provides a sum of the pageviews for all articles within that project. An example of the output for WikiProject Medicine in English.

The problems is that this tool only exists in English and is running on toolserver rather than Wikimedia Labs. So while we know what people are looking at in English, and this helps editors determine what articles to work on, other languages do not have this ability.

Additionally we are do not know if the topics people look up in English are the same as those they look up in other languages. In the subject area of medicine this could be the basis of a great academic paper and I would be happy to share authorship with those who help to build these tools.

A couple of steps are needed to solve this problem:
 * 1) For each article within a Wikiproject in English, take the interlanguage links stored at wikidata, and tag the corresponding article in the target language
 * 2) Figure out how to get Mr. Z's tool to work in other languages . He supposedly is working on it and I am not entire clear if he is willing to have help. Another tool that could potentially be adapted to generate the data is already on Labs

James Heilman (talk) 21:13, 14 September 2013 (UTC)

Beyond development
Featured projects that focus on technical activities other than software development.

Research & propose a catalog of extensions
Extensions on mediawiki.org are not very well organized and finding the right extension is often difficult. Listening community members you will hear about better management of extension pages with categorization, ratings on code quality, security, usefulness, ease of use, good visibility for good extensions, “Featured extensions”, better exposure and testing of version compatibility... This project is about doing actual research within our community and out there to come up with a proposal both agreed and feasible. A plan that a development team can just take to start the implementation.


 * Skills: research, negotiation, fluent English writing. Technical background and knowledge of MediaWiki features and web development features will get you sooner to the actual work.
 * Mentors: Yury Katkov + ?

Annotation tool that extracts statements from books and feed them on Wikidata
Wikidata is a free knowledge base that can be read and edited by humans and machines alike. If you understand the difference between plain text and data you will understand that this project is Wikipedia's Game-changer. The conversion from text to Wikidata content fields has started in Wikipedia and sister projects and continues diving deeper, but there is still a lot to do!

Now think about this: you are at home, reading and studying for pleasure, or an assignment, or for your PhD thesis. When you study, you engage with the text, and you often annotate and take notes. What about a tool that would let you share important quotes and statements to Wikidata?

A statement in Wikidata is often a simple subject - predicate - object, plus a source. Many, many facts, in the books you read, can be represented in this structure. We an think of a way to share them.

A client-side browser plugin or script or app that would take some highlighted text, offering you a GUI to fix up the statement and source, and then feed it into Wikidata.

We could unveil a brand-new world of sharing and collaborating, directly from you reading.

Possible projects:
 * Pundit. http://www.thepund.it/ (the team is aware of Wikidata and willing to collaborate).
 * Annotator https://github.com/okfn/annotator,
 * Mentors: Aubrey is available for mentorship, paired with a technical expert.

Google Books > Internet Archive > Commons upload cycle
Wikisources all around the world use heavily GB digitizations for transcription and proofreading. As GB provides just the PDF, the usual cycle is:
 * 1) go to Google Books and look for a book
 * 2) check if the book is already in IA
 * 3) if it's not, upload it there (library)
 * 4) get the djvu from IA
 * 5) upload it on Commons
 * 6) use it on Wikisource

For point 4, we have this awesome tool: IA-Upload. What we miss right now is a tool for point 2.1, that would serve many other users outside the Wikimedia movement too. Eventually, we could think of a bot/script which would do all the work altogether, notifying the user when their help is needed (eg metadata polishing, Commons categories, etc.) Relevant Wikisource help pages include s:Help:Internet Archive and s:Help:Internet Archive/Requested uploads.
 * Mentors: Aubrey is available for "design" mentorship, paired with a technical expert. We can maybe ask help from a IA expert.
 * Mentors: Yann (talk) is available for help regarding uploads of Google Books and files from IA
 * Mentors: Tpt is available as technical mentor. He has no special knowledge of InternetArchive and Google Books. Is also the creator of IA-Upload.

OpenHistoricalMap & Wikimaps
Wikimaps is an initiative to gather old maps in Wikimedia Commons and place them in world coordinates with the help of Wikimedia volunteers. Connecting with OpenHistoricalMap the historical maps can be used as reference for extracting historical geographic information. Additionally, the resulting historical geodata can be connected back to the data repository of Wikimedia through Wikidata, creating a community maintained spatiotemporal gazetteer.

We hope to foster better visualisation of the raw data held by the OpenHistoricalMap by allowing the rendering to work temporally as well as spatially. The tile rendering software will be modified to support a date range while the OSM rails port will be addled by a time / date sliders so that the current OSM tools stack can work within a specific time and place.

Mentors
 * Enhance the iD and The_Rails_Port so that a javascript time/date slider can be added to control the time period that is of interest.
 * Enhance the ID and The_Rails_Port so that meta-data hooks are added to the code that allow for custom deployments of both software. The intent is to support their use as dedicated user interfaces to certain applications (such as medieval walking path editing) while still using a generic data source.
 * Modify the Mapnik tile renderer to handle Key:start_date and Key:end_date.
 * Rob Warren, Muninn project
 * Shekhar Krishnan, Topomancy LLC
 * Jeff Meyer, GWHAT

= Very raw projects =
 * Taken from the former "Annoying large bugs" page.


 * A system to make, review and action requests for migrating or creating repositories in Git/Gerrit
 * This is currently performed on-wiki, which has hit its limits in terms of scaling.
 * This theoretically could be a Flow workflow, as and when those are available.


 * Making our puppet servers HA and load balanced without having to change all of the security certificates


 * A proper extension (rather than gadget) to manage and run the Commons Picture of the Year contest
 * Runs in May of each year
 * see mailing list discussion for specifications


 * Making a ReCAPTCHA-like solution that helps with proofreading for Wikisource
 * Help the developers on GitHub


 * Make frequently updated special pages like Recent Changes update themselves automatically on the client


 * "An extension that manages all templates... It could be like a library of templates, only now they can be easily imported to other wikis."


 * Global user preferences
 * architecturally very important (if not critical) to a number of projects
 * As far as I remember, the backend for this is largely completed, it just needs a sensible UI
 * Andrew Garrett writes:
 * I tried to implement this when I completely refactored the preferences system in 2009. It was eventually reverted in r49932. The main blocker was basically considering a way to decide which preferences would have their values synchronised. A UI would need to be developed for that and you'd need some extensive consultation on that fact.
 * If you were to implement this, you could potentially use my original implementation as a guide, though it is reasonably "in the guts" of MediaWiki so you'd have to be reasonably confident "code diving" into unfamiliar software packages.


 * using onscreen keymaps from Narayam's code base to build a mobile-focused app where one could choose and load a keymap from. This would be a great app to have on the mobile app stores for Boot2Gecko and Android.


 * Moving categories (and updating members of it) members
 * Discussed endlessly; needs to be properly implemented; move support, etc. should already be there; just needs schema support and some software tweaks?


 * HTML e-mail support
 * Requires some design expertise, but it'd be nice to have MediaWiki e-mails stop looking as though they're from 1995, especially as they're much more visible nowadays with ENotif (email notifications) enabled on Wikimedia wikis
 * Some of this was done as part of Notifications.


 * Fix user renames to be less fragile and horrible
 * Lots of breakages from renames of users with a lot of edits; old accounts need to be fixed in a sensible way and new borkages need to be properly prevented


 * Let users rename themselves
 * Restrict to those with zero edits?
 * Or not?
 * Major community policy issues.


 * Group similar pages in watchlist
 * Possibly a gadget? Make it easier for editors to sort through their watchlists.
 * See en:User:Js/watchlist.js


 * Database layer should automagically add GROUP BY columns on backends that need them (PostgreSQL)
 * Will improve the MediaWiki installation and maintenance experience for all MediaWiki installations on PostgreSQL, SQL Server, Oracle, SQLite, or any other non-MySQL database.
 * Needs some database experience


 * Add user preference to deactivate/delete user account
 * might be possible to piggyback on the HideUser functionality that already exists in MediaWiki core


 * Add a read-only API for CentralNotice