Outreach programs/Possible projects

We are using this list of projects as a master branch for Mentorship programs such as Google Summer of Code and Outreach Program for Women. The projects listed are good for students and first time contributors but they required a good amount of work. They might also be good candidates for Individual Engagement Grants.


 * Featured project ideas usually have mentors ready for you to jump in.
 * Ongoing tasks are neverending (but fun) activities with mentors available, although they might not be suitable for e.g. GSOC.
 * Raw projects are interesting ideas that have been proposed but might lack definition, consensus or mentors, and therefore we can't feature them.

If you are looking for smaller tasks check the Annoying little bugs. For a more generic introduction check How to contribute.



Be part of something big
We believe that knowledge should be free for every human being. We prioritize efforts that empower disadvantaged and underrepresented communities, and that help overcome barriers to participation. We believe in mass collaboration, diversity and consensus building to achieve our goals.

Wikipedia has become the fifth most-visited site in the world, used by more than 400 million people every month in more than 270 languages. We have other content projects including Wikimedia Commons, Wikidata and the most recent one, Wikivoyage. We also maintain the MediaWiki engine and a wide collection of open source software projects around it.

But there is much more we can do: stabilize infrastructure, increase participation, improve quality, increase reach, encourage innovation.

You can help to these goals in many ways. Below you have some selected ideas.

Where to start
Maybe at this point your proposal is just a vague idea and you want to get some feedback before investing much more time planning it? We know this feeling very well! Just send an email to wikitech-l (or qgil@undefinedwikimedia.org if you prefer) sharing what you have in mind. One short paragraph can be enough to get back to you and help you working in the right direction.

Any potential contributor new to our community is encouraged to follow the Landing instructions. Use your user page to introduce yourself and draft your project (use the template). The GSOC student guide is a good resource for anybody willing to write a good project proposal. And then there is a list of DOs and DON'Ts full of practical wisdom.

To set up your MediaWiki developer environment, we recommend you start installing a local instance using mediawiki-vagrant. You can also have a fresh MediaWiki to test on a remote server. Just get developer access and request your own instance at Wikitech.

If you have general questions you can start asking at the |Discussion page. IRC channel is also a good place to find people and answers. We do our best connecting project proposals with Bugzilla reports and/or wiki pages. Other contributors may watch/subscribe to those pages and contribute ideas to them. If you can't find answers to your questions, ask first in those pages. If this doesn't work then go ahead and post your question to the wikitech-l mailing list.

Featured project ideas
Below you can find a list of ideas that already have gone through a reality check and have mentors confirmed. You can find more suggestions in our list of Raw projects.

But before, let us talk about...

Your project
That's right! If you have a project in mind we want to hear about it. We can help you assessing its feasibility and we will do our best finding a mentor for it.

Here you have some guidelines for project ideas:


 * Opportunity: YES to projects responding to generic or specific needs. YES to provocative ideas. NO to trivial variations of existing features.
 * Community: YES to projects encouraging community involvement and maintenance. NO to projects done in a closet that won't survive without you.
 * Deployment: YES to projects that you can deploy. YES to projects where you are in sync with the maintainers. NO to projects depending on unconvinced maintainers.
 * MediaWiki != Wikipedia: YES to generic MediaWiki projects. YES to projects already backed by a Wikimedia community. NO to projects requiring Wikipedia to be convinced.
 * Free content: YES to use, remix and contribute Wikimedia content. Yes to any content with free license. NO to proprietary content.
 * Free API: YES to the MediaWiki API. Yes to any APIs powered with free software. NO to proprietary APIs.



Reuse / Remix / Contribute to Wikimedia content
The Wikimedia community maintains millions of articles, media files and data that anybody (including your software) can download, share, modify, remix. We offer a MediaWiki API to interact with this content in the Wikimedia servers. An API that is also available in most MediaWiki based sites.

We welcome projects aiming to get this content to the people that need it most. Projects converting regular users in contributors to the Wikimedia pool of free knowlege. Projects categorizing, connecting or remixing this content and obtaining better or simply unexpected results. Surprise us!

We also welcome improvements to the API, enabling the enablers. Localized errors and warnings, RESTful style Content API, API versioning and Wikidata API are some of the features waiting for a developer in our API roadmap.

Skills: depends on your project, but understanding PHP will be good in any case.

Mentors: Yurik Astrakhan as default. Others might be available depending on your project.

VisualEditor plugins
VisualEditor is a rich visual editor for all users of MediaWiki so they don't have to know wikitext or HTML to contribute well formatted content. It is our top priority and you can already test it at the English Wikipedia. While we focus on the core functionality, you could write a plugin to extend it, for instance with syntax higlighting, or insertion of video or Wikidata content. There are also many possibilities to increase the types of content supported, including sheet music, poems, timelines…

Skills: HTML / JavaScript / jQuery development is required. A good grasp of UX / Web design will make a difference.

Mentors: James Forrester, Roan Kattouw, Trevor Parscal.

Championing i18n
Internationalization (i18n) and localization (L10n) are part of our DNA. The Language team develops features and tools for a huge and diverse community, including 258 Wikipedia projects and 349 MediaWiki localization teams. This is not only about translating texts. Volunteer translators require very specialized tools to support different scripts, input methods, right-to-left languages, grammar...

Below you can find some ideas to help multilinguism and sharing of all the knowledge literally for everybody in their own language.

Tools for mass migration of legacy translated wiki content
The MediaWiki Translate extension has a page translation feature to make the life of translators easier. It allows structured translation of wiki pages separating text strings from formatting or images, and also tracks changes in the source pages (usually in English). You can see it action in this page (click the Edit view). Often, wikis have a lot of legacy content that requires tedious manual conversion to make it translatable. It would be useful to have a tool to facilitate the conversion. You would show the proof of concept in Meta-Wiki, a Wikimedia community looking forward for a project like this.

Skills: PHP, interest in usability and conducting user research.

Mentors: Niklas Laxström, Federico Leva.

jQuery.IME next big release improvements
The JQuery.IME input method editor library is part of the UnversalLanguageSelector extension. It works but after the feedback of the first release we have a good idea of what would need to be done in a next release. The main improvements would be to add an onscreen keyboard feature, support for content editable Divs and better browser compatibility. If you want more, we have more open issues.

Skills: JavaScript, jQuery, CSS, QUnit, browsers.

Mentors: Santhosh Thottingal.

Wikidata features
Wikidata is a free knowledge base that can be read and edited by humans and machines alike. If you understand the difference between plain text and data you will understand that this project is Wikipedia's Game-changer. The conversion from text to Wikidata content fields has started in Wikipedia and sister projects and continues diving deeper, but there is still a lot to do!

The Wikidata team welcomes your suggestions and provides you with some ideas.

Mentors: Wikidata team available. Lydia Pintscher is provisionally acting as proxy.

3rd party client
Currently the Wikidata client is only set up to directly serve data to the Wikimedia projects. The goal of this project is to also allow 3rd party clients to consume Wikidata data in the same way. For example, it is missing propagation of changes to clients out of the Wikimedia cluster, so they would show up in the watchlist and recent changes of the 3rd party MediaWiki sites.

New media types supported in Commons
Wikimedia Commons a database of millions of freely usable media files to which anyone can contribute. The pictures, audio and video files you find in Wikipedia articles are hosted in Commons. Several free media types are already supported but there are more requested by the community, like e.g. X3D for representing 3D computer graphics or KML/KMZ for geographic annotation and visualization. Considerations need to be taken for each format, like security risks or fallback procedures for browsers not supporting these file types.

Skills: PHP at least. Good knowledge of the file type chosen will be more than helpful.

Mentors: Thomas PT and Brian Wolff.

SASS/LESS support for ResourceLoader
ResourceLoader is the delivery system in MediaWiki for the optimized loading and managing of modules consisting of JavaScript, CSS and interface messages. The task is to implement a basic framework (46546) in ResourceLoader for compilation of higher-level CSS languages like SASS and LESS to actual CSS (this may include an equivalent framework for JavaScript). Then, implement at least one actual such conversion, either SASS (46545) or LESS (40964).

Skills: Primarily PHP. Some understanding of CSS may help with debugging.

Mentors: Matt Flaschen.

Automatic category redirects
This is one of the oldest and most voted MediaWiki feature requests. MediaWiki has a feature called redirects where one page can redirect to another. However they do not work for categories. In the ideal system, if Category A redirects to B, and someone puts page foo in category A, then the page should show up in category B. If Someone changes Category A to redirect to Category C, all the pages put in category C have to have their links moved from Category A to Category B.

This project would involve several of the "core" components of core MediaWiki including the, the database schema, and class. However it is quite self contained. This project would also be quite beneficial to several wiki projects, especially multilingual projects like Wikimedia Commons.

Skills: PHP. SQL would be helpful.

Mentors: Brian Wolff.

MediaWiki-Bugzilla extension
Mozilla has developed a MediaWiki-Bugzilla extension which allows for read-only inclusion of Bugzilla lists and charts in MediaWiki pages. For example, it is used to provide "Mentored Bugs" sections. It could also be used for basic prioritization, ToDo lists for projects, metrics... This project consists in enhancing this extension by making API calls more robust (i.e. when receiving 1000s of results), allowing more types of charts, providing good test coverage and user documentation.

Skills: PHP, Javascript, JSON, Bugzilla.

Mentors: Brandon Savage (Mozilla), Sam Reed.

Semantic MediaWiki features
Semantic MediaWiki is a lot more than a MediaWiki extension: it is also a full-fledged framework, in conjunction with many spinoff extensions and it has its own user and developer community. Semantic MediaWiki can turn a wiki into a powerful and flexible collaborative database. All data created within SMW can easily be published via the Semantic Web, allowing other systems to use this data seamlessly.

There are more than 500 SMW-based sites, including wiki.creativecommons.org, docs.webplatform.org, wiki.mozilla.org, wiki.laptop.org and wikitech.wikimedia.org.

Allowing 3rd party wiki editors to run more CSS features
The 3rd party CSS extension allows editors to style wiki pages just by editing them with CSS properties. It could be more powerful if we find a good balance between features and security. Currently this extension relies on basic blacklisting functionality in MediaWiki core to prevent cross-site scripting. It would be great if a proper CSS parser was integrated and a set of whitelists implemented.

Additionally, the current implementation uses data URIs and falls back to JavaScript when the browser doesn't support them. It would be a great improvement if the MediaWikiPerformAction (or similar) hook was used to serve the CSS content instead. This would allow the CSS to be more cleanly cached and reduce or eliminate the need for JavaScript and special CSS escaping.

Skills: PHP, CSS, JavaScript, web application security.

Mentors: Rusty Burchfield.

UploadWizard: OSM map embedding
WLM experience tells us that having a map within the Upload Wizard would simplify the flow of the user greatly. Beyond the contest the map could be used for:
 * making lists of "requested pictures" to prompt after upload in the area "Hey, you're nearby, would you like to go and take a photo of that?" (final step)
 * help categorizing images "Your picture looks to have no metadata. Tell us where you took it clicking on this map", also add on the map thing which already has a picture "Is it one of this?" (optional step)

Skills: PHP, Javascript

Mentors: first mentor TBD, co-mentor avaible from WLM-i team


 * I'll help out with this, but possibly as a co-mentor or in an unofficial capacity :) --MarkTraceur (talk) 22:41, 30 April 2013 (UTC)





Beyond development
Featured projects that focus on technical activities other than software development.

Research & propose a catalog of extensions
Extensions on mediawiki.org are not very well organized and finding the right extension is often difficult. Listening community members you will hear about better management of extension pages with categorization, ratings on code quality, security, usefulness, ease of use, good visibility for good extensions, “Featured extensions”, better exposure and testing of version compatibility... This project is about doing actual research within our community and out there to come up with a proposal both agreed and feasible. A plan that a development team can just take to start the implementation.

Skills: research, negotiation, fluent English writing. Technical background and knowledge of MediaWiki features and web development features will get you sooner to the actual work.

Mentors: Yury Katkov.



Work on outstanding Parsoid bugs and/or add features
Parsoid is an attempt at bringing some sanity to the world of Wikitext. Whenever you edit a wiki page, on this site for example, a PHP script runs through the page multiple times to come up with the new HTML that it generates. The Parsoid project is a single-pass design, which hopefully makes for a much speedier, and reliable, parser.

But we definitely need your help. Learn more about the complexities of Wikitext parsing, and help push forward the new VisualEditor project, by adding things to the new C++ parser, and making things generally work better.

Things you might need to know, but aren't required to: C++, HTML5, node.js, wiki syntax, parser design.

Contact MarkTraceur (talk) 00:15, 15 November 2012 (UTC) for more information, or just check out the project page.

Triaging bug reports
Wikimedia receives many reports about code mistakes (so-called "bugs") and enhancement/feature requests in the public database located at Bugzilla. Some need to be put into the right baskets so developers can find them and some miss enough information or are not in a good shape to be useful. This process is called Triaging. Triaging helps users/reporters, developers, maintainers, and release management to save time and keep an overview which problems are important. You would work with Wikimedia's bugwrangler.

Some helpful characteristics (these are no strict requirements though) probably include: Common sense and structured approach to problems (finding and asking the right questions), likes to test/reproduce weird things in dark dusty corners of software applications, loves details without being pedantic, is well-organized when it comes to sorting and prioritizing large amounts of (bug)mail, is interested in the organization of large projects (many stakeholders, many subprojects), basic understanding of code concepts in general.

For more information, please read the Bug management documentation on our wiki (especially the Triage Guide), "Why everyone needs a bugmaster" and give triaging reports in Wikimedia a try! --Malyacko (talk) 13:34, 20 November 2012 (UTC)

= Raw projects =

Make Wiktionary definitions available via the dict protocol
The dict protocol (RFC 2229) is a widely used protocol for looking up definitions over the Internet. We'd like to make Wiktionary definitions available for users. Doing that using the dict protocol would help drive the use and usefulness of Wiktionary, as well.

Possible users:
 * Tablet readers often have dictionary lookup included.
 * Students writing papers would have access to a large corpus of words.
 * Mobile applications for Wiktionary would be less tied to MediaWiki itself.

MediaWiki Development
If you're a programmer, we have lots of things for you to do. (To do: copy some relevant ideas from http://socialcoding4good.org/organizations/wikimedia )

Article evolution playback tool idea
Minimal idea: a gadget or script that automates hitting the "newer" link when viewing the old revisions of a MediaWiki article. Need to be able to pause. This would be educational and would be great to use at workshops or presentations to show how Wikipedia actually works.

Purposes:
 * curiosity
 * looking up an edit war
 * finding deleted content

Additional feature:
 * Marking of rollbacks and reverts so they can easily be spotted.
 * The vandalism would usually only flash by while the nice versions of the page would last longer.

Difficulties:
 * how to deal with time? Equal amount of time between each diff, or proportional to edit times?  The time could be handled in several ways and it would be nice if the user could select which one to use. Radiobutton for eqaul time or proportional. Input field for total length of show, default to 15 seconds (or perhaps even 1 second per made edit on the article).

Additional features that would be nice:
 * seeing authors' names
 * slider, like Etherpad

(Project idea suggested by Jan Ainali. No mentors currently available.)


 * The approved project m:Grants:IEG/Replay_Edits seems to cover this idea?--Qgil (talk) 22:01, 29 March 2013 (UTC)
 * Both m:Grants:IEG/Replay_Edits and the above idea have a lot of similarities lets collaborate and make it happen --Jeph paul (talk) 08:35, 31 March 2013 (UTC)
 * Another related concept is IBM's History Flow and it's descendant WikiHistoryFlow (which produces an SVG). - Amgine (talk) 03:59, 19 April 2013 (UTC)

[generic] Write useful Lua modules
We're in the process of moving towards a future where complex programming tasks usually dealt with by complex templates are handled in Lua, a friendly scripting language. To that end, it would be great to have someone who spent a lot of time writing useful scripts in Lua and testing them, either on local wikis or on MediaWiki.org.

Things you might need to know, but aren't required to: Lua, advanced wiki syntax (for translating old templates)

This project idea contributed by MarkTraceur (talk) 20:55, 14 November 2012 (UTC) (a mentor)


 * Some useful modules (from what I can think of):
 * Module for listing and searching a page's history and logs, e.g., retrieve revision information, search for all revisions by an author, get all protection logs, etc.
 * Function/module for performing an internal API call and retrieving the results.
 * All I got for now. Parent5446 (talk) 18:23, 18 March 2013 (UTC)

Improving the skinning experience
Research how to make the development of skins for MediaWiki easier. Many users complain about the lack of modern skins for MediaWiki and about having a hard time with skin development and maintenance. Often sys admins keep old versions of MediaWiki due to incompatibility of their skins, which introduces security issues and prevents them from using new features. However, little effort was done to research the exact problem points. The project could include improving skinning documentation,organizing training sprints/sessions, talking to users to identify problems, researching skinning practices in other open source platforms and suggesting an action plan to improve the skinning experience.

Phase out the Vector extension; merge the good parts into core
Vector has outlived its usefulness as an experiment. The good parts should be merged into core MediaWiki; either into the Vector skin, or as core features.

Add low-resolution styles for Vector
Vector is nice for large screens with a lot of space; however, it quickly degrades on smaller resolutions (such as approx. 800 px width, which is common on tablets and smartphones, and sometimes can be seen on desktops too, possibly if the user has multiple browser windows open side-by-side ) and becomes completely unusable on resolutions around 320 px (common in "dumb" mobile phones, a.k.a. feature phones, which are extremely popular in second- and third-world countries). While the MobileFrontend extension has been created to alleviate this issue for Wikimedia wikis, it lacks many crucial features (such as page editing) and may not be appropriate for third-parties.

Implementing separate (or additional) stylesheets for such resolutions using the CSS Media Queries feature, and potentially some cleanup for the existing CSS, seems like a nice project for a few weeks' work.

Extensions
Check Manual:Extensions and extension requests in Bugzilla.

[generic] Create an extension
Creating extensions to MediaWiki is a great way to make it better. It contributes something new and cool to the community, and the Wikimedia sites (including Wikipedia!) might even decide to deploy your software, if it's really neat.

If you have some great idea for a feature that MediaWiki doesn't have, an extension is almost surely the way to work on it. This is a very open-ended project idea. First get an opinion of MediaWiki developers to make sure that the idea makes sense.

If you need ideas, extension requests can be found here and here.

Things you might need to know, depending on the extension you want to write: PHP, JavaScript, jQuery, wiki syntax.

This project idea contributed by MarkTraceur (talk) 22:29, 13 November 2012 (UTC) (a mentor)

Work on backlogged bugs in Extension:UploadWizard
The UploadWizard project is an extension to MediaWiki that focuses on enabling users to more easily upload between 3 and 50 photos at a time. The project is primarily deployed on Commons, and is written mostly in JavaScript.

Things we could work on: Making the interface (even) more friendly, fixing bugs, adding integration with other media-sharing platforms (Flickr was just added, but MediaGoblin or raw URL might be useful), and much much more.

Things you might need to know (but of course aren't required to): JavaScript, jQuery

This project idea contributed by MarkTraceur (talk) 22:29, 13 November 2012 (UTC) (a mentor)

An easy way to share wiki content on social media services
Wikipedia, as well as other wikis based on MediaWiki, provide an easy way to accumulate and document knowledge, but it is difficult to share it on social media. According to https://strategy.wikimedia.org/wiki/Product_Whitepaper 84% of Wikimedia users were Facebook users as well in 2010, with the portion incresing from previous years. The situation is probably similar with other social media sites. It only makes sense to have an effective "bridge" between MediaWiki and popular social media site. More details here.

Some previous work you can use as a base, improve, or learn from:

Extension:Widgets

Extension:WidgetsFramework - experimental extension

Extension:AddThis

Extension:Facebook - just Facebook

Extension:WikiShare - unstable version, seems like it's not worked on any more

Write an extension to support XML Sitemaps without using command line
Sitemaps are files that make it more efficient for search engine robots (like googlebot) to crawl a website (so long as the bot supports the sitemap protocol.) Manual:GenerateSitemap.php describes the common way of generating XML Sitemaps. Write an extension, which allows users to generate Sitemaps on a given schedule without using command line.

Extension:OEmbedProvider
Finish Extension:OEmbedProvider, as proposed here. See also Bug 43436 - Implement Twitter Cards

Leap Motion integration with MediaWiki
MediaWiki has a wide user base and a lot of users today prefer touch based interfaces. Gesture based interface are friendly and the latest trend. Leap Motion provides controllers that can recognize gestures. It can be integrated with MediaWiki products like Wikisource. As an example, this would make it more friendly for users to flip through pages in a book. Another advantage of using gesture recognition would be to include turning through multiple chapters or pages at a time by identifying the depth of user's finger's motion.

It would also be helpful for flipping through images in Wikimedia Commons.

(Project idea suggested by Aarti Dwivedi).

Work on RefToolbar
The en:Wikipedia:RefToolbar/2.0 extension is incredibly useful, especially for new editors but also for experienced editors (I use it every day, and I've got a few miles under my belt!). But it suffers from bugs and problems, and there are a lot of improvements that could be made. For instance: adding additional reference types, adding fields for multiple authors, tool-tip help guidance, etc. I also suspect it will need an upgrade to match Lua conversions of common cite templates. Also, I don't think this is in wide deployment on other wikis, so translation/deployment could be a project. Looking at the talk page, there are a couple people starting to work on this but serious development isn't happening (so I'm not sure who would mentor this) but the code was recently made accessible. At any rate, it is an extension that really needs some work and where improvements would have immediate benefit for many editors.

Project idea contributed by Phoebe (talk) 23:23, 22 March 2013 (UTC) [n.b.: I can't mentor on the tech side, but can give guidance on the ins and outs of various citation formats in the real world & how cite templates are used on WP].

See

Wikimedia Commons / multimedia
Sébastien Santoro (Dereckson) can mentor these projects idea.

Multilingual, usable and effective captchas
This project is very ambitious and challenging. Current CAPTCHAs are mostly broken, and still they are important to guard web sites like Wikipedia from a lot of spam. Risk of failure is high, but when it succeeds, the rewards may be huge.

This project has a large research, design and user test component. The student will research and assess ways to use different CAPTCHA options, designed for multilingualism, to identify a more effective CAPTCHA than the current implementation used by Wikimedia. The student will create an implementation for use in MediaWiki of the identified CAPTCHA method. See related bug 32695. Some prototypes have been designed a while ago.

Mentors: Siebrand Mazeland and Pau Giner

Skills: Design, JavaScript and PHP.

MediaWiki LocalisationUpdate for all
There is the LocalisationUpdate extension. But only few people use it. It is slow and needs a special configuration with cron. If we could integrate it into core, make it fast enough so that cron would not be needed it would allow a lot of third parties to enjoy the blazingly fast localisation updates (under 36 hours) that Wikimedia projects currently have. To make it fast enough, it is likely that a separate service needs to be implemented. It could be standalone or part of some MediaWiki instance. It should be secure and allow querying only needed data.

Mentors: Niklas Laxström

Skills: PHP, web protocols and security

Multilingual SemanticMediaWiki
Semantic MediaWiki would benefit from being multilingual-capable out of the box. We could integrate it with the Translate extension. This can be done in some isolate steps. Some of the steps could be:


 * Fix the issues that prevent full localisation of semantic forms.
 * Enhance Special:CreateForm and friends to create forms that are already i18ned with placeholders and message group for Translate extension.
 * Make it possible to define translation for properties and create a message group for Translate extension
 * There are lot of places where properties are displayed: many special pages, queries, property pages. Some thinking is required to find out a sensible way to handle translations on all these places.

Skills: PHP and web frontend, has used SemanticMediaWiki and SemanticForms is a plus.

Mentors: Niklas Laxström (with yet unknown co-mentor from SMW).

Improve Extension:WebFonts or Extension:UniversalLanguageSelector for Chinese (or CJK) wikis
Chinese uses too many characters, and many are rarely used so it's not often installed on readers' systems. However including all of them in the font file makes it huge, so we may want to tailor the font file for every page based on characters used on that page.

As of writing, there isn't any "good" enough free font which includes all Chinese characters in Unicode and the "wiki" concept itself encourages collaborative content creation, so it would be nice to invite user to create a glyph for it when the system sees a character without existing data (remember we need free contents). en:WenQuanYi and glyphwiki.org already have some online glyph creators which can be useful for us.

Maybe we can donate glyphs created by wiki users to other projects, but we have to make sure our data meet their quality standards...

Skills: PHP, Web frontend, Font creation and management. Some knowledge of CJK characters can be a plus.

Contributed by: User:Liangent

Merge proofread text back into Djvu files
Wikisource, the free library, has an enormous collection of Djvu files and proofread texts based on those scans. However, while the DjVu files contain a text layer, this text is the original computer generated (OCR) text and not the volunteer-proofread text. There is some previous work about merging the proofread text as a blob into pages, and also about finding similar words to be used as anchors for text re-mapping. The idea is to create an export tool that will get word positions and confidence levels using Tesseract and then re-map the text layer back into the DjVu file. If possible, word coordinates should be kept.
 * Project proposed by Micru. I have found an external mentor that could give a hand on Tesseract, now I'm looking for a mentor that would provide assistance on Mediawiki.

Sysadmin
You're amazing if you want to help run our huge infrastructure. We have some ideas.

Debianize, puppetize, and deploy Etherpad Lite
Etherpad Lite is a complete overhaul of the old Etherpad system of yore. While great, and free software, Etherpad "Classic" is about 10 times as heavy as Etherpad Lite. We would really love to use the new version as our primary way of collaborating in real-time, but there are a bunch of things that need to be done first. We need to make sure a Debian package is available, so we can run it on our servers. We also need to make sure that we can do proper load balancing on it, which can be complicated with Etherpad Lite. Then, we need to write a Puppet manifest and actually do some deploys of it, to make sure everything goes all right.

Things you might need to know: Puppet, Debian packaging, command line.

This project was suggested by MarkTraceur (talk) 01:42, 14 November 2012 (UTC) (a mentor)

Implementing volunteer testing tracking framework
Wikimedia frequently deploys changes to software. It is always useful to test features as early and thoroughly before deployment. Currently Wikimedia doesn't have a proper process to communicate with volunteer testers and invite them to test features. Sometimes the wikitech-ambassadors list is used, sometimes new features run in beta and volunteers are invited to write up their experiences on a talk page somewhere, but very frequently features are not announced at all. The situation is complicated by the fact that the different Wikimedia sites work in almost 300 languages with different fonts, different string lenghts, different templates, different extensions, different CSS etc.

One way to solve this is to develop some tools and procedures to communicate with prospective volunteer testers and to collect feedback from them, both positive and negative. It can be a simple form that says: feature x, languages XX, OK/FAIL. See an example from Fedora here: QA-L10N:nautilus test day. In Fedora, the technical side of things is actually just a MediaWiki table. We could just use that, or we could do something even better: maybe a MediaWiki extension, or maybe even some non-MediaWiki-based technology.

In any case, an easy-to-understand workflow would be very important, even if the technical tools are good, and writing these tools and procedures would be a very useful contribution to the MediaWiki developers' and users' community. --Amir E. Aharoni (talk) 19:36, 14 November 2012 (UTC)

System documentation integrated in source code
It would be really nice if inline comments, README files, and special documentation files could exist in the source code but be exported into a formatted, navigable system (maybe wiki pages or maybe something else). It could be something like doxygen, accept better and orientated to admins and not developers. Of course it should integrate with mediawiki.org and http://svn.wikimedia.org/doc.

The idea would be that one could:
 * Keep documentation close to the code and thus far more up to date
 * Even enforce documentation updates to it with new commits sometimes
 * Reduce the tedium of making documentation by using minimal markup to specify tables, lists, hierarchy, and so on, and let a tool deal with generating the html (or wikitext). This could allow for a more consistent appearance to documentation.
 * When things are removed from the code (along with the docs in the repo), if mw.org pages are used, they can be tagged with warning box and be placed in maintenance category.

Proposed by Aaron Schulz.