Outreach programs/Possible projects

We are using this list of projects as a master branch for Mentorship programs such as Google Summer of Code and Outreach Program for Women. The projects listed are good for students and first time contributors but they require a good amount of work. They might also be good candidates for Individual Engagement Grants.


 * Featured project ideas usually have mentors ready for you to jump in.
 * Raw projects are interesting ideas that have been proposed but might lack definition, consensus or mentors, and therefore we can't feature them.

If you are looking for smaller tasks check the Annoying little bugs. For a more generic introduction check How to contribute.



Be part of something big
We believe that knowledge should be free for every human being. We prioritize efforts that empower disadvantaged and underrepresented communities, and that help overcome barriers to participation. We believe in mass collaboration, diversity and consensus building to achieve our goals.

Wikipedia has become the fifth most-visited site in the world, used by more than 400 million people every month in more than 270 languages. We have other content projects including Wikimedia Commons, Wikidata and the most recent one, Wikivoyage. We also maintain the MediaWiki engine and a wide collection of open source software projects around it.

But there is much more we can do: stabilize infrastructure, increase participation, improve quality, increase reach, encourage innovation.

You can help to these goals in many ways. Below you have some selected ideas.

Where to start
Maybe at this point your proposal is just a vague idea and you want to get some feedback before investing much more time planning it? We know this feeling very well! Just send an email to wikitech-l (or qgil@undefinedwikimedia.org if you prefer) sharing what you have in mind. One short paragraph can be enough to get back to you and help you working in the right direction.

Any potential contributor new to our community is encouraged to follow the Landing instructions. Use your user page to introduce yourself and draft your project (use the template). The GSOC student guide is a good resource for anybody willing to write a good project proposal. And then there is a list of DOs and DON'Ts full of practical wisdom.

To set up your MediaWiki developer environment, we recommend you start installing a local instance using mediawiki-vagrant. You can also have a fresh MediaWiki to test on a remote server. Just get developer access and request your own instance at Wikitech.

If you have general questions you can start asking at the |Discussion page. IRC channel is also a good place to find people and answers. We do our best connecting project proposals with Bugzilla reports and/or wiki pages. Other contributors may watch/subscribe to those pages and contribute ideas to them. If you can't find answers to your questions, ask first in those pages. If this doesn't work then go ahead and post your question to the wikitech-l mailing list.

Featured project ideas
Below you can find a list of ideas that already have gone through a reality check and have mentors confirmed. You can find more suggestions in our list of Raw projects.

But before, let us talk about...

Your project
That's right! If you have a project in mind we want to hear about it. We can help you assessing its feasibility and we will do our best finding a mentor for it.

Here you have some guidelines for project ideas:


 * Opportunity: YES to projects responding to generic or specific needs. YES to provocative ideas. NO to trivial variations of existing features.
 * Community: YES to projects encouraging community involvement and maintenance. NO to projects done in a closet that won't survive without you.
 * Deployment: YES to projects that you can deploy. YES to projects where you are in sync with the maintainers. NO to projects depending on unconvinced maintainers.
 * MediaWiki != Wikipedia: YES to generic MediaWiki projects. YES to projects already backed by a Wikimedia community. NO to projects requiring Wikipedia to be convinced.
 * Free content: YES to use, remix and contribute Wikimedia content. Yes to any content with free license. NO to proprietary content.
 * Free API: YES to the MediaWiki API. Yes to any APIs powered with free software. NO to proprietary APIs.



Parsoid
The Parsoid project is developing a wiki runtime which can translate back and forth between MediaWiki's wikitext syntax and an equivalent HTML / RDFa document model with better support for automated processing and visual editing. It powers the VisualEditor project, Flow and semantic HTML exports.

Cassandra backend for distributed round-trip test server
Our distributed round-trip test setup thoroughly tests Parsoid by converting 160000 Wikipedia articles from wikitext to HTML and back. Currently all result data is stored in MySQL, which does not deal very well with the large amount of data we throw at it. This project will address this by building a Cassandra backend for the round-trip test server. This will involve working with node.js and Cassandra. A candidate will ideally have at least some node.js experience, and is interested in distributed systems and storage.

Mentors: Gabriel Wicke, Marc Ordinas i Llopis

Improve round-trip test server web UI
Our distributed round-trip test setup has a very simple UI, product of the sedimentary aggregation of features over time. Moreover, the HTML production is done directly in the code, using simple string concatenation. This project would improve that on two fronts: The project involves working on node.js. A candidate will ideally have some notions of web design, data visualization and javascript programming.
 * Separate the code from the UI using either a templating system or some other form of structured HTML generation.
 * Improve the UI making it more visually appealing and easier to understand at a glance.

Mentors: Marc Ordinas i Llopis, Subramanya Sastry

Parser migration tool
Periodically, we come across some bit of wikitext markup we'd like to deprecate. See Parsoid/limitations, Parsoid/Broken wikitext tar pit, and (historically) MNPP for examples. We'd like to have a real slick tool to enhance communication with WP editors about these issues:
 * It would display a list of wiki titles (filtered by wikipedia project) which contain deprecated wikitext. Each title would link to a page which would briefly describe the problem(s), general advice on how the wikitext should be rewritten, and (perhaps) some previously-corrected pages for editors to look at.
 * Ideally this would be integrated with a wiki workflow and/or contain "revision tested" information so that editors can 'claim' pages from the list to fix and don't step on each others work. Fixed/revised pages would be removed from the list until their new contents could be rechecked.
 * It should be as easy as possible for Parsoid developers to add new "bad" pattern tests to the tool. These would get added to the testing, with appropriate documentation of the problem, so that editors don't have to learn about a new tool/site for every broken pattern.
 * Some of these broken bits of wikitext might be able to be corrected by bot. The tool could still create a tasklist for the bot and collect and display the bots' fixes for editors to review.
 * The backend which looks for broken wikitext could be based on the existing round-trip test server. Instead of repeatedly collecting statistics on a subset of pages, however, it would work its way through the entire wikipedia project looking for broken wikitext (and preventing regressions).
 * Some cleverness might be helpful to properly attribute bad wikitext to a template rather than the page containing the template. This is probably optional; editors can figure out what's going on if they need to.

This project involves working on node.js, and probably MediaWiki bots and/or extensions as well. A candidate will ideally have some node.js experiences and some notions of web and UX design. This task could be broken into parts, if a candidate wants to work only on the front-end or back-end portions of the tool.

Mentors: C. Scott Ananian, Subramanya Sastry

Clean up tracing/debugging/logging inside Parsoid
Over time, Parsoid has accumulated a whole bunch of different tracing/debugging abilities and features based on what different developers wanted during debugging or found it worth their time to add. The debugging tips page provides some information about what we currently have in Parsoid. We are proposing a project to do the following: This task is not quite glamorous, but essential and will make it much easier for new developers to approach our code and work on it with better confidence that they can more easily debug problems on their own. There are several open source event monitoring tools which might be integrated for a bit of extra sexiness/reward, leveraging the trace infrastructure to monitor performance of the production code.
 * Clean up outdated / useless tracing functionality and clean out code clutter
 * Improve usability of existing tracing features (by ensuring there is sufficient information in the output, able to disambiguate between trace output from different instances of the same class, etc.)
 * Add trace output for transformations that are currently missing them (Ex: transformer that converts wikitext-quote tokens to HTML  and  tags)
 * Maybe cleanup the setup/configuration of tracing / debugging actions across the codebase
 * Make trace output more readable (Ex: --trace wts output is a little hard to read)

Mentors: Subramanya Sastry, Arlo Breault

New media types supported in Commons
Wikimedia Commons a database of millions of freely usable media files to which anyone can contribute. The pictures, audio and video files you find in Wikipedia articles are hosted in Commons. Several free media types are already supported but there are more requested by the community, like e.g. X3D for representing 3D computer graphics or KML/KMZ for geographic annotation and visualization. Considerations need to be taken for each format, like security risks or fallback procedures for browsers not supporting these file types.

Skills: PHP at least. Good knowledge of the file type chosen will be more than helpful.

Mentors: Bryan Davis and ?.

Allowing 3rd party wiki editors to run more CSS features
The 3rd party CSS extension allows editors to style wiki pages just by editing them with CSS properties. It could be more powerful if we find a good balance between features and security. Currently this extension relies on basic blacklisting functionality in MediaWiki core to prevent cross-site scripting. It would be great if a proper CSS parser was integrated and a set of whitelists implemented.

Additionally, the current implementation uses data URIs and falls back to JavaScript when the browser doesn't support them. It would be a great improvement if the MediaWikiPerformAction (or similar) hook was used to serve the CSS content instead. This would allow the CSS to be more cleanly cached and reduce or eliminate the need for JavaScript and special CSS escaping.

Skills: PHP, CSS, JavaScript, web application security.

Mentors: Rusty Burchfield and ?.

UploadWizard: OSM map embedding
Wiki Loves Monuments experience tells us that having a map within the Upload Wizard would simplify the flow of the user greatly. Beyond the contest the map could be used for:
 * making lists of "requested pictures" to prompt after upload in the area "Hey, you're nearby, would you like to go and take a photo of that?" (final step)
 * help categorizing images "Your picture looks to have no metadata. Tell us where you took it clicking on this map", also add on the map thing which already has a picture "Is it one of this?" (optional step)

Skills: PHP, Javascript

Mentors: Gergő Tisza and ? (proposed by CristianCantoro).

Flow conversation animations
Flow brings a modern discussion and collaboration system for all Wikimedia projects. Subtle animations can greatly improve the user experience of navigating, collapsing, and participating in large conversations. This project involves working the Design and Flow engineers to add animations to the extension code while ensuring browser compatibility and graceful fallback.

Examples where animations will be applied to: Button states, other button transition, call-out entrance, page transitions, search bar expansion, etc.

Skills: CSS and JS/jQuery animations, browser testing

Mentors: and

Flow Right-To-Left language support
Currently, Flow is designed for left-to-right layouts only. However, MediaWiki includes substantial generic support for layouts which can be displayed in either left-to-right or right-to-left orientations. This project involves working with the Design team and Flow engineers to ensure that readers and contributors who use languages written with a right-to-left orthography can successfully use Flow in their own language and in a layout that they are comfortable with.

Skills: General front-end CSS, experience with RTL languages a big plus.

Mentors: S Page and Werdna

Complete the MediaWiki development course at Codecademy
We started writing a [ http://www.codecademy.com/courses/web-intermediate-en-BLea4/0/4 MediaWiki development course at Codecademy] as a pet project. We need a tech writer willing to complete it and, if time permits, expand it. The code examples would be synced with the ones used in mediawiki.org, either copying existing examples or writing new ones. In general, the content of the course would need to be available in Codecademy and mediawiki.org.

Skills: PHP, Javascript, and technical English.

Mentors: Yurik Astrakhan and ?.  

mediawiki.org homepage redesign
The mediawiki.org Main page needs a redesign to reflect better our project an the activities done by our community. There have been some proposals to improve it, but not much progress has been done. This project requires to analyze the current problems, propose a solution and implement it, being all these steps done with the participation of the community.

Skills: A good basis of UX and web design is required. HTML/CSS/JS, plus an ability to learn about Wikitext and templates in MediaWiki. You will also need good online communication and collaboration skills.

Mentors: (design process) and  (community process). 

= Raw projects =

VisualEditor plugins
VisualEditor is a rich visual editor for all users of MediaWiki so they don't have to know wikitext or HTML to contribute well formatted content. It is our top priority and you can already test it on the English Wikipedia. While we focus on the core functionality, you could write a plugin to extend it, such as to insert or modify Wikidata content. There are also many possibilities to increase the types of content supported, including sheet music, poems, timelines…

Skills: HTML / JavaScript / jQuery development is required. A good grasp of UX / Web design will make a difference.

Mentors: James Forrester, Roan Kattouw, Trevor Parscal.

Wikidata features
Wikidata is a free knowledge base that can be read and edited by humans and machines alike. If you understand the difference between plain text and data you will understand that this project is Wikipedia's Game-changer. The conversion from text to Wikidata content fields has started in Wikipedia and sister projects and continues diving deeper, but there is still a lot to do!

The Wikidata team welcomes your suggestions and provides you with some ideas.

Mentors: Wikidata team available. Lydia Pintscher is provisionally acting as proxy.

3rd party client
Currently the Wikidata client is only set up to directly serve data to the Wikimedia projects. The goal of this project is to also allow 3rd party clients to consume Wikidata data in the same way. For example, it is missing propagation of changes to clients out of the Wikimedia cluster, so they would show up in the watchlist and recent changes of the 3rd party MediaWiki sites.

Semantic MediaWiki features
Semantic MediaWiki is a lot more than a MediaWiki extension: it is also a full-fledged framework, in conjunction with many spinoff extensions and it has its own user and developer community. Semantic MediaWiki can turn a wiki into a powerful and flexible collaborative database. All data created within SMW can easily be published via the Semantic Web, allowing other systems to use this data seamlessly.

There are more than 500 SMW-based sites, including wiki.creativecommons.org, docs.webplatform.org, wiki.mozilla.org, wiki.laptop.org and wikitech.wikimedia.org.

Flow Edit Filter integration
Wikimedia projects currently use a system that allows editors to define filters which will be applied to all actions taking place on the site, in order to mitigate spam and vandalism. Flow, our discussion system in development, currently does not allow such filters to process actions that affect Flow's content. This will be a prerequisite for wider deployments on Wikimedia sites. This project involves determining the types of Flow actions that will need to be targeted by edit filters, and implementing the required integration.

Skills: PHP

Mentors: S Page and Werdna

Make Wiktionary definitions available via the dict protocol
The dict protocol (RFC 2229) is a widely used protocol for looking up definitions over the Internet. We'd like to make Wiktionary definitions available for users. Doing that using the dict protocol would help drive the use and usefulness of Wiktionary, as well.

Possible users:
 * Tablet readers often have dictionary lookup included.
 * Students writing papers would have access to a large corpus of words.
 * Mobile applications for Wiktionary would be less tied to MediaWiki itself.

MediaWiki development
If you're a programmer, we have lots of things for you to do. (To do: copy some relevant ideas from http://socialcoding4good.org/organizations/wikimedia )

Templates for new MediaWiki installs
Anyone setting up a MediaWiki wiki, wanting to use templates, has to export/import them from another wiki (e.g. Wikipedia) or re-write and maintain some similar ones. This can create a confusing mess. It would be very useful to have a simpler way to install an up-to-date package of standard templates.

Some of the problems encountered with setting up templates are discussed at Help talk:Templates

Rob Kam (talk) 06:30, 7 October 2013 (UTC)

Improving the skinning experience
Research how to make the development of skins for MediaWiki easier. Many users complain about the lack of modern skins for MediaWiki and about having a hard time with skin development and maintenance. Often sys admins keep old versions of MediaWiki due to incompatibility of their skins, which introduces security issues and prevents them from using new features. However, little effort was done to research the exact problem points. The project could include improving skinning documentation,organizing training sprints/sessions, talking to users to identify problems, researching skinning practices in other open source platforms and suggesting an action plan to improve the skinning experience.

Add low-resolution styles for Vector
Vector is nice for large screens with a lot of space; however, it quickly degrades on smaller resolutions (such as approx. 800 px width, which is common on tablets and smartphones, and sometimes can be seen on desktops too, possibly if the user has multiple browser windows open side-by-side ) and becomes completely unusable on resolutions around 320 px (common in "dumb" mobile phones, a.k.a. feature phones, which are extremely popular in second- and third-world countries). While the MobileFrontend extension has been created to alleviate this issue for Wikimedia wikis, it lacks many crucial features (such as page editing) and may not be appropriate for third-parties.

Implementing separate (or additional) stylesheets for such resolutions using the CSS Media Queries feature, and potentially some cleanup for the existing CSS, seems like a nice project for a few weeks' work.

Extensions
Check Manual:Extensions and extension requests in Bugzilla.

An easy way to share wiki content on social media services
Wikipedia, as well as other wikis based on MediaWiki, provide an easy way to accumulate and document knowledge, but it is difficult to share it on social media. According to https://strategy.wikimedia.org/wiki/Product_Whitepaper 84% of Wikimedia users were Facebook users as well in 2010, with the portion incresing from previous years. The situation is probably similar with other social media sites. It only makes sense to have an effective "bridge" between MediaWiki and popular social media site. More details here.

Some previous work you can use as a base, improve, or learn from:

Extension:Widgets

Extension:WidgetsFramework - experimental extension

Extension:AddThis

Extension:Facebook - just Facebook

Extension:WikiShare - unstable version, seems like it's not worked on any more

Extension:OEmbedProvider
Finish Extension:OEmbedProvider, as proposed here. See also Bug 43436 - Implement Twitter Cards

Leap Motion integration with MediaWiki
MediaWiki has a wide user base and a lot of users today prefer touch based interfaces. Gesture based interface are friendly and the latest trend. Leap Motion provides controllers that can recognize gestures. It can be integrated with MediaWiki products like Wikisource. As an example, this would make it more friendly for users to flip through pages in a book. Another advantage of using gesture recognition would be to include turning through multiple chapters or pages at a time by identifying the depth of user's finger's motion.

It would also be helpful for flipping through images in Wikimedia Commons.

(Project idea suggested by Aarti Dwivedi).

Work on RefToolbar
The en:Wikipedia:RefToolbar/2.0 extension is incredibly useful, especially for new editors but also for experienced editors (I use it every day, and I've got a few miles under my belt!). But it suffers from bugs and problems, and there are a lot of improvements that could be made. For instance: adding additional reference types, adding fields for multiple authors, tool-tip help guidance, etc. I also suspect it will need an upgrade to match Lua conversions of common cite templates. Also, I don't think this is in wide deployment on other wikis, so translation/deployment could be a project. Looking at the talk page, there are a couple people starting to work on this but serious development isn't happening (so I'm not sure who would mentor this) but the code was recently made accessible. At any rate, it is an extension that really needs some work and where improvements would have immediate benefit for many editors.

Project idea contributed by Phoebe (talk) 23:23, 22 March 2013 (UTC) [n.b.: I can't mentor on the tech side, but can give guidance on the ins and outs of various citation formats in the real world & how cite templates are used on WP].

See

Squash bugs and add features to Extension:Education_Program
The Education Program extension lets users organize classroom editing assignments, and can also be used to organize and track participation in outreach events such as edit-a-thons. It's in use on English Wikipedia and a growing number of other wikis. There are lots of bugs both small and large, as well as some interesting features that could be added.
 * bugs
 * features wishlist and ideas

Sage Ross (not a developer, but familiar with extension code and the use of it, as well as the basics of MediaWiki development process) can provide some mentorship.

Global, better URL to citation conversion functionality
Suppose, in Wikipedia, all that needed to be done to generate a perfect citation was to provide a URL? That would be a tremendous step toward getting a much higher percentage of text in Wikipedia articles to be supported by inline citations.

There are already expanders (for the English Wikipedia, at least) that will convert an ISBN, DOI, or PMID, supplied by an editor, into a full, correct citation (footnote). These are in the process of being incorporated into the reference dialog of the VisualEditor extension, making it almost trivial (two clicks, paste, two clicks) to insert a reference.

For web pages, however, the existing functionality seems to be limited to a Firefox add-on. Its limits, besides the obvious requirement to use that browser (and to install the add-on), include an inability to extract the author and date from even the most standard pages (e.g., New York Times), and the lack of integration with MediaWiki.

For a similar approach, using a different plug-in/program, see this Wikipedia page about Zotero.

A full URL-to-citation engine would use the existing Cite4Wiki (Firefox add-on) code, perhaps, plus (unless these exist elsewhere) source-specific parameter specifications. For example, the NYT uses "" for its author information; that format would be known by the engine (via a specifications database). Each Wikipedia community would be responsible for coding these (except for a small starter set, as examples), in the way that communities are responsible for TemplateData for the new VisualEditor extension.

(Project idea suggested by John Broughton.)

Wikimedia Commons / multimedia
Sébastien Santoro (Dereckson) can mentor these projects idea.

Support for text/syntax/markup driven or WYSIWYG editable charts, diagrams, graphs, flowcharts etc.
Resucitate Extension:WikiTeX and fold Extension:WikiTex into it.

VisualEditor support for EasyTimeline
Also mentioned at.

Internationalization and localization
Internationalization (i18n) and localization (L10n) are part of our DNA. The Language team develops features and tools for a huge and diverse community, including 258 Wikipedia projects and 349 MediaWiki localization teams. This is not only about translating texts. Volunteer translators require very specialized tools to support different scripts, input methods, right-to-left languages, grammar...

Below you can find some ideas to help multilinguism and sharing of all the knowledge literally for everybody in their own language.

Tools for mass migration of legacy translated wiki content
The MediaWiki Translate extension has a page translation feature to make the life of translators easier. It allows structured translation of wiki pages separating text strings from formatting or images, and also tracks changes in the source pages (usually in English). You can see it action in this page (click the Edit view). Often, wikis have a lot of legacy content that requires tedious manual conversion to make it translatable. It would be useful to have a tool to facilitate the conversion. You would show the proof of concept in Meta-Wiki, a Wikimedia community looking forward for a project like this.

Skills: PHP, interest in usability and conducting user research.

Mentors: Niklas Laxström, Federico Leva.

Multilingual, usable and effective captchas
This project is very ambitious and challenging. Current CAPTCHAs are mostly broken, and still they are important to guard web sites like Wikipedia from a lot of spam. Risk of failure is high, but when it succeeds, the rewards may be huge.

This project has a large research, design and user test component. The student will research and assess ways to use different CAPTCHA options, designed for multilingualism, to identify a more effective CAPTCHA than the current implementation used by Wikimedia. The student will create an implementation for use in MediaWiki of the identified CAPTCHA method. See related bug 32695. Some prototypes have been designed a while ago.

Mentors: Siebrand Mazeland and Pau Giner

Skills: Design, JavaScript and PHP.

MediaWiki LocalisationUpdate for all
There is the LocalisationUpdate extension. But only few people use it. It is slow and needs a special configuration with cron. If we could integrate it into core, make it fast enough so that cron would not be needed it would allow a lot of third parties to enjoy the blazingly fast localisation updates (under 36 hours) that Wikimedia projects currently have. To make it fast enough, it is likely that a separate service needs to be implemented. It could be standalone or part of some MediaWiki instance. It should be secure and allow querying only needed data.

Mentors: Niklas Laxström

Skills: PHP, web protocols and security

Multilingual SemanticMediaWiki
Semantic MediaWiki would benefit from being multilingual-capable out of the box. We could integrate it with the Translate extension. This can be done in some isolate steps. Some of the steps could be:


 * Fix the issues that prevent full localisation of semantic forms.
 * Enhance Special:CreateForm and friends to create forms that are already i18ned with placeholders and message group for Translate extension.
 * Make it possible to define translation for properties and create a message group for Translate extension
 * There are lot of places where properties are displayed: many special pages, queries, property pages. Some thinking is required to find out a sensible way to handle translations on all these places.

Skills: PHP and web frontend, has used SemanticMediaWiki and SemanticForms is a plus.

Mentors: Niklas Laxström (with yet unknown co-mentor from SMW).

Improve Extension:WebFonts or Extension:UniversalLanguageSelector for Chinese (or CJK) wikis
Chinese uses too many characters, and many are rarely used so it's not often installed on readers' systems. However including all of them in the font file makes it huge, so we may want to tailor the font file for every page based on characters used on that page.

As of writing, there isn't any "good" enough free font which includes all Chinese characters in Unicode and the "wiki" concept itself encourages collaborative content creation, so it would be nice to invite user to create a glyph for it when the system sees a character without existing data (remember we need free contents). en:WenQuanYi and glyphwiki.org already have some online glyph creators which can be useful for us.

[Update: Hanazono (with Japanese glyph) is (almost?) complete, but the size issue mentioned in the first paragraph still applies.]

Maybe we can donate glyphs created by wiki users to other projects, but we have to make sure our data meet their quality standards...

Skills: PHP, Web frontend, Font creation and management. Some knowledge of CJK characters can be a plus.

Contributed by: User:Liangent

Compact interlanguage links as a beta feature
With the current list of interlanguage links, users have to process a long list of languages looking for their languages of interest time after time. We can make the language list shorter by including only the languages which are relevant to the user. The feature provides a short list of languages to anticipate user selection. Based on previous selections, user location and browser preferences, we can surface the languages of interest for the user without the need of additional configuration. In addition, it allows users to easily select the rest of languages for which content is available with a searchable list of languages from the Universal Language Selector.


 * Links:
 * Info | Discussion
 * Prototype showing the idea.
 * Usability testing recordings. To see how users will switch languages.
 * Universal Language Selector. Extension that provides already most of the needed pieces (language selection list, cross-language search, and identification of likely languages).


 * Team: Pau Giner (design)

Merge proofread text back into Djvu files
Wikisource, the free library, has an enormous collection of Djvu files and proofread texts based on those scans. However, while the DjVu files contain a text layer, this text is the original computer generated (OCR) text and not the volunteer-proofread text. There is some previous work about merging the proofread text as a blob into pages, and also about finding similar words to be used as anchors for text re-mapping. The idea is to create an export tool that will get word positions and confidence levels using Tesseract and then re-map the text layer back into the DjVu file. If possible, word coordinates should be kept.
 * Project proposed by Micru. I have found an external mentor that could give a hand on Tesseract, now I'm looking for a mentor that would provide assistance on Mediawiki.
 * Aubrey can be a mentor providing assistance regarding Wikisource, and some past history of this issue. Not much, but glad to help if needed.
 * Rtdwivedi is willing to be a mentor.

Mobile reporting platform
Wikinews encourages its journalist to engage in original reporting. Timeliness of contributing it very important, and being able to do reporting from the field can be essential for larger events such the Paralympic Games, World Championships, Olympic Games, and Commonwealth Games. Creating a reporting toolkit that could allow reporting from a tablet device would be extremely useful. Some of these tasks are already built into other projects, such as commons. Tasks that would be nice for a mobile reporting tool for reporters to do (though not all required but would be nice to have):
 * Recording audio files in ogg or converting audio files to an ogg format. Being able to chose whether to upload these files to a local Wikinews project or to Commons under a compatible license.
 * Transcription of audio files into text format.
 * Recording video files in ogv or converting video files to an ogv format. Being able to chose whether to upload these files to a local Wikinews project or to Commons under a compatible license.
 * Being able to upload pictures to a local Wikinews project or to Commons under a compatible license.  Automatically create gallery code with the Wikinews photographer credited in compliance with local Wikinews policies for photo essays.
 * Being able to type notes that can be automatically shared on the talk page of the associated Wikinews article being developed.
 * Being able to import e-mails that are used for reporting verification in PDF format to the program that can be uploaded locally to Wikinews or can be sent to scoop@wikinewsie.org. Have an automatic message posted to the talk page of the associated Wikinews article being developed explaining were the verifying information can be found.
 * Allowing reporters to sort audio, video, image files, typed reporter notes, e-mailed information by reporting event for ease in use.
 * Automate part of the process for creating an article on Wikinews by pulling in the default draft template, autogenerating category for the reporter, putting in the original reporting template, pulling in the location from the device to say where the reporting is being submitted from, then querying reporter about topic for appropriate infobox and categories for the article.

Laura Hale and pi zero, both on the provisional board of The Wikinewsie Group would be keen to act as mentors for how to implement this from a community perspective. Some limited technical assistance may be available. The tool may also be useful for educational outreach programs and some Commons uploading activities.

Mobile viewing application
A mobile web application would be nice to have for Wikinews that could be downloaded from Apple's App Store or Android's App Store.

Laura Hale and pi zero, both on the provisional board of The Wikinewsie Group would be keen to act as mentors for how to implement this from a community perspective. Some limited technical assistance may be available. The tool may also be useful for educational outreach programs and some Commons uploading activities.

Automated news translation
There have been several discussions about implementing machine assisted translation tools on Wikipedia. Wikinews would strongly benefit from this too, because it would make reporting more cost effective by bringing the total number of stories generated to a higher number. Most projects have their own review process before publishing and their own local style guides. Some of the localization information is available on English Wikinews. There is no preference for the technical backend of how automated news translation is done, nor any specific requirements for the amount of human assistance needed to insure article publication. The community is willing to discuss any potential experiments in implementation.

Laura Hale and pi zero, both on the provisional board of The Wikinewsie Group would be keen to act as mentors for how to implement this from a community perspective. Some limited technical assistance may be available. The tool may also be useful for educational outreach programs and some Commons uploading activities.

Implementing volunteer testing tracking framework
Wikimedia frequently deploys changes to software. It is always useful to test features as early and thoroughly before deployment. Currently Wikimedia doesn't have a proper process to communicate with volunteer testers and invite them to test features. Sometimes the wikitech-ambassadors list is used, sometimes new features run in beta and volunteers are invited to write up their experiences on a talk page somewhere, but very frequently features are not announced at all. The situation is complicated by the fact that the different Wikimedia sites work in almost 300 languages with different fonts, different string lenghts, different templates, different extensions, different CSS etc.

One way to solve this is to develop some tools and procedures to communicate with prospective volunteer testers and to collect feedback from them, both positive and negative. It can be a simple form that says: feature x, languages XX, OK/FAIL. See an example from Fedora here: QA-L10N:nautilus test day. In Fedora, the technical side of things is actually just a MediaWiki table. We could just use that, or we could do something even better: maybe a MediaWiki extension, or maybe even some non-MediaWiki-based technology.

In any case, an easy-to-understand workflow would be very important, even if the technical tools are good, and writing these tools and procedures would be a very useful contribution to the MediaWiki developers' and users' community. --Amir E. Aharoni (talk) 19:36, 14 November 2012 (UTC)

System documentation integrated in source code
It would be really nice if inline comments, README files, and special documentation files could exist in the source code but be exported into a formatted, navigable system (maybe wiki pages or maybe something else). It could be something like doxygen, accept better and orientated to admins and not developers. Of course it should integrate with mediawiki.org and https://doc.wikimedia.org.

The idea would be that one could:
 * Keep documentation close to the code and thus far more up to date
 * Even enforce documentation updates to it with new commits sometimes
 * Reduce the tedium of making documentation by using minimal markup to specify tables, lists, hierarchy, and so on, and let a tool deal with generating the html (or wikitext). This could allow for a more consistent appearance to documentation.
 * When things are removed from the code (along with the docs in the repo), if mw.org pages are used, they can be tagged with warning box and be placed in maintenance category.

Proposed by Aaron Schulz.

Ranking articles by Pageviews for Wikiprojects and Task Forces in Languages other than English
Currently we have an amazing tool which every month determine what pages are most viewed for a Wikiproject and then provides a sum of the pageviews for all articles within that project. An example of the output for WikiProject Medicine in English.

The problems is that this tool only exists in English and is running on toolserver rather than Wikimedia Labs. So while we know what people are looking at in English, and this helps editors determine what articles to work on, other languages do not have this ability.

Additionally we are do not know if the topics people look up in English are the same as those they look up in other languages. In the subject area of medicine this could be the basis of a great academic paper and I would be happy to share authorship with those who help to build these tools.

A couple of steps are needed to solve this problem:
 * 1) For each article within a Wikiproject in English, take the interlanguage links stored at wikidata, and tag the corresponding article in the target language
 * 2) Figure out how to get Mr. Z's tool to work in other languages . He supposedly is working on it and I am not entire clear if he is willing to have help. Another tool that could potentially be adapted to generate the data is already on Labs

James Heilman (talk) 21:13, 14 September 2013 (UTC)

Beyond development
Featured projects that focus on technical activities other than software development.

Research & propose a catalog of extensions
Extensions on mediawiki.org are not very well organized and finding the right extension is often difficult. Listening community members you will hear about better management of extension pages with categorization, ratings on code quality, security, usefulness, ease of use, good visibility for good extensions, “Featured extensions”, better exposure and testing of version compatibility... This project is about doing actual research within our community and out there to come up with a proposal both agreed and feasible. A plan that a development team can just take to start the implementation.

Skills: research, negotiation, fluent English writing. Technical background and knowledge of MediaWiki features and web development features will get you sooner to the actual work.

Mentors: Yury Katkov.

Annotation tool that extracts statements from books and feed them on Wikidata
Wikidata is a free knowledge base that can be read and edited by humans and machines alike. If you understand the difference between plain text and data you will understand that this project is Wikipedia's Game-changer. The conversion from text to Wikidata content fields has started in Wikipedia and sister projects and continues diving deeper, but there is still a lot to do!

Now think about this: you are at home, reading and studying for pleasiure, or an assignment, or for your PhD thesis. When you study, you engage with the text, and you often annotate and take notes. What about a tool that would let you share important quotes and statements to Wikidata?

A statement in Wikidata is often a simple subject - predicate - object, plus a source. Many, many facts, in the books you read, can be represented in this structure. We an think of a way to share them.

A client-side browser plugin or script or app that would take some highlighted text, offering you a GUI to fix up the statement and source, and then feed it into Wikidata.

We could unveil a brand-new world of sharing and collaborating, directly from you reading.

Possible projects: Mentors: Aubrey is available for mentorship, paired with a technical expert.
 * Pundit. http://www.thepund.it/ (the team is aware of Wikidata and willing to collaborate).
 * Annotator https://github.com/okfn/annotator,

Google Books > Internet Archive > Commons upload cycle
Wikisources all around the world use heavily GB digitizations for transcription and proofreading. As GB provides just the PDF, the usual cycle is:
 * 1) go to Google Books and look for a book
 * 2) check if the book is already in IA
 * 3) if it's not, upload it there
 * 4) get the djvu from IA
 * 5) upload it on Commons
 * 6) use it on Wikisource

For point 4, we have this awesome tool: https://toolserver.org/~tpt/iaUploadBot/step1.php What we miss right now is a tool for point 2.1, that would serve many other users outside the Wikimedia movement too. Eventually, we could think of a bot/script which would do all the work altogether, notifying the user when their help is needed (eg metadata polishing, Commons categories, etc.) Mentors: Aubrey is available for "design" mentorship, paired with a technical expert. We can maybe ask help from a IA expert.