Outreach programs/Possible projects

From mediawiki.org
Revision as of 09:25, 12 September 2014 by Qgil-WMF (talk | contribs) (→‎Catalogue for MediaWiki extensions: GSOC project passed -- see related Bugzilla report)

We are using this list of projects as a master branch for Mentorship programs such as Google Summer of Code and Outreach Program for Women. The projects listed are good for students and first time contributors but they require a good amount of work. They might also be good candidates for Individual Engagement Grants.

  • Featured project ideas usually have mentors ready for you to jump in.
  • Raw projects are interesting ideas that have been proposed but might lack definition, consensus or mentors, and therefore we can't feature them. If you're interested in one of those, wonderful! You'll need to work a bit more to improve their fundamentals.

If you are looking for smaller tasks check the Annoying little bugs. For a more generic introduction check How to contribute.


Be part of something big

These are the people we develop for.

We believe that knowledge should be free for every human being. We prioritize efforts that empower disadvantaged and underrepresented communities, and that help overcome barriers to participation. We believe in mass collaboration, diversity and consensus building to achieve our goals.

Wikipedia has become the fifth most-visited site in the world, used by more than 400 million people every month in more than 270 languages. We have other content projects including Wikimedia Commons, Wikidata and the most recent one, Wikivoyage. We also maintain the MediaWiki engine and a wide collection of open source software projects around it.

But there is much more we can do: stabilize infrastructure, increase participation, improve quality, increase reach, encourage innovation.

You can help to these goals in many ways. Below you have some selected ideas.

Where to start

Maybe at this point your proposal is just a vague idea and you want to get some feedback before investing much more time planning it? We know this feeling very well! Just send an email to wikitech-l (or qgil@wikimedia.org if you prefer) sharing what you have in mind. One short paragraph can be enough to get back to you and help you working in the right direction.

Learn and discuss

Obligatory reading:

To set up your MediaWiki developer environment, we recommend you start installing a local instance using mediawiki-vagrant. You can also have a fresh MediaWiki to test on a remote server. Just get developer access and request your own instance at Wikitech.

If you have general questions you can start asking at the Discussion page. #mediawiki connect IRC channel is also a good place to find people and answers. We do our best connecting project proposals with Bugzilla reports and/or wiki pages. Other contributors may watch/subscribe to those pages and contribute ideas to them. If you can't find answers to your questions, ask first in those pages. If this doesn't work then go ahead and post your question to the wikitech-l mailing list.

Add your proposal

  • Use your user page to introduce yourself.
  • Draft your project in a separate page in main namespace, or as subpage of an existing project or extension your idea will integrate with. Try to pick a short, memorable and catchy title which communicates your core idea on how to tackle the issue/project you chose.
  • Use the template. For GSoC proposals, remember to add them to the proposals category and the table so that it's clear it's a proposal (not yet approved) and you're working on it.
  • The GSOC student guide is a good resource for anybody willing to write a good project proposal. And then there is a list of DOs and DON'Ts full of practical wisdom.

Featured project ideas

Below you can find a list of ideas that already have gone through a reality check and have mentors confirmed. You can find more suggestions in our list of Raw projects.

But before, let us talk about...

Your project

That's right! If you have a project in mind we want to hear about it. We can help you assessing its feasibility and we will do our best finding a mentor for it.

Here you have some guidelines for project ideas:

  • Opportunity: YES to projects responding to generic or specific needs. YES to provocative ideas. NO to trivial variations of existing features.
  • Community: YES to projects encouraging community involvement and maintenance. NO to projects done in a closet that won't survive without you.
  • Deployment: YES to projects that you can deploy. YES to projects where you are in sync with the maintainers. NO to projects depending on unconvinced maintainers.
  • MediaWiki != Wikipedia: YES to generic MediaWiki projects. YES to projects already backed by a Wikimedia community. NO to projects requiring Wikipedia to be convinced.
  • Free content: YES to use, remix and contribute Wikimedia content. YES to any content with free license. NO to proprietary content.
  • Free API: YES to the MediaWiki API. YES to any APIs powered with free software. NO to proprietary APIs.


Internationalization and localization

w:Internationalization (i18n) and w:localization (L10n) are part of our DNA. The Language team develops features and tools for a huge and diverse community, including 287 Wikipedia projects and 349 MediaWiki localization teams. This is not only about translating texts. Volunteer translators require very specialized tools to support different scripts, input methods, right-to-left languages, grammar...

Below you can find some ideas to help multilingualism and sharing of all the knowledge literally for everybody in their own language.

Extensive and robust localisation file format coverage

Translate extension supports multiple file formats. The formats have been developed "as needed" basis, and many formats are not yet supported or the support is incomplete. In this project the aim would be to make existing file formats (for example Android xml) more robust to meet the following properties:

  • the code does not crash on unexpected input,
  • there is a validator for the file format,
  • the code can handle the full file format specification,
  • the code is secure (does not execute any code in the files nor have known exploits).

Example known bugs are bugzilla:31331, bugzilla:36584, bugzilla:38479, bugzilla:40712, bugzilla:31300, bugzilla:57964, bugzilla:49412.

In addition new file formats can be implemented: in particular Apache Cocoon (bug 56276) and AndroidXml string arrays have interest and patches to work on, but we'd also like TMX, for example. Adding new formats is a good chance to learn how to write parsers and generators with simple data but complicated file formats. For some formats, it might be possible to take advantage of existing PHP libraries for parsing and file generation. (More example formats other platforms support: OpenOffice.org SDF/GSI, Desktop, Joomla INI, Magento CSV, Maker Interchange Format (MIF), .plist, Qt Linguist (TS), Subtitle formats, Windows .rc, Windows resource (.resx), HTML/XHTML, Mac OS X strings, WordFast TXT, ical.)

This project paves the way for future improvements, like automatic file format detection, support for more software projects and extension of the ability to add files for translation by normal users via a web interface.

One stop translation search

Search overview in the design

A Special:SearchTranslations page has been created for the Translate extension to allow searching for translations. However it has not been finished and it lacks important features: in particular, being able to search in source language, but show and edit messages in your translation language. The interface has some bugs with facet selection and direct editing of search results is not working properly. It is not possible to search by message key unless you know the special syntax, nor to reach a message in one click. Interface designs are available for this page.

  • Skills: Backend coding with PHP, frontend coding with jQuery, Solr/ElasticSearch/Lucene
  • Mentors: Niklas Laxström, Nik Everett (for the Elasticsearch part if needed)

Collaborative spelling dictionary building tool

There are extensive spelling dictionaries for the major languages of the world: English, Italian, French and some others. They help make Wikipedia articles in these languages more readable and professional and provide an opportunity for participation in improving spelling. Many other languages, however, don’t have spelling dictionaries. One possible way to build good spelling dictionaries would be to employ crowdsourcing, and Wikipedia editors can be a good source for this, but this approach will also require a robust system in which language experts will be able to manage the submissions: accept, reject, filter and build new versions of the spelling dictionary upon them. This can be done as a MediaWiki extension integrated with VisualEditor, and possibly use Wikidata as a backend.

Interwiki/cross-wiki and management

A system for reviewing funding requests

The goal of this project is to either create or adapt a free and open source review system for funding requests, applicable to Individual Engagement Grants, conference scholarships, and similar programs. The system will allow an administrator to specify the criteria by which the grants will be judged, and the importance of those criteria relative to each other. A defined set of reviewers will then score and comment on each application. Once this scoring is complete, the system will average the scores of all of the reviewers and apply the relative importance values specified by the administrator to produce a report showing how each proposal has been assessed. The system will therefore greatly aid the grantmakers in the selection process for funding requests.

The project will commence with a research phase to find if there are any free and open source software systems already released that could either be used or adapted. Upon finishing that research, the student will either modify an existing system that was found or build one from scratch, with support from the project mentors. More background on the overall project and some initial details can be found in the IdeaLab on meta-wiki.

MediaWiki core

Allowing 3rd party wiki editors to run more CSS features

The 3rd party CSS extension allows editors to style wiki pages just by editing them with CSS properties. It could be more powerful if we find a good balance between features and security. Currently this extension relies on basic blacklisting functionality in MediaWiki core to prevent cross-site scripting. It would be great if a proper CSS parser was integrated and a set of whitelists implemented.

Additionally, the current implementation uses data URIs and falls back to JavaScript when the browser doesn't support them. It would be a great improvement if the MediaWikiPerformAction (or similar) hook was used to serve the CSS content instead. This would allow the CSS to be more cleanly cached and reduce or eliminate the need for JavaScript and special CSS escaping.

  • Skills: PHP, CSS, JavaScript, web application security.
  • Mentors: Rusty Burchfield, ?.

MediaWiki.org and codebase management

Wikimedia Identities Editor

Mediawiki Community Metrics is a Wikimedia project which goal is to describe how the MediaWiki / Wikimedia tech community is doing.

Once the website with the metrics is reaching a first complete version, a web application to manage community identities is needed. A community member will access the web application and authenticate using OAuth or creating a new account. All the information about the member in Mediawiki Community Metrics will be presented, so the user can update her information, add new identities, the localization and so on.

  • Skills: Django or similar web framework to develop the application. OAuth and other authentication techs.
  • Mentors: Alvaro del Castillo, Daniel Izquierdo.

Multimedia, Wikisource

Book management in Wikibooks/Wikisource

Simple navigation bar

We need a stable interface for reading and editing books. At the moment we have made ​​great progress, but the extension needs some features (especially bug 53286 and bug 52435) to be ready for installation in our wikis.

  • Skills: PHP, JS (JSON/JQuery etc), HTML, CSS and caching. Experience in UX design and MySQL is a plus.
  • Mentor: Raylton

Google Books > Internet Archive > Commons upload cycle

Wikisources all around the world use heavily GB digitizations for transcription and proofreading. As GB provides just the PDF, the usual cycle is:

  1. go to Google Books and look for a book,
  2. check if the book is already in IA,
  3. if it's not, upload it there with appropriate metadata (library),
  4. get the djvu from IA,
  5. upload it on Commons,
  6. use it on Wikisource.

What we miss right now is a tool for points 2-3, that would serve many other users outside the Wikimedia movement too. It could be

  • a python script for mass uploads for experienced users (building with Alex.brollo's tests) and/or
  • some web tool for more limited uploads (preferably as a Wikisource gadget, but also bookmarklet, tool hosted on Tool Labs or elsewhere, or even browser extension).

Subsequently, more work is possible. For points 4-5, we have the awesome IA-Upload tool, that could be further improved. Eventually, the script/tool above would be further developed to combine and automate all the passages, asking user interveention only when needed (e.g. metadata polishing and Commons categories).

Relevant Wikisource help pages include s:Help:Internet Archive and s:Help:Internet Archive/Requested uploads.

  • Skills: python, screenscraping, some JavaScript for the frontend.
  • Mentors:
    • Aubrey is available for "design" mentorship, paired with a technical expert. We can maybe ask help from a IA expert;
    • Yann (talk) is available for help regarding uploads of Google Books and files from IA;
    • Tpt is available as technical mentor. He has no special knowledge of InternetArchive and Google Books. Is also the creator of IA-Upload.

New media types supported in Commons

Wikimedia Commons a database of millions of freely usable media files to which anyone can contribute. The pictures, audio and video files you find in Wikipedia articles are hosted in Commons. Several free media types are already supported but there are more requested by the community, like e.g. X3D for representing 3D computer graphics or KML/KMZ for geographic annotation and visualization. Considerations need to be taken for each format, like security risks or fallback procedures for browsers not supporting these file types.

  • Skills: PHP at least. Good knowledge of the file type chosen will be more than helpful.
  • Mentors: Bryan Davis, ?.

OpenHistoricalMap & Wikimaps

Wikimaps is an initiative to gather old maps in Wikimedia Commons and place them in world coordinates with the help of Wikimedia volunteers. Connecting with OpenHistoricalMap the historical maps can be used as reference for extracting historical geographic information. Additionally, the resulting historical geodata can be connected back to the data repository of Wikimedia through Wikidata, creating a community maintained spatiotemporal gazetteer.

We hope to foster better visualisation of the raw data held by the OpenHistoricalMap by allowing the rendering to work temporally as well as spatially. The tile rendering software will be modified to support a date range while the OSM rails port will be addled by a time / date sliders so that the current OSM tools stack can work within a specific time and place.

  • Enhance the iD and The_Rails_Port so that a javascript time/date slider can be added to control the time period that is of interest.
  • Enhance the ID and The_Rails_Port so that meta-data hooks are added to the code that allow for custom deployments of both software. The intent is to support their use as dedicated user interfaces to certain applications (such as medieval walking path editing) while still using a generic data source.
  • Modify the Mapnik tile renderer to handle Key:start_date and Key:end_date.

Semantic MediaWiki

Semantic MediaWiki is a lot more than a MediaWiki extension: it is also a full-fledged framework, in conjunction with many spinoff extensions, and it has its own user and developer community. Semantic MediaWiki can turn a wiki into a powerful and flexible collaborative database. All data created within SMW can easily be published via the Semantic Web, allowing other systems to use this data seamlessly.

There are more than 500 SMW-based sites, including wiki.creativecommons.org, docs.webplatform.org, wiki.mozilla.org, wiki.laptop.org and wikitech.wikimedia.org.

Multilingual Semantic MediaWiki

Semantic MediaWiki would benefit from being multilingual-capable out of the box. We could integrate it with the Translate extension. This can be done in some isolated steps, but there is a need to list all the things in need of translation and define appproach and priority for each of them. Some of the steps could be:

  • Fix the issues that prevent full localisation of Semantic Forms.
  • Enhance Special:CreateForm and friends (all the Special:Create* special pages by Semantic Forms) to create forms that are already i18ned with placeholders and message group for Translate extension.
  • Make it possible to define translation for properties and create a message group for Translate extension, similar to what CentralNotice does (sending strings for translation to Translate message groups).
    • There are lot of places where properties are displayed: many special pages, queries, property pages. Some thinking is required to find out a sensible way to handle translations on all these places.
    • Currently In most wikis, properties names are supposed to be hidden to the user, e.g. queries results are usually shown in infobox-like templates (whose labels could in theory be localised as all templates).

Translate would be fed with the strings in need of translation. Localised strings/messages would be displayed based on the interface language, that in core every user can set on Special:Preferences and with ULS is made way easier to pick for everyone including unregistered users.

For real field testing, WikiApiary could be used, or at worst translatewiki.net (quick deployments, little SMW content).

Simultaneous Modification of Multiple Pages with Semantic Forms

Right now the editing of multiple pages with Semantic Forms is rather cumbersome with users having to edit every page separately, then sending it off and waiting for the server reply to then click their way to the edit form for the next page. The aim of this project is to facilitate the simultaneous editing of the data of multiple pages displayed in a table, ideally giving a spreadsheet-like experience.

As an additional goal there should be an autoedit-like functionality for multiple pages. Using the #autoedit parser function it is currently possible to create links that, when clicked on, create or edit one page automatically in the background, with a preloaded set of values. With the new function it would be possible to modify several pages at once.

Project goals:

  • display data of multiple pages in a tabular form with each line containing the data of one page and each cell containing an input for one data item
  • provide an optimized user interface for this form that allows for rapid navigation and editing with a special focus on keyboard navigation
  • optional: for the data items use the input widgets as specified in an applicable form definition
  • when submitted store the modified data using the job queue
  • provide a parser function that allows the automatic modification of multiple pages

This project involves challenges regarding working with the MediaWiki API and user rights management to protect the wiki from unauthorized mass-modification of pages.

Updating RDFIO to use templates

RDFIO extends the RDF import and export functionality in Semantic MediaWiki by providing import of arbitrary RDF triples, and a SPARQL endpoint that allows write operations. Since most SMW is stored via templates, RDFIO should be able to create and modify template calls based on the triples it is importing (1). At first, assumptions will need to be made about the schema of the imported data to make this work effectively (see, e.g., 2) Then, we can see which of these assumptions can be relaxed, thereby increasing interoperability between SMW and the rest of the semantic web.

  • Skills: PHP, understanding of semantic web technology, primarily RDF and SPARQL.
  • Mentors: Joel Sachs, Samuel Lampa

Visual translation: Integration of page translation with VisualEditor

The wiki page translation feature of the Translate extension does not currently work with VisualEditor due to the special tags it uses. More specifically, this is about editing the source pages that are used as the source for translations, not the translation process itself. The work can be divided into three steps:

  1. Migrate the special tag handling to a more standard way to handle tags in the parser. This need some changes to the PHP parser for it to be able to produce wanted output.
  2. Add support to Parsoid and VisualEditor so that editing page contents preserves the structures that page translation adds to keep track of the content.
  3. Add to VisualEditor some visual aid for marking the parts of the page that can be translated.

This is likely to be a difficult project due to complexities of wikitext parsing and intersecting multiple different products: Translate, MediaWiki core parser, Parsoid, VisualEditor.

Promotion

Annotation tool that extracts statements from books and feed them on Wikidata

Wikidata is a free knowledge base that can be read and edited by humans and machines alike. If you understand the difference between plain text and data you will understand that this project is Wikipedia's Game-changer. The conversion from text to Wikidata content fields has started in Wikipedia and sister projects and continues diving deeper, but there is still a lot to do!

Now think about this: you are at home, reading and studying for pleasure, or an assignment, or for your PhD thesis. When you study, you engage with the text, and you often annotate and take notes. What about a tool that would let you share important quotes and statements to Wikidata?

A statement in Wikidata is often a simple subject - predicate - object, plus a source. Many, many facts, in the books you read, can be represented in this structure. We an think of a way to share them.

A client-side browser plugin or script or app that would take some highlighted text, offering you a GUI to fix up the statement and source, and then feed it into Wikidata.

We could unveil a brand-new world of sharing and collaborating, directly from you reading.

Possible projects:


Documentation

Wikidata Outreach

Wikidata is a free knowledge base that can be read and edited by humans and machines alike. If you understand the difference between plain text and data you will understand that this project is Wikipedia's Game-changer.

Wikidata is one of the newest Wikimedia projects. It is not known and understood enough yet. Your task is to help change this by helping with the outreach efforts around Wikidata. How we're going to do this? Let's figure it out. Let's plan a social media campaign? Let's make documentation easier to find and understand? Let's showcase amazing things around Wikidata? Let's do some storytelling? Or something completely different? Based on your skillset and interest we will figure out the right steps together.

  • Skills: at least a basic understanding of Wikidata, ideas and writing skills
  • Mentor: Lydia Pintscher

Evaluate MediaWiki web API client libraries

API:Client code, our list of available web API client libraries, is currently a mishmash of libraries of varying quality, some completely unmaintained. At some point in the future, it would be ideal to have a list of officially approved libraries in several popular programming languages. This would help the developers of applications that consume our API ("third-party" developers) easily know

This intern would help prepare for that glorious future by:

  • Updating the API:Client code list to make sure it includes all the extant MediaWiki-specific client libraries.
  • Evaluating the existing MediaWiki API client libraries in several major languages (both on a technical basis and judging how responsive the developers are) to pick the best in each language. The number of languages is not quite fixed and depends on how many libraries exist in each language, how clear-cut the differences are, and so on, but probably we want to evaluate libraries in at least four languages during this internship.
  • Writing up some detailed specifications of what we ought to do next to raise each of them to Official "Use This" Library status.

Skills needed:

  • Some programming skill in at least one or two programming languages. (Languages we're interested in include Java, PHP, Python, Ruby, C#, JavaScript and Node.js, Go, .NET, Perl, Clojure, OCaml, and Scala, but other languages might also be welcome.)
  • Willingness and ability to learn the basics of several programming languages over the course of the one-month Community Bonding Period and the three-month internship.
  • Writing English prose.
We are considering modifying this project to reduce the time spent on evaluation and add time for the intern to actually contribute to one of the client libraries, through bug-filing, documentation, and code improvements.

Proposed by Sumana Harihareswara, Co-mentor Tollef Fog Heen. Advisors: Brad Jorsch and Merlijn van Deen.

Wikimedia Performance portal

MediaWiki and the ecosystem of services that support it in Wikimedia's production environment emit lots of performance timing data, such as the time it took to process some request or the speed of a network link. Much of this data is aggregated in two log aggregation systems with graphing capabilities. But the data is not well curated, mixing important metrics with unimportant ones. The data needs a curator! We have http://performance.wikimedia.org/ provisioned, and we'd like that space to feature some key performance metrics about the Wikimedia cluster, perhaps accompanied by some glosses that help readers interpret the data. (See gdash.wikimedia.org for an approximate system.) Ori Livneh, Senior Performance Engineer at the Wikimedia Foundation, will act as mentor. He will be happy to provide an overview of the data that is available, the means of accessing it, and the tooling available for plotting it. This task is suitable for anyone with interest in data analysis and performance analysis. Some facility with a language with good data analysis libraries like Python or R is desirable but not required.


Raw projects

MediaWiki API / Wikimedia data

Make Wiktionary definitions available via the dict protocol

The dict protocol (RFC 2229) is a widely used protocol for looking up definitions over the Internet. We'd like to make Wiktionary definitions available for users. Doing that using the dict protocol would help drive the use and usefulness of Wiktionary, as well.

Possible users:

  • Tablet readers often have dictionary lookup included.
  • Students writing papers would have access to a large corpus of words.
  • Mobile applications for Wiktionary would be less tied to MediaWiki itself.

MediaWiki development

If you're a programmer, we have lots of things for you to do. (To do: copy some relevant ideas from http://socialcoding4good.org/organizations/wikimedia )

Effective anti-spam measures

Use something like a minimal version of Extension:ConfirmAccount to require human approval of each account creation. That is the applicant fills in forms for user name, email and a brief note about who they are and why they want to edit the wiki. Also set the wiki so that the initial few edits also need approval. Then have it that any bureaucrat can approve the account creation, initial edits and remove the user id from moderation. Rob Kam (talk) 09:50, 1 December 2013 (UTC)[reply]

Requirements have to be clarified here: the proposed approach is much more complex than ConfirmAccount, not "minimal". Perhaps what you want is a sandbox feature? --Nemo 10:32, 1 December 2013 (UTC)[reply]
Sandbox feature looks good, but for all new accounts not just translators. Rob Kam (talk) 10:49, 1 December 2013 (UTC)[reply]

Parsoid

The Parsoid project is developing a wiki runtime which can translate back and forth between MediaWiki's wikitext syntax and an equivalent HTML / RDFa document model with better support for automated processing and visual editing. It powers the VisualEditor project, Flow and semantic HTML exports.

Parser migration tool

Periodically, we come across some bit of wikitext markup we'd like to deprecate. See Parsoid/limitations, Parsoid/Broken wikitext tar pit, and (historically) meta:MNPP for examples. We'd like to have a real slick tool to enhance communication with WP editors about these issues:

  • It would display a list of wiki titles (filtered by wikipedia project) which contain deprecated wikitext. Each title would link to a page which would briefly describe the problem(s), general advice on how the wikitext should be rewritten, and (perhaps) some previously-corrected pages for editors to look at.
  • Ideally this would be integrated with a wiki workflow and/or contain "revision tested" information so that editors can 'claim' pages from the list to fix and don't step on each others work. Fixed/revised pages would be removed from the list until their new contents could be rechecked.
  • It should be as easy as possible for Parsoid developers to add new "bad" pattern tests to the tool. These would get added to the testing, with appropriate documentation of the problem, so that editors don't have to learn about a new tool/site for every broken pattern.
  • Some of these broken bits of wikitext might be able to be corrected by bot. The tool could still create a tasklist for the bot and collect and display the bots' fixes for editors to review.
  • The backend which looks for broken wikitext could be based on the existing round-trip test server. Instead of repeatedly collecting statistics on a subset of pages, however, it would work its way through the entire wikipedia project looking for broken wikitext (and preventing regressions).
  • Some cleverness might be helpful to properly attribute bad wikitext to a template rather than the page containing the template. This is probably optional; editors can figure out what's going on if they need to.
  • Skills: node.js, and probably MediaWiki bots and/or extensions as well. A candidate will ideally have some node.js experiences and some notions of web and UX design. This task could be broken into parts, if a candidate wants to work only on the front-end or back-end portions of the tool.
  • Mentors: C. Scott Ananian, Subramanya Sastry

VisualEditor plugins

VisualEditor is a rich visual editor for all users of MediaWiki so they don't have to know wikitext or HTML to contribute well formatted content. It is our top priority and you can already test it on the English Wikipedia. While we focus on the core functionality, you could write a plugin to extend it, such as to insert or modify Wikidata content. There are also many possibilities to increase the types of content supported, including sheet music, poems, timelines


VisualEditor support for EasyTimeline

Also mentioned at #VisualEditor plugins.

Flow

Flow brings a modern discussion and collaboration system to MediaWiki. Flow will eventually replace the current Wikipedia talk page system and will provide features that are present on most modern websites, but which are not possible to implement in a page of wikitext. For example, Flow will enable automatic signing of posts, automatic threading, and per-thread notifications.

Gadgets

Templates

Skins

Improving the skinning experience

Research how to make the development of skins for MediaWiki easier. Many users complain about the lack of modern skins for MediaWiki and about having a hard time with skin development and maintenance. Often sys admins keep old versions of MediaWiki due to incompatibility of their skins, which introduces security issues and prevents them from using new features. However, little effort was done to research the exact problem points. The project could include improving skinning documentation,organizing training sprints/sessions, talking to users to identify problems, researching skinning practices in other open source platforms and suggesting an action plan to improve the skinning experience.

Maria Miteva proposed this project.

Extensions

Check Manual:Extensions and extension requests in Bugzilla.

An easy way to share wiki content on social media services

Wikipedia, as well as other wikis based on MediaWiki, provide an easy way to accumulate and document knowledge, but it is difficult to share it on social media. According to https://strategy.wikimedia.org/wiki/Product_Whitepaper 84% of Wikimedia users were Facebook users as well in 2010, with the portion incresing from previous years. The situation is probably similar with other social media sites. It only makes sense to have an effective "bridge" between MediaWiki and popular social media site. More details here: strategy:Product Whitepaper#Red link: Post to social media feeds.

Some previous work you can use as a base, improve, or learn from:

Extension:OEmbedProvider

Finish Extension:OEmbedProvider, as proposed here. See also Bug 43436 - Implement Twitter Cards

Leap Motion integration with MediaWiki

MediaWiki has a wide user base and a lot of users today prefer touch based interfaces. Gesture based interface are friendly and the latest trend. Leap Motion provides controllers that can recognize gestures. It can be integrated with MediaWiki products like Wikisource. As an example, this would make it more friendly for users to flip through pages in a book. Another advantage of using gesture recognition would be to include turning through multiple chapters or pages at a time by identifying the depth of user's finger's motion.

It would also be helpful for flipping through images in Wikimedia Commons.

(Project idea suggested by Aarti Dwivedi).

Work on RefToolbar

The en:Wikipedia:RefToolbar/2.0 extension is incredibly useful, especially for new editors but also for experienced editors (I use it every day, and I've got a few miles under my belt!). But it suffers from bugs and problems, and there are a lot of improvements that could be made. For instance: adding additional reference types, adding fields for multiple authors, tool-tip help guidance, etc. I also suspect it will need an upgrade to match Lua conversions of common cite templates. Also, I don't think this is in wide deployment on other wikis, so translation/deployment could be a project. Looking at the talk page, there are a couple people starting to work on this but serious development isn't happening (so I'm not sure who would mentor this) but the code was recently made accessible. At any rate, it is an extension that really needs some work and where improvements would have immediate benefit for many editors.

Project idea contributed by Phoebe (talk) 23:23, 22 March 2013 (UTC) [n.b.: I can't mentor on the tech side, but can give guidance on the ins and outs of various citation formats in the real world & how cite templates are used on WP].[reply]

Global, better URL to citation conversion functionality

Suppose, in Wikipedia, all that needed to be done to generate a perfect citation was to provide a URL? That would be a tremendous step toward getting a much higher percentage of text in Wikipedia articles to be supported by inline citations.

There are already expanders (for the English Wikipedia, at least) that will convert an ISBN, DOI, or PMID, supplied by an editor, into a full, correct citation (footnote). These are in the process of being incorporated into the reference dialog of the VisualEditor extension, making it almost trivial (two clicks, paste, two clicks) to insert a reference.

For web pages, however, the existing functionality seems to be limited to a Firefox add-on. Its limits, besides the obvious requirement to use that browser (and to install the add-on), include an inability to extract the author and date from even the most standard pages (e.g., New York Times), and the lack of integration with MediaWiki.

For a similar approach, using a different plug-in/program, see this Wikipedia page about Zotero.

A full URL-to-citation engine would use the existing Cite4Wiki (Firefox add-on) code, perhaps, plus (unless these exist elsewhere) source-specific parameter specifications. For example, the NYT uses "<meta name="author" content="NICK BILTON" />" for its author information; that format would be known by the engine (via a specifications database). Each Wikipedia community would be responsible for coding these (except for a small starter set, as examples), in the way that communities are responsible for TemplateData for the new VisualEditor extension.

(Project idea suggested by John Broughton.)

Education Program, outreach and projects

The Wikipedia Education Program helps professors and students contribute to Wikipedia as part of coursework. The current Education Program extension provides features for keeping track of the institutions, courses, professors, students and volunteers involved in this. However, the extension has several limitations and will be largely rewritten. Help is needed to design and build new software to support both the Education Program and other related activities, including topic-centric projects and edit-a-thons.

This project offers tons of opportunities to learn about different facets of software development. There's work to be done right away on UX, flushing out details of requirements, and architecture design. On this last point, a fun challenge we'll face is creating elegant code that interfaces with a not-so-elegant legacy system. Another challenge will be to create small deliverables that are immediately useful, that can replace parts of the current software incrementally, and that can become components of the larger system we're planning.

Student developers eager to dive into coding tasks can also take bugs on the current version of the software—much of which will remain in production for a while yet. In doing so, they'll practice their code-reading skills, and will get to deploy code to production quickly. :)

  • Skills: PHP, Javascript, CSS, HTML, UI design, usability testing, and object-oriented design
  • Mentors: Andrew Green, Sage Ross.
Support for vizGrimoireJS-lib widgets

vizGrimoireJS-lib is a JavaScript visualization library for data about software development, collected by MetricsGrimoire tools. It is used, for example, for the MediaWiki development dashboard.

The idea of this project is to build a module for MediaWiki so that vizGrimoireJS-lib widgets can be included in MediaWiki pages, with a special markup. All widgets get their information from JSON files, and accept several parameters to control visualizacion. Currently, vizGrimoireJS-lib widgets can be included in HTML pages with a simple HTML markup (sing HTML attributes to specify the parameters of the widget. This behavior will be translated into MediaWiki markup, so that the whole MediaWiki development dashboard could be inserted in MediaWiki pages.

Modernize Extension:EasyTimeline

EasyTimeline hasn't gotten a lot of maintenance in the last few years, in part perhaps because the perl script and the ploticus dependency make it harder for MW devs to tweak. Bringing the graphics generation "inside" also could enable fancier future things, such as in-browser visual editing of timelines if it can be translated into something manipulable on the web such as SVG. (comment 0. See more talk in bugzilla) Some currently "visible" problems include inflexible font configuration, non-Latin scripts, right-to-left text and accessibility (not sure whether this is actually fixable).

  • Skills: Perl (reading would be enough), PHP, knowledge of whatever backend finally chosen
  • Mentors: (choose from bug commenters there?)

Wikimedia Commons / multimedia

SĂ©bastien Santoro (Dereckson) can mentor these projects idea.


Allow smoother and easier Wikimedia Commons pictures discovery

Skills: Programming. Design.

This project may overlap significantly with Extension:MultimediaViewer, so anyone taking it on should be in contact with the devs on that project and take advantage of the interface that's already built.

Wikimedia Commons is 20 million media repository, all under a free license or in public domain. This is a common repository used on Wikipedia and other Wikimedia projects and available for any other projects in need of educative or informational pictures.

Previous usability and UI effort were focused on the upload process and image reusing.

This project is to think, design and develop a better interface to browse and discover pictures, from a user perspective. For example, it has been suggested to implement a lightbox system to switch to the next picture in the categories. A part of the project could be to prepare an external website to implement this lightbox and so offer a similar browsing experience than other popular pictures sites. If the interface works well, a second phase could be to integrate it directly to Wikimedia Commons.

Another idea is a view mode allowing to browse a root category (e.g. the cats category or the roses category and to be able to see pictures in this category and also the subcategories). This will satisfy the need "I want a cat photo" or "I want a rose photo" without having to browse a dozen of specialized subcategory. In a second step, we could filter result with available information. If you're interested to implement this approach, your project could be whether:

  • the design and development of the viewer mode, with a focus on the UI and ergonomic browser capability
  • to prepare this second step and identify the most relevant criteria (weight, resolution, taken date, color information, most used files on wikis, images with labels) and analyze cost/benefits to cache these data; prepare a prototype with a subset of 1000 images to help to create a performance model and see how in the future have this information could be available for several millions of pictures.

Mentor: SĂ©bastien Santoro

Support for text/syntax/markup driven or WYSIWYG editable charts, diagrams, graphs, flowcharts etc.

Resuscitate Extension:WikiTeX and fold Extension:WikiTex into it.

Provide a way to create interactive 2D/3D timelines and infographics Ă  la Java applets, AJAX, Flash

We almost surely don't want to invent our own markup, but SVG probably doesn't suffice and we surely won't use any proprietary format. Ideally we would adopt some syntax/format/technology already adopted and supported by a lively community, preferably offering a certain amount of existing timelines/infographics and other resources which we would then be able to directly use on Wikimedia projects, save copyright incompatibilities. Perhaps http://timeline.knightlab.com/ , used by Reasonator?

Support for Chemical Markup Language
Accessibility for the colour-blind

Commons has a lot of graphs and charts used on Wikipedia and elsewhere, but few consider how they look with colour blindness, mostly because the creator/uploader has no idea. m:Accessibility#Colour-blind-friendly images lists some tools that can be used to automatically transform images into how they are seen by colour blind people. We could run such automated tools on all Commons graphs and charts and reporting the results, ideally after assessing automatically in some way that the resulting images are not discernible enough, lower than some score. The warnings can be relayed with some template on the file description or directly to the authors and can havhe a huge impact on the usefulness of Commons media.

Depending on skills and time constraint, the project taker would do 1, 1-2 or 1-3 of these three steps: 1) develop the code for such an automatic analysis based on free software, 2) identify what are the images to check on the whole Commons dataset and run the analysis on it producing raw results, 3) publish such results on Commons via bot in a way that authors/users can notice and act upon.

Category suggestions

Let's crush the categorisation backlog once and for all!

some categorization could be automated allready.

Searching for pictures based on meta-data is called "Concept Based Image Retrieval", searching based on the machine vision recognized content of the image is called "Content Based Image Retrieval".

What I understood of Lars' request, is an automated way of finding the "superfluous" concepts or meta-data for pictures based on their content. Of course recognizing an images content is very hard (and subjective), but I think it would be possible for many of these "superfluous" categories, such as "winter landscape", "summer beach" and perhaps also "red flowers" and "bicycle".

There exist today many open source "Content Based Image Retrieval" systems, that I understand basically works in the way that you give them a picture, and they find you the "matching" pictures accompanied with a score. Now suppose we show a picture with known content (pictures from Commons with good meta-data), then we could to a degree of trust find pictures with overlapping categories. I am not sure whether this kind of automated reverse meta-data labelling should be done for only one category per time, or if some kind of "category bundles" work better. Probably adjectives and items should be compounded (eg "red flowers").

Relevant articles and links from Wikipedia:

  1. w:Image_retrieval
  2. w:Content-based_image_retrieval
  3. w:List_of_CBIR_engines#CBIR_research_projects.2Fdemos.2Fopen_source_projects

Some demo links bawolff found:

I like the idea of automating categorisation, but I think we are a long way from being able to do much of it. So this would be a big longterm project. One of my concerns is that we are a global site, and we are trying to collect the most diverse set of images that anyone has ever assembled. Image recognition is a good way of saying that we now have another twenty images of this person, but it could be confused when we get our first images of one of the fox subspecies that we don't yet have a picture of. Or rather it would struggle to differentiate the rare and the unique from their more common cousins. There are also some spooky implications for privacy re image recognition and our pictures of people, aside from the obvious things like identifying demonstrators in a crowd or linking a series of shots of one person in such a way as to identify that this photograph of a face belongs to the same person as this photo of pubic hair because the hand is identical; We have had some dodgy things happening on Wikipedia with people wanting to categorise people ethnically and I worry that someone might use a tool such as this to try and semi accurately categorise people as say Jewish. Another major route for improved categorisation is geodata, and I think this could be a less contentious route. Not that everything has geodata, but if things have it could be a neat way to categorise a lot of images, especially if we can get boundary data so we can categorise images as being shot from within a set of boundaries rather than centroid data with all its problems that the parts of one place maybe closer to the centre of an adjacent place than the centre of the area they belong to. WereSpielChequers (talk) 09:58, 20 June 2014 (UTC)[reply]

MediaWiki core

Removing inline CSS/JS from MediaWiki

One of the future security goals of MediaWiki is to implement Content Security Policy. This is an HTTP header that disallows inline JavaScript and CSS as well as scripts and styles from disallowed domains. One of the big steps to achieving this is to remove all inline CSS and JavaScript from MediaWiki HTML. Some of the places inline scripting/styling is used:

  • Inline styling in wikitext is translated to inline styling in HTML
  • ResourceLoader is mostly good, but the loader script (at the top and bottom of page) is inline JavaScript
  • Data such as user preferences and ResourceLoader config variables is embedded into the HTML as inline JSON, when it should be in HTML attributes
  • Many extensions use inline styling rather than ResourceLoader modules

Fixing all of these inline scripts and styles is too big a task for a single mentor program. However, working on one or two, and slowly chipping down on the inline JS and CSS can help to move closer toward the final goal. This project obviously requires, at the very least, basic HTML and JavaScript knowledge, but some parts are more difficult than others. For example, bullet points 2 and 3 require only basic MediaWiki knowledge, but bullet point 1 requires altering the Parser class, and thus mandates a deeper understanding of MediaWiki and how it parses wikitext.

Wikisource

Merge proofread text back into Djvu files

Wikisource, the free library, has an enormous collection of Djvu files and proofread texts based on those scans. However, while the DjVu files contain a text layer, this text is the original computer generated (OCR) text and not the volunteer-proofread text. There is some previous work about merging the proofread text as a blob into pages, and also about finding similar words to be used as anchors for text re-mapping. The idea is to create an export tool that will get word positions and confidence levels using Tesseract and then re-map the text layer back into the DjVu file. If possible, word coordinates should be kept.

Hopefully, it will be possible to reuse part of existing proofreading/OCR correction/OCR training software such as the OCR editor by the National Library of Finland.

See also m:Grants:IdeaLab/Djvu text layer editor.

Mentors and skills:

  • Project proposed by Micru. I have found an external mentor that could give a hand on Tesseract, now I'm looking for a mentor that would provide assistance on Mediawiki.
  • Aubrey can be a mentor providing assistance regarding Wikisource, and some past history of this issue. Not much, but glad to help if needed.
  • Rtdwivedi is willing to be a mentor.

Sysadmin

Distributed cron replacement

A common requirement in infrastructure maintenance is the ability to execute tasks at scheduled times and intervals. On Unix systems (and, by extension, Linux) this is traditionally handled by a cron daemon. Traditional crons, however, run on a single server and are therefore unscalable and create single points of failure. While there are a few open source alternatives to cron that provide for distributed scheduling, they either depend on a specific "cloud" management system or on other complex external dependencies; or are not generally compatible with cron.

The Wikimedia Labs has a need for a scheduler that:

  • Is configurable by traditional crontabs;
  • Can run on more than one server, distributing execution between them; and
  • Guarantees that scheduled events execute as long as at least one server is operational.

The ideal distributed cron replacement would have as few external dependencies as possible.

— Coren (talk)/(enwp) 19:29, 23 November 2013 (UTC)[reply]

Testing

Documentation

System documentation integrated in source code

It would be really nice if inline comments, README files, and special documentation files could exist in the source code but be exported into a formatted, navigable system (maybe wiki pages or maybe something else). It could be something like doxygen, except better and orientated to admins and not developers. Of course it should integrate with mediawiki.org and https://doc.wikimedia.org.

The idea would be that one could:

  • Keep documentation close to the code and thus far more up to date
  • Even enforce documentation updates to it with new commits sometimes
  • Reduce the tedium of making documentation by using minimal markup to specify tables, lists, hierarchy, and so on, and let a tool deal with generating the html (or wikitext). This could allow for a more consistent appearance to documentation.
  • When things are removed from the code (along with the docs in the repo), if mw.org pages are used, they can be tagged with warning box and be placed in maintenance category.

Proposed by Aaron Schulz.

Translation

Product development

Ranking articles by pageviews for wikiprojects and task forces in languages other than English

Currently we have an amazing tool which every month determine what pages are most viewed for a Wikiproject and then provides a sum of the pageviews for all articles within that project. An example of the output for WikiProject Medicine in English.

The problems is that this tool only exists in English and is running on toolserver rather than Wikimedia Labs. So while we know what people are looking at in English, and this helps editors determine what articles to work on, other languages do not have this ability.

Additionally we are do not know if the topics people look up in English are the same as those they look up in other languages. In the subject area of medicine this could be the basis of a great academic paper and I would be happy to share authorship with those who help to build these tools.

A couple of steps are needed to solve this problem:

  1. For each article within a Wikiproject in English, take the interlanguage links stored at wikidata, and tag the corresponding article in the target language
  2. Figure out how to get Mr. Z's tool to work in other languages [1]. He supposedly is working on it and I am not entire clear if he is willing to have help. Another tool that could potentially be adapted to generate the data is already on Labs

James Heilman (talk) 21:13, 14 September 2013 (UTC)[reply]

Improving MediaWikiAnalysis

MediaWikiAnalysis is a tool to collect statistics from MediaWiki sites, via the MediaWiki API. It is a part of the MetricsGrimoire toolset, and it is currently used for getting information from the MediaWiki.org site, among others.

The stats currently collected by MediaWiki are only a part of what is feasible to collect, and the tool itself could be improved. Some possible directions:

  1. Explore in detail the MediaWiki API and extract as much information from it as possible.
  2. Improve efficiency and incremental retrieval of data
  3. Propose (and if possible, implement) changes to the MediaWiki API if needed, to support advanced collection of data.
  4. Use SQLAlchemy instead of MySQLdb for managing the MediaWikiAnalysis database.

Design

Promotion

Beyond development

Featured projects that focus on technical activities other than software development.

Research & propose a catalog of extensions

Extensions on mediawiki.org are not very well organized and finding the right extension is often difficult. Listening community members you will hear about better management of extension pages with categorization, ratings on code quality, security, usefulness, ease of use, good visibility for good extensions, “Featured extensions”, better exposure and testing of version compatibility... This project is about doing actual research within our community and out there to come up with a proposal both agreed and feasible. A plan that a development team can just take to start the implementation.

  • Skills: research, negotiation, fluent English writing. Technical background and knowledge of MediaWiki features and web development features will get you sooner to the actual work.
  • Mentors: Yury Katkov + ?

Very raw projects

Taken from the former "Annoying large bugs" page.
  • Making our puppet servers HA and load balanced without having to change all of the security certificates
  • Global user preferences
    • architecturally very important (if not critical) to a number of projects
    • As far as I remember, the backend for this is largely completed, it just needs a sensible UI
    • Andrew Garrett writes:
    I tried to implement this when I completely refactored the preferences system in 2009. It was eventually reverted in r49932. The main blocker was basically considering a way to decide which preferences would have their values synchronised. A UI would need to be developed for that and you'd need some extensive consultation on that fact.
    If you were to implement this, you could potentially use my original implementation as a guide, though it is reasonably "in the guts" of MediaWiki so you'd have to be reasonably confident "code diving" into unfamiliar software packages.
  • using onscreen keymaps from Narayam's code base to build a mobile-focused app where one could choose and load a keymap from. This would be a great app to have on the mobile app stores for Boot2Gecko and Android.
  • HTML e-mail support
    • Requires some design expertise, but it'd be nice to have MediaWiki e-mails stop looking as though they're from 1995, especially as they're much more visible nowadays with ENotif (email notifications) enabled on Wikimedia wikis
    • Some of this was done as part of Notifications.
  • Fix user renames to be less fragile and horrible
    • Lots of breakages from renames of users with a lot of edits; old accounts need to be fixed in a sensible way and new borkages need to be properly prevented
  • Let users rename themselves
    • Restrict to those with zero edits?
    • Or not?
    • Major community policy issues.
  • Add a read-only API for CentralNotice

See also