Google Summer of Code/2007

The Wikimedia Foundation is applying as a mentoring organization for the 2007 Google Summer of Code.

Note: the project ideas list will receive a little more cleanup over the next few weeks. Additional ideas can be added here.

Heuristics for vandalism
Some heuristics for vandalism already exist for the IRC notification module. The next step would be using those heuristics internally in the Web app, perhaps flagging suspicious edits, or workflow (let admins “claim” a bad edit to work on). Extended heuristics may use Bayesian filters. – clean hooks for gathering info, clean hooks to tag things, and then having the tagged info available, and then tool-building, like Spamassassin's rules engine.

Article quality rating analysis
Work out ways of combining ratings of different versions of articles from different sources that estimates the "true" quality profile of an article as well as possible, whilst rejecting noise and outliers, and resisting attempts to game or spam the ratings system. You may need to collect some real data somehow.

Deletion queue system
A clean system where pages or media files can be nominated for cleanup and deletion, and that process followed through to completion with a minimum of fuss, could help streamline processes on both large and small wikis.

Notification to the author(s) of nominated pages and a discussion system that's easy to get involved with and track are a must.

Admin panel
While MediaWiki has had an web-based installer for some time, there are issues with upgrades, maintenance, and other tasks for smaller sites which don't have shell access to their hosts.

A password-protected web interface to the maintenance and update tools, and perhaps even some site configuration options, would be very useful for such environments.

Statistics counter
Wikimedia needs a counter program which will process ~30k log lines per second using a single processor. It should collect per-article page view counts and produce a report on demand. A web frontend should format the results and present them to the public.

Multimedia integration
Provide an api for extensions to add user preferences, so that e.g. a new tab shows up in the preferences or a new preference shows up in an existing tab. This would probably be good to do alongside the admin panel idea above. It could define a few basic types including strings, numbers, booleans and colours, as well as some simple rules for when to show/hide or enable/disable settings.

Media player
No-install in-browser display of video and audio clips for Wikimedia Commons, using reasonably common Java and/or Flash components. Needs to be able to 1) play or transparently pre-convert Ogg Theora videos, 2) avoid use of patent-encumbered formats.

Some work was done on this last year, adapting Fluendo's Cortado Java player applet. Completing this and integrating into the primary code base would be a very valuable project.

Media conversion
Automated conversion of media formats on upload would also make it easier to get more media into the system.

Possible examples:
 * Conversion of uncompressed or FLAC audio formats to suitable streaming-bandwidth Ogg Vorbis
 * Conversion of high-resolution, high-bitrate videos to lower-quality Ogg Theora suitable for streaming and casual download
 * Conversion of possibly patent-encumbered audio and video formats (MP3, MPEG-4, etc) to free formats (Vorbis/Theora)

Conversion would have to work on in an asynchronous queue to keep uploading snappy, with queue status reported back to the user interface (eg to indicate that a recompressed version is not yet available, or has become available).

Upload form improvements
Lots of room for changes here: Make a beefed up JavaScript form that allows multi-file upload, show progress during upload (all ideally maximally backwards compatible). Make it possible to preview the upload summary, and add a nifty edit toolbar as for regular edits. Make it easy to add categories -- lazy loading AJAX category tree? Being able to run the upload form as a pop-up without the surrounding skin might also be useful for integration with edit pages.

Statistics
Integration into MediaWiki. These include user statistics, characteristics about articles and graphing the development. Most tools are currently external and not working together. Yet.

Reports
The English Wikipedia has a variety of useful reports which editors use to zero in on problem articles. There is an existing Perl code base for some of these (see Toolserver/Reports) but it needs bug-fixes, integration with the Toolserver (or other platform, so they can be run automatically), the possibility for editors to mark false positives and "dealt-with" articles, improvements based on editor suggestions, and expansion to cover more reports.