Summer of Code 2006/ideas

'''Submissions for the 2006 Summer of Code are now closed. This snapshot of 2006 ideas is preserved for historical interest here. Feel free to do these projects anyway, or elaborate on them at Talk:Proposals_for_new_projects where they're being improved. If you feel the process of proposing to SoC could be improved, see Summer of Code 2007 protocol and proposal protocol. Your help is appreciated.'''

The Wikimedia Foundation is a mentoring organization for the 2006 Summer of Code.

'''Note: the project ideas list still needs some paring down. SoC student applications don't open until May 1, let's make sure it's cleaned up by then. --brion 06:13, 25 April 2006 (UTC)'''


 * I've removed a few which I found too unspecific; feel free to restore any of those. I've also added a few more so we might still be even. ;-) Then again, the idea is that students can pick up to 10 from the ones below, right? In that case, it probably makes sense to have a reasonably high number of well-defined projects. If so, I suggest we focus on developing the individual proposals more and remove those which aren't adequate.--Eloquence 09:36, 25 April 2006 (UTC)


 * Hi folks, please don't add "AJAX" and "WYSIWYG" back again. "AJAX" refers to a large number of small, isolated potential projects which are too small for SoC. WYSIWYG is too big for SoC, requiring first a commitment to reviewing and formalizing the wiki markup and parser with an eye to requirements and compatibility. --brion 02:38, 26 April 2006 (UTC)

Add ideas for projects here:

Admin panel
Create a Web user interface for administration of a site. Move from using globals for config to some structured format, possibly in the database. Should have an interface for extensions so that they can be admin'd too. Only do basic stuff in first cut – look to FAQ and Angela for items to be included. Include extension management and auto-discovery. Advanced version could include Wordpress-style theme management.
 * User Groups management - Plugin that allows to add, rename, delete and change the rights of user groups easily.

Extensible user preferences
Provide an api for extensions to add user preferences, so that e.g. a new tab shows up in the preferences or a new preference shows up in an existing tab. This would probably be good to do alongside the admin panel idea above. It could define a few basic types including strings, numbers, booleans and colours, as well as some simple rules for when to show/hide or enable/disable settings.


 * This work is partly done already. --brion 21:09, 29 April 2006 (UTC)

IM and presence
Use Instant Messaging (IM) protocols to provide “presence” -- say whether a user is online in their chat client. User preference sets user ID, protocol. Where possible, provide links to launch IM client to chat with a user. This should be an extension.
 * See also ICQ_extension

Offline reader
An offline reader of places such as Wikipedia. This can include: software to browse the offline version, which enables an updated version of the page (or a full-size image) to be fetched; software to create a well-compressed dump of a certain size/with a selection of articles; decisions on what to do when a link leads to an article which is not on the CD or DVD.

Unified javascript toolkit
Wikipedia editors and administrators currently have a massive library of Javascript bits and pieces to speed up repetitive work. However, it isn't easy for an admin to find these pieces and activate them in their account. Much code is copied and not re-used between scripts. -Manipulatet the javascript library with the wikimedia user interface: "the wiki is the code."
 * See also Wikipedia:WikiProject User scripts.

Video
No-install in-browser display of video (and audio?) clips for Wikimedia Commons, using reasonably common Java and/or Flash components. Needs to be able to 1) play or transparently pre-convert Ogg Theora videos, 2) avoid use of patent-encumbered formats. Consider integration of Fluendo's Cortado player applet as a starting point.

Upload form improvements
Lots of room for changes here: Make a beefed up JavaScript form that allows multi-file upload, show progress during upload (all ideally maximally backwards compatible). Make it possible to preview the upload summary, and add a nifty edit toolbar as for regular edits. Make it easy to add categories -- lazy loading AJAX category tree? Being able to run the upload form as a pop-up without the surrounding skin might also be useful for integration with edit. Useful if could point at a folder and it uploads all files within, presenting a summary list before confirmation. Also ability to specify filename plus incremental numbering if desired or use file name of originals.
 * Afaik, uploads can't be stored in any desired folder without php code hacking right now (only hashed dirs or just /images dir). Talking about the edit integration it would be useful to say that FCKeditor has some js api for uploads, but afaik it's not fully integrated with MW yet. As for the category tree, imho Duesentrieb Tool is the most suitable candidate.

Unified Recent Changes
Show information from multiple wikis (e.g., many Wikimedia sites, or many Wikia sites) on one page. Besides Special:Recentchanges, logical candidates would be Special:Contributions and Special:Watchlist; the latter two depend on a single identity being recognized across all wikis.

External Editor helper app
MediaWiki has support for external applications for editing images, sound files, and wiki pages -- see the linked page for details. This is done using a helper application which negotiates between MediaWiki and any external applications. Currently there is only a reference implementation of this helper application in Perl/GTK. A nice, easy to install, graphical cross-platform (Qt? wxWindows?) application would be useful.

Improvements to Recent Changes
There is already an enhanced, more JavaScripty Special:Recentchanges which any logged in user can activate in their prefs. The enhanced version is a playground where you could try out all kinds of nifty UI paradigms, from lazy loading of diff previews to personalized client-side filters for highlighting or hiding changes.

Performance optimization
Use advantages of iframes or/and JavaScript & Divs to optimize the use of traffic, increase speed, decrease server actions and preserve existing caching scheme
 * Avoid unchanged data reload;
 * Dynamic download of page changes;
 * Client-side XHTML transformation;
 * Сlient-side password encryption;
 * JS view/edit switching without requests;

XHTML2Wiki conversion
Uploaded HTML-page (any other uploaded document or simple URL) automaticaly parsed and saved in the database as wikitext.

Wiktionary Page/Form Templates
Allow wikipages to be more (predefined) structured, like wiktionary (IMHO) badly, badly needs. The user gets a HTML form he fills the concrete fields in (and no longer one big texteditor field with a big mess of wikitemplatecode)

Client API
Removing this since a number of people are actively working on such already.

Yadis/OpenID/LID/i-names
(or a subset). Single-signon solution between MediaWiki-using organizations (and for other sites). For example, a Wikipedia fr: user could edit Memory Alpha with their Wikipedia fr: login info. A Livejournal blogger could use their blog's URL to identify themselves on any MediaWiki install. May just require integrating one or two of existing patches, e.g. currently on-line at rss-extensions.org or iiw.windley.com


 * Two issues - one, an SUL project is underway already, and this isn't it, and two, this would require significant amounts of sysadmin work, but not that much actual work (so it's far, /far/ more suitable for a core developer, particularly because it would need a lot of political capital when we force some people to change their user accounts' names). Post-SUL unified sign-on could be a possibility, but that would probably need to wait until next year.
 * James F. (talk) 11:51, 26 April 2006 (UTC)


 * OpenID is orthogonal to our internal account unification. --brion 18:34, 27 April 2006 (UTC)


 * There's no reason for people to change their account names. Also, the patches above are consumer patches -- they let others log in to MediaWiki sites with OpenID identities. We need to also have producer status -- letting MediaWiki users log in to other sites (wiki sites, or other sites) with their MediaWiki accounts. --Evan 20:07, 12 May 2006 (UTC)

I18N re-architect.
Some developers have talked about re-writing the internationalization (i18n) section of MediaWiki to use different language files (e.g., gettext or XML).

I18n search index
MediaWiki search does not work well at languages without spaces between words, such as CJK. Need to implement some index system to improve that. Further, a search could be tolerant of spelling variants (see the Alemannish wikipedia). Google search is a model- but has the irritating habit of proposing alternatives with nothing behind them.

RDF and Semantic Wiki
Use wiki technique to create machine-readable semantic data (such as RDF). Examples: geo-data, relationships, identifiers. Some extensions (like experimental Semantic MediaWiki or RDF extension in production on e.g. Wikitravel) already exist to do this, and this might remain in extensions. See also microformats entry, below.

Heuristics for vandalism
Some heuristics for vandalism already exist for the IRC notification module. The next step would be using those heuristics internally in the Web app, perhaps flagging suspicious edits, or workflow (let admins “claim” a bad edit to work on). Extended heuristics may use Bayesian filters. – clean hooks for gathering info, clean hooks to tag things, and then having the tagged info available, and then tool-building, like Spamassassin's rules engine.

Article quality rating analysis
Work out ways of combining ratings of different versions of articles from different sources that estimates the "true" quality profile of an article as well as possible, whilst rejecting noise and outliers, and resisting attempts to game or spam the ratings system. You may need to collect some real data somehow.

Evaluate PostgreSQL as an alternative database for MediaWiki
There are three parts to this:
 * get this existing PostgreSQL support working properly again
 * benchmark it on heavy simulated loads
 * study the scalability of using Postgres' built-in replication support for large installations such as Wikipedia
 * see: http://pgfoundry.org/projects/wikipedia/

Statistics
integration into MediaWiki. These include user statistics, characteristics about articles and graphing the development. Most tools are currently external and not working together. Yet.

Reports
The English Wikipedia has a variety of useful reports which editors use to zero in on problem articles. There is an existing Perl code base for some of these (see Toolserver/Reports) but it needs bug-fixes, integration with the Toolserver (or other platform, so they can be run automatically), the possibility for editors to mark false positives and "dealt-with" articles, improvements based on editor suggestions, and expansion to cover more reports.

Permissions
Expand the existing permissions model to allow fine-grained permissions based on namespaces, prefixes, etc.

Export

 * Parse Wiki markup into XML, and then do fun things to that. Possibly export to other formats, maybe using XSL. See Magnus Manske's tool.
 * Export to various (could be non-XML) formats : PDF, LaTeX, DocBook, OpenOffice, Office, etc. already done, someone must integrate it into Mediawiki.
 *  DocBook BackEnd - add a DocBook based backend, so that contents can be saved natively in DocBook format in order to be directly available in a large number of arbitrary output formats, see RFE#4073 and Magnus Manske's tool

Message system
Send alerts over ICQ (or something similar) to persons if something in Wikipedia happens, like changing of a watched article. Could tie into existing e-mail notification.

Structured discussion
Add support for structured discussion pages. Option for discussion pages to be more like message boards with threads. Current discussion format is less than ideal (can modify other's comments, cluttered editing window, etc). This could also include support for embedding polls in discussion pages, voting on proposals, etc. Basically, create a minimal message board system that is tailored for use in a wiki environment (see LiquidThreads for some ideas in that direction).

Media repository (Commons) integration
MediaWiki has basic support for a central media repository; this is to allow any Wikimedia wiki to transparently use files from the Wikimedia Commons, a free media archive containing over 500,000 files. However, integration with the other Wikimedia wikis could be improved in several ways:
 * Compute hashes of uploaded files and compare them for dupe checking both against the local and the repository wiki.
 * Track usage of files across wikis (currently this is done using an external tool)
 * If files are in use, show a warning message in the repository.
 * Support logging deletions from the repository to all using wikis.
 * Support uploading directly from a local wiki to the repository (may depend on single login work to be completed).
 * Support one-click "copy to Commons" -- copy an image from the local wiki to the commons wiki.
 * Support one-click "move to Commons" -- copy to Commons, and then delete the local version.


 * Isn't there somebody working on this already? --brion 02:39, 26 April 2006 (UTC)


 * With the exception of the check usage tool mentioned above, I'm not aware of any work in that area. If you're thinking of Magnus' tool, it only generates wikitext for copy and pasting from one wiki to the other.--Eloquence 09:44, 26 April 2006 (UTC)

Storage through diffs
MediaWiki currently stores the full text of each revision of an article. This storage method is inefficient. An incremental storage method based on text diffs would be much more efficient. However, performance considerations must be taken into consideration. It is probably a good idea to store the full text of the latest revision as well as n-incremental revisions, just to keep the load down. For sites like Wikipedia, incremental revision storage will significantly reduce the size of the database. &mdash;The preceding unsigned comment was added by IndyGreg (talk &bull; contribs) 18:05, 3 May 2006 (UTC)


 * I was under the impression that the software already did this. Relevant link? Mdd4696 21:36, 3 May 2006 (UTC)
 * MediaWiki can store old revisions in compressed format (using gzip), but does not currently store this data using text-based diffs. --IndyGreg 23:28, 3 May 2006 (UTC)