Google Summer of Code/2011

The Wikimedia Foundation is again participating in the Google Summer of Code in 2011. If you are interested in being a mentor or a student, please add your name below, and add your ideas for potential projects.


 * Timeline
 * We will be asking students to fill out Summer of Code 2011/Application template.
 * Useful manuals for students and mentors.

Mentor signup

 * Reedy - Potentially. Depending on time, and project interest (API, core MW, possibly some extensions. Not GUI centric stuff..)
 * Siebrand
 * Trevor Parscal
 * Michael Dale
 * Ariel Glenn (xml dumps!)
 * James Salsman
 * Brandon Harris - Potentially. Depends on project and interest.
 * Yaron Koren (Semantic MediaWiki)
 * Jeroen De Dauw (Semantic MediaWiki)
 * Benedikt Kämpgen (Semantic MediaWiki)
 * Markus Krötzsch (Semantic MediaWiki)
 * Denny Vrandecic (Semantic MediaWiki)
 * Jesse Wang (Semantic MediaWiki)
 * Daniel Herzig (Semantic MediaWiki)
 * Arthur Richards (Particularly offline, mobile, CiviCRM or general back-end related projects but open to most!)
 * Ryan Lane - Potentially, if it relates to the test/dev architecture or the OpenStackManager extension
 * Jack Phoenix - Potentially, for things related to SocialProfile extension and other social tools
 * JanPaul123 - Potentially, if it relates to (in-line) editing, or other GUI stuff.
 * This can be you!

Student signup

 * 1) Ashish Mukherjee
 * 2) Ankit Garg
 * 3) Akshay Goel
 * 4) Ashish Mittal
 * 5) Aishraj Dahal
 * 6) Neeraj Agarwal
 * 7) Boopathi Rajaa
 * 8) Daniel Bell
 * 9) Michael White (perhaps)
 * 10) David Stolfo
 * 11) Eric Zhu
 * 12) Brittany Wills
 * 13) Yuvi Panda -- XML dumps, performance
 * 14) Balanivash
 * 15) Mayank Singh
 * 16) A-M Horcher
 * 17) Lei Jiang
 * 18) Abhinav Sikri
 * 19) Shashi Gowda
 * 20) Kartik Mandaville
 * 21) Avner Maiberg
 * 22) Vlasenko Yevhenii
 * 23) Peng Wan
 * 24) Giridhar Prasanna
 * 25) Abhijit Kane
 * 26) Sakonard W
 * 27) Adarsh
 * 28) Aigerim K
 * 29) Sagie Maoz
 * 30) Akshay Agarwal -- AJAXifying logins
 * 31) Aayush Goel
 * 32) Salil -- API sandbox environment
 * 33) Lucas Tadeu Teixeira -- Mobile Website Rewriting
 * 34) Anirudh S
 * 35) drake
 * 36) Shrey (Kirti)
 * 37) Justin Drake
 * 38) This can be you!

This is just an informal signup. You must submit your proposal to our official Google Summer of Code page by 19:00 UTC on April 8th to be considered.

Before submitting your proposal, try chatting with us in #mediawiki on Freenode IRC. And check out how to become a MediaWiki hacker so you can jump in right away!

Project ideas
Below are some ideas for projects for this year's GSoC. Projects can be suggested by both potential mentors and potential students (or, for that matter, by anyone else). For inspiration, you can also check out the project ideas from 2010, and the list of past projects.

New extensions

 * Porting WP 1.0 Bot from a suite of tools written in Perl (that live on the toolserver) to a Mediawiki extension and expanding its feature set for improved, easier and more accessible offline content creation. Arthur Richards is especially interested in mentoring this: see his wikitech-l email.
 * Simple language wikipedias in languages other than English -- this would mostly involve an extension changing the way namespaces (or maybe subpages) and watchlists work together, to allow people to easily watch simple versions of the articles already on their watchlists
 * Audio upload with rtmplite and Adobe Flash/Flex and/or Gnash also would recommend checking out rainbow
 * Most popular related articles
 * Inline Editing for MW/SMW: An extension to MW/SMW, e.g., based on Aloha Editor, RDFa , and SMWWriter . Expected results: Annotations from a single wiki page or displayed in Inline Queries can be edited without going into edit mode, visiting queried pages, or using wiki text. Short explanation: In this work we seek to make it easier for users to edit semantic annotations. Prerequisites: PHP, JavaScript
 * See Jan Paul's existing work
 * Write and implement cite templates in PHP extension
 * Saves something like 20s or more off parse time for large articles; low-hanging fruit
 * Fill-in-the-blanks

Existing extensions

 * Suggestions for extensions to be merged into core
 * Quite a few extensions need rewrites before going into core
 * Localize Captcha in ConfirmEdit (bug 5309, "Localize captcha images" and bug 14230, "Add a button to request a new fancy captcha (code)")
 * Extension:Gadgets could benefit from loading gadgets from one central wiki. This of course must be done via ResourceLoader to keep number of HTTP requests at minimum.
 * Extend Extension:Gadgets to support global gadgets
 * This is not an easy project, and there are tentative plans to get this done by WMF devs anyway --Catrope
 * TimedMediaHandler Road map features (transcoding non-free codec uploads, improved mobile support for wikimedia video, etc.)
 * Extension:Quiz with assessment content in Moodle's GIFT
 * Improve Maths handling. For example:
 * Ripping it out and turning it into a proper extension.
 * Improve cache storage/image storage. ,
 * Covert to a PHP-based system -> remove dependencies where possible.
 * All open maths bugs
 * Extension:OpenStackManager
 * Add support for the OpenStack API
 * Add console support
 * Add ajaxterm support (OpenStack already supports this)
 * Add support to OpenStack for Guacamole, and add support to OpenStackManager for this
 * Make Javascript interface that makes actions easier
 * Extension:Semantic MediaWiki
 * you can find further ideas for Semantic MediaWiki here and below
 * Extension:ArticleEmblems
 * Needs to be rewritten following comments at .
 * Extension:SocialProfile
 * add support for non-MySQL DBMS
 * PostgreSQL support (or rather, the lack of it) has been filed as bug #27732. Someone needs to test PostgreSQL support and maybe fix the schemas etc. There's also no Oracle support as far as I'm concerned (then again, support for Oracle in core MW isn't perfect yet, I think). --Jack Phoenix (Contact) 16:32, 26 March 2011 (UTC)
 * write UserStatus feature (Twitter/Facebook-like short "status updates" on user profile pages)
 * clean up UserImages feature and integrate it into core SocialProfile
 * I have some code related to this somewhere, just need to find it... --Jack Phoenix (Contact) 13:22, 25 March 2011 (UTC)
 * review SocialProfile for any and all security issues and fix whatever issues are found
 * Extension:Video
 * write the backend code for supporting 5min.com, Blip.tv, Nicovideo and Tangler.com
 * write the backend code for deleting videos properly from the database
 * I haven't yet released the cleaned-up code for this extension, but if you're interested in this project, just let me know and I'll be happy to publish it. --Jack Phoenix (Contact) 13:22, 25 March 2011 (UTC)
 * Other social tools (i.e. FanBoxes, PictureGame, PollNY, QuizGame...)
 * rewrite the upload form to work with MW 1.16+ (once done it can be copied to all other extensions; the only difference is the upload form class name IIRC)
 * support ResourceLoader
 * security audit
 * Again, these are unreleased extensions for the time being, but if someone is interested in playing around with these, I'll be happy to publish the source code. --Jack Phoenix (Contact) 13:22, 25 March 2011 (UTC)
 * Extension:ImageTagging
 * actually make it work (on all modern browsers; it had issues with IE7 in the past and those issues were never resolved properly IIRC)

Semantic MediaWiki-based extensions

 * Semantic Schemas - a potential new extension, meant to work with Semantic MediaWiki, that would centralize all information about a "class" (category, templates, properties, etc.) in a single wiki page. -Yaron Koren
 * Improve considerably and turn the code for Shortipedia into reusable extensions, and further work on extensions needed to use MediaWiki easily as a data wiki
 * SMW as Linked Data Browser: Component: A new extension, "Linked Data Browser" or an extension to Shortipedia. Expected results: SMW can be used to browse and selectively gather linked data . Information residing in the wiki is both more cross-linked and connected to external information sources, resulting in better possibilities for managing knowledge. Short explanation: More and more web pages offer their information not only in a human-readable but also in a machine-readable form (see ). However, although this opens up the possibility of enhanced browsing for information, Semantic Web browsers have not yet reached their full potential (try for yourself using the LOD browser switch and Tim Berners-Lee's foaf file: ). If browsing would be possible directly from SMW, links between content from the wiki and information from the Web can be automatically identified (e.g., using simple matching rules, or through reasoning and machine learning). New information can first be checked for correctness and relevance, and possibly be curated, then partly gathered in the wiki. Search and gathering of information inside SMW would be enhanced. Prerequisites: PHP, JavaScript
 * Improve support for Semantic Internal Objects in other SMW extensions like Forms and Graphs and popular SMW extension bundles.

Scripts and other utilities

 * Gadget to fall back to a more compatible skin when an incompatible screen size geometry is detected (e.g., the new vector skin's search box disables the tabbed page menu on small screens)
 * Gadget to disable parts or all of the editing toolbar when user agent version suggests incompatibilities
 * Enable sitemaps for Commons with new Google Image Search specific extensions (such as geolocation and license)
 * Implement "Space Efficient Algorithms for Ordered Tree Comparison" on xml dumps and measure their performance on systems with different sizes of L1 cache RAM to support offline editing (for an updated version of the same paper, please see Wang, L. and Zhang, K. (2008) Algorithmica 51:283–97.)
 * moving some things that don't need to be MediaWiki-specific into the libraries folder. Example: the way we read image metadata
 * Implementing pre-commit checks in code repositories that would automagically look for security vulnerabilities, bad coding convention, broken code, etc, perhaps with a web interface to facilitate the process.
 * Nimish: "Basically the idea is to have something check for simple patterns in the code, and if any of them come up, send some warning email, like "hey, you might've done something silly. can you verify that you've 1) fixed this or 2) totally know what you're doing" we've had them at previous places I've worked, and they really do help."
 * RoanKattouw: "Platonides has a check-vars script that checks for some style things, like missing globals and unused vars or calling deprecated things, etc. Not sure what the full feature set is. (But it's not a post-commit hook)"
 * Reedy: "Certainly style, globals etc is fairly easy to do programmatically, "security" unless very explicit issues, must be all but impossible?"
 * RoanKattouw: "It can be done run-time with taint, & MW has some sort of taint support"
 * awjr (Arthur Richards): "those tools should be as RCS-agnostic as possible" (work with many modern version/source control systems)
 * awjr: you could likely use some fancy regexes and algorithms to look for patterns that would lead to security vulnerabilities
 * Requests for comment/Extension release management: Track compatibility of released extensions with core MediaWiki releases, and provide this information via special pages and the API
 * A Drupal/CiviCRM module to audit/reconcile donor contributions in our CiviCRM database with those recorded by our payment providers (PayPal and PayflowPro).
 * History visualisation, see http://en.wikipedia.org/wiki/Wikipedia:WikiProject_User_scripts/Requests#History_visualisation

Unicode conversion of older standards
Automating the conversion of unicode characters conversion from older standard to the new standard to make wikis properly searchable. Would be great to work on this one. - Neeraj Agarwal
 * Don't we already normalize deprecated unicode code points? Or do you mean something else? Bawolff 23:08, 24 February 2011 (UTC)
 * IIRC, we do but only when the page is touched (eg: null edit/edit) and on larger projects (eg: en.wikip) there may be pages that havn't been touched for a few years. Peachey88 04:22, 1 March 2011 (UTC)
 * The key thing is that it has to be two way. We keep our data in the latest version of Unicode but many readers are still stuck with the old code. As relevant it that they search Google with the old Unicode and are richly served but not by us. Having the latest Unicode and being able to serve it conform an old standard will gain us much more traffic. GerardM 22:24, 24 March 2011 (UTC)
 * That only works if we can reliably determine which version of the unicode code points the client supports (which I don't think we can). If we universally send to the older code points, we might as well just normalize to the old code points. Bawolff 13:51, 25 March 2011 (UTC)

MediaWiki core

 * – Add some kind of GUI for sidebar customization that mortals can comprehend
 * I have some plans for sidebar style messages something like that would depend on, or conflict with.
 * Add some kind of GUI for toolbar customization
 * Dumps:
 * Profiling xml dump process and reducing processing time.
 * This is probably too hard / too long Hashar 07:39, 25 March 2011 (UTC)
 * We're not talking about a huge refactoring, just in looking at say Export.php and friends for things that can be sped up. -- ArielGlenn 06:48, 26 March 2011 (UTC)
 * Generate JSON output for dump files.
 * Dumps are just recovering now. Will it ever get used by september? I would postpone JSON to 2012.Hashar 07:39, 25 March 2011 (UTC)
 * The smaller projects get dumps out on a regular basis. Why wouldn't they use it?  -- ArielGlenn 06:48, 26 March 2011 (UTC)
 * Dbzip2 (parallel compression/decompression) for the dumps.
 * Dumps 2.0: redesign from the ground up (see the March 23 or so wikitech thread about this).
 * Rewrite python dumps job management to run pieces on multiple hosts.
 * Image import -- there is something on BZ already.
 * Image export suitable for Wikimedia
 * AJAX Login that looks really good, is easy to use, integrates well with popular skins, and is secure.
 * Is AJAX appropriate for mobile device logins? A loaded ~500 MHz ARM mobile device can take long enough rendering AJAX Javascript to delay keypress echo. Is there any reason that HTTPS login form submission isn't always better?
 * I can't imagine doing a single ajax request when the user clicks a submit button would be that expensive. -bawolff
 * Take a look at Wikia's AJAXLogin extension, which has an existing jQuery port; may be faster to write from scratch, with the Wikia extension as a reference/spec implementation Sumanah 17:56, 23 March 2011 (UTC)
 * This looks like a good project, it small yet has enough potential to learn things. One might want to add Selenium tests. Hashar 07:39, 25 March 2011 (UTC)
 * API sandbox environment (a la Flickr API Explorer that gives a form interface to help learn the API )
 * Email notification rewrite to make the code prettier, and fix various bugs, especially the bug where all log events are considered to be page creations (14901, here's some other various bugs related )
 * This is actually an hard/long task for a new comer in our code. The feature is hooked everywhere in the code and it is hard to test it reliabily. Probably not suitable for a summer session Hashar 07:39, 25 March 2011 (UTC)