Summer of Code 2011/Signup archive

From mediawiki.org

Project selection phase is over, see results on the main page

Mentor signup[edit]

Students who signed up[edit]

You need to enter your official application into the Google Summer of Code web app here.
If you do not do this, you will not be eligible!

  1. Ashish Mukherjee
  2. Ankit Garg
  3. Akshay Goel
  4. Ashish Mittal
  5. Aishraj Dahal
  6. Neeraj Agarwal
  7. Boopathi Rajaa
  8. Daniel Bell
  9. Michael White (perhaps)
  10. David Stolfo
  11. Eric Zhu
  12. Brittany Wills
  13. Yuvi Panda -- XML dumps, performance
  14. Balanivash
  15. Mayank Singh
  16. A-M Horcher
  17. Lei Jiang
  18. Abhinav Sikri
  19. Shashi Gowda
  20. Kartik Mandaville
  21. Avner Maiberg
  22. Vlasenko Yevhenii
  23. Peng Wan
  24. Giridhar Prasanna
  25. Abhijit Kane
  26. Sakonard W
  27. Adarsh
  28. Aigerim K
  29. Sagie Maoz
  30. Akshay Agarwal -- AJAXifying logins
  31. Aayush Goel
  32. Salil -- API sandbox environment
  33. Lucas Tadeu Teixeira -- Mobile Website Rewriting
  34. Anirudh S
  35. drake
  36. Shrey (Kirti)
  37. Justin Drake
  38. Salvatore Ingala
  39. Michael Costello -- networking and security
  40. Kapil Goyal
  41. Kanwar Bhajneek
  42. Pablo Castellano -- Content migration from Drupal to Mediawiki
  43. Devayon Das -- Semantic MediaWiki.
  44. Christophe Van Gysel -- Extending SocialProfile
  45. This can be you!

This is just an informal signup. You must have submitted your proposal to our official Google Summer of Code page by 19:00 UTC on April 8th to be considered.

Try chatting with us in #mediawiki on Freenode IRC. And check out how to become a MediaWiki hacker so you can jump in right away!

Project ideas[edit]

Below are some ideas for projects for this year's GSoC. Projects were suggested by both potential mentors and potential students (or, for that matter, by anyone else). You can also check out the project ideas from 2010, and the list of past projects.

New extensions[edit]

  • Porting WP 1.0 Bot from a suite of tools written in Perl (that live on the toolserver) to a Mediawiki extension and expanding its feature set for improved, easier and more accessible offline content creation. Arthur Richards is especially interested in mentoring this: see his wikitech-l email.
  • Simple language wikipedias in languages other than English -- this would mostly involve an extension changing the way namespaces (or maybe subpages) and watchlists work together, to allow people to easily watch simple versions of the articles already on their watchlists
  • Audio upload with rtmplite and Adobe Flash/Flex and/or Gnash also would recommend checking out rainbow
  • Most popular related articles
  • Inline Editing for MW/SMW: An extension to MW/SMW, e.g., based on Aloha Editor [1], RDFa w:en:RDFa, and SMWWriter [2]. Expected results: Annotations from a single wiki page or displayed in Inline Queries can be edited without going into edit mode, visiting queried pages, or using wiki text. Short explanation: In this work we seek to make it easier for users to edit semantic annotations. Prerequisites: PHP, JavaScript
  • Write and implement cite templates in PHP extension
    • Saves something like 20s or more off parse time for large articles; low-hanging fruit
  • Fill-in-the-blanks
  • Convert RefToolbar into an extension - currently lives in en.wiki Common.js, so isn't really a "gadget" anymore, needs extensionification!

Existing extensions[edit]

Semantic MediaWiki-based extensions[edit]

Further ideas for Semantic MediaWiki

  • Semantic Schemas - a potential new extension, meant to work with Semantic MediaWiki, that would centralize all information about a "class" (category, templates, properties, etc.) in a single wiki page. -Yaron Koren
  • Improve considerably and turn the code for Shortipedia into reusable extensions, and further work on extensions needed to use MediaWiki easily as a data wiki
  • SMW as Linked Data Browser: Component: A new extension, "Linked Data Browser" or an extension to Shortipedia. Expected results: SMW can be used to browse and selectively gather linked data [3]. Information residing in the wiki is both more cross-linked and connected to external information sources, resulting in better possibilities for managing knowledge. Short explanation: More and more web pages offer their information not only in a human-readable but also in a machine-readable form (see [4]). However, although this opens up the possibility of enhanced browsing for information, Semantic Web browsers have not yet reached their full potential (try for yourself using the LOD browser switch and Tim Berners-Lee's foaf file: [5]). If browsing would be possible directly from SMW, links between content from the wiki and information from the Web can be automatically identified (e.g., using simple matching rules, or through reasoning and machine learning). New information can first be checked for correctness and relevance, and possibly be curated, then partly gathered in the wiki. Search and gathering of information inside SMW would be enhanced. Prerequisites: PHP, JavaScript
  • Improve support for Semantic Internal Objects in other SMW extensions like Forms and Graphs and popular SMW extension bundles.

Scripts and other utilities[edit]

  • Gadget to fall back to a more compatible skin when an incompatible screen size geometry is detected (e.g., the new vector skin's search box disables the tabbed page menu on small screens)
  • Gadget to disable parts or all of the editing toolbar when user agent version suggests incompatibilities
  • Enable sitemaps for Commons with new Google Image Search specific extensions (such as geolocation and license)
  • Implement "Space Efficient Algorithms for Ordered Tree Comparison" on xml dumps and measure their performance on systems with different sizes of L1 cache RAM to support offline editing (for an updated version of the same paper, please see Wang, L. and Zhang, K. (2008) Algorithmica 51:283–97.)
  • moving some things that don't need to be MediaWiki-specific into the libraries folder. Example: the way we read image metadata
  • Implementing pre-commit checks in code repositories that would automagically look for security vulnerabilities, bad coding convention, broken code, etc, perhaps with a web interface to facilitate the process.
    • Nimish: "Basically the idea is to have something check for simple patterns in the code, and if any of them come up, send some warning email, like "hey, you might've done something silly. can you verify that you've 1) fixed this or 2) totally know what you're doing" we've had them at previous places I've worked, and they really do help."
    • RoanKattouw: "Platonides has a check-vars script that checks for some style things, like missing globals and unused vars or calling deprecated things, etc. Not sure what the full feature set is. (But it's not a post-commit hook)"
    • Reedy: "Certainly style, globals etc is fairly easy to do programmatically, "security" unless very explicit issues, must be all but impossible?"
    • RoanKattouw: "It can be done run-time with taint, & MW has some sort of taint support"
    • awjr (Arthur Richards): "those tools should be as RCS-agnostic as possible" (work with many modern version/source control systems)
    • awjr: you could likely use some fancy regexes and algorithms to look for patterns that would lead to security vulnerabilities
  • Requests for comment/Extension release management: Track compatibility of released extensions with core MediaWiki releases, and provide this information via special pages and the API
  • A Drupal/CiviCRM module to audit/reconcile donor contributions in our CiviCRM database with those recorded by our payment providers (PayPal and PayflowPro).
  • History visualisation, see w:en:Wikipedia:WikiProject_User_scripts/Requests#History_visualisation

Unicode conversion of older standards[edit]

Automating the conversion of unicode characters conversion from older standard to the new standard to make wikis properly searchable. Would be great to work on this one. - Neeraj Agarwal

Don't we already normalize deprecated unicode code points? Or do you mean something else? Bawolff 23:08, 24 February 2011 (UTC)[reply]
IIRC, we do but only when the page is touched (eg: null edit/edit) and on larger projects (eg: en.wikip) there may be pages that havn't been touched for a few years. Peachey88 04:22, 1 March 2011 (UTC)[reply]
The key thing is that it has to be two way. We keep our data in the latest version of Unicode but many readers are still stuck with the old code. As relevant it that they search Google with the old Unicode and are richly served but not by us. Having the latest Unicode and being able to serve it conform an old standard will gain us much more traffic. GerardM 22:24, 24 March 2011 (UTC)[reply]
That only works if we can reliably determine which version of the unicode code points the client supports (which I don't think we can). If we universally send to the older code points, we might as well just normalize to the old code points. Bawolff 13:51, 25 March 2011 (UTC)[reply]

MediaWiki core[edit]

  • bug 16943 – Add some kind of GUI for sidebar customization that mortals can comprehend
    • <dantman> I have some plans for sidebar style messages something like that would depend on, or conflict with.
  • Add some kind of GUI for toolbar customization
  • Dumps:
    • Profiling xml dump process and reducing processing time.
      This is probably too hard / too long Hashar 07:39, 25 March 2011 (UTC)[reply]
      We're not talking about a huge refactoring, just in looking at say Export.php and friends for things that can be sped up. -- ArielGlenn 06:48, 26 March 2011 (UTC)[reply]
    • Generate JSON output for dump files.
      Dumps are just recovering now. Will it ever get used by september? I would postpone JSON to 2012.Hashar 07:39, 25 March 2011 (UTC)[reply]
      The smaller projects get dumps out on a regular basis. Why wouldn't they use it? -- ArielGlenn 06:48, 26 March 2011 (UTC)[reply]
    • Dbzip2 (parallel compression/decompression) for the dumps.
    • Dumps 2.0: redesign from the ground up (see the March 23 or so wikitech thread about this).
    • Rewrite python dumps job management to run pieces on multiple hosts.
    • Image import -- there is something on BZ already.
    • Image export suitable for Wikimedia
  • AJAX Login that looks really good, is easy to use, integrates well with popular skins, and is secure.
    • Is AJAX appropriate for mobile device logins? A loaded ~500 MHz ARM mobile device can take long enough rendering AJAX Javascript to delay keypress echo. Is there any reason that HTTPS login form submission isn't always better?
      • I can't imagine doing a single ajax request when the user clicks a submit button would be that expensive. -bawolff
        • Take a look at Wikia's AJAXLogin extension, which has an existing jQuery port; may be faster to write from scratch, with the Wikia extension as a reference/spec implementation Sumanah 17:56, 23 March 2011 (UTC)[reply]
    • This looks like a good project, it small yet has enough potential to learn things. One might want to add Selenium tests. Hashar 07:39, 25 March 2011 (UTC)[reply]
  • API sandbox environment (a la Flickr API Explorer that gives a form interface to help learn the API [6])
  • Email notification rewrite to make the code prettier, and fix various bugs, especially the bug where all log events are considered to be page creations (bugzilla:14901, here's some other various bugs related [7] )
    • This is actually an hard/long task for a new comer in our code. The feature is hooked everywhere in the code and it is hard to test it reliabily. Probably not suitable for a summer session Hashar 07:39, 25 March 2011 (UTC)[reply]

Networking and Security[edit]

  • Make wikipedia.org and mediawiki.org accessible through IPv6
  • Enable secure logins for wikipedia.org and mediawiki.org