User:YuviPanda/GSoC/Proposal

Project summary
The SelectionSifter helps anyone choose which wiki articles to collect into a collection, based on Article Feedback Tool data.

Aim
The aim of the project is to convert the Wikipedia 1.0 Bot into a series of MediaWiki extensions. The bot currently collects assessment information and displays it in various formats for use by the Wikipedia 1.0 project. The main aim of the Wikipedia 1.0 project is to create offline accessible versions of credible, good quality wikipedia articles. Right now, the process involves the perl based web interface here and a lot of manual effort. This GSoC project proposes to replace that with a set of MediaWiki extensions which remove most of the necessity for the manual effort.

Rationale for Change
While the current bot does it's job, it is written in perl, not easily usable outside enwiki and needs a lot of manual intervention to be run. Converting it to a set of loosely coupled MediaWiki extensions would enable faster development, cut out most of the manual work required to make collections, make i18n easier (via Translatewiki), enable use by other MediaWiki installations and eventually solve world poverty.

A graphical explanation of the current process and the proposed changes are available here. The newer workflow removes all purely manual steps.

Features
This GSoC project would replicate all the features of the current bot, and provide backwards compatibility to current assessment tools. This means extensions for:


 * Special Page for assessment changes log. Replacement for this
 * Magic Keyword for including statistics about assessment classes. Special Page to view them. Replacement for this.
 * Special Page to filter articles based on arbitrary criteria, and export them as interim collections. Replacement for this
 * Special Page to manage interim collections - add flags, adjust which revisions of pages are selected, manual addition/removal of articles, creating new interim collections from subsets of current interim collections and exporting to Extension:Collections. Interim Collections are just sets of articles at particular revisions with associated flags. Replacement for manual work.

While these are the user facing extensions, several other supporting extensions would need to be developed as well. The ones identified so far are:


 * Pluggable assessment information parsing. Initially, extension that is fully backwards compatible with the current assessment templates will be built. Later on, more extensions can be added that add support for other ways of assessments (WikiTrust, for example). There are different ways this could be done, and the best method should be picked after some experimentation & consultation with MediaWiki developers. This decision should be made before the official coding period starts. Some of the methods discussed so far are:
 * A parse assessments -> store -> display architecture. This is how the current bot works. Would run once a day or so, parse all the assessment templates and store the statistics in a central database.
 * A parser hook that updates assessment data in Manual:Page_props_table whenever the talk page is parsed. Always up-to-date.
 * A parser hook that updates assessment data in a central database whenever the talk page is parsed. Always up-to-date.
 * Common data access code. Varies, depending on which architecture is picked for assessment parsing.

Required deliverables

 * 1) Extension to provide common data access code for other extensions
 * 2) Extensions to parse Assessment Info (pluggable skeleton + implementation for current assessment templates)
 * 3) Extension to provide:
 * 4) Special page for assessment changes log
 * 5) Special page to view assessment class statistics (overall and per project)
 * 6) Magic word to include statistics about assessment (overall and per project) into wiki pages.
 * 7) Special page for listing articles using arbitrary criteria.
 * 8) Special Page to manage interim collections and export to Extension:Collections
 * 9) Integration with Translatewiki

Future work

 * 1) Integration with Extension:Collection

Community Bonding Period

 * Familiarize myself more with the codebase and the current community workflow
 * Write a few more extensions.
 * Read through the current perl source and familiarize myself with how it runs.
 * Figure out best way to do assessment parsing, with input from the Mediawiki developers and some experimentation.
 * Figure out a nice way to test the extensions developed (probably the virtualization infrastructure being developed by WikiMedia Foundation)
 * Get to know the current people running the Wikipedia 1.0 project. They are the primary users of this project and should be involved throughout in development and testing.
 * Crack jokes on IRC at appropriate times.

Official Coding Period

 * Target 1 (2 Weeks)
 * Create the assessment parsing extension. Test it out so it is backwards compatible with all current templates used.


 * Target 2 (1 Week)
 * Battle test the assessment parsing extension. Write unit tests if necessary. Run it at scale. The assessment parsing is the most important part of the project, and needs to be done right.


 * Target 3 (1 Week)
 * Implement special page to view assessment change log
 * Implement special page to view assessment statistics (overall and per project)
 * Implement way to include assessment statistics into wiki pages


 * Code Cleanup (1 Week)
 * Cleanup code wherever necessary and write documentation where none exist.


 * Target 4 (2 Weeks)
 * Implement special page to filter articles based on arbitrary criteria and export them as interim collections. Includes figuring out how interim collections will be represented.


 * Target 5 (2 Weeks)
 * Implement special page to manage interim collections. Support flagging (arbitrary flags), manual addition/removal of articles, modification of which revision of articles are picked and creating new interim collections from filtered subsets of other interim collections


 * Code Cleanup (1 Week)
 * Cleanup more code, and write more documentation


 * Buffer Period (2 Weeks)
 * Because humans are by nature optimists, and tend to underestimate time needed for completing things.

Participation
I'm reasonably chatty, and don't hesitate to consistently bug people. I like working in long uninterrupted blocks in the night - which makes me active at around the same time as most other people on #mediawiki. Skype conversations with User:awjrichards would also probably take place frequently. Ideally I would work via a git repo in github/private-server that is constantly updated to match trunk, but I don't mind working on a branch in svn. I'll blog regularly about my progress, for both documentation and flaunting purposes :) Ideally, I'd like to get my changes deployed in the WMF servers before next summer.

Past open source experience
I was a Google Summer of Code participant last year, working on Cheese (part of GNOME). I maintain Busroutes.in, a open source, open data, crowd-sourced website for collecting/displaying local transportation information. I also contribute to PiTiVi, a Linux Video Editor. I write code whenever I can and release those that could even remotely be useful to anyone else under a BSD License on GitHub. I've helped out at several open source workshops, and given talks. I run tawp.inI, a shorturl site for the Tamil Wikipedia. I also wrote a mediawiki plugin for Short Urls - Extension:ShortUrl, which makes the shorturl originate from the same domain as the wiki. I've also worked on forward porting a performance patch for mediawiki(see 5303).