User:Johang~mediawikiwiki/Most popular related articles

'''This is a draft. Comments and feedback is welcome.'''

Identity
Name: Johan Gunnarsson Email: johan.gunnarsson@gmail.com Project title: Most popular related articles

Contact/working info
Timezone: CEST (UTC +2) Typical working hours: 10:00-18:00 IRC or IM networks/handle(s): johang@freenode

Project summary
This project aims to resolve bug 21921, discussing how to encourage contributions to Wikipedia and its sister projects. The bug reporter proposes adding a sidebar listing the N popular pages that relate to the page currently being viewed. This would introduce another way of navigating to articles the user is likely to be interested in, and therefor more likely to contribute to. Ranking with respect to popularity helps to bring attention to articles touching events happening now.

About me
I'm a student in Computer Science and Engineering at Lund Institute of Technology, Lund, Sweden. I'm on my last year, currently working on my Master's Thesis project and hopefully graduating this summer.

Relevant experience

 * Participant of Google Summer of Code 2007 for GNU phpGroupWare.
 * Author of Wikitrends. Data crunching project inspired by Google Trends and Twitter Trends to find the pages with greatest uptrend on Wikipedia right now. Working on moving the project to Toolserver.
 * I have toolserver.org account.
 * Fluent in computers and code.

Required deliverables

 * 1) Batch processing system to find and rank related pages using data sources such as wikilinks, categories, edit counts and page views. Probably to run at Toolserver as a batch job.
 * 2) System to serve related pages to clients. Probably to run at Toolserver as a web application.
 * 3) Client fetching related pages and injecting into Wikipedia article layout. Could take different forms. Either as a client-side Javascript, a Greasemonkey script or integration in MediaWiki itself.

If time permits

 * 1) Investigate different ways of finding related articles. One way could be to combine different sources, like categories and wikilinks, with weights. It would also be interesting to walk further down the wikilink/category graph. Articles of subcategories can be counted as related too, although this would most likely be more computationally intensive.
 * 2) What articles that are related and not can of course be subjective to the reader. If time permits I could generate sets of related articles generated by different algorithms and ask people what they think is better.

Project schedule
TODO

Mockups

 * Mockup of UI element in the sidebar of a regular Wikipedia page.