Onboarding new Wikipedians/Recommender system

This document describes a simple recommender system for new Wikipedians, to be delivered via Extension:GettingStarted.

Rationale
The current method for selecting tasks delivered in the landing page depends on SuggestBot, a recommendations enginer originally designed for delivering recommended articles based largely on your past contribution history. This dependency on a bot and its edits to an associated template, the Getting Started page is not easily localized/internationalized, and we have less control over the task recommendations. We need to build a simple task recommendations engine within the extension, order to A): deploy Extension:GettingStarted outside English Wikipedia, and B): give us greater control over the type of task delivered, the frequency, and interface it is delivered to.

User experience
Our end goal is to deliver compelling tasks to users that, when completed, improve Wikipedia. The primary interface users will experience this through is the "Getting started" page, but to create a compelling task list within that page, we will ultimately need to discover what is a good task for newcomers to Wikipedia?

We propose that, at a high level, great tasks for beginners on Wikipedia...


 * have a clear beginning and end
 * feel rewarding to do, even if they're small
 * don't require extensive knowledge of community rules and norms

One further assumption we're making at this stage is that, because we're beginning from a cold start with users who have no editing history, the tasks we'll be delivering will not require interest or expertise in the subject. As time goes on, we may use completed tasks to filter the recommendations, but for now we're not trying to personalize the recommendations upfront.

Architecture

 * Task recommendation process:
 * 1) Generation
 * What sources we derive tasks from
 * What attributes we filter tasks on
 * 1) Queueing
 * How we store tasks so they are ready for delivery
 * How often we refresh the queue of tasks
 * How large the task queue is
 * 1) Delivery
 * How tasks get delivered to extensions or other user interfaces
 * 1) Optimization
 * How the system learns from data about which tasks are chosen or completed to improve its recommendations

Task generation
Possible sources of tasks include:
 * Categories (such as those in Wikipedia:Backlog)
 * RecentChanges events
 * Extensions and the feeds related to them, such as NewPagesFeed/Page Curation or Echo.
 * Wikitext parsing, such as to find spelling errors

Possible attributes we can filter tasks by include:
 * Length
 * Markup complexity, e.g. the presence of infoboxes or references
 * Categories
 * Media, e.g. pages which lack images
 * Pageviews

Queueing
We plan to avoid generating new database tables to keep a queue of tasks, and will first attempt implementing the queue in Redis or memcache.

Delivery
This recommender system will not itself deliver an interface to users. The first release will be embedded inside Extension:GettingStarted, and thus will use the Special page created via that extension to deliver the pages. In the future, we may deliver recommended tasks via other interfaces, such as guided tours or notifications.

Optimization and machine learning
This stage in the task recommendation process is something we will defer until a working implementation of the other steps -- generation, queueing, and delivery -- are complete.

The potential attributes we could collect and filter tasks on include:
 * Type, e.g. copyediting, adding image, add reference etc.
 * Difficulty rating
 * Topic (of the article)
 * Time (estimated time to complete)
 * "Freshness"
 * Popularity (in pageviews or in frequency of completion)