User:YuviPanda/GSoC

The SelectionSifter helps anyone choose which wiki articles to collect into a selection.

How to interpret estimates
The given values are lower bounds. Multiply by 3 to get higher bound. Multiply by 2 to get average. Overshooting timelines is to be expected. Will be adjusted as the project motors along.

Current implementation

 * Perl!(?)
 * http://en.wikipedia.org/wiki/User:WP_1.0_bot
 * Is a batch-processingish bot

Rewrite specifications

 * Written in PHP
 * Backwards compatible with current assessment templates used
 * Should be 'good enough' to be deployed on enwiki
 * Feature Parity with WP1.0 Bot

Components

 * 1) Assessment Data Collector
 * 2) Update Assessment Data whenever it is changed
 * 3) Log changes to assessments
 * 4) Import initial data from current Bot
 * 5) Querying interface (Assessment Statistics + Articles List)
 * 6) Arbitrary Querying of assessment data
 * 7) Embedding of arbitrary query results in different forms inside wiki articles (Statistical Table embedding)
 * 8) Creating, managing and exporting 'interim collections'
 * 9) Usage of Extension:Collections tbd

Machine Readable Assessments
After talking with User:CBM and User:Awjrichards, we hit on a much better way of doing assessments. Since most assessment templates use the w:en:Template:WPBannerMeta, we can modify that template to provide machine readable assessment data that can be then read by the assessment parser. This eliminates the need to maintain wikiprojects separately.

Representing Assessment Info
The approach favored by me would be to modify the template to insert extra attributes ( attributes) on the link pointing to the WikiProject home page. and  placed on that link would denote importance and quality assessment for that particular article from that particular WikiProject, and a   is added to the   tag to denote that it represents a wikiproject assessment. In line with the POSH principle from Microformats, but without too much abuse of. This puts them in a machine readable form right in the HTML.

Parsing out Assessment Info into Database
After each edit, we could either:
 * 1) Parse out the HTML (after it's generated) and pick out the assessment data
 * 2) Put an entry into the job queue, which executes code to pick out the assessment data

We then add it to the database if the info has changed, and record pertinent information in the log (user, timestamp, rervision, etc)

Open Issues

 * 1) Okay to use   attributes on WMF properties? No issues with browser compat, but still would like to get this clarified.
 * 2) Is the metatemplate good enough to actually insert these   attributes properly? I tried reading it (three times!) and got a headache. Need to contact User:MSGJ.
 * 3) We'll be parsing HTML to get data out. Is this considered dirty and sinful? Will I be punished by the WMF cabal? This is perhaps the most important issue.
 * 4) Parse out right after edit, or put in queue? Needs performance testing.
 * 5) How do I parse out the HTML? OutputPage doesn't build a DOM afaik, and I'd like to avoid reparsing if possible. External library?

Logs
Logs of assessment changes every time they are changed.

Tasks

 * 1) Develop logging model, with DA code (2 hours)
 * 2) Write a Special Page extension to view/filter the log. Filter By: (14 hours)
 * 3) Time of Change
 * 4) Type of Change (Importance/Quality/Other)
 * 5) User making change
 * 6) Direction of Change (Improve/Detoriate)
 * 7) Category/Project of article change is made to
 * 8) Article name

Query Engine
Set of core components that can execute any arbitrary queries, producing both statistics and article lists

Tasks

 * 1) Build a basic querying engine that can be extended in the future over other assessment backends (not just WikiProject based assessments). Abstract and well defined interfaces built. List of supported query operations would rather closely mirror that of LINQ. (est: 12 hours)
 * 2) Implement the querying engine for the WikiProject based assessments (Component #1) (est: 12 hours)
 * 3) Implement specific statistical engine for WikiProject based assesments. Support for overall and per project tables (est: 12 hours)

Querying Interface
User Interface to interactively query the assessments - both overall statistics and article lists.


 * 1) Expose the query engine via a Special Page (est: 12 hours design + 12 hours implementation)
 * 2) Expose the statistical engine via a Special Page  (est: 12 hours design + 12 hours implementation).

Embedding Interface
Magic Words (or similar) that let you embed statistical tables inside wikipages. Customizable.


 * 1) Build magic words to embed statistical tables/results in wikipages (est: 6 hours)
 * 2) Build magic words to embed query results in wikipages (est: 8 hours)