Extension:E3 Experiments/Architecture

The extension combines
 * a somewhat generic harness for making experimental changes to pages
 * specific PHP and JavaScript code for particular experiments
 * and CSS and message resources appropriate to the experiment if it makes changes to pages.

So the extension code only makes sense within an end-to-end explanation of a particular experiment.

E3Experiment for post-edit feedback
The Post-edit feedback experiment puts up one of two messages on successful edit for a very limited set of users.

Because a MediaWiki edit redirects to a normal page URL, the extension hooks into ArticleUpdateBeforeRedirect (in Experiments.hooks.php) to append ?pe=1 to the URL if it is the result of an edit.

Most of the rest of the extension code is JavaScript running in the browser. The extension uses the MW resource loader to load everything in lib/*.js and experiments/*.js onBeforePageDisplay. For the Post-edit feedback experiment in onBeforePageDisplay it also inserts a ?pe query string that the PHP ArticleUpdateBeforeRedirect hook adds), etc. Only if all this passes does it do anything.
 * test: use latest dashboard (see ../Testing, it reports this information.
 * Perhaps server-side onBeforePageDisplay hook code could skip loading the JS code if the page isn't appropriate

User bucketing
If the user is in the experiment, the JavaScript code calls  to turn the wgUserName into a random-ish number, in the Post-edit feedback assigns the user to control/experimental_1/experimental_2 bucket based on this (i.e. modulo 3 remainder), then calls ClickTracking jQuery code to remember which user the bucket in as the value of "PEF" in the userbuckets cookie (see ../Testing for other cookies). The JS code also calls $.trackActionWithOptions (more ClickTracking code) to track this as PEF1BUCKETASSIGN may turn off this event

This hashing is separate/different from ClickTracking's time-based hash that becomes the clicktracking-session cookie. The former is consistently the same for a known (logged-in) user, the latter changes upon clearing cookies or different browser.

The PEF action tracking includes this murmur3 hash, so even though we can work backwards from the edited article's revisionID to the user who made the edit, we also have the murmur3 hash of the username. To see edits from these users, could query for revision table edits for from new users registered in the experiment eligibility window, run murmur3 hash on their username, and know what bucket they were in and what PEF message they got (none/confirmation/gratitude).

Note that this is all for known logged-in users. The same information about edits is in their "My contributions" page, there isn't something additional we're tracking.

Performing the experiment
Then it actually performs the experiment, inserting different message text into the page according to whether the user is in bucket experimental_1 or experimental_2, or no message if the user is in the control bucket.

Extension:ClickTracking

 * Event tracking describes a replacement approach that is in development

Extension:ClickTracking is used by various extensions besides E3, like Extension:MoodBar, Article Feedback Tracking version 4, ... some in the process of being turned off on en wiki.


 * Provides user bucketing JS functions on jQuery $ that add information to userbuckets cookie (in ext.UserBuckets.js). UserBuckets can also set up a campaign if you provide it enough info in mw.activeCampaigns, but unlike say Account Creation, E3 Experiments doesn't do this.
 * Provides action tracking JS functions on jQuery $ such as $.trackAction('SOME INFO') (in jquery.clickTracking.js). The action tracking functions
 * check for a clicktrackingDebug cookie and set it if ?clicktrackingDebug is in the query string
 * if it doesn't already exist, set a clicktracking-session cookie
 * post using XMLHttpRequest a clicktracking action to api.php (This is hella expensive, it can't be cached!) unless clicktrackingDebug

ClickTracking does lots of other things like rewrite left-side navbar to track clicks, provides a function to track a URL (by requesting it through api.php which then redirects to the requested URL &mdash; expensive), hook into EditPage::showEditForm:fields form fields to add clicktrackingsession and event hidden form fields, etc. Kill this code if not used any more!

API calls
ClickTracking's API module always (?) derives additional information beyond the information the caller provides even if the caller doesn't care about it! Its ClickTrackingHooks:trackEvent also retrieves: The latter trio of edit counts is provided by another extension, UserDailyContribs, explained later.
 * whether the user isLoggedIn
 * if the user is logged in, ClickTracking accesses the database to retrieve
 * the user's edit count (but E3Experiments already has this info in wgEditCount, so the API call could avoid making a DB query)
 * the user's edit counts over three time periods: last 6 months, last 3 months, and last month

Then Here's a sample line from a click action on the en:Wikipedia:Community Portal page in the m:Research:Community portal redesign experiment: communityPortalClick	20120726225623	1	sUnbevNTg2FHRxMeqpAopbNQlH08dwS2w	4	722	21	21	5	https://en.wikipedia.org/wiki/India@/wiki/Main_Page
 * if $wgClickTrackingDatabase is set then it writes various data to the click_tracking database table, and writes the values of all buckets in the userbuckets cookie to the click_tracking_user_properties database table. But I think this is NOT enabled on MediaWiki production machines, so no database update.
 * if $wgClickTrackingLog is set then it calls wfErrorLog to log similar information, but not buckets:
 * event name
 * timestamp
 * whether user is logged in
 * a session id
 * the namespace of the page
 * the user's edit count
 * user edits over 6 months / 3 months / last month
 * additional information from the caller
 * event name is communityPortalClick
 * timestamp is July 26 2012
 * this user is logged in
 * the session ID is anonymized
 * the numerical namespace of the page is 4 (the "Wikipedia:" namespace).
 * since the user is logged in, edit counts are available:
 * users's edits total are 722, edits over last 6 - 3 - 1 month are 21 - 21 - 5
 * the additional information this experiment passed to ClickTracking in trackAction is the referring page and the destination of the click, separated by '@', thus the India Wikipedia page followed by the home page.

granular edit counts from UserDailyContribs
To get the user's edits over 6 - 3 - 1 months, ClickTracking calls getUserEditCountSince for each of these periods. getUserEditCountSince is provided by another extension, Extension:UserDailyContribs. This queries a user_daily_contribs table that it adds to the MediaWiki database. The extension adds a hook to ArticleSaveComplete and on each call it increments a counter for the day's edits.

Note that this happens on every call to ClickTracking's trackActionXxx, whether the caller cares about the info or not. 

Log processing
Squid logging explains how WMF processes logs, including the log output from ClickTracking. This step prepends the wiki from which the log came, e.g. enwiki, to the log message that ClickTracking generates.

Python scripts (on emery?) do some post-processing on these logs, e.g. unpacking an action's custom data into separate fields. For example.

Wikimedia researchers process these logs to explore and test hypotheses about the experiments. E.g. for Post-edit feedback, see http://stat1.wikimedia.org/rfaulk/_build/experiments_log.html#post-edit-feedback