Extension:E3 Experiments/Architecture

The extension combines
 * a somewhat generic JavaScript harness for making experimental changes to pages
 * specific PHP and JavaScript code for particular experiments
 * and CSS and message resources appropriate to the experiment if it makes changes to pages.

So the extension code only makes sense within an end-to-end explanation of a particular experiment.

E3Experiment for post-edit feedback
The Post-edit feedback experiment puts up one of two messages on successful edit for a very limited set of users.

Because a MediaWiki edit redirects to a normal page URL, the extension hooks into ArticleUpdateBeforeRedirect (in Experiments.hooks.php) to append ?pe=1 to the URL if it is the result of an edit.

Most of the rest of the extension code is JavaScript running in the browser. The extension uses the MW resource loader to load everything in lib/*.js and experiments/*.js onBeforePageDisplay. For the Post-edit feedback experiment in onBeforePageDisplay it also inserts a ?pe query string that the PHP ArticleUpdateBeforeRedirect hook adds), etc. Only if all this passes does it do anything.
 * test: use latest dashboard (see ../Testing, it reports this information.
 * Perhaps server-side onBeforePageDisplay hook code could skip loading the JS code if the page isn't appropriate

User bucketing
If the user is in the experiment, the JavaScript code calls  to turn the wgUserName into a random-ish number. It assigns the user to control/experimental_1/experimental_2 bucket based on this (i.e. modulo 3 remainder). It calls ClickTracking jQuery code to remember which user the bucket in as the value of "PEF" in the userbuckets cookie (see ../Testing for other cookies). The JS code also calls $.trackActionWithOptions (more ClickTracking code) to track this as PEF1BUCKETASSIGN may turn off this event

This username hashing is consistently the same for a known (logged-in) user, so if the user clears cookies or logs in on a different browser, she will end up in the same bucket. It is separate/different from ClickTracking's time-based hash that becomes the clicktracking-session cookie.

The PEF action tracking includes this murmur3 hash, so even though we can work backwards from the edited article's revisionID to the user who made the edit, we also have the murmur3 hash of the username. To see edits from these users, could query for revision table edits for from new users registered in the experiment eligibility window, run murmur3 hash on their username, and know what bucket they were in and what PEF message they got (none/confirmation/gratitude).

Note that this is all for known logged-in users. The same information about edits is in their "My contributions" page, there isn't something additional we're tracking.

Performing the experiment
Then it actually performs the experiment, inserting different message text into the page according to whether the user is in bucket experimental_1 or experimental_2, or no message if the user is in the control bucket.

Extension:ClickTracking

 * Event logging describes a replacement approach that is in development

Extension:ClickTracking is used by various extensions besides E3, like Extension:MoodBar, Article Feedback Tracking version 4, ... some in the process of being turned off on en wiki.


 * Provides user bucketing JS functions on jQuery $ that add information to userbuckets cookie (in ext.UserBuckets.js). UserBuckets can also set up a campaign if you provide it enough info in mw.activeCampaigns, but unlike say Account Creation, E3 Experiments doesn't do this.
 * Provides action tracking JS functions on jQuery $ such as $.trackAction('SOME INFO') (in jquery.clickTracking.js). The action tracking functions
 * check for a clicktrackingDebug cookie and set it if ?clicktrackingDebug is in the query string
 * if it doesn't already exist, set a clicktracking-session cookie
 * post using XMLHttpRequest a clicktracking action to api.php (This is hella expensive, it can't be cached!) unless clicktrackingDebug

ClickTracking does lots of other things like rewrite left-side navbar to track clicks, provides a function to track a URL (by requesting it through api.php which then redirects to the requested URL &mdash; expensive), hook into EditPage::showEditForm:fields form fields to add clicktrackingsession and event hidden form fields, etc. Kill this code if not used any more!

API calls
ClickTracking's API module always (?) derives additional information beyond the information the caller provides even if the caller doesn't care about it! Its ClickTrackingHooks:trackEvent also retrieves: The latter trio of edit counts is provided by another extension, UserDailyContribs, explained later.
 * whether the user isLoggedIn
 * if the user is logged in, ClickTracking accesses the database to retrieve
 * the user's edit count (but E3Experiments already has this info in wgEditCount, so the API call could avoid making a DB query)
 * the user's edit counts over three time periods: last 6 months, last 3 months, and last month

Then
 * if $wgClickTrackingDatabase is set then it writes various data to the click_tracking database table, and writes the values of all buckets in the userbuckets cookie to the click_tracking_user_properties database table. This is NOT enabled on MediaWiki production machines, so no database update.
 * if $wgClickTrackingLog is set then it calls wfErrorLog to log similar information, but not buckets:
 * event name
 * timestamp
 * whether user is logged in
 * a session id
 * the namespace of the page
 * the user's edit count
 * user edits over 6 months / 3 months / last month
 * additional information from the caller

Sample ClickTracking log entry

 * probably belongs in a ../Analysis subpage

The first item, event name, for an experiment is often experimentIdentifier@version-eventType-userbucket e.g. ext.postEditFeedback@1-assignment-experimental_1 PostEditFeedback version 1, the particular event is "assignment", and the user is bucketed in "experimental_1".

Here's the entire line from this post-edit feedback experiment on a test wiki:
 * event name is as above

Here's a sample line from a click action on the en:Wikipedia:Community Portal page in the m:Research:Community portal redesign experiment:
 * event name is communityPortalClick (no user bucketing, so much simpler)
 * timestamp is July 26 2012
 * this user is logged in
 * the session ID is anonymized
 * the numerical namespace of the page (always '4' for the "Wikipedia:" namespace of Wikipedia:Community_Portal).
 * since the user is logged in, edit counts are available:
 * users's edits total are 722, edits over last 6 - 3 - 1 month are 21 - 21 - 5
 * the final field is additional information the specific experiment passes to ClickTracking in trackAction; in this case (Community Portal redesign) it is the referring page and the destination of the click, separated by '@', thus the India Wikipedia page followed by the home page.

Other ClickTracking clients

 * ext.articleFeedback (AFT4)
 * currently collecting data from enwiki, eswiki, ptwiki, zhwiki, testwiki; enwiki will be entirely disabled with completion of the AFTv5 ramp-up, I don't know why we are still collecting data from other wikis unless this was commissioned by Global Dev. I recommend we discontinue AFTv4 logging on all wikis.


 * ext.articleFeedbackv5 (AFT5)
 * entirely disabled on enwiki, but events keep trickling in from users running stale code. It can be safely discontinued unless there's a need of resuming data collection for FeedbackPage usage or CTAs after the completion of the ramp-up


 * ext.MoodBar
 * collecting email notification click-through data from enwiki, will be disabled at the latest by August 15.


 * Vector extension
 * has code to track actions for wgCollapsibleNavBucketTest and wgVectorSectionEditLinksBucketTest, both false by default.


 * other
 * bits of code calling the clicktracking API are lurking everywhere: if we want to discontinue it entirely we will need to clean up lots of extensions that we have in production, on top of those listed. For example WikiEditor still has the ability of tracking button clicks.

granular edit counts from UserDailyContribs
To get the user's edits over 6 - 3 - 1 months, ClickTracking calls getUserEditCountSince for each of these periods. getUserEditCountSince is provided by another extension, Extension:UserDailyContribs. This queries a user_daily_contribs table that it adds to the MediaWiki database. The extension adds a hook to ArticleSaveComplete and on each call it increments a counter for the day's edits.

Note that this happens on every call to ClickTracking's trackActionXxx, whether the caller cares about the info or not. 

Log processing
Squid logging explains how WMF processes logs, including the log output from ClickTracking. This step prepends the wiki from which the log came, e.g. enwiki, to the log message that ClickTracking generates.

sets the $wgClickTrackingLog to machine emery over UDP, whence all ClickTracking logs are copied to machine stat1: in /a/aft/archive/clicktracking/ (including Extension:ArticleFeedback tracking). Python scripts (on emery?) do some post-processing on these logs, e.g. unpacking an action's custom data into separate fields. For example.

Wikimedia researchers process these logs to explore and test hypotheses about the experiments. E.g. for Post-edit feedback, see http://stat1.wikimedia.org/rfaulk/_build/experiments_log.html#post-edit-feedback