Extension:EventLogging/Programming

How it works
When code logs an event, it must reference a schema. Here's some actual working JavaScript: In this case the schema is "GettingStarted". This should exactly match a Schema: page on Meta. The name should use InitialCaps.


 * 1) The schema is a JSON structure that specifies the fields in the event &mdash; their names, their types (integer, string, boolean, etc.), whether required or not, allowed values, ...
 * 2) PHP code in core or an extension explicitly depends on a particular revision of a particular data model.
 * 3) * For client-side event logging the MediaWiki ResourceLoader gets the data model from its page in meta-wiki's Schema: namespace (e.g. http://meta.wikimedia.org/wiki/Schema:GettingStarted revision 4867730), caches it, and makes it available to client JavaScript such as the call to  above.
 * 4) * For server-side event logging, PHP code simply calls

How to make a data model
Then:
 * Meet a researcher and determine what you're going to log, name the fields to log, reusing well-known field names.
 * Create a JSON structure representing this data model in the Schema: namespace on meta, tweak it until it saves without errors.
 * Sample: m:Schema:OpenTask
 * Tip: http://jsonlint.com/ has better error reporting, copy and paste your JSON into it.
 * Tip: if you have a JSON file with desired fields and values, http://www.jsonschema.net/ will guess at a schema for it (but with extra info like "id" that we don't currently use) that you can start with.
 * Use the schema's talk page (sample) to link to experiments using this, discuss details, etc.
 * Always document what code in what circumstances logs the event
 * Developers write code to log events that match the data model.
 * The data model tells analysts what information is in the logs.

Versioning
If code tries to log an event that doesn't match the data model that EventLogging retrieved, EventLogging will log the event anyway but flag it as invalid. Since you always give a schema revision, you can edit the schema as much as you want without affecting existing code.

It's OK to have different kinds of events (often called actions) sharing one data model. That way the events go into one table and it may simplify querying and multi-dimensional analysis. Only add "required":"true" to the fields that are applicable to all events.

Built-in data fields
The client-side  function always logs some "meta" fields of its own in addition to the object you pass to it:

"site":"my_wiki","schema":"MobileBetaWatchlist","revision":4921083,"isValid":true}
 * site
 * string, the wiki database ("enwiki", etc.)


 * schema
 * string, the schema name you passed to mw.eventLog.logEvent (in this example, "OpenTask")


 * revision
 * integer, the revision number you passed to mw.eventLog.logEvent


 * isValid
 * boolean, false if the event failed to validate against the schema revision you specified

Additions coming soon:
 * _token
 * string, random anonymous token per user (more precisely, per browser)

The log processing that handles this event stream detects if an event was truncated in transmission. It also logs other fields, such as: palladium.eqiad.wmnet 327042 2012-12-11T20:53:20 208.NN.NN.NN (origin log server?, a sequence number, server timestamp in UTC, client IP). We obfuscate the client IP.

SQL fields
The json2sql script that creates a SQL table on-the-fly for each schema+revision prepends an underscore to the built-in data fields, thus the SQL table for a client-side event has fields _site, _schema, _revision, _isValid, _origin, _seqId, _timestamp, _clientIp, and _truncated.

Server-side events
The server-side  PHP function also logs a similar set of "meta" fields of its own in addition to the array you pass it: site, schema, revision, isValid, timestamp

Note: A schema should describe a server-side event or a client-side event, but not both at the same time, since server-side events and client-side events are annotated with different of metadata.

Standard data fields
Future Maybe these should be consistently generated by the event logging code, and have an underscore prefix, so clients could just say "please log,  , and  ".


 * isAnon
 * boolean; true if user has not logged-in (opposite of "authenticated"). In JavaScript, call mw.user.isAnon


 * article / title
 * string; the title of the page the user is editing. In JavaScript,.
 * note this doesn't work for Special pages and other namespaces.

If user has logged in (anon is false), then we often log:
 * editCount
 * integer how many edits a logged-in user has made. In JavaScript,.


 * pageId
 * integer the article ID of the current page. In JavaScript,.


 * pageNs
 * integer the namespace of the current page. In JavaScript,.


 * revId
 * integer The revision ID of the current page (meaningless for special pages, actions like View history, etc.) In JavaScript,.


 * userId
 * integer the user ID of a logged in user. Privacy note: information about the activities of logged-in users is already available in Special:RecentChanges, Special:UserContributions, etc. Not available "for free" in JavaScript; you must, for example, send the value of  to the browser from a MakeGlobalVariablesScript hook.

Common data fields
There are no standard values for these, but different data models use the same field name for their own values.


 * action
 * string; identifying different actions the data model logs, such as 'impression', 'click', 'submit', 'accept' (a task), 'create'


 * bucket
 * enum strings; this records which alternative is presented to a user. For example, Account Creation User Experience randomly shows users either 'control_3' (original form) or 'acux_3' (fancy validating form).


 * campaign
 * string; value of incoming query parameter identifying the source of an action. For example the Article Feedback Tool's "create an account" call to action links to the account creation with  in the query string.


 * error
 * ??, optional; records if the user experienced an error attempting an action (filling in a form, saving an edit, etc.) and what it was.


 * token
 * a unique random persistent token per browser, stored in the (badly misnamed) mediaWiki.user.id cookie. In JavaScript, calling mw.user.id will generate this. Note: this will probably become a built-in data field named _token
 * Perhaps use an additional session token when logging data in a single browser session


 * version
 * integer; a number representing changes to the conditions (not the data model), e.g. bump it when deploying code that presents a different experience.

Available data models
Also see m:Category:Data models. Not all of these have been converted to Schema: pages on meta,


 * openTask
 * in m:Schema:openTask


 * AccountCreation
 * in m:Schema:AccountCreation, includes client-side assign/impression/submit events and a server-side account_create event, both logged by Extension:E3Experiments. Current ACUX experiment still uses ClickTracking for client-side events, alas.


 * edit
 * server-side event logged by EventLogging itself whenever a user creates or edits an article (on PageContentSaveComplete hook), with fields:
 * articleId, api (boolean), title, namespace, created (boolean), summary, timestamp, minor (boolean), loggedIn, userId, editCount, registered (integer timestamp)


 * mobile
 * see Event_logging/Mobile


 * onboarding
 * may reuse openTask

JSON schema validation
Each data model JSON file on meta-wiki is a JSON schema. This is an evolving standard to specify the format of JSON structures, in our case the logged event.
 * the JSON schema draft].
 * As of December 2012 EventLogging only validates that the schemas on meta are valid JSON.
 * When code attempts to log an event, EventLogging only pays attention to a subset of JSON schema features; as of November 2012 this includes:
 * type: boolean, integer, number, string, timestamp
 * required: true/false
 * enum values

Error handling
If code attempts to log with an invalid format, EventLogging detects it's invalid and flags it, but logs it anyway.

Programming
Tips Debugging
 * The E3Experiments extension has working PHP setup code to declare and require the "openTasks" schema resource in, and sample JavaScript calls to eventLogging in.
 * your schema resource should depends on ext.eventLogging
 * require your schema wherever you need to log events (it will pull in ext.eventLogging which implements the mw.eventLog module).
 * See for API documentation.
 * In JavaScript code, use mw.eventLog.setDefaults to set common values for fields to log that don't change, such as version, the user's name, etc.
 * Client-side event logging works by requesting a beacon image event.gif with the log info in its query string. To see the log events you can
 * watch for this request in your browser's network console,
 * look for it in your web server's access logs,
 * run the toy web server scripts/DevServer.php in the EventLogging extension which pretty-prints the query string,
 * in your browser's JavaScript console, enter mw.eventLog.schemas.Name.logged to see an array of logged events on the current page.
 * In your browser's JavaScript console, enter mw.eventLog.schemas to see which schemas have been loaded.