Wikimedia Product/Analytics Infrastructure/Schema fragments

Schema Fragments
In the schema, reference the fragment(s) you wish to use and list which events are required in every event.

Example 1
Suppose we're running an A/B test on a new default skin for anonymous users and we are interested in measuring session length and average number of visited articles per session.

The schema would use the following fragments: core identifiers, page, UI, and A/B testing via: And the following fields would need to be included in the one (1) event logged by the instrument on every page load:  is included to assess impact of the new skin on reading behavior by screen size (small, medium, large) and the page namespace is included to filter out events from non-article pages.

The remainder of this section describes the fields in those fragments.

Identifiers
All new schemas for product analytics would include the core identifiers fragment:
 * Core identifiers
 * (string)
 * Identifies a client across multiple sessions. This is the "app install ID" on mobile apps and enables calculation of retention metrics for anonymous users since we do not have a user ID for those. MediaWiki-based instrumentation does not include this identifier in the events it sends.


 * (string)
 * Identifies a session. On MediaWiki, a session last for the lifetime of the browser process (refer to T223931 for additional information) and can be retrieved with . On iOS and Android apps, where the app is allowed to enter a background state, sessions expire after 15 minutes of inactivity. If the app returns to the foreground after 15 minutes, a new session ID is generated.


 * (string)
 * Identifies a page view, applicable only on the web. Interactions with multiple features (instrumented separately) on the same page may be linked together via this identifier. On MediaWiki this is retrievable with.


 * Activity sequencing (for reconstructing sequences of events)
 * (string)
 * Identifies a sequence of actions in the same context or funnel. In the past, teams have used terms like "session ID" and "sub-session ID" to refer to a set of connected events, such as interacting with a widget. This identifier is useful for grouping together impressions with corresponding clicks, and for grouping together steps in a process such as making an edit. Activity identifier can be randomly generated or a counter.


 * (integer)
 * Starting at 1, this is a counter for reconstructing the order of events in the same activity. For a variety of reasons we cannot trust the timestamp of receipt or the client-side timestamp of when the event was generated for putting events in order. In cases where the exact sequence of events needs to be established, this identifier can be used to record which event happened 1st, which happened 2nd, and so on.

For example, suppose the user is making an edit. We group the actions performed in this activity with. In the old way of doing things it would be feature-specific "editing_session_id". As the user interacts with various (instrumented) features/elements in the editor, previews the edit, continues editing, and finally publishes the edit, specific data about all of those interactions can be tracked in schema-specific fields, but the order in which those interactions happen is recorded in.

User
Information about the user associated with the event is contained in the  field.


 * Information about the user generating the event
 * (boolean)
 * Whether user is logged-in (false) or anonymous (true)


 * (integer)
 * User's MW user ID; 0 if user is anonymous. User ID is specific the wiki that the event came from.


 * (string)
 * Cross-wiki username


 * (integer)
 * The total number of edits by the user at the time of the event. Growth team retrieves this with  to record it for their experiments. May be useful as a proxy for experience at the time of the event.

Page
Information about the page associated with the event is contained in the  field.


 * Information about the page the event generated on
 * (integer)
 * Page's numeric ID in MediaWiki


 * (integer)
 * Page's namespace code in MediaWiki (e.g. 0 for Main/Article, -1 for Special)


 * (string)
 * Page's title


 * (boolean)
 * Whether the page is a redirect or not at the time of the event

User Interface
Information about the UI associated with the event is contained in the  field.


 * Information about the interface the user saw when the event was generated
 * (string)
 * Skin name (e.g. "Vector", "MinervaNeue", "Modern"); only applicable on MediaWiki, not mobile apps


 * (string)
 * "Light", "Sepia", "Dark", "Black"; currently only applicable on mobile apps, but Web is experimenting with it for MediaWiki


 * (object)
 * Information about the screen, such as dimensions


 * (integer)
 * Width of the screen in pixels


 * (integer)
 * Height of the screen in pixels

A/B Testing
Information about the A/B test (experiment) associated with the event is contained in the  field.


 * Information about the experiment the user was enrolled in when the event was generated
 * (string)
 * Name of the A/B test the user is enrolled in (e.g. "Desktop Redesign (Phase 3)"


 * (string)
 * Name of the group (sometimes called "bucket") the user was randomly assigned to – e.g. "control", "variant-a", "variant-b", "variant-c"

Note: what if user is in multiple A/B tests? answer: maybe test is an array with (name, group) pairs as elements

Campaign Attribution
Information about the UTM parameters associated with the event is contained in the  field.


 * Information about where the user came from
 * Identifies which site sent the traffic, and is a required parameter (e.g. )
 * Identifies which site sent the traffic, and is a required parameter (e.g. )


 * Identifies what type of link was used, such as cost per click or email (e.g. )
 * Identifies what type of link was used, such as cost per click or email (e.g. )


 * Identifies a specific product promotion or strategic campaign (e.g. )
 * Identifies a specific product promotion or strategic campaign (e.g. )


 * Identifies search terms (e.g. )
 * Identifies search terms (e.g. )


 * Identifies what specifically was clicked to bring the user to the site, such as a banner ad or a text link. It is often used for A/B testing and content-targeted ads. (e.g. )
 * Identifies what specifically was clicked to bring the user to the site, such as a banner ad or a text link. It is often used for A/B testing and content-targeted ads. (e.g. )