Reading/Web/EventLogging best practices

Also see Reading/Web/Quantitative Testing and WMF-wide EventLogging best practices: Extension:EventLogging/Guide

Schemas

 * Schemas in MobileFrontend should be prefixed with
 * The talk page should be edited with the  template. For example Schema_talk:MobileWebSearch

Privacy
EventLogging data is subject to the Wikimedia Foundation's privacy policy and Data retention guidelines. In particular, sensitive data needs to be deleted after 90 days, in a purging process that (for EventLogging) is defined by a whitelist (of non-sensitive fields) for each schema.

The Analytics Engineering team offers some thoughts on best privacy practices for when writing schemas.

See the Audiences department's Instrumentation DACI regarding the general process for creating and reviewing schemas.

Sampling rate
See Reading/Web/Quantitative_Testing regarding terminology.

Be sure to check events for any newly deployed schema in its Grafana dashboard (linked on the talk page of the schema).

Consideration for experimental features in beta
If wanted you may want to consider a 100% sampling rate in beta. If so, please put this in the acceptance criteria for the task which implements the schema.

Use of tokens

 * See also Reading/Web/Quantitative_Testing

The use of tokens links various events within a user's interaction. These tokens facilitate the connection of events that occur within a single pageview or session, allowing for a more comprehensive understanding of user behavior. One common scenario is when a schema, such as Schema:Popups, needs to record several actions occurring within one pageview and during a session.

Two types of tokens are commonly employed for this purpose: session tokens and page tokens.

Session Tokens
Session tokens are unique identifiers generated once and associated with a user's session. These tokens persist until the user closes their browser (session ID, cf. caveats). See also T205569. They serve as a means to link multiple events occurring within the same session and provide continuity in data collection. However, it's important to note that the uniqueness of session tokens might have certain caveats, particularly when browsers do not support the crypto API.

Page Tokens
Page tokens, on the other hand, are generated for the specific purpose of linking events occurring within a single pageview. When a user moves to a new page, a new page token is generated (page token). This approach helps isolate and associate events with specific pageviews.

How is page token generated?
TL;DR - The page token is created using a combination of strong random values from the WebCrypto API or, if unavailable, the  function. This ensures a high degree of entropy, making the token highly unlikely to collide with other tokens and allowing it to serve as a unique identifier for the user's current page view context.

Detailed Breakdown

The page token, generated by the  function, is an 80-bit integer represented in hexadecimal format with padding. It serves as a unique identifier for the user's current page view context. The process of generating this token involves the following steps:


 * 1) When the   function is called, it checks if the   variable is already set.
 * 2) If   is not set (meaning it's the first time or hasn't been generated yet), the function proceeds to generate the page token.
 * 3) To generate the page token, the function internally calls the   function.
 * 4) The   function employs two methods to ensure a high level of entropy (randomness) in the generated ID:
 * 5) * It first attempts to use the WebCrypto API's  method to obtain cryptographically strong random values.
 * 6) * In case the WebCrypto API is not supported by the browser or if  fails, the function falls back to using.
 * 7) The result of the   function is an 80-bit integer, represented as a string in hexadecimal format with padding (20 characters).
 * 8) This generated page token is then cached in the   variable, ensuring that it doesn't need to be regenerated for subsequent calls to

The use of both session tokens and page tokens allows for comprehensive tracking and analysis of user behavior during a session and across pageviews. It ensures that events are correctly attributed to their respective contexts.

Additionally, it's worth mentioning that when a schema records both session tokens and page names, there may be a need to implement data retention policies. For instance, one of these identifiers should be purged after a certain duration, typically 90 days, to manage data storage and privacy considerations.

This token-based approach aids in organizing and analyzing user interactions effectively and provides valuable insights for quantitative testing and research, as outlined in T205569.

Testing
EventLogging events can be monitored in the browser console (in the Network tab, filter for request URLs containing "event" or "beacon"). But a nicer display of events as MediaWiki notifications can be activated by applying the following code in the console, while being logged in: (See also T188640)

One may need to deactivate Do Not Track and browser extensions that might be blocking EventLogging requests (e.g. Privacy Badger).

Token mining
It may be necessary to mine a token (session, pageview, or otherwise) in order to test instrumentation consistently, for example during the QA step(s).

If an A/B test is implemented using the  function, then the following scripts will mine a token:


 * https://gist.github.com/phuedx/d580f01c501d207398828b717bf9870b
 * https://gist.github.com/polishdeveloper/dd42b2372218331c442e70e521e595ac