Jump to content

Reading/Web/EventLogging best practices

From mediawiki.org
< Reading‎ | Web

Also see Reading/Web/Quantitative Testing and WMF-wide EventLogging best practices: Extension:EventLogging/Guide

Schemas

[edit]
  • Schemas in MobileFrontend should be prefixed with MobileWeb
  • The talk page should be edited with the SchemaDoc template. For example Schema_talk:MobileWebSearch

Privacy

[edit]

EventLogging data is subject to the Wikimedia Foundation's privacy policy and Data retention guidelines. In particular, sensitive data needs to be deleted after 90 days, in a purging process that (for EventLogging) is defined by a whitelist (of non-sensitive fields) for each schema.

The Analytics Engineering team offers some thoughts on best privacy practices for when writing schemas.

See the Audiences department's Instrumentation DACI regarding the general process for creating and reviewing schemas.

Sampling rate

[edit]

See Reading/Web/Quantitative_Testing#Sampling_and_bucketing regarding terminology.

Be sure to check events for any newly deployed schema in its Grafana dashboard (linked on the talk page of the schema).

Consideration for experimental features in beta

[edit]

If wanted you may want to consider a 100% sampling rate in beta. If so, please put this in the acceptance criteria for the task which implements the schema.

Use of tokens

[edit]
See also Reading/Web/Quantitative_Testing#Notes on tokens

The use of tokens links various events within a user's interaction. These tokens facilitate the connection of events that occur within a single pageview or session, allowing for a more comprehensive understanding of user behavior. One common scenario is when a schema, such as Schema:Popups, needs to record several actions occurring within one pageview and during a session.

Two types of tokens are commonly employed for this purpose: session tokens and page tokens.

Session Tokens
[edit]

Session tokens are unique identifiers generated once and associated with a user's session. These tokens persist until the user closes their browser (session ID, cf. caveats). See also phab:T205569. They serve as a means to link multiple events occurring within the same session and provide continuity in data collection. However, it's important to note that the uniqueness of session tokens might have certain caveats, particularly when browsers do not support the crypto API.

Page Tokens
[edit]

Page tokens, on the other hand, are generated for the specific purpose of linking events occurring within a single pageview. When a user moves to a new page, a new page token is generated (page token). This approach helps isolate and associate events with specific pageviews.

How is page token generated?
[edit]

TL;DR - The page token is created using a combination of strong random values from the WebCrypto API or, if unavailable, the Math.random function. This ensures a high degree of entropy, making the token highly unlikely to collide with other tokens and allowing it to serve as a unique identifier for the user's current page view context.

Detailed Breakdown

The page token, generated by the getPageviewToken function, is an 80-bit integer represented in hexadecimal format with padding. It serves as a unique identifier for the user's current page view context. The process of generating this token involves the following steps:

  1. When the getPageviewToken function is called, it checks if the pageviewRandomId variable is already set.
  2. If pageviewRandomId is not set (meaning it's the first time or hasn't been generated yet), the function proceeds to generate the page token.
  3. To generate the page token, the function internally calls the generateRandomSessionId function.
  4. The generateRandomSessionId function employs two methods to ensure a high level of entropy (randomness) in the generated ID:
    • It first attempts to use the WebCrypto API's getRandomValues method to obtain cryptographically strong random values.
    • In case the WebCrypto API is not supported by the browser or if getRandomValues fails, the function falls back to using Math.random.
  5. The result of the generateRandomSessionId function is an 80-bit integer, represented as a string in hexadecimal format with padding (20 characters).
  6. This generated page token is then cached in the pageviewRandomId variable, ensuring that it doesn't need to be regenerated for subsequent calls to getPageviewToken

The use of both session tokens and page tokens allows for comprehensive tracking and analysis of user behavior during a session and across pageviews. It ensures that events are correctly attributed to their respective contexts.

Additionally, it's worth mentioning that when a schema records both session tokens and page names, there may be a need to implement data retention policies. For instance, one of these identifiers should be purged after a certain duration, typically 90 days, to manage data storage and privacy considerations.

This token-based approach aids in organizing and analyzing user interactions effectively and provides valuable insights for quantitative testing and research, as outlined in phab:T205569.

Testing

[edit]

EventLogging events can be monitored in the browser console (in the Network tab, filter for request URLs containing "event" or "beacon"). But a nicer display of events as MediaWiki notifications can be activated by applying the following code in the console, while being logged in:

mw.loader.using('mediawiki.api.options')
    .then(() => new mw.Api().saveOption('eventlogging-display-web', '1'));

(See also phab:T188640)

One may need to deactivate Do Not Track and browser extensions that might be blocking EventLogging requests (e.g. Privacy Badger).

Token mining

[edit]

It may be necessary to mine a token (session, pageview, or otherwise) in order to test instrumentation consistently, for example during the QA step(s).

If an A/B test is implemented using the mw.experiments.getBucket function, then the following scripts will mine a token: