User:DLynch (WMF)/VisualEditor/Instrumentation

From mediawiki.org

Schemas[edit]

A schema is a JSON file which defines the format of the data we want to log. It's a list of fields and the type of data that's allowed in each one.

Schemas live on metawiki, and the EventLogging extension knows how to use an API to fetch them regardless of which wiki they're being used on.

EditAttemptStep[edit]

EditAttemptStep is our most used schema. It's a general-purpose definition of an abstract "editing session", and anywhere on-wiki that can conceptually be viewed as a user editing a page should be logging to it. In practice, this means:

  • WikiEditor (the 2010 editor, with toolbar)
  • VisualEditor
  • The 2017 wikitext editor (a.k.a. VisualEditor's wikitext mode)
  • MobileFrontend's editor
  • DiscussionTools

Everywhere that EditAttemptStep is used should be identifiable through some unique combination of these fields:

  • `integration`
  • `platform`
  • `editor_interface`

...but you might need to look at sources like VisualEditor Configurations to work out what combination you're seeking.

A session?[edit]

EditAttemptStep is concerned with a theoretical "editing session", which begins when the editor is initialized and ends when an edit is saved or abandoned.

Each session should be linked to a unique `editing_session_id`.

Each event in EditAttemptStep has an `action` property, which is the type of the event. The shape of the session is thus these event-types occurring in this order:

  1. `init`: editor initialization begins
  2. `ready`: it's possible for the user to interact with the editor, though possibly in a restricted manner
  3. `loaded`: the editor is fully loaded with all functionality available
  4. `firstChange`: the user has made a change to the document, most likely by typing a character into the editor

...then these events could occur repeatedly as the user tries to save:

  1. `saveIntent`: the user has shown an interest in saving the edit, perhaps by opening a save dialog
  2. `saveAttempt`: the user has initiated an attempt to save the edit
  3. `saveFailure`: the attempted edit failed

...the session should then be ended by one of:

  1. `saveSuccess`: the attempted edit succeeded
  2. `abort`: the edit session ended, without saving

`abort` can happen at any point in this sequence; the `abort_type` field contains more information about why the edit was aborted. Unfortunately, you can't rely on `abort` being present, particularly on mobile devices, as pages can be discarded in the background when the user tabs away from them / leaves the app.

Every event includes a `timing` field, giving the time since some previous relevant event in the sequence.

  1. `init`: never set, as it's the event everything else is relative to
  2. `ready`: time since `init`
  3. `loaded`: time since `init`
  4. `firstChange`: time since `ready`
  5. `saveIntent`: time since `ready`
  6. `saveAttempt`: time since `saveIntent`
  7. `saveSuccess` / `saveFailure`: always logged as -1, for reasons that are before my time, but which I reconstruct as wanting to avoid the influence of the server round-trip on the timings
  8. `abort`: depends on when in the sequence it happens; generally time since `ready`, but will be since `init` if `ready` hasn't happened yet, or since `saveAttempt` if the user leaves the page while the save is occurring

VisualEditorFeatureUse[edit]

This is closely paired with EditAttemptStep. It's used to log some simple feature use data that's not part of a strictly defined session structure. As such, it's a simple key-value store, logging a `feature` and an `action`.

This schema intends to answer the question "did this session involve the user doing X?", without trying to store what the user actually did with that feature.

This is bulk data logged from generic handlers, so it exposes a lot of internal VisualEditor names for things. e.g. almost all `features` are the internal VE name for a node-type / tool.

There's a reasonably expansive set of feature and action descriptions at feature use data dictionary.

Sampling[edit]

We don't want to log every single session, because that's a lot of data. As such, we sample it -- one session in every sixteen is logged, by default.

This can be configured or overridden in various ways, though they're implementation-dependent.

There's two variants of this:

  1. Change the sampling rate
  2. Force sampling for this particular session

The latter explicitly flags the session as being "oversampled".

How do I test this?[edit]

Trackdebug[edit]

All the integrations listed above will respect this URL parameter: `trackdebug=1`

If it's present, the following will happen:

  • The session will be oversampled
  • The logged data be sent to the browser console
  • The logged data will not be sent to EventLogger

This is convenient for developer usage, as it doesn't log anything permanently when you're performing strange and non-representative actions.

EventLogging[edit]

You can see what's actually being logged to EventLogging by following the EventLogging debug guide. If you enable debug mode as it describes, you'll see popups on the right-hand side of your browser window for every logged event, and entries in your browser console for each event. If you click on the popup, it'll open a dialog window which contains a JSON representation of the event.

You can force oversampling with this URL parameter: `editingStatsOversample=1`