Extension:WikiLambda/Metrics Implementation

From mediawiki.org

This page provides an overview of the implementation of metrics collection from WikiLambda, and provides links to information that's needed for maintaining or extending it. We cover Interaction Metrics and Product Metrics, as discussed in Extension:WikiLambda/Metrics. (Because the collection of User / Usage Metrics does not require any WikiLambda-specific implementation, they are not covered here.)

Interaction metrics[edit]

Wikifunctions' Interaction Metrics are collected by means of event logging. Metrics instruments in WikiLambda code generate events (key/value data structures) which are transferred into a database table in the Foundation's (Hadoop-based) Data Lake. From there, the data can be queried and presented on dashboards or in reports. This section provides information about these instruments. It gives a brief overview of the instruments' implementation and, mostly by means of links to relevant documentation, provides guidance for developing new instruments.

Metrics_Platform is used for metrics instrumentation in WikiLambda. All current instruments are in Vue code files. Each instrument calls dispatchEvent, which is defined in WikiLambda file mixins/eventLogUtils.js. (dispatchEvent, in turn, calls mw.eventLog.dispatch in Metrics Platform code.) All existing instruments can be found by searching WikiLambda Vue files for dispatchEvent.

WikiLambda declares a single metrics stream, wikifunctions_ui, in ext-EventStreamConfig.php (mediawiki-config repository). As explained in Event_Platform/Stream_Configuration, this stream configuration specifies a set of contextual attributes. When an instrument creates an event, Metrics Platform inserts values for each of these contextual attributes into the event.

In addition to these contextual attributes, events can contain values for custom data elements, which are specified by the code of the instrument creating the event.

WikiLambda currently employs an early version of Metrics Platform, in which all events conform to a predefined schema, known as the monoschema. The monoschema defines the available contextual attributes, and also defines the valid values -- strings, numbers, booleans, and null -- that may be used in custom data. The monoschema's formal specifcation is in analytics/mediawiki/client/metrics_event/current.yaml, and its documentation page is Metrics_Platform/Event_Schema.

Metrics Platform is evolving, and WikiLambda instruments will evolve with it. In particular, future versions of Metrics Platform will require the specification of a schema for each instrument, and the monoschema will be phased out. Reusable schema fragments will be provided to make schema specification easier.

Creating a new instrument[edit]

Before creating a new instrument in WikiLambda, these preliminary steps should be taken:

  • Familiarize yourself with the existing WikiLambda instruments.
  • Figure out the name of the event to be created by your instrument (a string).  The name should be consistent with names already used in the existing instruments, and with the best practices listed below.  It could be an event name that's already in use, or a new one.
  • Figure out the custom data to accompany the event.  The custom data key names should be consistent with key names already used in the existing instruments, and with the best practices listed below.

To code up a new instrument, follow the guidance presented in Metrics_Platform/Creating_An_Instrument, but with the following important modification:

  • Instead of calling mw.eventLog.dispatch directly, call dispatchEvent in WikiLambda file mixins/eventLogUtils.js.

Note that Metrics_Platform/Creating_An_Instrument includes a section on configuring the MediaWiki Docker development environment, which links to MediaWiki Docker EventLogging Configuration recipe, where you will find useful details such as additions to LocalSettings.php and docker-compose.override.yml to enable tracing of event generation. The tracing will show up in events.json and in the eventlogging log.

Best practices[edit]

  • Use the smallest number of instruments that will get the job done
  • Use the smallest number of distinct event names that will get the job done
  • Choose new event names to be consistent with existing ones
    • Generally using the pattern wf.ui.entity.verb (e.g., wf.ui.defaultView.load)
  • Reuse custom data property names across instruments
  • Remember that custom data property names must be in flatcase
  • Instrument locations should be informed by our code structure; e.g.
    • Vue mounted methods can work well for logging the start of a user activity
    • Take advantage of methods already designed to localize reusable functionality (e.g., WikiLambda's publishZObject, leaveTo, callFunction methods)
  • Consider general S.E. best practices around modularity, maintainability, etc.
  • See also Event_Data_Modeling_and_Schema_Naming

Product metrics[edit]

Content[edit]

Wikifunctions content metrics, also known as inventory metrics, are derived by querying two WikiLambda database tables: wikilambda_zobject_labels_table and wikilambda_zobject_function_join_table. These database tables are copied daily into the data lake, by means of Apache sqoop, as specified in sqoop.py (refinery repository). The script that invokes the copy operation is refinery-sqoop-wikifunctions-production.sh.erb (puppet repository).

Performance[edit]

Wikifunctions page load time is recorded by a Metrics Platform instrument (the instrument that dispatches events named wf.ui.newView.mounted, in App.vue). Metrics Platform instrument implementation is described in the preceding section, Interaction metrics. Because this instrument is not used for tracking a particular user interaction, we classify this metric under Product metrics rather than Interaction metrics.