Wikimedia Apps/App Analytics

From mediawiki.org

Background[edit]

The apps currently use a combination of API driven metrics (page views), ad-hoc EventLogging based user events and ad-hoc Piwik based usage tracking. Because the EventLogging interface is oriented around tracking web based events, and the creation of custom queries and dashboards, the apps teams have sought to try other solutions (Piwik, Appsee) for our specific needs.

Data In[edit]

However, there is now general consensus that we should work towards client libraries which front EventLogging as the data collection and storage layer. Android is the closest to that being the case already, but a unified definition of how app analytics should work would help both teams work towards a uniform understanding of our users, and reduce complexity of analysis and testing.

Data Out[edit]

The other shortcomings identified by the apps teams are on the data modeling and querying side. This project will not directly address that, and a parallel effort by Reading PM and Data Analysis will be needed to set up worksheets, dashboards or other retrieval and presentation of cross-app usage data.

Project Goals[edit]

  • Create a client-side analytics layer for Android and iOS
  • Send user events to EventLogging
  • Track and send offline events
  • Be smart about bandwidth and battery usage
  • Respect privacy and default to anonymity 
  • Make it easy to add events and support new features without significant schema or EventLogging overhead
  • Provide a consistent sampling regime and test user pooling processes across clients
  • Make app usage analytics testable without significant data analyst work
  • Use consistent names and data definitions across apps

Core Stories[edit]

Basic Feature Use Analysis[edit]

I want to know how many people use a feature or sub-feature. I need to understand the overall usage, as well as the usage relative to the larger population of all users. What % of users use this feature since launch? What % use this over a set time (% of users per day)? I also want to view these counts and rates by standard audience segments, such as primary language, device type (screen size), connection type and new vs. returning users.

Inter-Context and Intra-Context Analysis[edit]

I want to know which features on a given screen or sub-screen are most used. This means being able to use the context in which events occur as a dimension of analysis. This context may be a full screen (the Explore feed) or a sub-screen (the Featured Article card on the Explore screen). Additionally some actions are possible in multiple contexts. For example, you can Save an article to read for later from many screens and sub-screens in the apps, and its useful to compare these different sources of identical action. For example, what screen do people most often save articles? Are there screens where saving is so low, comparatively it could be replaced with another action?

Sequence Analysis[edit]

I want to know what features are used in sequence, and how the sequence of events effects users use of the feature or app. For example, do articles opened from the Explore feed result in longer reading sessions than those found through a search? Do users who find an article from the Read More recommendations then click another recommendation? This is also useful for understanding the completion rates for features with multiple sequential steps. For example, how many people abandon the account sign up process at each step in the process?

Variant Experience Analysis[edit]

I want to release a version of the app with variant experiences, and be able to assign users to different variants and measure the effect of those variants on their use of the app and/or feature.

Subsequent Effect Analysis [This is a reach][edit]

I want to know what the use of a feature or design change has on subsequent actions by the user. For example, does saving an article from the Explore feed increase the likelihood of users reopening the app? Of actually reading the article? Do new users who turn on notifications use the app more 30 days later?

Extended App Capsule[edit]

In addition to the standard event capsule, all app events should share a "meta schema" which provides the app specific context information. This capsule should include:

  • Client timestamp (time of event as defined on device)
  • Context (either a specific screen in the app or the url for an article where the event occurs)
  • Test id (a unique test identifier. optional value to be passed to identify events sent as part of variant experience testing).
  • Variant pool id (a unique variant experience identifier. optional value to be passed to identify events sent as part of variant experience testing).
  • App version (since recently available directly available in the new parsed user agent JSON, although there are still open questions around JSON handling)
  • Language choice (Primary language wiki/ device language)
  • Connection type (Wifi, Cell, Zero, Offline)

User Segments[edit]

When performing analysis its often helpful to limit the data to a subtype of user, or to allow for different user groups to be a dimension of analysis.

Device Identity Based Segments[edit]

The system should support querying and grouping by basic aspects of a device's identity, including:

  • OS
  • OS Version
  • Device type/screen size
  • App Version
  • Connection type

Behavior Based Segments[edit]

In addition to inherent attributes, it is often useful to identify users based on a user behavior. For example, for people who have notification "on" do they use the app more or less than users overall.

Behavioral segments are not pre-determined, but defined at the time of analysis, and then potentially stored, so that the same segment can be used over time. Behavior based segments include:

  • Days since install
  • User has done event X (eg. user has saved a page)
  • User has seen screen X (eg. user has been to the Places tab)
  • Primary language choice
  • Uses language
  • User has setting X with value Y (eg. user has opted into sharing location)
  • Geography (using IP lookup or device info?)
  • In test X (and variant Y) (eg. user is in test b of the new nav bar test)

Sampling[edit]

Most questions can be answered without need for tracking and storing all actions by all users. To support this, the app analytics layer will need to support sampling of users and events. Sampling may apply at different levels:

  • Deterministic sampling based on install ID? All device/install actions have a default "in or out" sampling status for the lifetime of the install.
  • Arbitrary per-event sampling. Membership in sample for tracking a specific event, determined at event time, and overriding the default device sampling status.
  • Per variant test sampling rates. Membership in test pool for tracking events related to a specific variant test, determined at variant choice time, and overriding the default sampling status. Rates cannot be changed after test begins.

Variant Experience Testing[edit]

Creating and Running Tests[edit]

A variant experience test is a way to try different experiences on different users and measure the impact of the variations on user behavior. Although these variants can be implemented and controlled entirely through ad-hoc client changes, it is often useful, especially in binary app store land, to separate the control and configuration of variants to a server API. In this system, a test will be defined and controlled via a RESTbase service.

A test will consist of a test identifier, and variant identifier for each variant. This will be used in the client and EventLogging to identify the tests and their variants for analysis. Each variant will also define a test pool size. This size represents the % of users to be included in the variant, ie. the percent chance any given user will receive this variant. Pool sizes cannot be changed after a test begins.

In order to avoid complications of pools and managing potentially overlapping variants, only one variant experience test should be run for any platform at any given time.

The test will also have an associated test date range, which will identify when the test should be run.

Server Configuration[edit]

Configurable Behaviors[edit]

  • Overall sampling rate for the app
  • Variant tests:
    • Pool distribution (defined as a decimal number which represents the % of users to be included in the variant)
    • Test dates (dates during which variant experiences should be used)

Behavior When Configuration not available[edit]

Need to define how required the config file is, and how the layer should behave (particularly w/r/t sampling and variants) when the config file cannot be retreived.

Offline Event Tracking and Queueing[edit]

Events which occur while the device is otherwise "offline" should be queued locally and sent once the user reconnects. This queue should always represent the lowest priority communication by the app (for example, requesting a content update should happen before any event transmission).

Events queued up but not sent after a certain time may be deleted. That is, we shouldn't send events months after they are relevant, just because someone turned on an old phone.

The queuing and reconnection should apply to all events tracked in this layer, except to check the server configuration. That is there should only be a single local queue/cache managed by the analytics layer.