Wikimedia Apps/App Analytics

Background
Apps use a combination of API driven metrics (page views), ad-hoc EventLogging based user events and ad-hoc Piwik based usage tracking. Because the EventLogging interface is oriented around tracking web based events, and the creation of custom queries and dashboards, the apps teams have sought to try other solutions (Piwik, Appsee) for our specific needs.

Data In
However, there is now general consensus that we should work towards client libraries which front EventLogging as the data collection and storage layer. Android is the closest to that being the case already, but a unified definition of how app analytics should work would help both teams work towards a uniform understanding of our users, and reduce complexity of analysis and testing.

Data Out
The other shortcomings identified by the apps teams are on the data modeling and querying side. This project will not directly address that, and a parallel effort by Reading PM and Data Analysis will be needed to set up worksheets, dashboards or other retrieval and presentation of cross-app usage data.

Project Goals

 * Create a client side analytics layer for Android and iOS
 * Send user events to EventLogging
 * Track and send offline events
 * Be smart about bandwidth and battery usage
 * Respect privacy and default to anonymity
 * Be easy to add events and support new features without significant schema or EventLogging overhead
 * Provide a consistent sampling regime and test user pooling processes across clients
 * Make app usage analytics testable without significant data analyst work
 * Use consistent names and data definitions across apps

App Event Definition
In addition to the standard event capsule, all app events should share a "meta schema" which provides the app specific context information. This capsule should include:
 * Client timestamp (time of event as defined on device)
 * Context (either a specific screen in the app or the url for an article where the event occurs)
 * Test id (a unique test identifier. optional value to be passed to identify events sent as part of variant experience testing).
 * Variant pool id (a unique variant experience identifier. optional value to be passed to identify events sent as part of variant experience testing).
 * App version (although this can be gathered from the user agent, its useful to have a clear, queryable field for this)
 * Connection type (Wifi, Cell, Zero, Offline)

User Segments
When performing analysis its often helpful to limit the data to a subtype of user, or to allow for different user groups to be a dimension of analysis.

Device Identity Based Segments
The system should support querying and grouping by basic aspects of a device's identity, including:
 * OS
 * OS Version
 * Device type/screen size
 * App Version
 * Connection type

Behavior Based Segments
In addition to inherent attributes, it is often useful to identify users based on a user behavior. For example, for people who have notification "on" do they use the app more or less than users overall.

Behavioral segments are not pre-determined, but defined at the time of analysis, and then potentially stored, so that the same segment can be used over time. Behavior based segments include:
 * Days since install
 * User has done event X (eg. user has saved a page)
 * User has seen screen X (eg. user has been to the Places tab)
 * Primary language choice
 * Uses language
 * User has setting X with value Y (eg. user has opted into sharing location)
 * Geography (using IP lookup or device info?)
 * In test X (and variant Y) (eg. user is in test b of the new nav bar test)

Sampling
Mathy stuff. Statistics even!

Creating and Running Tests
A variant experience test is a way to try different experiences on different users and measure the impact of the variations on user behavior. Although these variants can be implemented and controlled entirely through ad-hoc client changes, it is often useful, especially in binary app store land, to separate the control and configuration of variants to a server API. In this system, a test will be defined and controlled via a RESTbase service.

A test will consist of a test identifier, and variant identifier for each variant. This will be used in the client and EventLogging to identify the tests and their variants for analysis. Each variant will also define a test pool size. This size represents the % of users to be included in the variant, ie. the percent chance any given user will receive this variant.

The test will also have an associated test date range, which will identify when the test should be run.

Configurable Behaviors

 * Overall sampling rate (per event or overall?)
 * Variant tests:
 * Pool distribution (defined as a decimal number which represents the % of users to be included in the variant)
 * Test dates (dates during which variant experiences should be used)

Behavior When Configuration not available
Need to define how required the config file is, and how the layer should behave (particularly w/r/t sampling and variants) when the config file cannot be retreived.

Offline Event Tracking and Queueing
Events which occur while the device is otherwise "offline" should be queued locally and sent once the user reconnects. This queue should always represent the lowest priority communication by the app (for example, requesting a content update should happen before any event transmission).

Events queued up but not sent after a certain time may be deleted. That is, we shouldn't send events months after they are relevant, just because someone turned on an old phone.

The queuing and reconnection should apply to all events tracked in this layer, except to check the server configuration. That is there should only be a single local queue/cache managed by the analytics layer.

Basic Feature Use Analysis
I want to know how many people use a feature or sub-feature. I need to understand the overall usage, as well as the usage relative to the larger population of all users. What % of users use this feature since launch? What % use this over a set time (% of users per day)? I also want to view these counts and rates by standard audience segments, such as primary language, device type (screen size), connection type and new vs. returning users.

Inter-Context and Intra-Context Analysis
I want to know which features on a given screen or sub-screen are most used. This means being able to use the context in which events occur as a dimension of analysis. This context may be a full screen (the Explore feed) or a sub-screen (the Featured Article card on the Explore screen). Additionally some actions are possible in multiple contexts. For example, you can Save an article to read for later from many screens and sub-screens in the apps, and its useful to compare these different sources of identical action. For example, what screen do people most often save articles? Are there screens where saving is so low, comparatively it could be replaced with another action?

Sequence Analysis
I want to know what features are used in sequence, and how the sequence of events effects users use of the feature or app. For example, do articles opened from the Explore feed result in longer reading sessions than those found through a search? Do users who find an article from the Read More recommendations then click another recommendation? This is also useful for understanding the completion rates for features with multiple sequential steps. For example, how many people abandon the account sign up process at each step in the process?

Subsequent Effect Analysis [This is a reach]
I want to know what track use of a feature or design change and what effect it has on subsequent actions by the user. For example, does saving an article from the Explore feed increase the likelihood of users reopening the app? Of actually reading the article? Do new users who turn on notifications use the app more 30 days later?

Variant Experience Analysis
I want to release a version of the app with variant experiences, and be able to assign users to different variants and measure the effect of those variants on their use of the app and/or feature.

Pageview-like events:
It is not scalable to track all basic consumption in the EventLogging layer, however being able to join this information is needed to do some of the funnel and usage analyses we would like to do. For specific tests and features, when needed, how can these be tracked via EventLogging? Should this tracking be part of the client analytics layer or a separate interface/path?
 * page views
 * page previews
 * search queries

Content anonymity:

 * iOS tracks the DOMAIN but not the article for article action (view, save, share, preview, etc) to add an additional layer of content consumption anonymity. Should that change?
 * Android defaults users to being tracked, iOS defaults to not tracked (and asks users explicitly during install). This makes apples to apples comparisons more complex. Should that change?

Minimize migration path for Android

 * Android already uses EventLogging and has significant instrumentation. Although this project would ideally treat the situation as a blank slate and define a "best" solution for apps generically, we should try to minimize changes needed in Android.

Passing timestamps for events

 * One of the core differences between app analytics and web analytics is that app events may occur while the user is offline. Additionally users are senstive to battery and data usage around apps (its easy to track which apps use up your battery/bandwidth to a level not generally possible in browsers). EventLogging assumes that an event will be dispatched as soon as it occurs, and places an automatic timestamp on each event as its logged. We will need to work with Analytics to detmine how asynchronous events can best be passed to EventLogging.

A/B testing capabilities

 * Android has done some limited A/B testing, and on the web, various teams (including Reading Web and Discovery) have been doing A/B tests for some time. Ideally the app analytics layer would be aware of variant tests and able to handle the pooling and tracking of users for variants. However that could also be a follow-on to a basic analytics layer definition.

Per language and per-wiki testing capabilities

 * In addition to arbitrary test pools, we often need to test or roll out features on a per-wiki basis. Being able to limit test pools to users of certain languages and wikis would be extermely valuable in these situations.