Wikimedia Apps/App Analytics

Background
The apps currently use a combination of API driven metrics (page views), ad-hoc EventLogging based user events and ad-hoc Piwik based usage tracking. Because the EventLogging interface is oriented around tracking web based events, and the creation of custom queries and dashboards, the apps teams have sought to try other solutions (Piwik, Appsee) for our specific needs.

Data In
However, there is now general consensus that we should work towards client libraries which front EventLogging as the data collection and storage layer. Android is the closest to that being the case already, but a unified definition of how app analytics should work would help both teams work towards a uniform understanding of our users, and reduce complexity of analysis and testing.

Data Out
The other shortcomings identified by the apps teams are on the data modeling and querying side. This project will not directly address that, and a parallel effort by Reading PM and Data Analysis will be needed to set up worksheets, dashboards or other retrieval and presentation of cross-app usage data.

Project Goals

 * Create a client-side analytics layer for Android and iOS
 * Send user events to EventLogging
 * Track and send offline events
 * Be smart about bandwidth and battery usage
 * Respect privacy and default to anonymity
 * Make it easy to add events and support new features without significant schema or EventLogging overhead
 * Provide a consistent sampling regime and test user pooling processes across clients
 * Make app usage analytics testable without significant data analyst work
 * Use consistent names and data definitions across apps

Extended App Capsule
In addition to the standard event capsule, all app events should share a "meta schema" which provides the app specific context information. This capsule should include:
 * Client timestamp (time of event as defined on device)
 * Context (either a specific screen in the app or the url for an article where the event occurs)
 * Test id (a unique test identifier. optional value to be passed to identify events sent as part of variant experience testing).
 * Variant pool id (a unique variant experience identifier. optional value to be passed to identify events sent as part of variant experience testing).
 * App version (since recently available directly available in the new parsed user agent JSON, although there are still open questions around JSON handling)
 * Language choice (Primary language wiki/ device language)
 * Connection type (Wifi, Cell, Zero, Offline)

User Segments
When performing analysis its often helpful to limit the data to a subtype of user, or to allow for different user groups to be a dimension of analysis.

Device Identity Based Segments
The system should support querying and grouping by basic aspects of a device's identity, including:
 * OS
 * OS Version
 * Device type/screen size
 * App Version
 * Connection type

Behavior Based Segments
In addition to inherent attributes, it is often useful to identify users based on a user behavior. For example, for people who have notification "on" do they use the app more or less than users overall.

Behavioral segments are not pre-determined, but defined at the time of analysis, and then potentially stored, so that the same segment can be used over time. Behavior based segments include:
 * Days since install
 * User has done event X (eg. user has saved a page)
 * User has seen screen X (eg. user has been to the Places tab)
 * Primary language choice
 * Uses language
 * User has setting X with value Y (eg. user has opted into sharing location)
 * Geography (using IP lookup or device info?)
 * In test X (and variant Y) (eg. user is in test b of the new nav bar test)

Sampling
Due to limitations in EventLogging and data warehousing, the app analytics layer will need to support sampling of events. Sampling may apply at different levels:
 * Deterministic sampling based on install ID? All device/install actions are defaulted to be tracked or not for the lifetime of the install.
 * Also support random per-event sampling? Membership in sample for tracking a specific event, determined at event time, and overriding the default pool membership.
 * Per variant test sampling rates. Membership in test pool for tracking events related to a specific variant test, determined at variant choice time, and overriding the default test membership. Rates cannot be changed after test begins.

Creating and Running Tests
A variant experience test is a way to try different experiences on different users and measure the impact of the variations on user behavior. Although these variants can be implemented and controlled entirely through ad-hoc client changes, it is often useful, especially in binary app store land, to separate the control and configuration of variants to a server API. In this system, a test will be defined and controlled via a RESTbase service.

A test will consist of a test identifier, and variant identifier for each variant. This will be used in the client and EventLogging to identify the tests and their variants for analysis. Each variant will also define a test pool size. This size represents the % of users to be included in the variant, ie. the percent chance any given user will receive this variant. Pool sizes cannot be changed after a test begins.

In order to avoid complications of pools and managing potentially overlapping variants, only one variant experience test should be run for any platform at any given time.

The test will also have an associated test date range, which will identify when the test should be run.

Configurable Behaviors

 * Overall sampling rate
 * Variant tests:
 * Pool distribution (defined as a decimal number which represents the % of users to be included in the variant)
 * Test dates (dates during which variant experiences should be used)

Behavior When Configuration not available
Need to define how required the config file is, and how the layer should behave (particularly w/r/t sampling and variants) when the config file cannot be retreived.

Offline Event Tracking and Queueing
Events which occur while the device is otherwise "offline" should be queued locally and sent once the user reconnects. This queue should always represent the lowest priority communication by the app (for example, requesting a content update should happen before any event transmission).

Events queued up but not sent after a certain time may be deleted. That is, we shouldn't send events months after they are relevant, just because someone turned on an old phone.

The queuing and reconnection should apply to all events tracked in this layer, except to check the server configuration. That is there should only be a single local queue/cache managed by the analytics layer.

Basic Feature Use Analysis
I want to know how many people use a feature or sub-feature. I need to understand the overall usage, as well as the usage relative to the larger population of all users. What % of users use this feature since launch? What % use this over a set time (% of users per day)? I also want to view these counts and rates by standard audience segments, such as primary language, device type (screen size), connection type and new vs. returning users.

Inter-Context and Intra-Context Analysis
I want to know which features on a given screen or sub-screen are most used. This means being able to use the context in which events occur as a dimension of analysis. This context may be a full screen (the Explore feed) or a sub-screen (the Featured Article card on the Explore screen). Additionally some actions are possible in multiple contexts. For example, you can Save an article to read for later from many screens and sub-screens in the apps, and its useful to compare these different sources of identical action. For example, what screen do people most often save articles? Are there screens where saving is so low, comparatively it could be replaced with another action?

Sequence Analysis
I want to know what features are used in sequence, and how the sequence of events effects users use of the feature or app. For example, do articles opened from the Explore feed result in longer reading sessions than those found through a search? Do users who find an article from the Read More recommendations then click another recommendation? This is also useful for understanding the completion rates for features with multiple sequential steps. For example, how many people abandon the account sign up process at each step in the process?

Subsequent Effect Analysis [This is a reach]
I want to know what track use of a feature or design change and what effect it has on subsequent actions by the user. For example, does saving an article from the Explore feed increase the likelihood of users reopening the app? Of actually reading the article? Do new users who turn on notifications use the app more 30 days later?

Variant Experience Analysis
I want to release a version of the app with variant experiences, and be able to assign users to different variants and measure the effect of those variants on their use of the app and/or feature.