Notifications/Metrics



'''This document is a work in progress. Comments are appreciated but this is not a final draft.'''

This page outlines our metrics plan for Echo, our new notifications system for MediaWiki, with a focus on the first release of Echo in April 2013.

To learn more about Echo, check out this project hub, these testing tips, the Feature requirements page and other related documents. For a quick visual overview of this project, check the.

Research Goals
Here are our overall goals for collecting and analyzing data for the Echo notifications project.

We want to measure how effective Echo is in helping users:
 * learn about activity related to them
 * take action on notifications
 * participate on Wikipedia

We plan to answer these questions through a variety of measurements, some of which would be integrated in the Echo tools. Here are some examples of measurements we could take on, for discussion purposes. After an initial feasibility study and prioritization session, we will identify a few key metrics that we think are required and practical for our first en-wiki deployment.

Target Users
For Echo's first release, our primary target user will be new editors, but the tool will be available to other user groups as well (albeit with different default preferences for each group). Here are the options we are considering for testing on the English Wikipedia in early 2013:

New users
 * Defined as users registered in 2013, who do not have special user rights (except for auto-confirmed users, who should be included in this group)
 * Enable individual email notifications
 * Disable onsite and email notifications for edit reversions

Current users
 * Defined as users who registered before 2013 and/or have special user rights (e.g.: reviewer, rollbacker, stewart, administrator, etc.)
 * No email notifications (but send one email notification to invite them to opt-in)
 * Disable onsite and email notifications for page links

For more details on our first release plans, read these requirements for defaults by user group.

Research Questions
Here are some of the research questions we want to answer in coming months. These questions apply to all notifications as a whole, as well as individual notification categories (e.g. talk page messages, page reviews).

First questions
Here are the questions we aim to answer with the metrics tools available to us for the first release of Echo:

'''1. How many events generate notifications each day? ''' Is that number growing? Tool: Events

2. How many users have notifications enabled, overall? How does this break down for web, email or both? Tool: Preferences

'''3. Are people turning off some notifications more than others? ''' Are there significant differences between new and current users? Tool: Preferences

'''4. How many notifications are being viewed on the web? ''' What's the breakdown between flyout and archive? Tool: Views

5. How many notifications are being clicked on? What's the click-through rate? Are some notifications receiving more clicks per view than others? Tools: Clicks / Views Blocker: Click logging (E3 says EventLogging can do this in early April)

'''6. Do notifications help people become more productive? ''' Do they edit pages more often? Are their edits successful? Tool: Productivity cohort analysis (bucketing)

7. Are people satisfied with the notifications tool? Tools: Satisfaction survey (new users) + project talk page (current users)

Future questions
These other research questions could be investigated once more tools become available to us, and if we can allocate the resources after the second release.

'''8. How many notifications are being viewed on email? ''' What's the breakdown between single emails, daily or weekly digests? Tool: Views (HTML email only) Blocker: Legal is not comfortable with tracking email impressions at this time.

9. Are some notifications viewed more than others? Is that difference growing? Tool: Events, Preferences, Views Blocker: need to normalize views for the same number of events and preferences; lower priority.

10. How many people took a follow-up action after viewing a notification? (e.g. editing an article, replying on a talk page) Tool: Funnel analysis of productivity by cohort Blocker: This is really hard to measure without a lot of instrumentation; lower priority, much of this is beyond the scope of Echo.

First release
Here are some of the metrics and research tools available to us for the first release, as shown in this diagram:

Events
This metric tool can answer research questions such as: 'How many notifications are being generated every day?'. It counts the number of events that trigger notifications, server-side.

Status: This 'notification generation' tool is already in place for Echo, but doesn't have any dashboards.

Preferences
This metric tool can answer research questions such as: 'How many users have notifications turned on?'. It counts the number of users that have enabled or disabled notifications, server-side. This 'preference status' tool can be easily developed for Echo, but doesn't have any dashboards.

Views
This metric tool can answer research questions such as: 'How many notifications are being viewed?'. It counts the number of impressions on the web flyout or archive (and eventually on HTML emails), client-side.

Status: This would use the EventLogging tool and may be developed for Echo's first release, but will require some instrumentation and may have some limitations (e.g. can it identify which notifications were above the fold in the flyout?). Note that our current privacy policy may prevent us from collecting impressions on HTML emails.

Clicks
This metric tool can answer research questions such as: 'How many notifications are being clicked on?'. It counts the number of clicks on the web flyout or archive, client-side. It can also be combined with Views to provide Click-through Rates, which can help measure the effectiveness of notifications.

Status: The current EventLogging tool doesn't track clicks reliably, and we cannot count on having this tool available for the first release. However, we would like to measure this important metric as soon as practical. To that end, we are encouraging the E3 team to make this a high priority for the next version of EventLogging.

Productivity
This one-time cohort study will answer research questions such as: "Do notifications help new users become more productive? (e.g. editing pages more often, getting fewer edits reverted)'. For this one-week study, we propose to bucket all new users into two cohorts: a main study group with notifications turned on; and a smaller control group with Echo notifications turned off completely (but keep current talkpage notifications enabled?). We will collect and compare the total number of productive actions taken by notification users from each cohot over that week, using a mix of client-side and server-side metrics

Status: This study will require bucketing and some instrumentation on edit pages, but can be done in our time-frame. We propose to start it a few weeks after the first release, once the code and features are stable.

Priorities: At a later date, we may want to do other cohort studies, such as a funnel analysis to measure which follow-up actions were taken (post on talk page, edit article), for each notification category and each user group. But this type of study is time consuming and requires a lot of instrumentation -- and we consider it to be lower priority, as much of this is beyond the scope of Echo. For now, our first priority is comparing the productivity of these two new user groups: with notifications vs. no notifications.

Satisfaction
This simple research tool can answer questions such as: 'How satisfied are users with the notifications tool?'. It uses a simple survey to ask users if they find the tool useful, and can be done using the SurveyMonkey tool, with a link in the archive page (and/or a special notification asking them to take a survey). This provides both quantitative and qualitative feedback on how our users perceive this tool.

Status: This 'customer satisfaction' tool can be easily integrated in Echo with a link inviting users to take a quick survey -- and SurveyMonkey provides useful dashboards of live survey results, with a range of filters and other tools.

Target Pages
This method would answer research questions such as: "How many people viewed the target page after clicking on a notification link? (e.g. article or user page)'. It would count the number of impressions generated by notifications on their landing page, using a mix of client-side and server-side metrics.

Status: This would require a lot of instrumentation, because there are so many different target pages. We have considered using URL parameters in the notification links, but this would not be received well by our users and would require extensive back-end data processing of huge file logs -- and this would be the first time that this method is being used by the product group (though the fundraising group has used it in the past). We view this as low priority for now.

Follow-up Actions
This method would answer research questions such as: "How many people took a follow-up action after viewing a notification? (e.g. editing a page)'. It counts the number of successful actions taken by notification users after viewing their landing page, using a mix of client-side and server-side metrics. See these examples of follow-up actions (e.g. visit the diff OR leave a message OR complete their first talk page edit within 24 hours of registration).

Status: These actions vary from one notification type to another and are difficult to measure, requiring a lot of instrumentation and we cannot count on using this method for the first release. It may be possible to do a cohort analysis to measure these results at a later date. We view this as low priority for now.

First Dashboards
Here are examples of dashboards we would like to produce for the first release of Echo on en-wiki the week of April 8th.

Events Dashboards
These volume dashboards would show the number of notifications generated every day, by category and group. This would measure the number of events that triggered a notification, using server-side data.

Purpose: To provide a general sense of notification event volume by category, over time. We also want to determine whether specific classes of users receive too many notifications.

Sample displays:

Events by Category
 * talkpage messages
 * page reviews
 * page links
 * mentions
 * thanks
 * edit reverts
 * welcome
 * get started
 * user rights

SQL Script:''' SELECT event_notificationType, COUNT(DISTINCT(event_recipientuserId)) FROM `log.Echo_5285750` WHERE wiki = 'mediawikiwiki' GROUP BY 1;

Events by Activity Group
 * interactive
 * positive
 * neutral
 * negative

Events by User Type
 * new users
 * current users

Notes:
 * [meta.wikimedia.org/wiki/Schema:Echo This schema] specifies what data needs to be logged
 * We implemented the corresponding hooks in MediaWiki.org
 * We have deploy data collection via EventLogging on MediaWiki.org
 * (See Trello ticket) for more info

Preferences Dashboards
This set of dashboards would show the number of users who have set their preferences at any given time, by type and category. This would be based on user preferences, using server-side data.

Purpose: To provide a general sense of how notifications preferences are changing over time, and whether or not people are turning off their notifications.

Sample displays:

Email Preferences by Frequency
 * No email notifications
 * Individual notifications
 * Daily digest
 * Weekly digest

Email Preferences by Category
 * Talk message
 * Thanks
 * Mention
 * Page link
 * Page review
 * Edit revert

Web Preferences by Category
 * Talk message
 * Thanks
 * Mention
 * Page link
 * Page review
 * Edit revert 88

Web Preferences by Type
 * Badge enabled
 * Badge disabled

Notes: 
 * Data for these dashboards needs to be cached on a daily basis with a cron job.

Views Dashboards
This set of dashboards will show how often notifications are viewed by users, by category and source type. This would be based on total impressions on the web flyout and the archive page, using a mix of client-side and server-side data, on a daily basis.

Purpose: To provide a general sense of how many notifications are viewed over time, and whether or not some are viewed more than others.

Examples:

Web Views by Category
 * Talk message
 * Thanks
 * Mention
 * Page link
 * Page review
 * Edit revert
 * Welcome
 * Get Started
 * User rights

Views by Source
 * Flyout
 * Archive

Views by User Type
 * new users
 * current users

Views by User Group (this one is lower priority)
 * new members
 * new editors
 * active editors
 * very active editors
 * inactive users

Notes:
 * Need to rewrite this initial draft of Schema:EchoInteraction.
 * Cannot measure email clicks, which are excluded for now, for legal reasons.

Clicks Dashboards
This set of dashboards will show how often notifications are being clicked on by users, by category and source type. It would also be combined with Views to display Clickthrough Rates, to measure the effectiveness of notifications. This would be based on total clicks on the web flyout and the archive page, divided by the number of views, client-side.

Purpose: To provide a general sense of how many notifications are clicked on over time, and whether or not some are clicked on more than others.

Examples:

Web Clicks by Category
 * Talk message
 * Thanks
 * Mention
 * Page link
 * Page review
 * Edit revert
 * Welcome
 * Get Started
 * User rights

Also provide the same dashboard for the 'Clickthrough Rate by Category' (dividing clicks by views).

Clicks by Source
 * Flyout
 * Archive

Also provide the same dashboard for the 'Clickthrough Rate by Source' (dividing clicks by views).

Clicks by User Type
 * new users
 * current users

Also provide the same dashboard for the 'Clickthrough Rate by User Type' (dividing clicks by views).

Clicks by User Group (this one is lower priority)
 * new members
 * new editors
 * active editors
 * very active editors
 * inactive users

Also provide the same dashboard for the 'Clickthrough Rate by User Group' (dividing clicks by views).

Notes:
 * Need to rewrite this initial draft of Schema:EchoInteraction.
 * An improved click logging tool may not be available from E3 until mid-April (EventLogging extension).
 * Cannot measure email clicks, which are excluded for now, for legal reasons.

First Cohort Study
This first study will show whether new users who receive Echo notifications become more productive than new users who don't have Echo. This would be based on total edits (or if time allows, number of edits that were not reverted within a week).

For this one-week study, we would bucket all new users into two cohorts: a main study group with notifications turned on; and a smaller control group with Echo notifications turned off completely (but with current talkpage and watchlist notifications enabled). We will collect and compare the total number of productive actions taken by notification users from each cohort over that week, using a mix of client-side and server-side metrics

Notes:
 * This study will require bucketing and some instrumentation on edit pages, but can be done in our time-frame.
 * We propose to start it a few weeks after the first release, once the code and features are stable.
 * We need to specify our bucketing strategy (e.g. all new users with an odd user_id)
 * Determine what control group we can and want to set up for each type of notification and determine the dependencies
 * What do we need to disable? (set prefs, prevent writing of notifications to the talk page)
 * See also (this Trello ticket) for more info.

Future Dashboards
Here are some dashboards which we may want to consider in the future. (Some of them may be redundant with some of the first dashboards above).

Specific dashboards
We would like some specific dashboards for each notification type, so that we (or developers using our API) can track their effectiveness.

User dashboards
We would like some individual user dashboards, so we can observe how typical users use this tool. It would also be useful to look at an average by user type as well (e.g. new user vs. active user average).

Volume Metrics
Note that some of the breakdowns proposed below (e.g. new vs. active user, or source breakdown) are useful across other types of metrics (e.g. posts vs. views vs. clicks). For example, it would be helpful to have a matrix table showing how many posts, views and clicks were generated by each user group across a range of sources.

Posts
How many notifications are being posted to users in a given week? (but not necessarily viewed)
 * for new users
 * for active users
 * for very active users

Note:
 * Rationale for this request: determine whether specific classes of users receive too many notifications

(This can be done by analyzing echo data, each notification is associated with a user along with triggering timestamp, with user edit count as a way to group new/active/very active users, we will know how many notifications are posted to each user group in a given week. However, a user may increase their edit counts in a very short amount of time, a new user two weeks ago may become an active user today, in this case, we would want to use EvengLogging to track the edit count at the time of triggering the notification)

Views
How many notifications are actually viewed in a given week?
 * on the flyout
 * on the archive page(s)
 * via email (plain text vs. HTML)

Notes:
 * We expect to use EventLogger to track these views
 * We would need to discuss building view tracking into our html email template and a server end to record views
 * Potential privacy concerns for HTML tracking, but OTRS has support for email impressions
 * For users with many pages of archive, how many click past the first page?

(We can track view of flyout by tracking the click on the badge, we can also track the view on archive page on page load, I am not sure if we can track view in emails unless users click on something in the email)

Clicks
How many notifications are being clicked on in a given week?
 * interactive notifications (e.g.: talk page messages, user mentions)
 * positive notifications (e.g.: wiki love, page link, promoted feedback)
 * neutral notifications (e.g.: page reviews, user mentions)
 * negative notifications (e.g.: edit reverts, page deletions)

Notes:
 * Break out clickthrough stats by source (flyout/archive/text-email/html-email)

(All these clicks can be tracked by EventLogging)

Preference Metrics
We want to measure changes in user preferences for both web and email notifications.

The primary rationale is to track how many users have web and/or email preferences enabled, and how many are disabling preferences.

All preferences
 * How many web and email notification types are enabled per user, on average?
 * How many users disabled both web and email notifications altogether this week?

Web preferences
 * How many web notification types are enabled per user, on average?
 * How many users disabled web notifications altogether this week?

Email Preferences
 * How many email notification types are enabled per user, on average?
 * How many users disabled email notifications altogether this week?
 * How many users switched to daily digest? weekly digest?
 * How many emails are being sent in a week to new/active/very active users?
 * For users who disabled or reduced their email settings, how many emails had been sent to them in the past day/week?

Notes:
 * Control for users with authenticated email addresses

Users
How many unique users are clicking on their notifications:
 * every day
 * once a week
 * once a month
 * rarely or never

(We can analyze this by using the data from clicks)

Actions
How many users go on to take these follow-up actions?
 * make an article edit (ns0) -- this is the most important action
 * start a page
 * post a message
 * post feedback
 * upload a file
 * other contributions
 * do nothing

Notes:
 * We need to define what is the scope of a "follow up" action.
 * Since it is hard to determine which actions take place as a direct result of a notification, it might make sense to compare a group of users who are getting Echo notifications with a control group without Echo, then track the overall number of edits, new pages and other contributions from both groups over a period of time.
 * This would let us determine whether or not Echo is helping engage users who receive notifications more than other users.

(It's quite difficult to track user follow-up actions especially when it involves multiple sub-actions. For example, if a user posts on my talk page about an article edit, I get the notification, click on the link in the flyout and get redirected to the talk page, click on the article link in the talk page, then start editing by clicking the edit link, it involves many actions in the middle, any of these actions can be initiated from other places)

Usage
Which notification types are used the most? the least?
 * Welcome
 * Talk page message
 * Started page message
 * Wikilove
 * etc.

Notes:
 * The purpose of this request is to determine whether any of these notifications should be eliminated due to insufficient usage
 * To normalize the data, we may want to focus on the ratio of clickthrough versus frequency (e.g.: measure the number of clicks divided by number of notifications)

Productivity
Which notification types appear to be most productive in terms of follow-up actions?
 * Welcome
 * Talk page message
 * Started page message
 * Wikilove
 * etc.

Usefulness
How useful are notifications to users overall? to new vs. experienced users?
 * Very useful
 * Useful
 * Neutral
 * Not very useful
 * Not useful at all

Notes:
 * This measurement of customer satisfaction could be based on a simple survey shown to users for a short period of time.

==Related documents''' Here are some useful links to related information:
 * Echo Metrics - Generating Notifications (Trello)
 * Echo Metrics - Interacting with Notifications (Trello)
 * Schema for tracking the generation of notifications (Meta)
 * Schema for tracking interactions with notifications (Meta)
 * Tables for tracking interactions with notifications: (Google)
 * Facebook Click-To-Impression Ratio Info

We will keep updating this section as our metrics plan develops.