Notifications/Metrics

'''This document is a work in progress. Comments are appreciated but this is not a final draft.'''

This page outlines our metrics plan for Echo, our new notifications system for MediaWiki, with a focus on the first release of Echo in April 2013.

To learn more about Echo, check out this project hub, these testing tips, the Feature requirements page and other related documents. For a quick visual overview of this project, check the.



Research Goals
Here are our overall goals for collecting and analyzing data for the Echo notifications project.

We want to measure how effective Echo is in helping users:
 * keep up with events that affect them
 * take action on notifications
 * participate on Wikipedia

We also want to improve this tool by tracking user engagement through a variety of metrics:
 * by category (e.g. message)
 * by user group (e.g. new users)
 * by source (e.g. flyout)

We've recommended below some key metrics for our first en-wiki deployment.

Target Users
For Echo's first release in April 2013, our primary target user will be new editors, who will get both web and email notifications, by default. Current users will also have web notifications enabled, but emails turned off by default.

New users
 * Defined as users registered in 2013, who do not have special user rights (except for auto-confirmed users, who should be included in this group)
 * Enable individual email notifications
 * Disable web and email notifications for edit reversions

Current users
 * Defined as users who registered before 2013 and/or have special user rights (e.g.: reviewer, rollbacker, stewart, administrator, etc.)
 * No email notifications (but send one email notification to invite them to opt-in)
 * Disable web and email notifications for page links

For more details on our settings for each user group, read these requirements for defaults by user group.

Research Questions
Here are some of the research questions we want to answer in coming months. These questions apply to all notifications as a whole, as well as individual notification categories (e.g. talk page messages, page reviews).

First questions
Here are the questions we aim to answer with the metrics tools available to us for the first release of Echo:

'''1. How many events generate notifications each day? ''' Is that number growing? Tool: Events

2. How many users have notifications enabled, overall? How does this break down for web, email or both? Tool: Preferences

'''3. Are people turning off some notifications more than others? ''' Are there significant differences between new and current users? Tool: Preferences

'''4. How many notifications are being viewed on the web? ''' What's the breakdown between flyout and archive? Tool: Views

5. How many notifications are being clicked on? What's the click-through rate? Are some notifications receiving more clicks per view than others? Tools: Clicks / Views Blocker: Click logging (E3 says EventLogging can do this in early April)

'''6. Do notifications help people become more productive? ''' Do they edit pages more often? Are their edits successful? Tool: Productivity cohort analysis (bucketing)

7. Are notifications useful to people who receive them? Tools: Satisfaction survey (new users) + project talk page (current users)

Future questions
These other research questions could be investigated once more tools become available to us, and if we can allocate the resources after the second release.

'''8. How many notifications are being viewed on email? ''' What's the breakdown between single emails, daily or weekly digests? Tool: Views (HTML email only) Blocker: Legal is not comfortable with tracking email impressions at this time.

9. Are some notifications viewed more than others? Is that difference growing? Tool: Events, Preferences, Views Blocker: need to normalize views for the same number of events and preferences; lower priority.

10. How many people took a follow-up action after viewing a notification? (e.g. editing an article, replying on a talk page) Tool: Funnel analysis of productivity by cohort Blocker: This is really hard to measure without a lot of instrumentation; lower priority, much of this is beyond the scope of Echo.

First Dashboards
Here are examples of dashboards we would like to produce for the first release of Echo on en-wiki the week of April 8th.

Events Dashboard
This set of volume dashboards would show how many notifications are generated every day, by category and group. This would measure the number of events that triggered a notification, using server-side data.

Purpose: To provide a general sense of notification event volume by category, over time. We also want to determine whether specific classes of users receive too many notifications.

Proposed dashboards: For the first release, all dashboards will share the same basic graphic display: a line graph with time on the horizontal axis (one point per day) and events on the vertical axis (as shown in these dashboards for Page Curation). Events for different categories or groups will be represented with different color lines, so we can easily tell them apart.

Events by Category
 * talkpage messages
 * page reviews
 * page links
 * mentions
 * thanks
 * edit reverts
 * welcome
 * get started
 * user rights

Events by Activity Group
 * interactive
 * positive
 * neutral
 * negative

Events by User Type
 * new users
 * current users

Notes:
 * This schema specifies what data needs to be logged
 * We implemented the corresponding hooks in MediaWiki.org
 * We have deploy data collection via EventLogging on MediaWiki.org
 * (See Trello ticket) for more info

Preferences Dashboard
This set of dashboards would show the number of users who have set their preferences at any given time, by type and category. It would also measure changes in preferences, for either web or email notifications. This would be based on user preferences, using daily server-side data.

Purpose: To track how many users have web and/or email preferences enabled, and how many are disabling preferences. To also show how notifications preferences are changing over time.

Proposed dashboards: For the first release, all dashboards will share the same basic graphic display: a line graph with time on the horizontal axis (one point per day) and preferences on the vertical axis (as shown in these dashboards for Page Curation). Preferences for different categories or groups will be represented with different color lines, so we can easily tell them apart.

Web Preferences by Category
 * Talk message
 * Thanks
 * Mention
 * Page link
 * Page review
 * Edit revert

Email Preferences by Category
 * Talk message
 * Thanks
 * Mention
 * Page link
 * Page review
 * Edit revert

Email Preferences by Frequency
 * No email notifications
 * Individual notifications
 * Daily digest
 * Weekly digest

Web Preferences by Type
 * Badge enabled
 * Badge disabled

Notes: 
 * Data for these dashboards needs to be cached on a daily basis with a cron job.
 * Control for users with authenticated email addresses

Views Dashboard
This set of dashboards will show how often notifications are viewed on the web by by category, source type and user group. This would be based on total impressions on the web flyout and the archive page, using a mix of client-side and server-side data, on a daily basis.

Purpose: To provide a general sense of how many notifications are viewed over time, and whether or not some are viewed more than others.

Proposed dashboards: For the first release, all dashboards will share the same basic graphic display: a line graph with time on the horizontal axis (one point per day) and views on the vertical axis (as shown in these dashboards for Page Curation). Views for different categories or groups will be represented with different color lines, so we can easily tell them apart.

Views by Category
 * Talk message
 * Thanks
 * Mention
 * Page link
 * Page review
 * Edit revert
 * Welcome
 * Get Started
 * User rights

Views by Source (cannot track emails at this time, for legal reasons)
 * Flyout
 * Archive

Views by User Type
 * new users
 * current users

Views by User Group (this one is lower priority)
 * new members
 * new editors
 * active editors
 * very active editors
 * inactive users

Notes:
 * We expect to use EventLogger to track these views.
 * Need to rewrite this initial draft of Schema:EchoInteraction.
 * Cannot measure email clicks, for legal reasons, because they conflict with our privacy policy. So this is web-only for now.
 * Technically, it is possible to track views of html email notifications, but it needs a server end to record views
 * We also hear that OTRS has support for email impressions, and will investigate.
 * It would be helpful to know how many click past the first archive page, if that's easy to do.
 * We can track view of flyout by tracking the click on the badge.
 * We can also track the view on archive page on page load.

Clicks Dashboards
This set of dashboards will show how often notifications are being clicked on by users, by category and source type. It would also be combined with Views to display Clickthrough Rates, to measure the effectiveness of notifications. This would be based on total clicks on the web flyout and the archive page, divided by the number of views, client-side.

Purpose: To provide a general sense of how many notifications are clicked on over time, and whether or not some are clicked on more than others.

Proposed dashboards: For the first release, all dashboards will share the same basic graphic display: a line graph with time on the horizontal axis (one point per day) and clicks on the vertical axis (as shown in these dashboards for Page Curation). Clicks for different categories or groups will be represented with different color lines, so we can easily tell them apart.

Clicks by Category
 * Talk message
 * Thanks
 * Mention
 * Page link
 * Page review
 * Edit revert
 * Welcome
 * Get Started
 * User rights

Also provide the same dashboard for the 'Clickthrough Rate by Category' (dividing clicks by views).

Clicks by Source
 * Flyout
 * Archive

Also provide the same dashboard for the 'Clickthrough Rate by Source' (dividing clicks by views).

Clicks by User Type
 * new users
 * current users

Also provide the same dashboard for the 'Clickthrough Rate by User Type' (dividing clicks by views).

Clicks by User Group (this one is lower priority)
 * new members
 * new editors
 * active editors
 * very active editors
 * inactive users

Also provide the same dashboard for the 'Clickthrough Rate by User Group' (dividing clicks by views).

Notes:
 * Need to rewrite this initial draft of Schema:EchoInteraction.
 * An improved click logging tool may not be available from E3 until mid-April (EventLogging extension).
 * Cannot measure email clicks, which are excluded for now, for legal reasons.

Productivity Cohort Study
This first study will show whether new users who receive Echo notifications become more productive than new users who don't have Echo. This would be based on total edits (or if time allows, number of edits that were not reverted within a week).

Purpose This would let us determine whether or not Echo is helping engage users who receive notifications more than other users.

Cohorts For this one-week study, we would bucket all new users into two cohorts: a main study group with notifications turned on; and a smaller control group with Echo notifications turned off completely (but with current talkpage and watchlist notifications enabled). We will collect and compare the total number of productive actions taken by notification users from each cohort over that week (e.g. successful edits), using a mix of client-side and server-side metrics.

Notes: 
 * This study will require bucketing and some instrumentation on edit pages, but can be done in our time-frame.
 * We propose to start it a few weeks after the first release, once the code and features are stable.
 * We need to specify our bucketing strategy (e.g. all new users with an odd user_id)
 * Determine what control group we can and want to set up for each type of notification and determine the dependencies
 * What do we need to disable? (set prefs, prevent writing of notifications to the talk page)
 * See also (this Trello ticket) for more info.

Usefulness Survey
This measurement of customer satisfaction could be based on a simple survey shown to users for about a month after the first release, asking them how useful the tool is for them.

A link to the survey would appear in the notification emails and on the archive page (e.g. 'Share your feedback'). Clicking on it would display a small popup window with a survey form powered by Survey Monkey, as we have done for other tools

How useful are these notifications to you?
 * Very useful
 * Useful
 * Neutral
 * Not very useful
 * Not useful at all

We would then use Survey Monkey's built-in filtering capabilities to display the perceived usefulness of this tool for all users, as well as new vs. current users.

First release
Here are some of the metrics and research tools available to us for the first release, as shown in this diagram:

Events
This metric tool can answer research questions such as: 'How many notifications are being generated every day?'. It counts the number of events that trigger notifications, server-side.

Status: This 'notification generation' tool is already in place for Echo, but doesn't have any dashboards.

Preferences
This metric tool can answer research questions such as: 'How many users have notifications turned on?'. It counts the number of users that have enabled or disabled notifications, server-side. This 'preference status' tool can be easily developed for Echo, but doesn't have any dashboards.

Views
This metric tool can answer research questions such as: 'How many notifications are being viewed?'. It counts the number of impressions on the web flyout or archive (and eventually on HTML emails), client-side.

Status: This would use the EventLogging tool and may be developed for Echo's first release, but will require some instrumentation and may have some limitations (e.g. can it identify which notifications were above the fold in the flyout?). Note that our current privacy policy may prevent us from collecting impressions on HTML emails.

Clicks
This metric tool can answer research questions such as: 'How many notifications are being clicked on?'. It counts the number of clicks on the web flyout or archive, client-side. It can also be combined with Views to provide Click-through Rates, which can help measure the effectiveness of notifications.

Status: The current EventLogging tool is expected to track clicks reliably by mid-April, so we can use this tool soon after the first release. The E3 team has been kind enough to make this a high priority for the next version of EventLogging.

Cohort Study
This one-time cohort study will answer research questions such as: "Do notifications help new users become more productive? (e.g. editing pages more often, getting fewer edits reverted)'. For this one-week study, we propose to bucket all new users into two cohorts: a main study group with notifications turned on; and a smaller control group with Echo notifications turned off completely (but keep current talkpage notifications enabled?). We will collect and compare the total number of productive actions taken by notification users from each cohot over that week, using a mix of client-side and server-side metrics

Status: This study will require bucketing and some instrumentation on edit pages, but can be done in our time-frame. We propose to start it a few weeks after the first release, once the code and features are stable.

Priorities: At a later date, we may want to do other cohort studies, such as a funnel analysis to measure which follow-up actions were taken (post on talk page, edit article), for each notification category and each user group. But this type of study is time consuming and requires a lot of instrumentation -- and we consider it to be lower priority, as much of this is beyond the scope of Echo. For now, our first priority is comparing the productivity of these two new user groups: with notifications vs. no notifications.

Survey
This simple research tool can answer questions such as: 'How useful are notifications for our readers?'. It uses a simple survey to ask users if they find the tool useful, and can be done using the SurveyMonkey tool, with a link in the archive page (and/or a special notification asking them to take a survey). This provides both quantitative and qualitative feedback on how our users perceive this tool.

Status: This 'customer satisfaction' tool can be easily integrated in Echo with a link inviting users to take a quick survey -- and SurveyMonkey provides useful dashboards of live survey results, with a range of filters and other tools.

Target Pages
This method would answer research questions such as: "How many people viewed the target page after clicking on a notification link? (e.g. article or user page)'. It would count the number of impressions generated by notifications on their landing page, using a mix of client-side and server-side metrics.

Status: This would require a lot of instrumentation, because there are so many different target pages. We have considered using URL parameters in the notification links, but this would not be received well by our users and would require extensive back-end data processing of huge file logs -- and this would be the first time that this method is being used by the product group (though the fundraising group has used it in the past). We view this as low priority for now.

Follow-up Actions
This method would answer research questions such as: "How many people took a follow-up action after viewing a notification? (e.g. editing a page)'. It counts the number of successful actions taken by notification users after viewing their landing page, using a mix of client-side and server-side metrics. See these examples of follow-up actions (e.g. visit the diff OR leave a message OR complete their first talk page edit within 24 hours of registration).

Status: These actions vary from one notification type to another and are difficult to measure, requiring a lot of instrumentation and we cannot count on using this method for the first release. It may be possible to do a cohort analysis to measure these results at a later date. We view this as low priority for now.

Future research
Here are some other metrics tools which we may want to consider in the future. (Some of them may be redundant with some of the first dashboards above).

Specific dashboards
We would like some specific dashboards for each notification type, so that we (or developers using our API) can track their effectiveness.

Individual user dashboards
We would like a way to track some individual user dashboards, so we can observe how typical users use this tool.

User type dashboards
It would also be useful to look at an average by user type (e.g. new user vs. current user), along different vectors. For example, we could provide a matrix table showing how many posts, views and clicks were generated by each user group across a range of sources.

Preferences
Any of these questions could be investigated at a later date, once we've deployed the first release. Here are some examples.

All preferences
 * How many web and email notification types are enabled per user, on average?
 * How many users disabled both web and email notifications altogether this week?

Web preferences
 * How many web notification types are enabled per user, on average?
 * How many users disabled web notifications altogether this week?

Email Preferences
 * How many email notification types are enabled per user, on average?
 * How many users disabled email notifications altogether this week?
 * How many users switched to daily digest? weekly digest?
 * How many emails are being sent in a week to new/active/very active users?
 * For users who disabled or reduced their email settings, how many emails had been sent to them in the past day/week?

Emails sent
How many notifications are sent by our mail servers every day?
 * with single emails
 * with daily digests
 * with weekly digests

Users
How many unique users are clicking on their notifications:
 * every day
 * once a week
 * once a month
 * rarely or never

(We may be analyze this by using the data from clicks)

Actions
How many users go on to take these follow-up actions?
 * make an article edit (ns0) -- this is the most important action
 * start a page
 * post a message
 * post feedback
 * upload a file
 * thank another user
 * review a page
 * other contributions
 * do nothing

Notes:
 * We need to define what is the scope of a "follow up" action.
 * It is hard to determine which actions take place as a direct result of a notification
 * It's quite difficult to track user follow-up actions especially when it involves multiple sub-actions. For example, if a user posts on my talk page about an article edit, I get the notification, click on the link in the flyout and get redirected to the talk page, click on the article link in the talk page, then start editing by clicking the edit link, it involves many actions in the middle, any of these actions can be initiated from other places.

Usage
Which notification types are used the most? the least?
 * Welcome
 * Talk page message
 * Started page message
 * Wikilove
 * etc.

Notes:
 * The purpose of this request is to determine whether any of these notifications should be eliminated due to insufficient usage
 * To normalize the data, we may want to focus on the ratio of clickthrough versus frequency (e.g.: measure the number of clicks divided by number of notifications)

==Related documents''' Here are some useful links to related information:
 * Echo Metrics - Generating Notifications (Trello)
 * Echo Metrics - Interacting with Notifications (Trello)
 * Schema for tracking the generation of notifications (Meta)
 * Schema for tracking interactions with notifications (Meta)
 * Tables for tracking interactions with notifications: (Google)
 * Facebook Click-To-Impression Ratio Info

We will keep updating this section as our metrics plan develops.