Flow/Analytics

Goals:
 * determine engagement with Flow boards (a Trello card)
 * We'll do this by running queries against the Flow DB
 * Probably want to compare with regular talk pages.
 * measure how people use the UI (Trello card plus others)
 * We'll do this using EventLogging to log m:Schema:FlowReplies and action events.
 * also involves qualitative User Research.

Determining engagement
Flow can determine metrics like new topics, and average number of replies to topics because these are separate DB updates.

We'll probably want to compare with regular talk pages. Wikimetrics can show edit metrics for regular talk pages, but it's only for a cohort, a defined group of users.
 * Wikimetrics page edits aren't currently talk-aware. Determining similar metrics (New topics, count replies, etc.) for regular user talk pages is harder. Echo has a   that can help, but it's intensive, parsing each revision.

Cohort
Typical wikimetrics involves identifying a cohort ("People who signed up at our Editathon") and then tracking their page edit success.

Flow doesn't have obvious cohorts to compare, we could just pick a bunch of newly-registered users. Danny has manually counted regular talk page edits vs. Flow board edits.

Implementation
http://flow-reportcard.wmflabs.org/ runs on limn1.eqiad.wmflabs

Dan Andreescu set up analytics/limn-flow-data repository (see its gerrit patches) based off mobile's repo.
 * This commit deploys the Flow metadata to limn1
 * reportcard.json defines our default dashboard.
 * sets up a cron job to generate Flow statistics

Now Flow team "only" needs to
 * commit python query scripts based on mobile to our limn-flow-data repo's flow directory
 * and update reportcard.json to reference their output.

How to get info to a dashboard?

 * Limn for now

Mobile and multimedia teams have automated this, each has a labs server ( http://mobile-reportcard.wmflabs.org and http://multimedia-metrics.wmflabs.org ), running cron jobs and generating Limn graphs.

Multimedia team also has server-side graphing in Ganglia.

Dan Andreescu will tell us where the code is, how these teams do it, etc.

Example: Echo dashboard
http://ee-dashboard.wmflabs.org/dashboards/enwiki-features has Echo, AFT, Page curation, and WikiLove stats. (An interesting one is Echo views by category.) All dashboards are actually puppetized web hosts on a limn1 server.
 * EE Dashboard has some info about setting this up
 * enwiki-features dashboard definition has multiple graph_ids including "enwiki_echo_all"
 * enwiki_echo_all datasource definition points to URL
 * http://datasets.wikimedia.org/public-datasets/enwiki/echo/echo_all.csv
 * which is on Datasets.wikimedia.org
 * which is stat1001 where I think we can run cron jobs to create datasets, or possibly stat1003.

(Note "ee-dashboard" sounds like a labs machine for the Editor engagement team (what the Flow team used to be called), but is actually editor engagement research (User:DarTar).)

Privacy: not too much data, not too long, not too personal

 * don't store data for long periods.
 * don't store personally-identifiable information data.
 * don't log for every single user

Note that Echo does all this for logged-in users who click on the Echo [NN] red badge.

Next steps
Talk to Dan Andreescu

Make sure we define what success is
For comparison, Analytics has developed a well-defined funnel for "editor success": user registers, user edits successfully, and user sticks around.

Possible model

 * how many people visit a talk page
 * and never try again
 * or try to edit
 * or add a new topic/section
 * and "get their answer"

UI event logging
We understand this pretty well.

Can Extension:Flow simply require EventLogging, or can it be decoupled through a " " interface? ( see how VisualEditor decouples)