Flow/Analytics

From mediawiki.org

Goals:

  • determine engagement with Flow boards (a Trello card)
    • We'll do this by running queries against the Flow DB
    • Probably want to compare with regular talk pages.
  • measure how people use the UI (Trello card plus others)

Determining engagement[edit]

Flow can determine metrics like new topics, and average number of replies to topics because these are separate DB updates.

We'll probably want to compare with regular talk pages. Wikimetrics can show edit metrics for regular talk pages, but it's only for a cohort, a defined group of users.

  • Wikimetrics page edits aren't currently talk-aware. Determining similar metrics (New topics, count replies, etc.) for regular user talk pages is harder. Echo has a DiscussionParser that can help, but it's intensive, parsing each revision.

Cohort[edit]

Typical wikimetrics involves identifying a cohort ("People who signed up at our Editathon") and then tracking their page edit success.

Flow doesn't have obvious cohorts to compare, we could just pick a bunch of newly-registered users. Danny has manually counted regular talk page edits vs. Flow board edits.

Implementation[edit]

http://flow-reportcard.wmflabs.org/ runs on the front-end web server limn1.eqiad.wmflabs

Dan Andreescu set up analytics/limn-flow-data repository (see its gerrit patches) based off mobile's repo.

The Flow analytics repository is regularly checked out on the stats back-end machine stat1003 to /a/limn-flow-data. Log output from the generate.py cron job (not much) appears in /var/log/limn-data/limn-flow-data.log

To generate new data the Flow team "only" needs to

  • commit python query scripts based on mobile to our limn-flow-data repo's flow directory
  • and update reportcard.json to reference their output.

Deploying new front-end code[edit]

Deploying new code on the front-end is a separate process. You need to check out https://github.com/wikimedia/limn-deploy locally. limn-deploy uses [www.fabfile.org/ Fabric] to execute commands remotely on limn1 via ssh, so you need to be able to ssh limn1.eqiad.wmflabs. It has "stages" for deployment, flow is one of the stages, thus

$ cd your/git/analytics
$ git clone https://github.com/wikimedia/limn-deploy
$ cd limn-deploy
$ sudo pip install -e .
$ fab -l  # lists available stages and commands

Then to push changes to the Flow analytics front-end:

$ fab flow deploy.only_data

How to get info to a dashboard?[edit]

  • Limn for now

Mobile and multimedia teams have automated this, each has a labs server ( http://mobile-reportcard.wmflabs.org and http://multimedia-metrics.wmflabs.org ), running cron jobs and generating Limn graphs.

Multimedia team also has server-side graphing in Ganglia.

Dan Andreescu will tell us where the code is, how these teams do it, etc.

Example: Echo dashboard[edit]

http://ee-dashboard.wmflabs.org/dashboards/enwiki-features has Echo, AFT, Page curation, and WikiLove stats. (An interesting one is Echo views by category.) All dashboards are actually puppetized web hosts on a limn1 server.

(Note "ee-dashboard" sounds like a labs machine for the Editor engagement team (what the Flow team used to be called), but is actually editor engagement research (User:DarTar).)

Privacy: not too much data, not too long, not too personal[edit]

  • don't store data for long periods.
  • don't store personally-identifiable information data.
  • don't log for every single user

Note that Echo does all this for logged-in users who click on the Echo [NN] red badge.

Next steps[edit]

Talk to Dan Andreescu

Make sure we define what success is[edit]

For comparison, Analytics has developed a well-defined funnel for "editor success": user registers, user edits successfully, and user sticks around.

Possible model[edit]

  • how many people visit a talk page
    • and never try again
    • or try to edit
    • or add a new topic/section
      • and "get their answer"


UI event logging[edit]

We understand this pretty well.

Can Extension:Flow simply require EventLogging, or can it be decoupled through a "track" interface? ( see how VisualEditor decouples)

See also[edit]