Talk:Flow/Analytics

From mediawiki.org

Talk with Dan Andreescu 2014-11-05[edit]

There's one dashboard server in labs, limn1, easy to add another site to it.

Mobile Web team's report card is more automated that the editor-engagement hacks described on Flow/Analytics. Their repo analytics-limn-mobile-data defines the data generation and the report card web site presentation. We'll clone this for Flow. Some files here:

  • config.yaml names the graphs and their SQL file
    • If we just point to a CSV it's easy, if we want to tweak we have to point to a datasource.
  • edits-monthly-new-active.sql uses Jinja templating so the query is parameterized
  • generate.py is run by a cronjob on sta1003 to actually generate stats.
    • the SQL queries run against
      • databases hosted on analytics-store.eqiad.wmnet (replicated DB stuff, not just EeventLogging but e.g. enwiki revision tables.
      • databases hosted onx1-analytics-slave
    • We need to make sure that Flow's special DB cluster extension1 with flowdb on it is also accessible to this.
  • operations/puppet has limn config in modules/limn and manifests/misc/limn.pp

In development[edit]

There's no way to replicate the whole limn set up locally. Annoying, just run the sql on stat1003 and if it works commit it to flow-analytics repo and hope.

working on stat1003[edit]

Set up access to stat1003 (through bast1001.wikimedia.org) For mysql borrow ~milimetric/my.cnf.one-box

$ ssh stat1003.wikimedia.org
$ mysql --defaults-file=/home/milimetric/.my.cnf.one-box
mysql:research@analytics-store.eqiad.wmnet [(none)]> show databases

This has replication of all the wiki databases like enwiki (but not flowdb yet. Also log database has all the event logging tables corresponding to SchemaName_revision on metawiki.

  • we'll need to add flowdb here, see ToDo.
mysql:research@analytics-store.eqiad.wmnet [(none)]> use log
Database changed
mysql:research@analytics-store.eqiad.wmnet [log]> show tables like 'echo%';
+-------------------------+
| Tables_in_log (echo%)   |
+-------------------------+
| EchoInteraction_5539940 |
| EchoInteraction_5782287 |
| EchoMail_5467650        |
| EchoPrefUpdate_5488876  |
| Echo_5285750            |
| Echo_5364744            |
| Echo_5423520            |
| Echo_6081131            |
| Echo_7572295            |
| Echo_7731316            |
+-------------------------+
10 rows in set (0.00 sec)

In production[edit]

  • make changes to our repo
  • +2 them
  • puppet runs, updates stat1003
  • next cron job should pick up the changes

Limn data generation[edit]

Limn data generation also runs on stat1003,

  • The repo code is checked out to /a/limn-mobile-data
  • /a/limn-mobile-data/generate.py create stuff in /a/limn-public-data/mobile/datafiles
  • the Limn log is /var/log/limn-mobile-data.log (we aren't in the stats group so we can't see it)
    • anything goes wrong, bother Dan.
  • whatever's in /a/limn-public-data/ gets rsync'd to http://datasets.wikimedia.org/limn-public-data

So we see the change a few hours later.

Help[edit]

  • milimetric on #wikimedia-analytics connect, also mforns and nuria

To do[edit]

  • Dan create a new repo cloned from mobile analytics, but with a flow folder in place of mobile.
    • So we think we would put our query definitions and stuff in a here.
  • Dan set up a separate repository for our Flow reportcard that points to our virtual host and runs Flow's generate.py from a cron job
  • Dan set up a puppet change to set up the new flow-reportcard.wmflabs.org (a reportcard can have multiple dashboards). Limn1 has a
  • ErikB will give Dan the DB details for Flow: flowdb on extension1
  • everyone make RT request for stat1003
  • Dan Andreescu will give us access to limn1.
    • Mattflaschen
    • spage
    • Add your labs login here (the think in wikitech instance)