Wikimedia Release Engineering Team/DataDataData Sync Up/2019-03-12

= 2019-03-12 =

Last time

 * Meeting logistics
 * Let's meet bi-weekly
 * Antoine and Dan talked with Joseph from Analytics about getting help on this project
 * They want to show off their tech to other teams, and we want to store and analyze lots of stuff
 * A potential match. Let's set up a meeting or draft an email about what we need.
 * Dan is documenting schemas to help tease out what kind of storage and query we'll need to discover all kinds of trends and relationships in our data
 * https://drive.google.com/drive/u/0/folders/1iNPPC4U5PjLoaFuixNQKV0kWlNtWurn9
 * It would also be good to show Analytics what raw formats our data will come in
 * Jenkins: https://github.com/jenkinsci/statistics-gatherer-plugin
 * Gerrit: https://gerrit-review.googlesource.com/Documentation/cmd-stream-events.html
 * and https://gerrit-review.googlesource.com/Documentation/json.html#change
 * Do we need a project/board in Phab?

Today's Agenda

 * What's going on?
 * Vacation and not much movement
 * JR created phab task https://phabricator.wikimedia.org/T216085
 * Took a look at dashboard mock up and metrics spreadsheet (now in Data^3 folder)

Outline plan for Analytics
JR: Analytics might have thoughts on having easy access to large pool of data and querying that is flexible to changing needs

What data we have currently or are planning to collect

 * Schema
 * Data samples

How we might want to query that data

 * Our data is highly structured (see schemas)
 * Is Hadoop or ES more appropriate for that? Would we lose structure by putting it in Hadoop?
 * How much do we have to know about how data's structure before we put it in ES?
 * Can relationships/schema be changed after data is stored?

TODOs (by next meeting)

 * Dan to draft email to Analytics (include dashboard mockup)