Wikimedia Product/Better use of data/DACI

From mediawiki.org

During Q1 FY 2018-2019 (July 2018 - September 2018) the Better Use of Data Working Group and interested Wikimedia Audiences team members collaborated on the following DACI to help better formalize the instrumentation (usually taken to mean event logging) process. This artifact is part of Output 3.1 Instrumentation.

Activity Comments Head of Product Analytics Data Analyst / Data Scientist Reading Infrastructure Engineering Manager Reading Infrastructure Data Engineer Product Software Engineer Software Engineer Research Analytics Engineering Engineering Manager Analytics Engineering Software Engineer Product Manager / Program Manager / Researcher / Designer Legal / Privacy Security Management (usually, product Director)
Decide metrics needs Usually the product manager or researcher realizes a need for metrics and gets the OK from management. Sometimes, management was the one initiating the request in the first place. The workflow is initiated. Contributor Informed Informed Driver
Define research questions, metrics, instrumentation data The product manager defines questions that the instrumentation data needs to be able to help us answer.

The product manager and the data analyst determine the concrete metrics to be derived from the instrumentation, and define the concrete data that the instrumentation willl need to generate for this purpose. The data engineer and product software engineer reality check the instrumentation definition to ensure the data can actually be generated with the event model of the client.

Approver Contributor Contributor Contributor Driver
Instrumentation specification ticket and schema blob (precise definition of schema fields w/ validation, data dictionary cross-check, sampling approach, purging & whitelisting approach, queries, metadata) The schema and queries and other attributes of the logging and use are collaboratively defined in Phabricator and a schema blob through a session between the data analyst, product software engineer, AE software engineer (optional), and data engineer.

The data analyst creates database queries that will be used to generate results (this is the equivalent of test-driven development (TDD), for data). Once they're done, the Heaad of Product Analytics reviews regarding use of best practices with the Product Manager and then advances to the next step for privacy review. Everyone else is looped onto the ticket. A heads up of 7-10 business days is given to Legal/Privacy via privacy@ that schema privacy review will be needed on date X.

Approver Contributor Informed Contributor Contributor Informed Contributor Driver Informed Informed Informed
Schema privacy review privacy@ is emailed with the link to the ticket the business day before (or the business morning of) date X for the review so that Legal/Privacy can review and approve the formulation or ask for re-engineering. Contributor Driver Approver Contributor
Schema creation with backlink to ticket The data analyst, data engineer, and product engineer finalize the schema, with a reference back to the Phabricator ticket. Ideally, all metadata is self-documented in the schema / schema registry itself. The Head of Product Analytics reviews with the Product Manager and confirms when the schema is satisfactory for being instrumented against. Approver Contributor Contributor Contributor Driver
Instrumentation, pipeline scripts (if needed), alerting & monitoring Product Software Engineer does the plumbing, with assistance from the data engineer as needed. The data analyst reviews and approves the code/config artifacts and testing is arranged. Approver Contributor Contributor Informed Driver
Testing The software engineers (usually, product software engineers), the data analyst and possibly a QA tester collaboratively test the instrumentation. The data analyst reviews the outcome with the product manager and if the work is satisfactory move forward. Approver Contributor Contributor Contributor Driver
Update data dictionary if needed The data dictionary (initially on-wiki, probably built into the schema registry as a software fixture when the schema registry is built) is updated by the data analyst Approver Contributor Contributor Contributor Contributor Driver
Activation The product software engineer or data analyst activates production level logging. Contributor Contributor Contributor Approver Driver
Verification and fixes Analytics Engineering Software Engineer confirms no system degradation. Approver Driver
Dashboard productionization In case it has been requested to visualize some of the instrumentation data in form of a dashboard, the data analyst configures dashboarding, consulting with the data engineer and product software and Analytics Engineering software engineer as needed. Driver Contributor Contributor Contributor Approver
Long-term Support (LTS) The data analyst monitors the quality of the logged data and reports anomalies/bugs to the Reading Infrastructure Engineering Manager and Product Software Engineer. Driver Informed Contributor Informed Informed Approver
Decommissioning If logging is no longer needed, the Head of Product Analytics requests sign off on decommission. The data analyst and software engineers provide consultation and work along the way to actually decommission things. Driver Contributor Informed Contributor Contributor Informed Contributor Contributor Contributor Informed Approver