Wikimedia Technology/Annual Plans/FY2019/TEC14: Smart Tools for Better Data/Goals

=Program Goals and Status for FY18/19=

TEC14: Smart Tools for Better Data
 * Goal Owner: Nuria Ruiz
 * Program Goals for FY18/19: We will maintain and increase public access to past, present and real time data for Wikimedia projects. We will provide the infrastructure to measure the impact and reach of projects and features for editors, communities and WMF.
 * Annual Plan: TEC14: Smart Tools for Better Data
 * Primary Goal is Knowledge as a Service: Evolve our systems and structures
 * Tech Goal: Supporting our Community of contributors



 = Q1 Goals =

Outcome 1 / Output
Wikimedia Cloud Services users have easy access to high quality analytics data to answer questions about content and contributors.
 * Provision a cluster for public Data Lake access in Cloud Service

Goal(s)

 * Order Data Lake hardware and Provide Rationale for SQL engine used to make data accessible in labs

Outcome 3 / Output 1
Foundation staff and community have better visual tools to access data about content, contributors and readers.
 * Wikistats 2.0 - Users (and Programatic tools) have access to most reports that community consultation found of importance

Goal(s)

 * Build most prolific contributors report
 * Include metrics about total article count

Outcome 3 / Output 2
Foundation staff and community have better visual tools to access data about content, contributors and readers.
 * Wikistats 2.0 - Beta (carry on items from last quarter)

Goal(s)

 * Support annotations
 * Improvements on pageview data per country ✅

Outcome 3 / Output 3
Foundation staff and community have better visual tools to access data about content, contributors and readers.
 * Support for more data sources and programming languages for WMF Jupyter Notebook users.

Goal(s)

 * Better integration of Jupyter with spark ✅

Outcome 4 / Output 1
Foundation staff and community have better visual tools to access data about content, contributors and readers.
 * Users see improvements on data computing and data quality.

Goal(s)

 * Data Sanitization backend for hadoop that includes ability to salt & hash. ✅

Outcome 4 / Output 2
Foundation staff and community have better visual tools to access data about content, contributors and readers.
 * MediaWiki content is available on cluster on recurrent schedule

Goal(s)

 * STRETCH GOAL: Productionize MediaWiki content processing. Ingest and process text on every wikipedia page to use later for analytics-style computations

Outcome 4 / Output 3
Foundation staff and community have better visual tools to access data about content, contributors and readers.

Goal(s)

 * STRETCH GOAL: : More efficient Bot filtering on pageview data.

Outcome 5 / Output 1
We have scalable, performant and reliable software for data transport
 * Software maintenance on analytics stack to maintain current level of service

Goal(s)

 * Spin out a tiny EventLogging RL module for lightweight logging



=Q2 Goals =

Outcome X / Output X
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
 * Nullam interdum, elit in malesuada aliquam, libero lorem auctor lacus, eu mattis lacus velit vitae mauris.

Dependencies on: ___________

Goal(s)

 * Ut eget sodales odio. Maecenas a varius leo.

Status
October 2018
 * Discussed...

November 2018
 * Discussed...

December 2018
 * Discussed...