Wikimedia Technology/Goals/2019-20 Q3

Technology Department Team Goals and Status for Q3 FY19/20 in support of the  Medium Term Plan (MTP) Priorities and Annual Plan for FY19/20



Analytics
Team Manager: Nuria Ruiz
 * Modern Event Platform
 * Build a reliable, scalable, and comprehensive platform for creating services, tools and user facing features that produce and consume event data'''
 * Deploy a new event stream for analytics using the new Event Platform infrastructure
 * Client side Error logging enabled for 1 wiki such errors from browser clients are surfaced to developers
 * Public schema endpoint for event schemas  ✅


 * Smart Tools for Better Data. Make easier to understand the history of all Wikimedia projects
 * Design (together with core platform team) an alternative architecture for historic data endpoints used by iOS application
 * Move stats.wikimedia.org domain to point to Wikistats2 by default
 * Wikistats2 UI is localized
 * Project Newpyter: First Class Jupyter Notebook system. Deliver Project plan
 * Enable Presto access for shell users, much faster way to query hadoop


 * Smart Tools for Better Data. Increase Data Quality, Privacy and Security
 * Bots: Label high volume bot spikes in pageview data as automated traffic


 * Core. Operational Excellence. Increase Resilience of Systems
 * Deploy Eventstreams with Kubernetes
 * Create a MySQL replica for backups for all MySQL instances we use MySQL on, like Oozie or Superset


 * Core. Operational Excellence. Reduce Operational Load by Phasing Out Legacy Systems/Technologies
 * Reimage at least one Hadoop worker to Debian Buster (to unlock the upgrade) -
 * Spark 2.4 encryption working in Hadoop
 * Refresh 1004 with new host and GPU
 * Airflow as an easier job scheduling alternative, PoC for refine workflow

 Status 
 * January 21, 2020 status - updated above
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -



Core Platform
Team Manager: Corey Floyd
 * Core Work
 * Enable MultiDC Reads


 * IP Masking
 * IP Masking Implementation


 * Platform Evolution / Modern Event Platform
 * Initial modularization of MediaWiki (planning) (continued from Q2)
 * Push notification service
 * Migrate Service - changeprop (continued from Q2)
 * Initial modularization of MediaWiki (one component)
 * FAWG (desktop refresh work)
 * API Gateway


 * Tech & Product Partnerships
 * Developer Portal Implementation
 * Integrate OAuth 2.0 into API, API Keys, rate limiting
 * Paid API project


 * Safe and Secure Spaces
 * Image Hash Checking (in beta)

Dependencies on:

 Status 
 * January 24, 2020 status
 * In review:
 * Migrate Service - changeprop
 * Enable MultiDC Reads
 * Developer Portal Implementation
 * IP Masking Implementation
 * February 2020 status -
 * March 2020 status -
 * IP Masking Implementation
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -



Performance
Team Manager: Gilles Dubuc
 * Core Work
 * Figure out the right store to use for the main stash (continued from Q1)
 * Publish 8 blog posts about performance (continued from Q1)
 * Support and maintenance of MediaWiki's object caching and data access components (continued from Q1)
 * Support and maintenance of WebPageTest and synthetic testing infrastructure (continued from Q1)
 * Support and maintenance of MediaWiki's ResourceLoader (continued from Q1)
 * Support and maintenance of Fresnel (continued from Q1)
 * Add operational monitoring for 100% of the performance-team services
 * Have at least 2 years of retention for ArcLamp flame graphs
 * Organise and run the Web Performance devroom at FOSDEM 2020 (continued from Q2)
 * Make seen/unseen state of watched pages reliable (all affected users who reported the issue are satisfied)
 * DC-shared object caches are available and replicate even if 20% of the servers fail
 * DC-shared temporary data is written via queues or replicated stores (continued from Q2)
 * Shared caching and temporary data storage use established/documented patterns
 * A memory-only lightweight store for temporary data exists and supports global (cross-wiki) keys (continued from Q2)
 * A disk-backed lightweight store for temporary data exists and supports global (cross-wiki) keys (continued from Q2)
 * Document how to add your own User Timing and how to see it in RUM and synthetic testing
 * Collect and graph First Input Delay
 * Document how to add your own Element Timing and how to see it in synthetic testing
 * Document how to add your own user journey for synthetic testing
 * Document the search user journeys in synthetic testing


 * Platform Evolution / Modern Event Platform
 * Provide performance expertise to FAWG outcome (continued from Q1)

Dependencies on:

Quality and Test Engineering
Team Manager: JR Branaa
 * Core Work
 * Team inception, formalization, and assessment of current organizational practices (continued from Q1)
 * Add all deployed to production repos to the Code Health pipeline (Code Health Metrics).
 * Solicit feedback from current users of CHM POC and define phase 2 enhancements (continued from Q2)
 * Interview engineering teams to understand their current code review practices (continued from Q2)
 * Relaunch the Code Review Office Hours (continued from Q2)
 * Put in place Code Review performance metrics (continued from Q2)
 * Develop Test Strategy for CPT


 * Platform Evolution / Modern Event Platform
 * Make CI warn about slow tests, and publish a collated list of slow tests
 * Transfer maintainership/ownership of API Test Tooling from CPT

Dependencies on:

Release Engineering
Team Manager: Tyler Cipriani
 * Core Work
 * Set up an experimental elastic search instance to store and analyze CI logs and metrics
 * Continuation of Phabricator and Gerrit improvement (in conjunction with SRE) (continued from Q2)
 * Migrate from Gerrit version 2.15 to 2.16


 * Platform Evolution / Modern Event Platform
 * Other service deployment pipeline migrations as prioritized between SRE/RelEng and relevant teams (continued from Q2)
 * A demonstration MediaWiki development environment hosts the full TimedMediaHandler front-end and back-end workflow

Dependencies on:

 Status 
 * January 2020 status -
 * Reduce weekly rate of RDBMs commit/shutdown errors to less than 10
 * DC-shared temporary data is written via queues or replicated stores
 * A memory-only lightweight store for temporary data exists and supports global (cross-wiki) keys
 * A disk-backed lightweight store for temporary data exists and supports global (cross-wiki) keys
 * Migrate prod localisation cache to faster static-array distribution
 * Improve MediaWiki PHP startup time and recover from PHP7 regression
 * Add all deployed to production repos to the Code Health pipeline (Code Health Metrics)
 * Develop Test Strategy for CPT
 * Develop Test Strategy for Editing Team
 * Transfer maintainership/ownership of API Test Tooling from CPT
 * (Stretch): MobileContentService
 * (Stretch): MobileContentService


 * Reduce weekly rate of lost post-send updates to less than 10
 * February 2020 status -
 * March 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -



Fundraising Tech
Team Manager: Erika Bjune
 * Core Work
 * Support Advancement in testing and planned Q3 campaigns
 * Make IDEAL payment processor campaign (support for Q4 campaigns)
 * Start Matching Gifts V2

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Machine Learning / Scoring Platform
Team Manager: Aaron Halfaker
 * Content Integrity
 * JBuild/improve models in response to community demand (ongoing every quarter)


 * Machine Learning Infrastructure
 * Jade expansion/Iteration
 * Session-model use, maintenance, and user-research

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Research
Team Manager: Leila Zia
 * Address Knowledge Gaps
 * Conduct a literature review, plan and set up collaborations for projects about understanding engagement with Wikimedia images around the world. (continued from Q2)
 * Build one formal collaborations in the disinformation space to start the research for building solutions starting Q3. (continued from Q2)

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Search Platform
Team Manager: Guillaume Lederrey
 * Core Work
 * 1.1 New query parser is used in production by the end of Q3
 * 2.2 WDQS storage expansion (continued from Q2)
 * 7.1. Increase understanding of our work outside our team, and outside the Foundation
 * 8.1. Improve search quality, especially for non-English wikis by prioritizing community requests - Positive feedback from speakers/community on changes made
 * 10.1 Newcomer task


 * Wikidata
 * Improve WDQS updater performance


 * Machine Learning Infrastructure
 * 3.1. Glent method 1 (comparison to other users' queries) offline tested, tuned, A/B tested and possibly deployed end of Q3 (continued from Q2)


 * Address Knowledge Gaps
 * 6.1. Increase of training data retention (>90 days) is validated with Legal / Privacy (continued from Q2)
 * 6.2. Any new data retention requirements are implemented (validate with Legal) (continued from Q2)


 * Structured Data
 * 9.1. Proof of Concept SPARQL endpoint for SDoC is available on WMCS and updated weekly. (stretch) (continued from Q2)

Dependencies on: SDC, Legal

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Security
Team Manager: John Bennett
 * Core Work
 * Incident response Table Top and updates to security after action reports and improvement plans (continued from Q2)
 * Create design document for DAST implementation and development tools pen testing
 * Threat Intel/Hunt
 * NIST Assessments
 * Create or improve language-based best security practices documentation (continued from Q2)

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Site Reliability Engineering
Directors: Mark Bergsma and Faidon Liambotis
 * Cross-cutting

Service Operations
Team Manager: Mark Bergsma
 * Core Work

Data Persistence
Team Manager: Mark Bergsma
 * Core Work

Traffic
Team Manager: Brandon Black
 * Core Work

Infrastructure Foundations
Team Manager: Faidon Liambotis
 * Core Work

Observability
Team Manager: Faidon Liambotis
 * Core Work

Data Center Operations
Team Manager: Willy Pao
 * Core Work
 * Modify existing dc-ops processes to be able to measure SLAs effectively
 * Create landing page, that directs end users to the various types of data center requests, its appropriate template, and expected turnaround time
 * Partner with Joel and Automation team to establish reports that can measure SLAs via Phabricator and/or Netbox
 * Define SLAs for each type of dc-ops tasks
 * Order and receive all Q3 hardware procurement orders by end of quarter
 * Clean out eqiad storage room and send all decommissioned equipment and unneeded parts for recycling by end of January
 * Partner with Julianne and Automation team to revamp decommission process of manually entering information into spreadsheet
 * Reduce total number of open data center tasks by 30%

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Technical Engagement
Team Manager: Birgit Müller

Developer Advocacy
Team Manager: Birgit Müller

Key Deliverables: Reduce Complexity of the Platform; Movement diversity
 * Create a blog by and for technical audiences where members of the technical community can post about their technical work
 * Publish 6 (min) technical blog posts
 * Create regular cadence of content -- strive for 3 x per week -- @MediaWiki and @Wikimediatech
 * Run Wikimedia Technical Talks -- increase views on talks by 10%
 * Prepare release of 3rd edition of the Tech Community Newsletter (publishing date: April 2020)
 * Make further improvements to the dashboard for Wikimedia Cloud Services edit data and announce it on targeted channels.
 * Publish current numbers on technical contributions provided by Bitergia in the Quarterly Tech Community newsletter (by Jan 2020)
 * Coordinate with Bitergia and get data on "Avg. Time Open (Days)" for Gerrit patchsets per affiliation and "time to first review" data for patches by end of Q4.
 * Find out what is needed to get data on technical contributions/contributors (by Q3)
 * Provide “showroom”, introducing newcomers to a variety of different tools to show what developers can do in Toolforge (by Q3, in collaboration with GCI students)
 * In Q2/Q3, at least 700 task instances are completed in Google Code-in.
 * At least five projects are successfully completed by Outreachy interns by end of Q3.
 * At least 12 projects are promoted in GSOC and Outreachy programs.
 * Kick off Friends of the Docs initiative (prep work in Q3; kick off in Q4)
 * Develop workshop concept with partner community for technical workshops in Q3/Q4
 * Provide continuous support for teams and individuals in Phabricator
 * Conduct at least 4 workshops + introductions into Phabricator at movement events by end of Q4
 * Establish Phabricator training for new staff members
 * A starter kit for small wikis containing a recommended set of templates, Gadgets, bots etc. is available by Q4

Wikimedia Cloud Services
Team Manager: Bryan Davis

tbd

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -