Wikimedia Technology/Goals/2019-20 Q3

Technology Department Team Goals and Status for Q3 FY19/20 in support of the  Medium Term Plan (MTP) Priorities and Annual Plan for FY19/20



Analytics
Team Manager: Nuria Ruiz
 * Core Work
 * Limit our users on stats machines
 * Spike SWAP (notebook)
 * Airflow (proof of concept for scheduler system)
 * ML model building, how to?
 * Enable Kerberos for e-very-body


 * Platform Evolution / Modern Event Platform
 * Development work for kafka connect
 * Begin replacing (some) event Camus jobs with Kafka Connect
 * Replacement of eventlogging server side (event gate) and client side
 * Development on Stream Config Service starts.
 * Replacement of eventlogging server side (event gate) and client side


 * Smart Tools for Better Data
 * Enqueue eventlogging requests for better performance (continued from Q1)
 * Bot Detection “Remove automated traffic not identified as such from readers data” (continued from Q2)
 * Metrics for wikistats missing from Q4
 * Restructuring on event_longterm workflow
 * Mediacounts API, Phase 2
 * Development work for designer mocks of upcoming changes to wikistats UI.

Dependencies on: SRE and Legal

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Core Platform
Team Manager: Corey Floyd
 * Core Work
 * Enable MultiDC Reads


 * IP Masking
 * IP Masking Implementation


 * Platform Evolution / Modern Event Platform
 * Initial modularization of MediaWiki (planning) (continued from Q2)
 * Push notification service
 * Initial modularization of MediaWiki (one component)
 * FAWG (desktop refresh work)
 * Wikimedia Unified API


 * Tech & Product Partnerships
 * Developer Portal Implementation
 * Integrate OAuth 2.0 into API (Phase 2)
 * Paid API project

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Performance
Team Manager: Gilles Dubuc
 * Core Work
 * Figure out the right store to use for the main stash (continued from Q1)
 * Publish 8 blog posts about performance (continued from Q1)
 * Support and maintenance of MediaWiki's object caching and data access components (continued from Q1)
 * Support and maintenance of WebPageTest and synthetic testing infrastructure (continued from Q1)
 * Support and maintenance of MediaWiki's ResourceLoader (continued from Q1)
 * Support and maintenance of Fresnel (continued from Q1)
 * Add operational monitoring for 100% of the performance-team services
 * Have at least 2 years of retention for ArcLamp flame graphs
 * Organise and run the Web Performance devroom at FOSDEM 2020 (continued from Q2)
 * Make seen/unseen state of watched pages reliable (all affected users who reported the issue are satisfied)
 * DC-shared object caches are available and replicate even if 20% of the servers fail
 * DC-shared temporary data is written via queues or replicated stores (continued from Q2)
 * Shared caching and temporary data storage use established/documented patterns
 * A memory-only lightweight store for temporary data exists and supports global (cross-wiki) keys (continued from Q2)
 * A disk-backed lightweight store for temporary data exists and supports global (cross-wiki) keys (continued from Q2)
 * Document how to add your own User Timing and how to see it in RUM and synthetic testing
 * Collect and graph First Input Delay
 * Document how to add your own Element Timing and how to see it in synthetic testing
 * Document how to add your own user journey for synthetic testing
 * Document the search user journeys in synthetic testing


 * Platform Evolution / Modern Event Platform
 * Provide performance expertise to FAWG outcome (continued from Q1)

Dependencies on:

Quality and Test Engineering
Team Manager: JR Branaa
 * Core Work
 * Team inception, formalization, and assessment of current organizational practices (continued from Q1)
 * Add all deployed to production repos to the Code Health pipeline (Code Health Metrics).
 * Solicit feedback from current users of CHM POC and define phase 2 enhancements (continued from Q2)
 * Interview engineering teams to understand their current code review practices (continued from Q2)
 * Relaunch the Code Review Office Hours (continued from Q2)
 * Put in place Code Review performance metrics (continued from Q2)
 * Develop Test Strategy for CPT


 * Platform Evolution / Modern Event Platform
 * Make CI warn about slow tests, and publish a collated list of slow tests
 * Transfer maintainership/ownership of API Test Tooling from CPT

Dependencies on:

Release Engineering
Team Manager: Tyler Cipriani
 * Core Work
 * Set up an experimental elastic search instance to store and analyze CI logs and metrics
 * Continuation of Phabricator and Gerrit improvement (in conjunction with SRE) (continued from Q2)
 * Migrate from Gerrit version 2.15 to 2.16


 * Platform Evolution / Modern Event Platform
 * Other service deployment pipeline migrations as prioritized between SRE/RelEng and relevant teams (continued from Q2)
 * A demonstration MediaWiki development environment hosts the full TimedMediaHandler front-end and back-end workflow

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Fundraising Tech
Team Manager: Erika Bjune
 * Core Work
 * Support Advancement in testing and planned Q3 campaigns
 * Make IDEAL payment processor campaign (support for Q4 campaigns)
 * Start Matching Gifts V2

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Machine Learning / Scoring Platform
Team Manager: Aaron Halfaker
 * Content Integrity
 * JBuild/improve models in response to community demand (ongoing every quarter)


 * Machine Learning Infrastructure
 * Jade expansion/Iteration
 * Session-model use, maintenance, and user-research

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Research
Team Manager: Leila Zia
 * Address Knowledge Gaps
 * Conduct a literature review, plan and set up collaborations for projects about understanding engagement with Wikimedia images around the world. (continued from Q2)
 * Build one formal collaborations in the disinformation space to start the research for building solutions starting Q3. (continued from Q2)

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Search Platform
Team Manager: Guillaume Lederrey
 * Core Work
 * 1.1 New query parser is used in production by the end of Q3
 * 2.2 WDQS storage expansion (continued from Q2)
 * 7.1. Increase understanding of our work outside our team, and outside the Foundation
 * 8.1. Improve search quality, especially for non-English wikis by prioritizing community requests - Positive feedback from speakers/community on changes made
 * 10.1 Newcomer task


 * Wikidata
 * Improve WDQS updater performance


 * Machine Learning Infrastructure
 * 3.1. Glent method 1 (comparison to other users' queries) offline tested, tuned, A/B tested and possibly deployed end of Q3 (continued from Q2)


 * Address Knowledge Gaps
 * 6.1. Increase of training data retention (>90 days) is validated with Legal / Privacy (continued from Q2)
 * 6.2. Any new data retention requirements are implemented (validate with Legal) (continued from Q2)


 * Structured Data
 * 9.1. Proof of Concept SPARQL endpoint for SDoC is available on WMCS and updated weekly. (stretch) (continued from Q2)

Dependencies on: SDC, Legal

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Security
Team Manager: John Bennett
 * Core Work
 * Incident response Table Top and updates to security after action reports and improvement plans (continued from Q2)
 * Create design document for DAST implementation and development tools pen testing
 * Threat Intel/Hunt
 * NIST Assessments
 * Create or improve language-based best security practices documentation (continued from Q2)

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Site Reliability Engineering
Directors: Mark Bergsma and Faidon Liambotis
 * Cross-cutting

Service Operations
Team Manager: Mark Bergsma
 * Core Work

Data Persistence
Team Manager: Mark Bergsma
 * Core Work

Traffic
Team Manager: Brandon Black
 * Core Work

Infrastructure Foundations
Team Manager: Faidon Liambotis
 * Core Work

Observability
Team Manager: Faidon Liambotis
 * Core Work

Data Center Operations
Team Manager: Willy Pao
 * Core Work
 * Modify existing dc-ops processes to be able to measure SLAs effectively
 * Create landing page, that directs end users to the various types of data center requests, its appropriate template, and expected turnaround time
 * Partner with Joel and Automation team to establish reports that can measure SLAs via Phabricator and/or Netbox
 * Define SLAs for each type of dc-ops tasks
 * Order and receive all Q3 hardware procurement orders by end of quarter
 * Clean out eqiad storage room and send all decommissioned equipment and unneeded parts for recycling by end of January
 * Partner with Julianne and Automation team to revamp decommission process of manually entering information into spreadsheet
 * Reduce total number of open data center tasks by 30%

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Technical Engagement
Team Manager: Birgit Müller
 * Cross-cutting
 * Hire Developer Advocate (continued from Q2)

Developer Advocacy
Team Manager: Birgit Müller
 * Movement Diversity
 * Pilot project India
 * Review Developer Metrics Case Study
 * Design & publish Tech Engagement quarterly newsletter Ed3
 * Create a blog by and for technical audiences where members of the technical community can post about their technical work. (continued from Q2)
 * Two Blog posts for Q3 - TBD
 * Develop Wiki bootstrap list jointly with community
 * Publish 6 (min) technical blog posts (continued from Q2)
 * Provide “showroom”, introducing newcomers to a variety of different tools to show what developers can do in Toolforge by Q3 (continued from Q2)
 * Coordinate with Bitergia and get data on "Avg. Time Open (Days)" for Gerrit patchsets per affiliation and "time to first review" data for patches (by end of Q4) (continued from Q2)
 * Develop workshop concept with partner community for technical workshops in Q3 (continued from Q2)
 * At least five projects are successfully completed by Outreachy interns by end of Q3.


 * Platform Evolution / Modern Event Platform
 * Gather and publish current numbers on technical contributions provided by Bitergia in the Quarterly Tech Community newsletter (by Jan 2020)
 * Platform Evolution / Modern Event PlatformTBD: visualisation of other statistics (WMCS projects, maintainers …)
 * Platform Evolution / Modern Event PlatformTechnical internships + mentoring - Q3

Wikimedia Cloud Services
Team Manager: Bryan Davis
 * Core Work
 * Hypervisor growth + replace out of warranty servers
 * Migrate instance storage to Ceph cluster
 * Run systems failover test plan
 * Per-tool hostnames ( .toolforge.org)
 * Quota limited, reliable backup service for Cloud VPS tenants (rsync.net clone)
 * Gather requirements for a Wiki Replica replacement service

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -