Wikimedia Technology/Goals/2019-20 Q3

Technology Department Team Goals and Status for Q3 FY19/20 in support of the  Medium Term Plan (MTP) Priorities and Annual Plan for FY19/20



Analytics
Team Manager: Nuria Ruiz
 * Modern Event Platform
 * Build a reliable, scalable, and comprehensive platform for creating services, tools and user facing features that produce and consume event data'''
 * Deploy a new event stream for analytics using the new Event Platform infrastructure
 * Client side Error logging enabled for 1 wiki such errors from browser clients are surfaced to developers
 * Public schema endpoint for event schemas  ✅


 * Smart Tools for Better Data. Make easier to understand the history of all Wikimedia projects
 * Design (together with core platform team) an alternative architecture for historic data endpoints used by iOS application
 * Move stats.wikimedia.org domain to point to Wikistats2 by default ✅
 * Wikistats2 UI is localized
 * Project Newpyter: First Class Jupyter Notebook system. Deliver Project plan
 * Enable Presto access for shell users, much faster way to query hadoop ✅


 * Smart Tools for Better Data. Increase Data Quality, Privacy and Security
 * Bots: Label high volume bot spikes in pageview data as automated traffic


 * Core. Operational Excellence. Increase Resilience of Systems
 * Deploy Eventstreams with Kubernetes
 * Create a MySQL replica for backups for all MySQL instances we use MySQL on, like Oozie or Superset


 * Core. Operational Excellence. Reduce Operational Load by Phasing Out Legacy Systems/Technologies
 * Reimage at least one Hadoop worker to Debian Buster (to unlock the upgrade) -
 * Spark 2.4 encryption working in Hadoop
 * Refresh 1004 with new host and GPU
 * Airflow as an easier job scheduling alternative, PoC for refine workflow

 Status 
 * January 21, 2020 status - updated above
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -



Core Platform
Team Manager: Corey Floyd
 * Core Work
 * Enable MultiDC Reads


 * IP Masking
 * IP Masking Implementation


 * Platform Evolution / Modern Event Platform
 * Initial modularization of MediaWiki (planning) (continued from Q2)
 * Push notification service
 * Migrate Service - changeprop (continued from Q2)
 * Initial modularization of MediaWiki (one component)
 * FAWG (desktop refresh work)
 * API Gateway


 * Tech & Product Partnerships
 * Developer Portal Implementation
 * Integrate OAuth 2.0 into API, API Keys, rate limiting
 * Paid API project


 * Safe and Secure Spaces
 * Image Hash Checking (in beta)

Dependencies on:

 Status 
 * January 24, 2020 status
 * In review:
 * Migrate Service - changeprop
 * Enable MultiDC Reads
 * Developer Portal Implementation
 * IP Masking Implementation
 * February 2020 status -
 * March 2020 status -
 * IP Masking Implementation
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -



Performance
Team Manager: Gilles Dubuc
 * Core Work
 * Figure out the right store to use for the main stash (continued from Q1)
 * Publish 8 blog posts about performance (continued from Q1)
 * Support and maintenance of MediaWiki's object caching and data access components (continued from Q1)
 * Support and maintenance of WebPageTest and synthetic testing infrastructure (continued from Q1)
 * Support and maintenance of MediaWiki's ResourceLoader (continued from Q1)
 * Support and maintenance of Fresnel (continued from Q1)
 * Add operational monitoring for 100% of the performance-team services
 * Have at least 2 years of retention for ArcLamp flame graphs
 * Organise and run the Web Performance devroom at FOSDEM 2020 (continued from Q2)
 * Make seen/unseen state of watched pages reliable (all affected users who reported the issue are satisfied)
 * DC-shared object caches are available and replicate even if 20% of the servers fail
 * DC-shared temporary data is written via queues or replicated stores (continued from Q2)
 * Shared caching and temporary data storage use established/documented patterns
 * A memory-only lightweight store for temporary data exists and supports global (cross-wiki) keys (continued from Q2)
 * A disk-backed lightweight store for temporary data exists and supports global (cross-wiki) keys (continued from Q2)
 * Document how to add your own User Timing and how to see it in RUM and synthetic testing
 * Collect and graph First Input Delay
 * Document how to add your own Element Timing and how to see it in synthetic testing
 * Document how to add your own user journey for synthetic testing
 * Document the search user journeys in synthetic testing


 * Platform Evolution / Modern Event Platform
 * Provide performance expertise to FAWG outcome (continued from Q1)

Quality and Test Engineering
Team Manager: JR Branaa
 * Core Work
 * Team inception, formalization, and assessment of current organizational practices (continued from Q1)
 * Add all deployed to production repos to the Code Health pipeline (Code Health Metrics).
 * Solicit feedback from current users of CHM POC and define phase 2 enhancements (continued from Q2)
 * Interview engineering teams to understand their current code review practices (continued from Q2)
 * Relaunch the Code Review Office Hours (continued from Q2)
 * Put in place Code Review performance metrics (continued from Q2)
 * Develop Test Strategy for CPT


 * Platform Evolution / Modern Event Platform
 * Make CI warn about slow tests, and publish a collated list of slow tests
 * Transfer maintainership/ownership of API Test Tooling from CPT

Release Engineering
Team Manager: Tyler Cipriani
 * Core Work
 * Set up an experimental elastic search instance to store and analyze CI logs and metrics
 * Continuation of Phabricator and Gerrit improvement (in conjunction with SRE) (continued from Q2)
 * Migrate from Gerrit version 2.15 to 2.16


 * Platform Evolution / Modern Event Platform
 * Other service deployment pipeline migrations as prioritized between SRE/RelEng and relevant teams (continued from Q2)
 * A demonstration MediaWiki development environment hosts the full TimedMediaHandler front-end and back-end workflow

Dependencies on:

 Status 
 * January 24, 2020 status
 * Reduce weekly rate of RDBMs commit/shutdown errors to less than 10
 * DC-shared temporary data is written via queues or replicated stores
 * A memory-only lightweight store for temporary data exists and supports global (cross-wiki) keys
 * A disk-backed lightweight store for temporary data exists and supports global (cross-wiki) keys
 * Migrate prod localisation cache to faster static-array distribution
 * Improve MediaWiki PHP startup time and recover from PHP7 regression
 * Add all deployed to production repos to the Code Health pipeline (Code Health Metrics)
 * Develop Test Strategy for CPT
 * Develop Test Strategy for Editing Team
 * Transfer maintainership/ownership of API Test Tooling from CPT
 * (Stretch): MobileContentService
 * Continuation of Phabricator and Gerrit improvement (in conjunction with SRE)
 * Other service deployment pipeline migrations as prioritized between SRE/RelEng and relevant teams.
 * Migrate from Gerrit version 2.15 to 2.16
 * Scap static PHP array for l10n cache
 * A demonstration MediaWiki development environment hosts the full TimedMediaHandler front-end and back-end workflow
 * Create Basic Local Development Environment for MediaWiki/Core
 * Create PipelineLib documentation to support deployment pipeline migrations
 * Reduce weekly rate of lost post-send updates to less than 10
 * February 2020 status -
 * March 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -



Fundraising Tech
Team Manager: Erika Bjune
 * Core Work
 * Support Advancement in testing and planned Q3 campaigns
 * Make IDEAL payment processor campaign (support for Q4 campaigns)
 * Start Matching Gifts V2

Dependencies on:

 Status 
 * January 24, 2020 status
 * Support Advancement in testing and planned Q3 campaigns
 * Make IDEAL payment processor campaign (support for Q4 campaigns)
 * Recover from the holidays
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Machine Learning / Scoring Platform
Team Manager: Aaron Halfaker
 * Content Integrity
 * JBuild/improve models in response to community demand (ongoing every quarter)


 * Machine Learning Infrastructure
 * Jade expansion/Iteration
 * Session-model use, maintenance, and user-research

Dependencies on:

 Status 
 * January 24, 2020 status
 * Expansion of Topic Model to ar, ko, and cswiki
 * Support operations infrastructure improvements (prometheus)
 * Deploy Entity UI for Jade
 * Deployment of session-based models
 * Build/improve models in response to community demand (ongoing every quarter)
 * Improve topic models in based on initial use
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Research
Team Manager: Leila Zia
 * Address Knowledge Gaps
 * Conduct a literature review, plan and set up collaborations for projects about understanding engagement with Wikimedia images around the world. (continued from Q2)
 * Build one formal collaborations in the disinformation space to start the research for building solutions starting Q3. (continued from Q2)

Dependencies on:

 Status 
 * January 24, 2020 status
 * Finalize the research brief for crosslingual topical model laying out the work that will be done in this space starting Q3.
 * Build one formal collaborations in the disinformation space to start the research for building solutions starting Q3.
 * Plan for a challenge: come up with an initial format, put a committee together, choose a venue for presentations.
 * Finalize a proposal for changes in Research based on learnings about Research's audience, what they expect from the team, our positioning within WMF, Movement, and the Research community, and the opportunities for impact.
 * Document and communicate with the team: expectations of the Research Scientist role and trajectory in the IC track.
 * Measuring the consistency of information between Wikipedia articles and Wikidata items
 * Run office hours WMF/Research&Analytics
 * A list of meaningful Commons Categories whose images can be used to train image classifiers
 * Link recommendation
 * Submit work on section alignment
 * Research showcase take-over transition
 * Start formal collaboration around the project "Understanding Readers' Image Usage in Wikipedia"
 * Investigate Knowledge Gaps in Multimedia
 * Supervise Outreachy Internship on Releasing data dumps for Citation Needed Classifiers
 * Start research project on navigation paths "How we read wikipedia"
 * Build taxonomy of readership gaps
 * Define research agenda for external re-use
 * Submit paper on reader demographics surveys for peer-review
 * Research showcase improvements for 2020
 * Pilot social media traffic reports for English Wikipedia
 * Organize Wiki Workshop 2020
 * Submit Citation Usage Camera Ready Paper to the Web Conference 2020
 * Start the first version of the Research Internships program
 * Launch experimental API for Wikidata-based topic model
 * February 2020 status -
 * March 2020 status -
 * Launch experimental API for Wikidata-based topic model
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -



Search Platform
Team Manager: Guillaume Lederrey
 * Core Work
 * 1.1 New query parser is used in production by the end of Q3
 * 2.2 WDQS storage expansion (continued from Q2)
 * 7.1. Increase understanding of our work outside our team, and outside the Foundation
 * 8.1. Improve search quality, especially for non-English wikis by prioritizing community requests - Positive feedback from speakers/community on changes made
 * 10.1 Newcomer task


 * Wikidata
 * Improve WDQS updater performance


 * Machine Learning Infrastructure
 * 3.1. Glent method 1 (comparison to other users' queries) offline tested, tuned, A/B tested and possibly deployed end of Q3 (continued from Q2)


 * Address Knowledge Gaps
 * 6.1. Increase of training data retention (>90 days) is validated with Legal / Privacy (continued from Q2)
 * 6.2. Any new data retention requirements are implemented (validate with Legal) (continued from Q2)


 * Structured Data
 * 9.1. Proof of Concept SPARQL endpoint for SDoC is available on WMCS and updated weekly. (stretch) (continued from Q2)

Dependencies on: SDC, Legal

 Status 
 * January 24, 020 status
 * 8.1. Improve search quality, especially for non-English wikis by prioritizing community requests - Positive feedback from speakers/community on changes made
 * 10.1 Newcomer task
 * Import RDF data to answer questions on partioning graph databases - a better understanding of the data
 * 3.2. Glent method 1 (comparison to other users' queries) offline tested, tuned, A/B tested and possibly deployed end of Q3
 * Improve WDQS updater performance
 * 1.1. New query parser is used in production by the end of Q3
 * 2.2. WDQS storage expansion
 * 6.2. Any new data retention requirements are implemented (validate with Legal)
 * 6.1. Increase of training data retention (>90 days) is validated with Legal / Privacy
 * 7.1. Increase understanding of our work outside our team, and outside the Foundation
 * 9.1. Proof of Concept SPARQL endpoint for SDoC is available on WMCS and updated weekly. (stretch)
 * February 2020 status -
 * March 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -



Security
Team Manager: John Bennett
 * Core Work
 * Incident response Table Top and updates to security after action reports and improvement plans (continued from Q2)
 * Create design document for DAST implementation and development tools pen testing
 * Threat Intel/Hunt
 * NIST Assessments
 * Create or improve language-based best security practices documentation (continued from Q2)

Dependencies on:

 Status 
 * January 24, 2020 status
 * Create privacy engineering charter
 * Assess / Refine Phab Usage and Workflows
 * Create design document for DAST implementation and development tools pen testing
 * Enhance and Deploy (to beta) StopForumSpam
 * Evaluate providing a Tier 1 (Clinic-like) function for the Security team.
 * Develop organization-wide privacy training.
 * Incident response Table Top and updates to security after action reports and improvement plans
 * February 2020 status -
 * March 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -



Site Reliability Engineering
Directors: Mark Bergsma and Faidon Liambotis
 * Cross-cutting

Service Operations
Team Manager: Mark Bergsma
 * Core Work

Data Persistence
Team Manager: Mark Bergsma
 * Core Work

Traffic
Team Manager: Brandon Black
 * Core Work

Infrastructure Foundations
Team Manager: Faidon Liambotis
 * Core Work

Observability
Team Manager: Faidon Liambotis
 * Core Work

Data Center Operations
Team Manager: Willy Pao
 * Core Work
 * Modify existing dc-ops processes to be able to measure SLAs effectively
 * Create landing page, that directs end users to the various types of data center requests, its appropriate template, and expected turnaround time
 * Partner with Joel and Automation team to establish reports that can measure SLAs via Phabricator and/or Netbox
 * Define SLAs for each type of dc-ops tasks
 * Order and receive all Q3 hardware procurement orders by end of quarter
 * Clean out eqiad storage room and send all decommissioned equipment and unneeded parts for recycling by end of January
 * Partner with Julianne and Automation team to revamp decommission process of manually entering information into spreadsheet
 * Reduce total number of open data center tasks by 30%

Dependencies on:

 Status 
 * January 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -
 * March 2020 status -



Technical Engagement
Team Manager: Birgit Müller

Developer Advocacy
Team Manager: Birgit Müller

Key Deliverables: Reduce Complexity of the Platform; Movement diversity
 * Create a blog by and for technical audiences where members of the technical community can post about their technical work
 * Publish 6 (min) technical blog posts
 * Create regular cadence of content -- strive for 3 x per week -- @MediaWiki and @Wikimediatech
 * Run Wikimedia Technical Talks -- increase views on talks by 10%
 * Prepare release of 3rd edition of the Tech Community Newsletter (publishing date: April 2020)
 * Make further improvements to the dashboard for Wikimedia Cloud Services edit data and announce it on targeted channels.
 * Publish current numbers on technical contributions provided by Bitergia in the Quarterly Tech Community newsletter (by Jan 2020)
 * Coordinate with Bitergia and get data on "Avg. Time Open (Days)" for Gerrit patchsets per affiliation and "time to first review" data for patches by end of Q4.
 * Find out what is needed to get data on technical contributions/contributors (by Q3)
 * Provide “showroom”, introducing newcomers to a variety of different tools to show what developers can do in Toolforge (by Q3, in collaboration with GCI students)
 * In Q2/Q3, at least 700 task instances are completed in Google Code-in.
 * At least five projects are successfully completed by Outreachy interns by end of Q3.
 * At least 12 projects are promoted in GSOC and Outreachy programs.
 * Kick off Friends of the Docs initiative (prep work in Q3; kick off in Q4)
 * Develop workshop concept with partner community for technical workshops in Q3/Q4
 * Provide continuous support for teams and individuals in Phabricator
 * Conduct at least 4 workshops + introductions into Phabricator at movement events by end of Q4
 * Establish Phabricator training for new staff members
 * A starter kit for small wikis containing a recommended set of templates, Gadgets, bots etc. is available by Q4

Wikimedia Cloud Services
Team Manager: Bryan Davis

tbd

Dependencies on:

 Status 
 * January 24, 2020 status
 * At least five projects are successfully completed by Outreachy interns by end of Q3.
 * Provide “showroom”, introducing newcomers to a variety of different tools to show what developers can do in Toolforge by Q3
 * Find out what is needed to get data on technical contributions/contributors (by Q3)
 * Coordinate with Bitergia and get data on "Avg. Time Open (Days)" for Gerrit patchsets per affiliation and "time to first review" data for patches (by end of Q4).
 * Create a blog by and for technical audiences where members of the technical community can post about their technical work.
 * Develop workshop concept with partner community for technical workshops in Q3
 * Advocate for better processes + practices to support developer productivity
 * Create regular cadence of content -- strive for 3 x per week -- @MediaWiki and @Wikimediatech
 * In Q2/Q3, at least 700 task instances are completed in Google Code-in.
 * Conduct at least 4 workshops + introductions into Phabricator at movement events by end of Q4
 * Provide “showroom”, introducing newcomers to a variety of different tools to show what developers can do in Toolforge (by Q3, in collaboration with GCI students)
 * Provide Bug Management support in Phabricator
 * Publish 6 (min) technical blog posts
 * Develop workshop concept with partner community for technical workshops in Q3/Q4
 * At least five projects are successfully completed by Outreachy interns by end of Q3.
 * A starter kit for small wikis containing a recommended set of templates, Gadgets, bots etc. is available by Q4
 * Create a blog by and for technical audiences where members of the technical community can post about their technical work
 * At least 14 projects are promoted in GSOC and Outreachy programs by Q4
 * Replace Debian Jessie with newer operating system release on all physical hardware managed by the Cloud Services team before Jessie EOL to keep all software under active security patch management
 * All Debian Jessie instances are removed/replaced in Cloud VPS hosted projects
 * Provide a forecast analysis of CEPH resource utilization used to identify the full impact of migrating virtual machines disks to shared storage.
 * Upgrade OpenStack components from Pike to Queens
 * Provide a more modern, secure, and performant PaaS experience for Toolforge tools
 * Improve readability, and content where applicable, of 5 - 10 pages of Toolforge and Cloud VPS "Help" documentation on Wikitech chosen based on data from 2019 WMCS user survey
 * Publish current numbers on technical contributions provided by Bitergia in the Quarterly Tech Community newsletter (by Jan 2020)
 * Upgrade OpenStack components from Ocata to Pike
 * February 2020 status -
 * March 2020 status -
 * February 2020 status -
 * March 2020 status -
 * March 2020 status -