Wikimedia Technology/Goals/2018-19 Q1

From MediaWiki.org
Jump to navigation Jump to search
TriangleArrow-Left.svgQ4 Wikimedia Technology Goals, FY2018–19, Q1 (July - September) Q2TriangleArrow-Right.svg

Introduction[edit]

The Technology Department has a number of annual goals in support of the Wikimedia Foundation's Annual Plan; this work is detailed in the Annual Plan. Our remaining work falls into four broad areas—foundational, sustaining, supporting our technical community, and supporting the overall community health.

All Technology programs fall under the primary goal of Knowledge as a Service/Foundational Strength - evolve our systems and structures, with the exception of TEC5: Scoring Platform and TEC9: Address Knowledge Gaps which fall under the primary goal of Knowledge Equity - grow new contributors and content.

Purpose of this document[edit]

Goals for the Wikimedia Technology department, for the first quarter of fiscal year 2018–19 (July - September 2018). The goal owner in each section is the person responsible for coordinating completion of the section, in partnership with the team(s) and relevant stakeholders.

Goals for the Audiences department are available on their own page

Legend[edit]

ETA (Estimated Time of Arrival) fields may use the acronym EOQ (End of Quarter) or EOY (End of Year).

Status fields can use the following templates: In progress In progress, To do To do, N Postponed, YesY Done or Incomplete Partially done


Technology Departmental programs[edit]

TEC1: Reliability, Performance, and Maintenance[edit]

Tech Goal: Sustaining | Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome : We have scalable, reliable and secure systems for data transport Output : Analytics stack maintains current level of service. OS Upgrades. task T192642

Continue upgrading to Debian Stretch:

  • AQS cluster hosts, thorium
  • Hardware refresh of Hadoop master nodes and coordinator (analytics1003)
Analytics To do To do
Outcome: We have scalable, reliable and secure systems for data transport Output: : Analytics stack maintains current level of service. Improved Monitoring. STRECH GOAL: Add prometheus metrics for varnishkafka instances running on caching hosts task T196066 Analytics To do To do
Output 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure.

Output 1.1: Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services)

Perform a datacenter switchover

  • Successfully switch backend traffic (MediaWiki, Swift, RESTBase, and Parsoid) to be served from codfw with no downtime and reduced read-only time
  • Serve the site from codfw for at least 3 weeks
  • Refactor the switchdc script into a more re-usable automation library and update it to the newer switchover requirements

SRE

  • Core Platform (MediaWiki, RESTbase)
  • Performance
  • Release Engineering
  • Parsing (Parsoid)
  • Analytics (EventBus)
  • Community Liaisons

End of October 2018

To do To do
Output 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure. Output 1.1: Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services) Deploy an ATS backend cache test cluster in core DCs:
  • 2x 4 node clusters
  • Puppetization
  • Application-layer routing
SRE / Traffic EOQ To do To do
Output 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure. Output 1.1: Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services)

Deploy a scalable service for ACME (LetsEncrypt) certificate management:

  • Features:
    • Core DC redundancy
    • Wildcards + SANs
    • Multiple client hosts for each cert
  • Coding + Puppetization
  • Deploy in both DCs
  • Live use for one prod cert
SRE / Traffic

EOQ

To do To do
Output 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure. Output 1.1: Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services)

Increase network capacity:

  • eqiad: 2 rows with 3*10G racks
  • codfw: 2 rows with 3*10G racks
  • ulsfo: replace routers
  • eqdfw: replace router
SRE / Traffic

EOQ

To do To do
Output 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure. Output 1.1: Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services) Improve synthetic monitoring
  • Monitor more WMF sites, including some in each deploy group
  • Configure alerts such that they're directed to the teams that are able to address them
Performance EOQ To do To do
Output 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure. Output 1.1: Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services) Improve Navigation Timing data
  • Update dashboards to use newer navtiming2 keys
  • Move navtiming data from graphite to prometheus
Performance EOQ To do To do
Output 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure. Output 1.1: Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services) Remove dependency on jQuery from Mediawiki's base Javascript module Performance EOQ In progress In progress
Output 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure. Output 1.1: Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services) Support and develop the Mediawiki ResourceLoader component Performance
  • Core Platform
EOQ In progress In progress
Output 1: Current levels of service are maintained and/or improved for all production sites, services and underlying infrastructure. Output 1.1: Deploy, update, configure, and maintain and improve production services, platforms, tooling, and infrastructure (Traffic infrastructure, databases & storage, MediaWiki application servers, (micro)services, network, Infrastructure Foundations, Analytics infrastructure, developer & release tooling, and miscellaneous sites & services) Support and develop Mediawiki's data access components Performance
  • Core Platform
EOQ In progress In progress
Output 2: Better designed systems Output 2.1: Assist in the architectural design of new services and making them operate at scale Research performance perception in order to identify specific metrics that influence user behavior Performance EOQ To do To do
Outcome 3: Users can leverage a reliable and public Infrastructure as a Service (IaaS) product ecosystem for VPS hosting. Output 3.1: Maintain existing OpenStack infrastructure and services
  • Develop timeline for Ubuntu Trusty deprecation
  • Communicate deprecation timeline to Cloud VPS community
  • Continue replacing Trusty with Debian Jessie/Stretch in infrastructure layer
WMCS EOQ To do To do
Outcome 3: Users can leverage a reliable and public Infrastructure as a Service (IaaS) product ecosystem for VPS hosting. Output 3.2: Replace the current network topology layer with OpenStack Neutron Migrate at least one Cloud VPS project to the eqiad1 region and its Neutron SDN layer WMCS EOQ To do To do

TEC2: Modern Event Platform[edit]

Tech Goal: Foundational | Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: Output 1.1 - 1.4: TechCom RFCs underway and technical decisions made. Analytics, Services, SRE To do To do


TEC3: Deployment Pipeline[edit]

Tech Goal: Sustaining | Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: Continuous Integration is unified with production tooling and developer feedback is faster
Output 1.1
Convert current CI builds to use the new tooling (Blubber).
Move verify stage from Minikube to CI k8s namespace in production context Release Engienering SRE EOQ To do To do


TEC4: PHP7 Migration[edit]

Tech Goal: Sustaining | Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: Zend PHP7 is the only PHP runtime in use in the WMF environment Output 1.2: All wikis, including wikitech/office/etc, are being run under the Zend PHP7 runtime Allow MediaWiki requests to be served by PHP7 alongside HHVM
  • Install and configure php-fpm alongside HHVM on the application servers
  • Refactor Apache configuration to allow selection of PHP engine based on HTTP request
  • Stretch: Evaluate performance of PHP 7.0, 7.2 versus HHVM, and pick one.
  • Stretch: Refactor the puppet module "mediawiki" classes to role/profile structure
Service Operations
  • Traffic
  • Performance
EOQ To do To do
Outcome 1: Zend PHP7 is the only PHP runtime in use in the WMF environment Output 1.2: All wikis, including wikitech/office/etc, are being run under the Zend PHP7 runtime A sampling profiler that works under PHP7 has been identified and is prepared for use in the WMF production environment. If no appropriate profiler can be identified, then a statement of work for contractor effort to build one is prepared. Performance

Core Platform

EOQ To do To do


TEC5: Scoring Platform[edit]

Tech Goal: Supporting our Community of contributors | Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 2: Grow the community of wiki decision process modelers and tool builders (staff, volunteers, academics) Documentation -- Threshold optimizations in The ORES Manual Scoring Platform Cloud Services EOQ To do To do
Outcome 3: Users of ORES-based-tools can build a repository of human judgement to contrast with model-predictions JADE --> Production Scoring Platform SRE EOQ To do To do
Outcome 2: Grow the community of wiki decision process modelers and tool builders (staff, volunteers, academics) Developing a focus group for JADE Scoring Platform n/a EOQ To do To do
Outcome 1: More wiki communities benefit from semi-automated curation support Keep ORES online and improve robustness Scoring Platform SRE Ongoing Template:In-progress

TEC6: Address Infrastructure Gaps[edit]

Tech Goal: Sustaining | Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 3: Wikimedia projects and content are protected against major disasters that threaten availability. Output 4: Strengthen backups with reliable and redundant backup infrastructure Monitor database backup generation for failure or incorrect generation
  • Generate metrics and historic data about databases (objects, table and wiki sizes, growth over time, etc.)
  • Detect and alert on backup metrics anomalies
Data Persistence Infrastructure Foundations EOQ In progress In progress
Outcome 2: Technical staff have increased visibility into the operation of our services and infrastructure. Output 3: Modernize logging, alerting and metrics monitoring infrastructure Adopt Logstash
  • Review Logstash/Kibana's architecture and installation and identify next steps and gaps to be addressed.
  • Audit log producers across the infrastructure and plan their transition to centralized logging.
  • Investigate log shipping methods and standardize on them.
Infrastructure Foundations Search Platform EOQ In progress In progress
Outcome 4: Technical staff are able to implement and maintain services and infrastructure in an efficient manner with a minimal amount of manual tasks. Output 6: Automate common operational tasks around service deployment, maintenance and incident response and build automated workflows for data center infrastructure, network, and equipment lifecycle management Migrate the hardware inventory from Racktables to Netbox
  • Define Netbox existing and custom fields usage standards/best practices
  • Switch over from Racktables to Netbox
  • Stretch: Investigate Netbox reporting capabilities to automatically validate data
  • Stretch: Investigate Netbox potential future integrations, towards a single source of truth
Infrastructure Foundations, Data center operations Traffic EOQ In progress In progress
Outcome x: Output x: To do To do


TEC7: Environmental Sustainability[edit]

Tech Goal: Supporting our Community of contributors | Overall goal owner: Erika Bjune

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome: Assessment of WMF's environmental impact Output: Produce an environmental impact report and statement Identify and contract with an organization that can assess WMF's environmental footprint. To do To do
Outcome: Assessment of WMF's environmental impact Output: Produce a Sustainability Roadmap Through interactions with community members and teams at WMF, work on an actionable plan for reducing WMF's environmental footprint To do To do

TEC8: Search Platform[edit]

Tech Goal: Foundational | Overall goal owner: Erika Bjune

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: Through incremental Search Platform component improvements, teams and developers can deliver more and better ways for readers and editors to search for content across languages. Output: Incorporate Natural Language Processing (NLP) in the machine learning analysis pipeline for search Select 1 or 2 NLP applications and prototype the features Search Platform Will need some short-term consulting help during implementation EOQ To do To do
Output: Evaluation of image features for search ranking Investigate and evaluate image level features for image search ranking (ie. Image quality score in ML indexing) (Stretch goal) Search Platform EOQ To do To do
Output: Better understanding of the effectiveness of our improvements to search and the performance of our tooling on the back end Revise search metrics and dashboard Search Platform, Analytics (Audiences) EOQ To do To do
Output: Improved support for multiple languages by researching and deploying new language analyzers where feasible on individual language wikis. Morphological library investigations and implementations (specific languages TBD) Search Platform EOQ To do To do
Output: Specific media search filters for Wikidata/Wikibase and the relationships to the topics they represent will be better supported using structured data and other techniques. Lexeme search implementation: complete search implementation for all modes for Lemmas and Forms Search Platform, WMDE EOQ To do To do
Investigate applying machine-learning enabled ranking to Wikidata searches, start collecting click data for Wikidata completion searches and start developing machine-learning models for Wikidata search relevancy. Search Platform, WMDE EOQ To do To do
Outcome 2: Technical debt addressed and required maintenance completed for Search Platform components Output: Elasticsearch upgrades and server replacements
  • Continue to prepare for a major upgrade to Elasticsearch 6
  • Replace Elasticsearch servers which are at the end of their lease
  • Migrate Elasticsearch servers to RAID 0
Search Platform, SRE EOQ To do To do
Output: Higher capacity for WDQS to improve its ability to power features on-wiki for readers and the growing set of features for supporting structured data
  • Add storage to WDQS servers
  • Enable Kafka event consumption
  • Separate the Wikidata Elasticsearch implementation into a separate extension
  • Investigate Blazegraph support options and alternatives (Stretch goal)
Search Platform, SRE, WMDE EOQ To do To do

TEC9: Address Knowledge Gaps[edit]

Tech Goal: Supporting our Community of contributors | Overall goal owner: Leila Zia

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: One or more of the followingː Interested editors will be able to use recommendation services that will allow them to have relevant information about the articles they want to edit immediately at their repository. Newcomers can more easily contribute to Wikimedia projects. Editathon organizers benefit from automatically generated templates and recommendations that can help them in onboarding new or less experienced editors. More readers will have access to content in their local languages. Developers will be able to surface more diverse set of recommendations for article expansion through their tools. Output 0: A unit test to measure the bias of recommendation algorithms
  • A report of the state of the art on bias detection and algorithm auditability in the context of recommendation systems (based on the review of the literature and industry interviews)
Research EOQ To do To do
Output 2: Section recommendation algorithm in many languages.
  • Build a section recommender system based on the section mapping algorithm
Research EOQ To do To do
Output 4: Public test (vs. production) APIs corresponding to algorithms designed and tested in other outputs.
  • Build a test API for the section recommendation algorithm in Output 2
Research EOQ To do To do
Outcome 2: Interested editors, developers, and partners can identify more types of gaps in content Output 1: An improved task recommendation gadget or API
  • Improve article recommendation API to completion (of the second stage improvements)
Research EOQ In progress In progress
Output 2: A framework for understanding and measuring the knowledge gaps and inequality of access to knowledge that includes reader representation by demographics and characterizes readers who come to Wikipedia based on their readership characteristics as well as demographics.
  • Explore the interlanguage navigation patterns as a first approach to understand knowledge gaps in specific languages.
  • Characterize Wikipedia readers across languages based on survey responses, request, article, and session activity.
Research Formal collaborators EOQ
  • To do To do
  • In progress In progress
Outcome 3: More minority voices and diverse newcomers in Wikimedia projects can stay longer on the projects to contribute. Output 1: An improved socio-technical framework to remove the barriers for contribution by populations that are currently considered minorities on our projects.
  • Run an experiment to test the effectiveness of the first design of the framework
  • Provide an early analysis of the experiment and iterate if needed
Research Formal collaborators, Legal EOQ
  • In progress In progress
  • To do To do
Output 2: An algorithm to address Wikipedia's cold-start problem when it comes to learning user interests when they join the project.
  • Run the experiment to test the quality of the algorithm to elicit user interests
  • Analyze the result of the experiment. Devise next steps.
Research Formal collaborators, Legal
  • In progress In progress
  • To do To do
Outcome 4: More decision makers can make more informed decisions about the audiences to target, the gaps to prioritize, and other research findings. More researchers can build on top of the knowledge generated through this research. Output 1: Citable knowledge about the state of gaps in Wikimedia projects, the needs of Wikimedia users by demographics, and beyond.
  • Submit the research for characterizing Wikipedia readers across languages for publications (stretch)
Research Formal collaborators EOQ
  • In progress In progress


TEC10: Build Technical Community[edit]

Tech Goal: Support our Technical Community | Overall goal owner: Bryan Davis

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: Technical Writing Output 1.3: Attract and foster a robust community of skilled and aspiring technical writing contributors Research potential partnerships/joint programs with other FOSS-oriented communities and organizations DevAdv EOQ In progress In progress
Outcome 3: Support use of Wikimedia services Output 3.1: Promote Wikimedia products at relevant conferences, hackathons, and within the Wikimedia communities
  • Promote Cloud Services products at Wikimania Hackathon
  • Promote FOSS participation at Wikimedia Hackathon
  • Assist in New Developer mentoring program at Wikimedia Hackathon
  • Promote Technical Writing tasks at Wikimedia Hackathon
WMCS, DevAdv EOQ To do To do
Outcome 8: Developer Advocacy Output 8.1: Update MediaWiki.org homepage and other key content pages Update visual design and content of MediaWiki.org Main Page DevAdv EOQ In progress In progress
Outcome 8: Developer Advocacy Output 8.4: Collect or create content Investigate and improve MediaWiki Action API documentation DevAdv EOQ In progress In progress


TEC11: Support Fundraising Activities[edit]

Tech Goal: Foundational | Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: Support fundraising activities Output 1.1: Support Q1 campaigns Make Ingenico Campaign Ready Fr-tech Ingenico EOQ To do To do
Outcome 1: Support fundraising activities Output 1.2 Ensure scalability and maintainability for Q2 English campaigns Run 1 integrity test of the incoming data between Kafkatee and Event Logging. Start bug fixing process based on test results. Fr-tech Analytics EOQ To do To do

TEC12: Developer Productivity[edit]

Tech Goal: Support our Technical Community | Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: Local development is unified with testing and production All. Make a hire to create the capacity needed for this program. Release Engineering WMF Recruiting End of Q2 To do To do
Outcome 2: Developer satisfaction improves year over year.
Output 2.1
A survey is created and shared with all Wikimedia developers to measure overall satisfaction and productivity as well as identifying the most sought after needs. The top needs are shared as key goals for the team.
Write and share a survey to measure developer satisfaction and areas for investment. - task T197635 Release Engineering
  • WMF Learning & Evaluation
  • WMF Legal
EOQ To do To do


TEC13: Code Health[edit]

Tech Goal: Sustaining | Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: Increase software stewardship levels of our deployed code
Output 1.1
Assess deployed code and prioritize stewardship gaps.
  • Investigate and propose record of origin (ROO) for deployed code (currently Developers/Maintainers page)
Release Engineering Code Health Group EOQ To do To do
Output 1.3
Using stewardship review process, create plan of action for top priority items each quarter.
  • Perform existing review process for Q1 cycle.
Release Engineering Development Teams EOQ To do To do
Outcome 2: We reduce the number of testable regressions from hitting our users
Output 2.1
Integrate regression testability evaluation into our on-going post-mortem process.
  • Add test evaluation to post mortem review process.
Release Engineering Release Engineering (re: TEC3, Outcome 2) EOQ To do To do
Output 2.2
Jointly create smoke tests addressing high priority needs for 15 projects over the year.
  • Review existing e2e test coverage.
  • Define prioritization scheme.
  • Prioritize e2e testing gaps.
Release Engineering Development Teams EOQ To do To do
Output 2.3
Pro-actively add unit tests to MediaWiki core and deployed extensions.
  • Make current unit testing coverage more visible by reporting out to Engineering Management.
  • (maybe, TBC) set coverage goals
Development Teams, Code Health Group EOQ To do To do
Outcome 3: Reduce Technical Debt
Output 3.2
Tech Debt Management process rolled out.
  • Platform and Search Platform teams are using TDM PoC
Release Engineering Development Teams EOQ To do To do
Output 3.4
Reduce technical debt in the MediaWiki core, by refactoring and improving internal interfaces and policies.
  • Identify key Tech Debt areas
  • Put in place Tech Debt management process for PEP
Release Engineering Platform Team EOQ To do To do
Outcome 4: Increase visibility into Code Health
Output 4.1
Define Code Health Metrics
  • Define base Code Health metric set.
Release Engineering Code Health Group EOQ To do To do


TEC14: Smart Tools for Better Data[edit]

Tech Goal: Supporting our Community of contributors | Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome: Wikimedia Cloud Services users have easy access to high quality analytics data to answer questions about content and contributors. Output : Provision a cluster for public Data Lake access in Cloud Service Order Data Lake hardware and Provide Rationale for SQL engine used to make data acessible in labs task T198424 Analytics To do To do
Outcome: Foundation staff and community have better visual tools to access data about content, contributors and readers. Output : Wikistats 2.0 - Users (and Programatic tools) have access to most reports that community consultation found of importance Analytics To do To do
Outcome: Foundation staff and community have better visual tools to access data about content, contributors and readers. Output : Wikistats 2.0 - Beta (carry on items from last quarter) Analytics To do To do
Outcome: Foundation staff and community have better visual tools to access data about content, contributors and readers. Output : Support for more data sources and programming languages for WMF Jupyter Notebook users. Better integration of Jupyter with spark task T190443 Analytics To do To do
Outcome: Foundation staff and community have better visual tools to access data about content, contributors and readers. Output : Users see improvements on data computing and data quality. Data Sanitization backend for hadoop that includes ability to salt & hash. task T198426 Analytics To do To do
Outcome: We have scalable, performant and reliable software for data transport Output : Software maintenance on analytics stack to maintain current level of service Spin out a tiny EventLogging RL module for lightweight logging task T187207 Analytics To do To do
Outcome: Users see improvements on data computing and data quality. Output : MediaWiki content is available on cluster on recurrent schedule STRECH GOAL: Productionize MediaWiki content processing. Ingest and process text on every wikipedia page to use later for analytics-style computations task T190858 Analytics To do To do
Outcome: Users see improvements on data computing and data quality. Output : More efficient Bot filtering on pageview data. STRECH GOAL: task T190858 Analytics To do To do

Cross-departmental programs[edit]

CDP1: Privacy, Security, and Data Management[edit]

Segment 2 - Security[edit]

Overall goal owner: John Bennett

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: Ensure the high-quality protection and security of our infrastructure and data. Output 1: * Review and update current security policies, standards and procedures Review and mature security awareness functions:
  • Create Risk Taxonomy for evaluating IT Risk.
  • Analytics Security Review
  • CSP partial rollout
Security EOQ To do To do
Outcome 1: Ensure the high-quality protection and security of our infrastructure and data. Output 2: Reduce risk, improve application security practices, improve code quality, reduce vulnerabilities and attack surface and encourage a secure by design approach. Testing campaigns:
  • Incident Response Table top exercises
  • Phishing/Security Awareness Campaign
  • Penetration testing for English Wikipedia site
Security EOQ To do To do
Outcome 1: Ensure the high-quality protection and security of our infrastructure and data. Output 3: Increase maturity and capabilities in the event of a security incident. Update security policies and do at least one security release Security EOQ To do To do

Segment 3 - Analytics[edit]

Overall goal owner: NRuiz (WMF) (talk) 16:26, 26 June 2018 (UTC)

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome x: Output x: Threat Model of Analytics stack Analytics Security To do To do
Outcome x: Output x: STRECH GOAL: Prototype in labs new security measures for cluster task T198227 Analytics Security To do To do

Segment 10 - Technology[edit]

Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome x: Output x: To do To do
Outcome x: Output x: To do To do

CDP2: Platform Evolution[edit]

Segment 7 - Core Platform Team[edit]

Overall goal owner: Corey Floyd

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 2:
Engineers are able to access more functionality of the stack using well encapsulated components and well defined APIs
Output 2.2:
Modularized RESTBase
Research, document and develop a specification for the storage API. Propose an RFC. Core Platform Team None EOQ To do To do

Segment 8 - Core Platform Team[edit]

Overall goal owner: Corey Floyd

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1:
Engineers have a clear understanding of our technology stack and the plan to better scale, maintain and test it
Output 1.1:
Architecture Spec for the WMF technology stack
Gather, analyze and prioritize requirements, feedback, issues and documentation gaps from stakeholders  in order to develop the architecture plan and prepare for Wikimedia Technical Conference Core Platform Team Platform Stakeholders including Audiences, Technology Platform and Services teams, and WMDE. EOQ To do To do


CDP3: Knowledge Integrity[edit]

Segment 1 - Research[edit]

Overall goal owner: Dario Taraborelli

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome 1: Wikimedia contributors are better able to focus and prioritize their sourcing efforts and Product teams can build the best user experiences to support readers’ learning goals and their digital literacy. Output 1: A map of verifiability of information in Wikimedia projects
  • Design and test and end-to-end machine learning framework to identify statements in need of a citation.
  • Improving the taxonomy of reasons why editors add citations to Wikipedia statements
  • Design the experiment and collect larger-scale data about reasons why people add citations
Research Formal collaborators EOQ In progress In progress
Output 2: Research to understand how readers use citations
  • Prepare the data and do preliminary analysis on the first data collection on citation usage based on data gathered via Citation Usage schema
Research Formal collaborators EOQ In progress In progress
Outcome 4: More knowledge professionals and other contributors are motivated to join the effort to build an open citation ecosystem, and are more able to actively improve the structure, quantity, and quality of citations on Wikimedia projects. Output 6: Funding the WikiCite event series
  • Fundraise for the annual meeting in the WikiCite series and set of satellite events, to improve the sustainability and global reach of the initiative.
Research Advancement EOQ YesY Done

CDP4: Structured data[edit]

Segment 2 - Search Platform[edit]

Overall goal owner: Erika Bjune

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome: It is easier for people to discover, learn, and manage the free media stored on Commons and thereby incentivize higher contribution rates Output 1: Commons search will be extended via CirrusSearch and Elasticsearch and Wikidata Query Service, to support searching based on structured data elements describing media. Support the implementation of "depicts" search functionality Search Platform, SDC To do To do

Segment 4 - Technical Collaboration[edit]

Overall goal owner:

Annual Plan Outcome Annual Plan Output Quarterly Goal(s) Primary Team(s) Dependencies ETA Status
Outcome x: Output x: To do To do
Outcome x: Output x: To do To do

FY17-18 Segment 4 - Programs[edit]

Goal owner: Jonathan Morgan

Annual plan outcome Annual plan objective(s) Quarterly Work (or Goal) Primary Team(s) Dependencies Tech Goal ETA Status
Outcome 2: Develop a better understanding of existing needs for Structured Commons Objective 2: Write case studies and documentation for Commons and Wikidata projects that allow project development among Wikimedia Communities and allow us to identify gaps in existing tools (task T171252).
  • Publish Commons reuser research report (task T190228) (continued from FY17-18, Q4)
Research Community programs, Multimedia C Q1 FY18/19 In progress In progress