Data Platform Engineering

From mediawiki.org


The Data Platform Engineering team was formed in July 2023, bringing together data teams and systems from across the Product and Technology department with the goal of delivering end-to-end data capabilities to meet the data producer and consumer needs. The group is led by Olja Dimitrijevic, Director of Engineering and Desiree Abad, Director of Product.

Our teams are:

  • Data Products team supports the development of various data products, such as curated datasets and instrumentation, while ensuring data management and modeling best practices. The team will closely partner with Research & Decision Science teams to define, develop, and deliver trusted datasets for the wider Wikimedia movement to consume. It is also responsible for the user experiences across visualization (ex: Superset & dashboards) and experimentation (A/B testing and Jupyter notebooks) platforms.
  • Search Platform team is responsible for the Search features and APIs for MediaWiki. This includes the CirrusSearch extension, which relies on Elasticsearch, the search backend used to support Wikimedia projects. It also includes the Wikidata Query Service, the SPARQL endpoint used to query Wikidata. The team provides both a direct user experience around Search and an API on which higher level features can be developed.
  • Data Engineering team is responsible for the core capabilities of the data platform, including data storage, batch and streaming infrastructure, and distributed query engines. This platform supports ingestion of Wikimedia project content, web traffic, instrumentation, operational data and other datasets into the Data Lake. The team manages the foundational data pipelines, whereas the data producers manage their respective data pipelines and data products. The team's responsibilities include data quality, observability, and discoverability.
  • Data Platform SRE team supports all of the above teams to manage their infrastructure, applications, and operations.

The Data Platform group is supported by principal engineers: Adam Baso, Andrew Otto and Dana Bredemeyer.

Mission[edit]

Our Mission is to empower Wiki Communities and the Wikimedia Foundation to gain insights, conduct research, and build compelling user experiences, through access to privacy-aware data and data platform services.

What we do[edit]

We provide the infrastructure and services that empower our users to collect, discover, and use trustworthy data to derive insights, conduct research, and build new data products.

The data platform provides capabilities that include:

  • Ingestion
  • Storage
  • Transform and serve
  • Search and query
  • Exploration and analysis
  • Visualization and reporting
  • Publishing and reuse

For more details see the Data Platform Overview

Who we serve[edit]

We support the open-knowledge communities and the Wikimedia Foundation at large. Specifically:

  • Wiki administrators
  • Wiki readers and editors
  • GLAM programs
  • Analysts
  • Researchers and machine learning practitioners
  • WMF Trust and Safety
  • WMF SRE and Traffic teams
  • WMF Product feature teams
  • WMF Fundraising

Contact Us[edit]

Please see the Intake Process page to make a request or contact one of our Product Managers.