The Data Enginering team is responsible for the core capabilities of the data platform, including data storage, batch and streaming infrastructure, and distributed query engines.
This platform supports ingestion of Wikimedia project content, web traffic, instrumentation, operational data and other datasets into the Data Lake. The team manages the ingress data pipelines, whereas the data producers manage their respective data pipelines and data products.
The team's responsibilities also include data quality, observability, and discoverability.
The Event Platform has been merged into this team.
Planning & Goal setting
The current quarterly plan (Q2) can be viewed here.
And the corresponding OKRs are tracked in Asana.
Backlog & Sprint Backlog
The backlog and current sprint work of the Data Engineering team is tracked in the Data Engineering & Event Platform Phabricator board.
New backlog items are triaged every week. The current Sprint cadence is 3 weeks.
We are currently working on organizing our documentation. Meanwhile have a look at | Data Engineering
Please see the Intake Process page to make a request or contact one of our Product Managers.