Jump to content

Data Platform Engineering/Data Platform SRE/Priorities

From mediawiki.org

Here are the high level priorities of the DPE SRE team. The detailed backlog can be found on our main Phabricator board. Our current work can be followed on our "milestone" Phabricator board (there is no stable link to the current milestone, but it can be found as a link in the menu of our main board).

Current main projects

[edit]

Archiva is our current solution for artifact hosting for Java / Scala projects and mirroring of external Maven repositories. It is unsupported and as a critical piece of our development and deployment infrastructure needs to be replaced. Gitlab is a component that provides the functionality that we need and is already deployed in our infrastructure, it is the obvious solution.

This project is driven by DPE SRE, but most of the implementation work is done by Search Platform, Data Engineering and Data Products. It is prioritized on top of the usual work for those teams and thus is slow moving.

[edit]

Hadoop upgrade:

[edit]

Links:

Kubernetes upgrade

[edit]

We need to make sure that the dse-k8s-eqiad cluster is using the new version of kubernetes. ServiceOps has largely prepared the upgrade and has begun rolling it out to wikikube. We need to follow suit with our own cluster. We have some plugins and operators that are specific to the dse-k8s cluster, so we need to be very careful with these. We have already completd some preparatory work in T369492 and T377875.

The plan is to test the upgrade on the new dse-k8s-codfw cluster and make sure that all operators and plugins work, before applying the update to the cluster in eqiad, too.

Links:

Mutualized OpenSearch cluster:

[edit]

Links:

Migrate Current-Generation Dumps to Airflow

[edit]

Links:

Recently Completed Projects

[edit]

To simplify operations and increase availability, we are migrating Airflow to k8s.

[edit]

To support the deprecation and removal of Graphite

[edit]

To support work by the Search Platform team. In particular, DPE SRE is focused on migration of the internal WDQS clients and the operational support of the underlying servers / platform.

[edit]

Migration of the Search cluster from Elasticsearch to OpenSearch:

[edit]

Links:

Usual operational work

[edit]
  • Incidents
  • Various minor software upgrades
  • Access requests
  • SPARQL Federation requests

High level backlog of projects

[edit]
  • Kafka upgrade: design doc
  • Spark upgrade
  • Migration of additional services to k8s
    • Presto
    • JupyterHub