Data Platform Engineering/Data Products/work focus

From mediawiki.org

Data Products Goals[edit]

At a high level, our team is currently focused on two hypotheses within the WMF Product & Technology FY 23/24 annual plan

We are also working on the committed work of Commons Impact Metrics and other essential work to maintain and and decrease maintenance burden on systems we steward.

Sprint Goals[edit]

The goals for current sprint are (23/10/24 - 23/11/214)

  1. [HIGHEST] Commons Impact Metrics: Prep for GLAM Wiki Conference
  2. [HIGH] SDS 2.5: Core Interaction API Design, Implementation & Documentation
  3. [MEDIUM] Transition to 50/25/25 capacity structure
  4. [LOW] Sunset AQS 1

Past Sprints[edit]

23/10/23 - 23/10/23

  1. SDS 2.5.1: Prepare to onboard the rest of the team
  2. Traffic to all six services routed to AQS 2. AQS is ready to sunset.
  3. Technical strategy for Commons Impact Metrics prototype including implementation draft
  4. Dumps 2: Bring to complete or pause with a plan for future.
  5. Knowledge gaps: pause until we open work on SDS 3.4


23/09/12 - 23/10/02

  1. At least one client library is refactored to include the new data contract (core schema and scheme fragments) and an existing instrument is prototyped [receiving live data?]
    1. Did not yet
    2. Almost at two client libraries refactored
    3. Merge requests not quite landed
  2. [Continue] Generate XML dumps for simplewiki
    1. Not yet
    2. XML generated with everything but data quality issues form input
    3. How we import is remaining work
  3. 100% of traffic routed to Media, Pageviews [Edit and Editor Analytics next]
    1. Media done 🎉
    2. Pageviews is waiting on SRE
  4. Knowledge Gaps Index metrics receive production traffic
    1. Waiting on SRE
  5. Data dumps transition has been clearly communicated across stakeholders
    1. Done 🎉
23/08/28 - 23/09/08[edit]

Generate XML dumps for a simplewiki

Core interaction schema and schema fragments are prototyped and tested in preparation for updating metrics platform client libraries next sprint

100% of traffic routed to Geo and Media Analytics

Identify and mitigate risks associated with MediaWiki History pipeline