Topic on Talk:Wikimedia Engineering Architecture Principles

Audiences Engineering Feedback

11
ABaso (WMF) (talkcontribs)

We’ve consulted with the engineering units in the Audiences department at the Wikimedia Foundation and following are our recommendations. We generally agree with the sentiment of the document, although want to express our strong support for a heightened emphasis on security and user privacy, as well as our consensus view on re-use and contemporary deployability.

-Audiences Engineering leads: Runa Bhattacharjee, Ryan Kaldari, Adam Baso

Before we dig into other items, first, the phrase “MediaWiki Platform Architecture Principles” should be changed to “Wikimedia Engineering Architecture Principles”.

Major Considerations

The framing of the document should make clear that the goal is not to stop all software development, but instead these principles describe the sort of architecture we’d like in the future. We suggest that the “Application” section be amended to note that investment in improving engineering sustainability should be in a healthy balance with investment in feature development.

The following three items should be changed to MUSTs:

  • our software and infrastructure SHOULD be designed in such a way to prevent unauthorized access to sensitive information, and to minimize the the impact of individual components getting compromised.
  • resilience against data corruption SHOULD be a design goal for our system architecture, and be built into the software we write.
  • our software systems SHOULD be designed to only collect data we need, and retain it only as long as necessary.

The requirement “software components SHOULD be designed to be reusable, and be published for re-use” should be changed to “software components with broad applicability MUST be designed and published for re-use; software components limited to Wikimedia project-specific use MAY be designed without the need for re-use but MUST be published for auditability”.

The requirements “the MediaWiki stack SHOULD be easy to deploy on standard hosting platforms” and “small MediaWiki instances SHOULD function in low-budget hosting environments” should be amended to reflect the Wikimedia Technical Conference 2018 decision about shared hosting. Additionally, they should be amended to ensure that for each component the target audience and platform should be specified.

We’re unsure how this should be worded, but we believe that observability and analytic instrumentation should always be considered for Wikimedia project components. Not all new or changing components will require observability and analytic instrumentation, but there ought to be a pause to consider this.

The phrase “as well as potentially limited connectivity” should be changed to “with tradeoffs explicitly considered in design for mobile form factors and connectivity.”

Terminology Updates

“scripting languages” should be changed to “programming languages”.

The term “domain model” should be clarified where used.

“(annotated HTML)” should be changed to “(e.g., annotated HTML)”.

Follow Up Actions

As a follow on after ratification of the principles: there is a desire for more concrete examples. For example, there’s a desire for standards on “high granularity” and versioning of web APIs, defined test coverage targets, and guidance on processing existing/discovered technical debt. There is some consideration for these specific examples in Foundation planning, although more concrete examples and some uniformity would be welcome.

DKinzler (WMF) (talkcontribs)

I have implemented several changes according to the feedback above:

Before we dig into other items, first, the phrase “MediaWiki Platform Architecture Principles” should be changed to “Wikimedia Engineering Architecture Principles”.

Done.

  • our software and infrastructure SHOULD be designed in such a way to prevent unauthorized access to sensitive information, and to minimize the the impact of individual components getting compromised.
  • resilience against data corruption SHOULD be a design goal for our system architecture, and be built into the software we write.
  • our software systems SHOULD be designed to only collect data we need, and retain it only as long as necessary.

Done

The requirement “software components SHOULD be designed to be reusable, and be published for re-use” should be changed to “software components with broad applicability MUST be designed and published for re-use; software components limited to Wikimedia project-specific use MAY be designed without the need for re-use but MUST be published for auditability”.

Done somewhat differently

The phrase “as well as potentially limited connectivity” should be changed to “with tradeoffs explicitly considered in design for mobile form factors and connectivity.”

Done

“scripting languages” should be changed to “programming languages”.

Done. "community with ways to develop workflows using scripting languages" was written with an eye to Scribunto and Gadgets, but I think there is no harm in a broader phrasing. The original wording contained the phrase "on-wiki", but that is gone now anyway.

The term “domain model” should be clarified where used.

Done. Are there any other terms that need clarification, or should be linked to a wikipedia article?

“(annotated HTML)” should be changed to “(e.g., annotated HTML)”.

Done as well.

ABaso (WMF) (talkcontribs)

Thanks. Seeing as you asked about terms needing clarification, here are some more:

  • "APIs and libraries" might at present read incorrectly to not include "services". This is always a difficult nomenclature problem, as an API often means the API interface at the class level as well as network exposed API, but services may or may not be network exposed. I usually throw my hands up at this point and use the term "component".
  • "through the integration of provenance information" could use an illuminating for-example.
  • the term "standard hosting platform" should be disambiguated.
DKinzler (WMF) (talkcontribs)

"APIs and libraries" might at present read incorrectly to not include "services".

Well, it includes the APIs of services. The kind of community maintained code we are talking about here includes Gadgets, Lua modules, extensions, and bots. All of these use APIs, and it would be nice if we could supply them with libraries. I will add "services", but it seems redundant - and may be taken to include service objects, as opposed to web-exposed services.

"through the integration of provenance information"

What this really means is "when exposing parts of a wiki page via an API, also expose the relevant citations". But that seems too concrete for include in the policy. Also, re-reading this, it seems like MUST is too strong here. This is rather hard to do. A MUST would block any new feature that doesn't do this.

"standard hosting platform"

This was intended to be future-compatible. It currently means "vanilla LAMP stack with no shell access and no admin rights". But if node.js support becomes standard in such environments, the policy should allow us to make use of that without having to amend it.

DKinzler (WMF) (talkcontribs)

Re this:

We’re unsure how this should be worded, but we believe that observability and analytic instrumentation should always be considered for Wikimedia project components. Not all new or changing components will require observability and analytic instrumentation, but there ought to be a pause to consider this.

I now added:

observability and analytic instrumentation SHOULD be explicitly considered in in the design of new components and services.

Does that sound good?

ABaso (WMF) (talkcontribs)

That works. I think the translation here is that for new things there should be a solid reason for not considering it.

DKinzler (WMF) (talkcontribs)

Thanks for the feedback! This all sounds pretty reasonable. I'll probably get around to incorporating this and some of the other feedback next week. I'll let you know once that is done, and we can discuss whether the changes I made seem sufficient to you.

ABaso (WMF) (talkcontribs)

Thanks!

DKinzler (WMF) (talkcontribs)

You wrote:

The requirements “the MediaWiki stack SHOULD be easy to deploy on standard hosting platforms” and “small MediaWiki instances SHOULD function in low-budget hosting environments” should be amended to reflect the Wikimedia Technical Conference 2018 decision about shared hosting. Additionally, they should be amended to ensure that for each component the target audience and platform should be specified.

I now added:

for every component and feature, the intended target audience and supported target platform MUST be clearly defined.

However, I'm unsure how to incorporate the decision made at TechConf. It reads:

If we commit to an easy-to-use tool for MW platform installation, configuration, and maintenance, then we can drop support of "one-click installs" on shared hosting environments; A special interest group is necessary to further these goals and facilitate implementation (see Wikimedia Technical Conference/2018/Session notes/Choosing installation methods and environments for 3rd party users#Decisions)

The "if" part has not happened, there is no such commitment, and no such special interest group exists. I would be very happy to see this happening, but until then, the policy should document the status quo: MediaWiki has to run on shared hosting.

Tgr (WMF) (talkcontribs)

IMO the current wording is generic enough to incorporate that - if we provided, say, easy-to-use docker containers with a long-term support commitment, that would be a stack that's easy to deploy on standard low-budget hosting platforms (cloud providers being reasonably standard and low-budget these days).

The one thing I'd maybe change is "SHOULD be easy to deploy and maintain" as containers often tend to be easier to deploy than to operate over an extended period of time and sufficient thought is not always given to how they can be kept up-to-date with OS security updates etc.

DKinzler (WMF) (talkcontribs)

Ease of maintainance is I think covered by the subsequent bullet points:

  • it SHOULD be possible to install and upgrade MediaWiki without much technical knowledge.
  • it MUST be possible to upgrade MediaWiki without the risk of losing content or disrupting operation.
Reply to "Audiences Engineering Feedback"