Topic on Talk:Wikimedia Engineering Architecture Principles

More MUST, less SHOULD

16
DKinzler (WMF) (talkcontribs)

Perhaps more goals should be made into strong requirements (MUST), e.g. the principle of "data austerity", to collect and retain only the data we actually need.

While we should be careful not to create hard requirements that we cannot always meet, which would lead to such requirements to not be taken seriously, we shouldn't be "doubly soft": We can for instance say that horizontal scalability MUST be a design goal for services with high load - it being a goal does not mean the software cannot be deployed if this goal is not be fully met.

BDavis (WMF) (talkcontribs)

Currently the document uses:

  • SHOULD - 49 times
  • MUST - 8 times
  • MAY - 1 time

I just read the whole thing for the first time, and honestly it felt like a passive aggressive attack from someone giving drive-by code review.

BDavis (WMF) (talkcontribs)

Even the document itself only has a SHOULD endorsement from the Wikimedia Technical Committee.

MHolloway (WMF) (talkcontribs)

Agreed. This struck me right away when reading the document. As a reflection of current reality, it seems pretty accurate, but if it's meant to be prescriptive (as I assume is the case), I'd like to see stronger stands, or at least some written justification of why SHOULDs are SHOULDs and not MUSTs.

Personally, I consider at least the following current SHOULDs to be MUSTs:

  • All points under the heading "To ensure the data integrity of the content on WMF systems, and protect the privacy of our users"
  • software that interacts with users MUST be designed to make key functionality available on devices with a variety of capability and restrictions [I'd also add form factors], as well as potentially limited connectivity
  • software that interacts with users MUST follow accessibility guidelines
  • data formats and APIs that provide access to user generated content MUST be designed to ensure verifiability through the integration of provenance information
  • data formats and APIs that provide access to user generated content MUST be designed to provide easy access to all necessary licensing information
Kaldari (talkcontribs)

Would be nice to change "comprehensive documentation SHOULD be maintained along with the code" to MUST. I don't think that would be controversial. I don't agree with MHolloway about making "follow accessibility guidelines" a MUST. There are rare cases where other considerations (including accessibility) override accessibility guidelines.

CPettet (WMF) (talkcontribs)

What does 'MUST' mean in this case? Is there teeth to it?

DKinzler (WMF) (talkcontribs)

> What does 'MUST' mean in this case? Is there teeth to it?

The "teeth" depend on the people enforcing this. At the very minimum, RFCs that violate a MUST will not be approved. Ideally, no code that violates a MUST is deployed.

If we are serious about the MUST, any code that is currently live but violates a MUST would have to be pulled. If we made everything suggested in this thread a MUST and pulled everything that doesn't comply, we'd have to shut down the site tomorrow. That's actually the reason for having a lot of SHOULD and not that many MUSTs.

Maybe it makes more sense to go with a softer interpretation on MUST, that essentially only applies it to new code and major changes and rewrites. If we interpret it that way, we can have a lot more MUSTs. Does that sound good?

BDavis (WMF) (talkcontribs)

I think it would be better to write the standard that the working group wants even if there are parts that are aspirational. A list of "grandfathered" applications with known violations could be offered as an appendix if needed and be footnoted into the standard when appropriate.

Tgr (WMF) (talkcontribs)

Uppercase MUST invokes RfC 2119 (MUST: This word, or the terms "REQUIRED" or "SHALL", mean that the definition is an absolute requirement of the specification.) to anyone familiar with it. I'd rather use it less often than water it up. We can make allowances for legacy code that's already in production, but for new code or changes to existing code MUST should really mean that anything violating that will not be merged or deployed ever, even if there's a deadline or a grant or a mob of editors with pitchforks at the WMF office entrance or whatever.

Tgr (WMF) (talkcontribs)

Also, maybe it would be a worthwhile exercise to list what existing practice (ie. not legacy code but development practices we follow today) violate any of the MUSTs? E.g. "data we offer for re-use MUST use clearly specified data schemas" is probably not true for most things (wikitext? Action API response formats? file metadata? ...I guess it comes down to what exactly is meant by data schema).

DKinzler (WMF) (talkcontribs)

I'm happy with more MUST if we are really serious about enforcing it, and nobody comes crying once we do...

DKinzler (WMF) (talkcontribs)

I changed a number of MUSTs to SHOULDs now. The document now has 43 SHOULDs and 21 MUSTs.

Tgr (WMF) (talkcontribs)

The document now says "When existing code is discovered to violate a MUST or SHOULD principle, steps for making the code compliant with the architecture principles need to be planned." which is ambitious, but maybe that's a good thing :) Any plans on how this should work in practice? If I find such code, where do I report it, who is responsible for planning the steps, who is responsible for actually making it happen?

DKinzler (WMF) (talkcontribs)

The concrete steps necessary will be very different from case to case. Some such changes only take a 20 minute patch, some may need major refactoring or changes in infrastructure.

The only general answer I can give is "track it on phabricator, so it becomes visible".

As to who is responsible - I'd say either the person or group who wrote the offending code (mostly for newer code), or the group who owns owns it (mostly for older code).

Tgr (WMF) (talkcontribs)

Also comprehensive documentation is now a MUST. While most other things can be enforced in the planning or code review phase, documentation normally only happens when the code is live (at which point code authors often lose interest). How do we ensure that it does not get forgotten?

DKinzler (WMF) (talkcontribs)

Documentation of architecture and information flow should ideally be written before the code (as specifications, plans, RFCs, etc). They should be required to be merged into the repo along with the code, just like test, and just like method-level and class-level documentation. Documentation (and testing) should not be an afterthought.

Documentation for end-users and wiki-owners will generally not live in the same repo as the code, since it's less "bound" to the code, but it should, ideally, also be written *before* the code, as user stories, UI designs, etc. Turning the plan into proper documentation may happen after the fact, but then it's the responsibility of the team who deployed the feature to make it happen.

In my mind though, the architecture principles don't really apply to documentation for end-users and wiki-owners. Combined with the lead sentence of the section, the principles reads: "To maintain a code base that can be modified with confidence and readily understood, comprehensive documentation MUST be maintained along with the code." I think it's pretty clear that this refers to documentation of the code. Do you think this should be made more explicite?

Reply to "More MUST, less SHOULD"