Architecture meetings/Amsterdam Hackathon 2013 Day 2

From mediawiki.org

Meeting at Amsterdam Hackathon 2013

Date: 2013-05-25

Attendees: Robla, Daniel K, Roan, MaxSem, Yuri, Lucas, Brad, TheDJ, Sam, Gabriel, Niklas, Jeroen De Dauw, Hashar, Ori, Siebrand, Tim, MarkB, Faidon, Timo, Subbu, Tyler Romeo, MarkAHershberger

Recap from Day 1

  • No consensus yet on how to handle value objects separated from storage interfaces.
  • Reference material on the wiki to make review easier ("fight the document, not the reviewer").

DanielK: Identify problems to solve. Unit tests not very well (mostly integration tests instead of unit tests).

By nature due to classes often mixing logic currently.

Jeroen: In wikidata we have a policy that new components must have unit tests.

Separate classes make classes easier to understand.

Max: Too many classes makes code harder to learn for new comers?

it's important that running tests is easy for users. Jenkins may run them but that's after submitting to Gerrit. Though we shouldn't desire to run *everything* locally, the general test suite (phpunit on 1 database backend + parser tests) must be easy to run.

Timo: PHP code coverage is now enabled and runs on every commit: https://integration.wikimedia.org/cover/mediawiki-core/master/php/

  • Though not all that reliable for integration tests because high level methods call a bunch of lower level methods which are then considered "covered" even though they are not the methods being tested.

Timo: There are coverage tools that account for this now.

Roan: We don't have a lot of documentation about how different classes relate to each other (we have function and class level documentation using doxygen, but not very well maintained high-level documentation on mediawiki.org about how everything relates together - 10,000 foot view).

Yuri: A general rule for everything (separate classes vs. simple solo classes) may not be the best way. Whatever makes sense for the specific use case.

Robla: We're often on the hook for large reviews of the code base.

Yuri: Have a person on-call for advice on how to structure something that is about to be written.

Robla: Tim is often available for "Capital A" architecture reviews of new components.

You don't want to write it all and then have review to do it all over again.

Brion: Centralising design before code is being written.

Yuri: Adressing wikitech-l with a need for a structure, I;ll get too many replies and bikeshed.

MarkH: When code shows up in Gerrit that hasn't been pre-approved in terms of structure....

Brion: If from a contributor who isn't experienced in MediaWiki..

Robla: If a new contrib comes up and refactors EditPage, no matter the socialize, we need to agree on that beforehand.

Brad: As for socialising, having the guidelines written down will go a long way from reviewer's like "we do it this way" without a reference.

Robla: Should these arch guidelines be like a law? Examples to code reviews.

Gabriel: It's more important to know where you want to end up than to document lots of examples on how not to do things.

Hashar: Specific cases like: Parent object -> child object – or caching something. There should be defacto standard goto patterns for it.

> Though not the actual "goto" :P (hashar: nooo goto :D )
  • Avoid the bashslash "namespace" code convention problem again. Consistency is more important than some weird edge case exception where something might be a bit more efficient.

Yuri: You can't easily move a public class to a namespace.

Hashar: We should settle on whether or not to use namespaces. If not, we should rename some of our core classes to be less generic.

Robla: Namespaces is a good one to hash out, but we won't be able to make a decision right now.

Hashar (etherpad only): same should happen with the global variables!

Review feedback:

DanielK: Sometimes there's little to no feedback on gerrit about the structure. Sometimes a lot.

Robla: Some periodic meeting to walk through upcoming features and their patterns.

Timo: (knows will fil later0

Daniel: Bugzilla is not a great platform, but in Wikidata we try to use it for both bugs and features. This helps in encouraging discussion about patterns for discussions prior to implementation. In core we often only use it for actual bugs.

Robla: Improve our RFC process!

Timo: Lots of RFCs: https://www.mediawiki.org/wiki/RFC The most recent one that made it to implementation is from 2011-05.

  • Settle on simple principles. Just "don't mix logic with ui" (separation of concerns). The lack of this is hurting us exponentially.

TheDJ: Document our current state better (e.g. document the class as what it should be and point out what's wrong with it).

Yuri: From #1, todo comments

Gabriel: From #1, fooStorage/foo. Data objects that are easy to serialize.

DanielK: You can take the separation of concerns to an infinite level. We should stop at a reasonable level of separation. Basic separation is already a lot better than what we have now. Take small steps and achieve that first.

ssastry: mostly independent components/subsystems/services where each subsystem has some form of a model - controller (and view where appropriate, for example user-facing components/services) structure.

csteip: (for Security) Too hard too track origin of values (http, uploads, database, user input). Deep class inheritance and factory methods make it hard to track where stuff is coming from and who's responsible for what.

robla: chunked uploads issue from recent @csteip

csteip: Uploads can go to stash, chunked, by url etc. (different classes). Validation is different based on parameters. When it was actually written to the file system, it failed due to differences in how wmf file system and local filesystem. We mustn't have (web) executable files in (web) accessible places.

  • Always validate. But chunks are in nature partial so validation is less straight forward. Don't have chunks in web accessible places (move once complete and validated). -> validation can be slow on large files, so doing it during the upload is attractive.

Validate with streams in the background?

csteip: For now we validate chunks (which can escape the pattern), and then the file as a whole when it has completed. This will become problematic for e.g. a 2 GB file. Ideally we validate during the upload. sides

Moving forward with the RFC Greater will be written and more significant. Furhter work on the arc doc When we work on code, this is how we do things.


Mailing list / wiki page? Or more meetings?

We'll meet at Wikimania again at least.

Let's try to get a reasonable draft going before that (during online meetings we can co-edit over etherpad).

Goal for before Wikimania: Rountinely refer to this document in RFCs. Instead of writing essays on RFC responses, leave small responses and write up a generic answer in the arc doc and refer to that instead.