Wikimedia Technical Conference/2018/Session notes/Architecting Core: layers/components/libraries

Architecting our Core Platform: layers components & libraries

https://phabricator.wikimedia.org/T206080

= Session Setup = Facilitator Instructions: /Session_Guide#Session_Guidance_for_facilitators

= Questions to answer during this session =

What are the advantages of modular software over monolithic software?
Exercise: Propose/discuss/agree [5 min]


 * Understand module with no (or limited) understanding of other modules.
 * Clear ownership of module, can work on it without being blocked on discussions with and review by other teams.
 * Refactor module with no (or limited) impact on other modules.
 * Replace module with no (or limited) impact on other modules.
 * Re-use module without the need to pull in other modules.
 * Test module in isolation -> easier to write tests. (integration tests still needed).

Which units of modularizations are useful for this discussion? What defines a “modules” or “component” in this context?
Exercise: Propose/discuss/agree [5 min]


 * If two things depend on each other, they form one module, not two. Corollary: circular dependencies fuse modules into a monolith.  Corollary: to improve modularity, circular dependencies must be removed.
 * Useful units-of-modularity: methods, classes, namespaces, libraries (repos).
 * “Layers” may also be useful when thinking about separation of concerns.
 * “Vocabulary mapping” at “Domain boundaries” can be used to provide “framework isolation”.

Which components in MediaWiki are “knots” that need to be solved to remove circular dependencies?
Exercise: Plot dependencies / knots [15 min]


 * Language <-> Message <-> Parser -> Language
 * User <-> Title  Title: AC 224, EC 36; User AC 166, EC 55
 * Parser -> Linker -> IContextSource -> User+Title+Language+WikiPage
 * Message -> IContextSource
 * Revision -> Content <-> ContentHandler -> IContextSource -> WikiPage -> Revision

What best practices can we follow to reduce and avoid spurious dependencies and improve modularity?
Exercise: offer the below as a starting point, do small group discussion [15+5 min]


 * Define narrow public interfaces (follow caller needs, not implementation pattern; non-newable classes are ok instead of interfaces). Narrow interfaces that are compatible with existing classes are easier to establish and useful stepping stones, even if their contract is not ideal.
 * All dependencies are declared (natural via use clauses and constructor params, if we don’t use global state).
 * No public circular dependencies (introduce interfaces to break them; single-implementation interfaces are useful, they allow callers to declare that they use a given functionality, and don’t use other functionality).
 * Circular dependencies in internals only within the next larger module.
 * No dependency on internals (“internal classes”) of other module, except the ones that contain the code (method->class->namespace->parent-namespace…).
 * Enforce this with code sniffer rules or similar, when not enforced by language. Allow exceptions to be declared, use exceptions as to-do list.
 * It should be made clear to newcomers that the status quo is a bad example, and new code should not follow its example.

Who will do what to improve modularity? What should we all do, what will individuals do?
Exercise: get commitment from individuals (and ideally, budget owners) [10 min]


 * Follow principles.
 * Allocate time to plan and discuss refactoring of some classes.

= Action items =

= Detailed notes =


 * Image showing Plot of dependencies. But reality is probably fa worse. Highlighted ones are dependency for title. (!)
 * I'll be using term module as term for "encapsulating" units of code, starts with methods, goes to clases, and can be namespaces, and most importantly repos. Repos are things that are versioned together. Could be called libraries etc.
 * Slide [modules are good] Why is modularization good? 4 things.
 * Each can be understood sep  (180k lines in core alone)
 * You can change each module independently, without touching everything
 * Can reuse independently.
 * Clear ownership
 * E.g. special base page class, if i want to change, i;'ll have to talk to a lot of people and wait for review from all.
 * More isolation between components, enables moving quicker, and clearer interaction between org parts.
 * Overall Question: What strategy can we use to make MW more modular?
 * Modularization can be done 2 ways. Top down or bottom up. [slides]
 * Top-down - Storage layer value abstraction etc.
 * Bottom-up  - components emerge by identifying groups that have few dependencies on each other (and no circular dependencies).
 * Top down may sound better, but you might find during refactoring that it doesn't work that way.
 * If you have 2 units of modules but try to refactor them you might find they're really just one, because the depend on each other.
 * I propose rules: [slide]
 * No circular dependencies between packages (ideally, namespaces)
 * No circular dependencies between classes from different  packages
 * No public circular dependencies between classes - if a depends on b publicly, it can still depend on it internally, of they are on the same package. Useful for backward compat.
 * Where to start?
 * Vertical Refactoring. E.g. Title and User objects. They depend on each other publicly.
 * Can ask for page class on user object, depends on title, so a nice loop. Can refactor into vertical
 * Vertical = pull out base class for interface and depend on it.
 * Breaking the circle allows us to cut things into components
 * But depending on how you define the boundaries between the components, the components may still have a circular dependency, even of the classes no longer do.
 * [Boundaries] The boundaries that emerge when trying to avoid circular dependencies may be counter-intuitive. In the example, UserIdentity would not be in the same component as user, but in the same component as Title. This can be read as UserIdentity being Title’s idea of a User.
 * The other thing is that you end up with naturally growing components but don't always have clear conceptual names. Awkward. Right hand = storage concept s layer. Left hand other things. If you keep doing this and pulling out more stuff, can end up with narrow concepts on right, and old stuff on left. But perhaps this isn't so terrible. Note the 3 on left all depend on each other. Starting to refine them, allows us to stop using the old things eventually. I don't really see a way around that.
 * Horizontal refactoring - just pull out offending functionality - breakup - pull out the reason Title depends on User. You have e.g the user permissions separately.  For backwards compat, you still have these methods, and internally they use the new thing. This is where “allowed” private circular dependencies come in. As long as they're in the same component that's ok temporarily.
 * I'd love if we could enforce no circular dependencies per namespace. Will be hard. Per package/component is the minimum.
 * When we want to pull things into libraries (turning namespaces into packages), have to be strict.
 * That gets us to trying to cut the Gordian Knot.
 * GOAL: Find the tangles. Identify where we want to break them.
 * I suggest you use your laptops and look at the code. E.g. global states and dependencies can be hidden.
 * Task: Write down classes, names, namespaces, and the arrows between them.
 * Question: Why not just write new good things, and start using them instead of the old things (“strangle” approach)?
 * A: Because we can’t replace everything at once, so new code will have to depend on old code in some places.
 * “use” clauses are helpful, because they tell you what classes depend on. UNLESS (!) they use global state - please don't do that.
 * https://www.mediawiki.org/wiki/Manual:Code

Complete list of classes that were identified


 * Skin
 * OutputPage
 * Message
 * Language
 * LinkCache
 * Output Page
 * Linker
 * MediaHandler
 * File
 * Filerepo
 * Title
 * User
 * ContextSource
 * Parser
 * Interwiki / Interwiki-lookup
 * Content
 * ContentHandler
 * WikiPage
 * Revision Record
 * Watched Item

Proposed (existing) alternatives to some usages:


 * UserIdentity
 * TitleValue

Whiteboard:


 * Marked items, worst offenders, things to look at, high-impact. (both hard or low-hanging fruit):
 * Mediahandler. Filerepo. File.
 * Language. Message.
 * ContextSource.
 * User/Title.
 * Linker. (ignoring for now)
 * Legend:
 * Arrows - dependencies we identified ad-hoc
 * Red circles - things to talk about
 * Pink notes - class names with problematic circular dependencies
 * Yellow notes - things we could do to positively change the dependencies.

Group discussions report-out:


 * MediaHandler. FileRepo. File.
 * We definitely have deps between MediaTransform, FileRepo, Title classes.  They definitely have some tangled interconnections.
 * Method on File exposes MediaHandler and does some internal things using it as well. Could do in a more injection friendly way,
 * The worst thing is the way Linker uses File.
 * Weird logic on Title for renaming files, which prob belong more on WikiPage. Want to separate those things out. Already todo comments, but need to do them.
 * MediaTransform and Linker are the really ugly stuff.  Stuff goes into the other from both. Need to separate the logic of generating a frame from the logic of transforming the media better.
 * When it comes to foreign repos from commons for responsive images, a lot of hacks were introduced that are not maintainable.
 * Language. Message.
 * Our little loop is message and language, which also goes via WMFMessage.
 * We should kill WMFMessage because the make issues like this more visible
 * There’s a problem with LanguageSpecifier where you can’t specify the language.
 * This circ depended lists
 * Lists, lang variant names, numbers, ellipses, times, also view previous next have circular dependencies
 * Bottom right stuff should live with message and messagespecifier and not live with language.
 * Title. User.
 * For title and user, luckily they are already value objects that represent the core of these things.  They both implement ?? interface.
 * We have user and userentityvalue which provides migration path on that side.
 * We identified the methods that constitute these dependences and thought about where to put them.
 * On the Title object there’s a lot of permission checking going on, it should be handled in a stateless way on the User object. Would not use title or user.
 * Similarly on Title there are methods for moving a page; that should go into a PageStore service that’s already been dreamed up
 * There's already a pageaction object which could take some of the functionality
 * In user we found several groups of methods that refer to Title, stuff like getGroupPage, getUserPage
 * They should no longer return a [?] but should return a title object.
 * It also has isBlockedFrom, that should be part of the permissions manager
 * Watchlist management - [....]  should go into a watchlist management service.
 * Two things related to notification, one on Title and another on User, we didn’t quite figure out where to put them.  Still need to figure notifications out.
 * Core has notifications but we also have extensions doing other kinds of notifications.

Questions:


 * Gergo: Coming up with plans to fix these is not that hard, the hard part is investing the time, getting manager buy-in.
 * D: Try to think of these when coding, and budget for the necessary refactoring work. Also, talk to the teams that touch the code.
 * I'll prob work on Title and User stuff in future, based on ongoing revision storage work.
 * For MediaHandler things, naturally that would be with the Multimedia team, though they’re probably busy with SDC
 * Message, language, etc. -- natural place is i18n team, ← also hard pressed for resources.
 * TheDJ: Should we at least create Phab tickets (and link them together)?
 * D: yes!
 * BD: Let's make a phab project for it. Similar to #librarization project.
 * Gergo: If you find time during MCR work to decouple Title and User, you are a hero, but very few people will know it.  Won’t be clear to c-levels. Also will probably take longer.  Is this solvable?
 * D: Hard because relevant people aren't in room. But moving fast entails keeping track of technical debt they CREATED or FIXED. Need to do better at raising awareness of both these things.
 * (Brion: [ran out of time but did a quick check in with Daniel after] can we create an automated metric for how circular our dependencies are so we can show progress?)

Photos of posters: