User:AKlapper (WMF)/Code Review

We would like to merge better code faster. Code Review should be a tool and not an obstacle. We would like to prioritize reviewing patches submitted by volunteers. T101686, T78768.

The list below incorporates random literature and comments from T114419 and other Wikimedia Phabricator tasks. Note that Wikimedia will migrate from Gerrit to Differential (T114320).

Dimensions of potential influential factors and potential actions:
 * 3 aspects: social (soc), technical (tech), organizational (org).
 * 2 roles: contributor, reviewer.
 * 3 factors: Patch-Acceptance/Positivity-Likeliness (accept), Patch-Time-to-review/merge (time2rev); Contributor Onboarding (not covered here).

Unstructured review approach
⏴ [ soc | time2rev ] Unstructured review approach potentially demotivates first patch contributors, but fast and structured feedback is crucial for keeping them engaged

Set up and document a multi-phase, structured patch review process for reviewers: Three steps proposed by Sarah Sharp for maintainers / reviewers, quoting:
 * 1) Is the idea behind the contribution sound? / Do we want this? Yes, no. If the contribution isn’t useful or it’s a bad idea, it isn’t worth reviewing further. Or “Thanks for this contribution!  I like the concept of this patch, but I don’t have time to thoroughly review it right now.  Ping me if I haven’t reviewed it in a week.” The absolute worst thing you can do during phase one is be completely silent.
 * 2) Is the contribution architected correctly? Squash the nit-picky, perfectionist part of yourself that wants to comment on every single grammar mistake or code style issue.  Instead, only include a sentence or two with a pointer to coding style documentation, or any tools they will need to run their contribution through.
 * 3) Is the contribution polished? Get to comment on the meta (non-code) parts of the contribution.  Correct any spelling or grammar mistakes, suggest clearer wording for comments, and ask for any updated documentation for the code

Lack of enough skillful, available, confident reviewers and mergers
⏴ [ org/soc | time2rev/accept ] Not enough skillful or available reviewers and potential lack of confident reviewers ? Not enough reviewers with CR+2 rights to actually merge?


 * 1) ❌ Capacity building: Discuss consider handing out code review rights to more (trusted) volunteers by recognizing active CR+1 users (based on yet-to-create korma statistics? Or use Nemo's data at http://koti.kapsi.fi/~federico/crstats/ ?) and encourage them to become habitual and trusted reviewers; actively nominate to become maintainers ? Potentially recognize people not executing their CR+2 rights anymore.
 * 2) ❌ Review current CR+2 handout practice - documentation at Gerrit/+2.
 * 3) ❌ Consider establishing prestigious roles for people, like "Reviewers"?
 * 4) ❌ "we recommend including inexperienced reviewers so that they can gain the knowledge and experiences required to provide useful comments to change authors" ; Reviewers who have prior experience give more useful comments as they have more knowledge about design constraints and implementation.
 * 5) Vague: "Project management can also identify weak reviewers and take necessary steps to help them become efficient." - but missing statistics?

Under-resourced or unclear responsibilities
⏴ [ org/soc | time2rev/accept ] Lack of repository owners / maintainers, or under-resourced or unclear responsibilities (everybody expecting another person to review). For MediaWiki core, cf. T115852 and T1287.

"Changes failing to capture a reviewer's interest remain unreviewed" due to self-selecting process of reviewers, or everybody expects another person in the team to review. "When everyone is responsible for something, nobody is responsible"


 * 1) ❌ Better statistics (http://korma.wmflabs.org/browser/gerrit_review_queue.html) to identify unmaintained areas within a codebase or codebases with unclear maintenance responsibilities.
 * 2) ❌ Define a role to "Assign reviews that nobody selects." (There might be (old) code areas that only one or zero developers understand.) Might need an overall Gerrit wrangler similar to a Bugwrangler.
 * 3) ❌ Clarify and centrally document which WMF Engineering/Product teams are responsible for which codebases, and Team/Maintainer ⟷ Codebase/Repository relations (Example: T121889: How the WMF Reading team manages extensions)
 * 4) ❌ Actively outreach to volunteers for unmaintained codebases via Gerrit/Project_ownership?
 * 5) ❌ Vague: Monthly "Project in need of a maintainer" campaign on wikitech-l?
 * 6) ❌ Identify Phabricator projects with zero members/watchers? Or irrelevant as interested people might use custom queries instead of notifications?
 * 7) ❌ Vague technical future: Automatic reviewer suggestion systems?

Hard to identify good reviewer candidates
⏴ [ tech/org | time2rev ] Hard for new contributors to identify and add good reviewers

"choice of reviewers plays an important role on reviewing time. More active reviewers provide faster responses" but "no correlation between the amount of reviewed patches on the reviewer positivity".
 * 1) ❌ Check "owners" tool in Phabricator "for assigning reviewers based on file ownership" so reviewers get notified of patches in their areas of interest. In Gerrit this exists via https://www.mediawiki.org/wiki/Gerrit/watched_projects but is limited.
 * 2) ❌ Either have automated updating of outdated manual Developers/Maintainers, or replace individual names on Developers/Maintainers by links to Phabricator project description pages and encourage people (how?) to become project members/watchers..

Unhelpful reviewer comments
⏴ [ soc | time2rev/accept ] Due to unhelpful reviewer comments, contributors spend time on creating many revisions/iterations before successful merge.


 * 1) ❌ Make sure documentation for reviewers states:
 * 2) Reviewers' CR comments considered useful by contributors: identifying functional issues; identifying corner cases potentially not covered; suggestions for APIs/designs/code conventions to follow.
 * 3) Reviewers' CR comments considered somewhat useful by contributors: coding guidelines; identifying alternative implementations or refactoring
 * 4) Reviewers' CR comments considered not useful by contributors: Authors consider reviewers praising on code segments, reviewers asking questions to understand the implementation, and reviewers pointing out future issues not related to the specific code (should be filed as tasks) as not useful.
 * 5) Reviewers' CR comments considered somewhat useful by contributors: coding guidelines; identifying alternative implementations or refactoring
 * 6) ❌ Agree and document how to use -1: «Some people tend to use it in an "I don't like this but go ahead and merge if you disagree" sense which usually does not come across well. OTOH just leaving a comment makes it very hard to keep track - I have been asked in the past to -1 if I don't like something but don't consider it a big deal, because that way it shows up in Gerrit as something that needs more work.»
 * 7) Stakeholders with different expertise areas to review aspects need to split reviewing parts of a larger patch.
 * 8) ❌ T115850: Agree and document if there be a guideline for reviewers to mark controversial patches (when it comes to project direction) as CR-2. If yes, have that guideline also link to the item "discussion prior to patch submission" under "Document for contributors".
 * 9) ❌ Related documentation pages to check / update (are there more?): Gerrit/Code_review, T207: Update Code Review related documentation on wiki pages

Weak review culture
⏴ [ org | time2rev ] Prioritization / weak review culture: more pressure to write new code than to review patches contributed?

Introduce and foster routine and habit across developers to spend a certain amount of time each day for reviewing patches (or part of standup) and team peer review on complex patches.
 * 1) ❌ Contact Team Practices Group about their thoughts how this can be fostered and whether that is in their scope?
 * 2) ❌ Write code to display "a prominent indicator of whether or not you've pushed more changesets than you've reviewed" ?
 * 3) Technical: Allow finding / explicitly marking first contributions by listing recent first contributions and their time to review on korma's code_contrib_new_gone in T63563. Someone responsible to ping, follow up, and (with organizational knowledge) to add potential reviewers to such first patches. Might need an overall Gerrit wrangler similar to a Bugwrangler.

Workload of existing reviewers
⏴ [ org | time2rev/accept ] Workload of existing reviewers; too many items on their list already

Reviewer's Queue Length: "the shorter the queue, the more likely the reviewer is to do a thorough review and respond quickly" and the longer the more likely it takes longer but "better chance of getting in" (due to more sloppy review?).
 * 1) ❌ Tool support to propose reviewers or display on how many unreviewed patches a reviewer is already added so the author can choose other reviewers. Proposal to add reviewers to patches but needs good knowledge of community members as otherwise creating noise.
 * 2) ❌ Potentially document that "two reviewers find an optimal number of defects - the cost of adding more reviewers isn't justified [...]"
 * 3) ❌ Documentation for reviewers: "we should encourage people to remove themselves from reviewers when they are certain they won't review the patch. A lot of noise and wasted time is created by the fact that people are unable to keep their dashboards clean"
 * 4) CR-1 gets lost when a Gerrit reviewer removes themselves (example) hence Gerrit lists (more) items which look unreviewed. ❌ Check if same problem exists in Phabricator Differential?
 * 5) ❌ Agree and document for reviewers: Should 'Cannot be Merged' be a reason to CR-1 to have a 'cleaner' list?

Poor quality of contributors' patches
⏴ [ soc | time2rev ] Due to poor quality of contributors' patches, reviewers spend time on reviewing many revisions/iterations before successful merge. Might make reviewers ignore instead of reviewing again and again with CR-1.


 * 1) ❌ Make sure documentation for contributors states:
 * 2) Patches should be "small, independent, and complete".
 * 3) When it comes to changesets, "[I]f there are more files to review [in your patch], then a thorough review takes more time and effort" and "review effectiveness decreases with the number of files in the change set."
 * 4) When it comes to changesets, small patches (max 4 lines changed) "have a higher chance to be accepted than average, while large patches are less likely to be accepted" (probability) but "one cannot determine that the patch size has a significant influence on the time until a patch is accepted" (time) Small, independent, complete patches are more likely to be accepted.
 * 5) Patch Size: "Review time [is] weakly correlated to the patch size" but "Smaller patches undergo fewer rounds of revisions"
 * 6) Reasons for rejecting a patch (not all are equally decisive; "less decisive reasons are usually easier to judge" when it comes to costs explaining rejections):
 * 7) Problematic implementation or solution: Compilation errors; Test failures; Incomplete fix; Introducing new bugs; Wrong direction; Suboptimal solution works but there is a more simple or efficient way); Solution too aggressive for end users; Performance; Security
 * 8) Difficult to read or maintain: Including unnecessary changes (to split into separate patch); Violating coding style guidelines; Bad naming (e.g. variable names); Patch size too large (but rarely matters as it's ambiguous - if necessary it's not a problem); Missing docs; Inconsistent or misleading docs; No accompanied test cases (❌ How much are "No accompanied test cases" a CR-1/-2 reason in Wikimedia? In which cases do we require unit tests? Should be more deterministic?); Integration conflicts with existing code; Duplication; Misuse of API; risky changes to internal APIs; not well isolated
 * 9) Deviating from the project focus or scope: Idea behind is not of core interest; irrelevant or obsolete
 * 10) Affecting the development schedule / timing: Freeze; low urgency; Too late
 * 11) Lack of communication or trust: Unresponsive patch authors; no discussion prior to patch submission; patch authors' expertise and reputation
 * 12) cf. Upstream Phabricator reasons why patches can get rejected
 * 13) There is a mismatch of judgement: Patch reviewers consistently consider test failures, incomplete fix, introducing new bugs, suboptimal solution, inconsistent docs way more decisive for rejecting than authors.
 * 14) Propose guidelines for writing acceptable patches:
 * 15) Authors should make sure that patch is in scope and relevant before writing patch
 * 16) Authors should be careful to not introduce new bugs instead of only focussing on the target
 * 17) Authors should not only care if the patch works well but also whether it's an optimal solution
 * 18) Authors should not include unnecessary changes and should check that corner cases are covered
 * 19) Authors should update or create related documentation --- see Development policy
 * 20) Patch Writer Experience is relevant: Be patient and grow. "more experienced patch writers receive faster responses" plus more positive ones. Contributors' very first patch is likely to get positive feedback in WebKit; for their 3rd-6th patch it is harder.
 * 21) ❌ Related documentation pages to check / update (are there more?): mw:Manual:Coding_conventions, Gerrit/Code_review/Getting_reviews, T207: Update Code Review related documentation on wiki pages
 * 22) ❌ Agree on and document testing responsibility: "making clear who is responsible for testing. I often refrain from merging simple patches because I feel I should not merge code without testing it, but then never get around to do that as it might require setting up a whole new test environment and often figuring out exactly how to test. As I understand Differential will be an improvement there as it requires patch authors to fill out a test plan. "

Hard to realize a repo is unmaintained
⏴ [ tech | time2rev ] Hard to realize how (in)active a repository is for a potential contributor


 * 1) ❌ Technical implementation to display "lack of recent activity" information in Gitblit/Diffusion and Gerrit/Differential?
 * 2) Allow contributor to act via Gerrit/Project_ownership

No culture to improve changesets by other contributors
⏴ [ org | time2rev/accept ] Changesets are rarely picked up by other developers


 * 1) ❌ Document best practices to amend a change written by another contributor if you are interested in bringing the patch forward: T121751 ("Phabricator doesn't really encourage it" and requires commandeering a revision)

Hard to find related patches
⏴ [ tech | accept ] Hard to find existing "related" patches in a certain code area when working on your own patch in that area. Hence more potential rebase/merge conflicts?


 * 1) ✅ Differential offers "Recent Similar Open Revisions". Gerrit might have such a feature in a newer version.

Lack of sync between teams
⏴ [ soc | time2rev ] Lack of sync between developer teams: team A stuck because team B doesn't review their patches?


 * 1) ✅ |Blocked-on-* projects in Phabricator exist - likely not a bigger issue in Wikimedia?

Misc stuff that has not found a related section yet

 * Code review "application is inconsistent and enforcement uneven."
 * Followup fixing culture? "after the castle has been conquered and the change is in, it is very difficult to revert it or to get original developers to help fix some broken aspect of a merged change"
 * https://meta.wikimedia.org/wiki/Grants:IdeaLab/Making_Gerrit_access_easier_for_developers_new_to_MediaWiki
 * "among the factors we studied, non-technical (organizational and personal) ones are betters predictors" (means: possible factors that might affect the outcome and interval of the code review process) "compared to traditional metrics such as patch size or component, and bug priority."
 * Priority: significant correlation between priority in issue tracker and positivity
 * Component: "no relation between positivity and the component factor"
 * Organization: "both Apple and Google are more positive about their own patches than 'foreign' patches" to WebKit
 * Reviewers:
 * "[R]eviewers from different teams gave slightly more useful comments than reviewers from the same team [...] however, the magnitude of the difference is quite small"
 * Patch acceptance: Developer experience, patch maturity; Review time impacted by submission time, number of code areas affected, number of suggested reviewers, developer experience.