User:AKlapper (WMF)/Code Review

In Wikimedia we would like to review and merge better code faster. Especially patches submitted by volunteers. Code Review should be a tool and not an obstacle. Also see T101686, T78768.

The Wikimedia Foundation allowed me to spend some time on research earlier in 2016. The results might also be interesting for other free and open source software projects.

The list below incorporates random literature and comments from T114419 and other Wikimedia Phabricator tasks. Note that Wikimedia will migrate from Gerrit to Differential (T114320).

Benefits of Code Review are knowledge transfer, increased team awareness, and finding alternative solutions. Good debates help to get to a higher standard of coding and drives quality. I see three dimensions of potential influential factors and potential actions (that often cannot be cleanly separated):
 * 3 aspects: social (soc), technical (tech), organizational (org).
 * 2 roles: contributor, reviewer.
 * 3 factors: Patch-Acceptance/Positivity-Likeliness (accept), Patch-Time-to-review/merge (time2rev); Contributor Onboarding (not covered here).

In general, "among the factors we studied, non-technical (organizational and personal) ones are betters predictors" (means: possible factors that might affect the outcome and interval of the code review process) "compared to traditional metrics such as patch size or component, and bug priority."

Potential influential factors and ideas/actions
These are theses. Some might not apply to your project for various reasons.

Unstructured review approach
⏴ [ soc | time2rev ] Unstructured review approach potentially demotivates first patch contributors, but fast and structured feedback is crucial for keeping them engaged

Set up and document a multi-phase, structured patch review process for reviewers: Three steps proposed by Sarah Sharp for maintainers / reviewers, quoting:
 * 1) Fast feedback whether it is wanted: Is the idea behind the contribution sound? / Do we want this? Yes, no. If the contribution isn’t useful or it’s a bad idea, it isn’t worth reviewing further. Or “Thanks for this contribution!  I like the concept of this patch, but I don’t have time to thoroughly review it right now.  Ping me if I haven’t reviewed it in a week.” The absolute worst thing you can do during phase one is be completely silent.
 * 2) Architecture: Is the contribution architected correctly? Squash the nit-picky, perfectionist part of yourself that wants to comment on every single grammar mistake or code style issue.  Instead, only include a sentence or two with a pointer to coding style documentation, or any tools they will need to run their contribution through.
 * 3) Polishing: Is the contribution polished? Get to comment on the meta (non-code) parts of the contribution.  Correct any spelling or grammar mistakes, suggest clearer wording for comments, and ask for any updated documentation for the code

Lack of enough skillful, available, confident reviewers and mergers
⏴ [ org/soc | time2rev/accept ] Not enough skillful or available reviewers and potential lack of confident reviewers ? Not enough reviewers with rights to actually merge into the codebase?


 * 1) ❌ Capacity building: Discuss consider handing out code review rights to more (trusted) volunteers by recognizing active CR+1 users (based on yet-to-create korma statistics? Or use Nemo's data at http://koti.kapsi.fi/~federico/crstats/ ?) and encourage them to become habitual and trusted reviewers; actively nominate to become maintainers ? Potentially recognize people not executing their CR+2 rights anymore. This requires statistics (to identify very active reviewers) and stakeholders (to decide on nominations).
 * 2) ❌ Review current CR+2 handout practice - documentation at Gerrit/+2.
 * 3) ❌ Consider establishing prestigious roles for people, like "Reviewers"?
 * 4) ❌ "we recommend including inexperienced reviewers so that they can gain the knowledge and experiences required to provide useful comments to change authors" ; Reviewers who have prior experience give more useful comments as they have more knowledge about design constraints and implementation.

Under-resourced or unclear responsibilities
⏴ [ org/soc | time2rev/accept ] Lack of repository owners / maintainers, or under-resourced or unclear responsibilities (everybody expecting another person to review). For MediaWiki core, cf. T115852 and T1287.

"Changes failing to capture a reviewer's interest remain unreviewed" due to self-selecting process of reviewers, or everybody expects another person in the team to review. "When everyone is responsible for something, nobody is responsible"


 * 1) ❌ Better statistics (http://korma.wmflabs.org/browser/gerrit_review_queue.html) to identify unmaintained areas within a codebase or codebases with unclear maintenance responsibilities.
 * 2) ❌ Define a role to "Assign reviews that nobody selects." (There might be (old) code areas that only one or zero developers understand.) Might need an overall Code Review wrangler similar to a Bugwrangler/Bugmaster.
 * 3) ❌ Clarify and centrally document which Engineering/Development/Product teams are responsible for which codebases, and Team/Maintainer ⟷ Codebase/Repository relations (Example: How Wikimedia Foundation's Reading team manages extensions)
 * 4) ❌ Actively outreach to volunteers for unmaintained codebases via Gerrit/Project_ownership? Might need an overall Code Review wrangler similar to a Bugwrangler/Bugmaster.
 * 5) ❌ Advertise a monthly "Project in need of a maintainer" campaign on a technical mailing list and/or blog posts?

Hard to identify good reviewer candidates
⏴ [ tech/org | time2rev ] Hard for new contributors to identify and add good reviewers

"choice of reviewers plays an important role on reviewing time. More active reviewers provide faster responses" but "no correlation between the amount of reviewed patches on the reviewer positivity".
 * 1) ❌ Check "owners" tool in Phabricator "for assigning reviewers based on file ownership" so reviewers get notified of patches in their areas of interest. In Gerrit this exists via https://www.mediawiki.org/wiki/Gerrit/watched_projects but is limited.
 * 2) ❌ Encourage people to become project members/watchers..
 * 3) ❌ Either have automated updating of outdated manual Developers/Maintainers, or replace individual names on Developers/Maintainers by links to Phabricator project description pages.
 * 4) ❌ In the vague technical future, automatic reviewer suggestion systems could help, like automatically listing people who lately touched code in a code repository or related tasks in an issue tracking system and the length of their current review queue. (Proofs of concept have been published in scientific papers but code is not always made available.)

Unhelpful reviewer comments
⏴ [ soc | time2rev/accept ] Due to unhelpful reviewer comments, contributors spend time on creating many revisions/iterations before successful merge.


 * 1) ❌ Make sure documentation for reviewers states:
 * 2) Reviewers' CR comments considered useful by contributors: identifying functional issues; identifying corner cases potentially not covered; suggestions for APIs/designs/code conventions to follow.
 * 3) Reviewers' CR comments considered somewhat useful by contributors: coding guidelines; identifying alternative implementations or refactoring
 * 4) Reviewers' CR comments considered not useful by contributors: Authors consider reviewers praising on code segments, reviewers asking questions to understand the implementation, and reviewers pointing out future issues not related to the specific code (should be filed as tasks) as not useful.
 * 5) Reviewers' CR comments considered somewhat useful by contributors: coding guidelines; identifying alternative implementations or refactoring
 * 6) Avoid negativity and ask the right questions the right way. As a reviewer, ask questions instead of making demands to foster a technical discussion: "What do you think about...?" "Did you consider...?" "Can you clarify...?" "Why didn't you just..." provides a judgement, putting people on the defensive. Be positive.
 * 7) If you learned something or found something particular well, give compliments. (As code review is often about critical feedback only.)
 * 8) ❌ Agree and document how to use -1: «Some people tend to use it in an "I don't like this but go ahead and merge if you disagree" sense which usually does not come across well. OTOH just leaving a comment makes it very hard to keep track - I have been asked in the past to -1 if I don't like something but don't consider it a big deal, because that way it shows up in Gerrit as something that needs more work.»
 * 9) Stakeholders with different expertise areas to review aspects need to split reviewing parts of a larger patch.

Weak review culture
⏴ [ org | time2rev ] Prioritization / weak review culture: more pressure to write new code than to review patches contributed? Code review "application is inconsistent and enforcement uneven."

Introduce and foster routine and habit across developers to spend a certain amount of time each day for reviewing patches (or part of standup), and team peer review on complex patches.


 * 1) ✅ Contact Team Practices Group about their thoughts how this can be fostered and whether that is in their scope? -- Out of scope.
 * 2) ❌ Write code to display "a prominent indicator of whether or not you've pushed more changesets than you've reviewed" ?
 * 3) Technical: Allow finding / explicitly marking first contributions by listing recent first contributions and their time to review on korma's code_contrib_new_gone in T63563. Someone responsible to ping, follow up, and (with organizational knowledge) to add potential reviewers to such first patches. Might need an overall Code review wrangler similar to a Bugwrangler/Bugmaster.

Workload of existing reviewers
⏴ [ org | time2rev/accept ] Workload of existing reviewers; too many items on their list already

Reviewer's Queue Length: "the shorter the queue, the more likely the reviewer is to do a thorough review and respond quickly" and the longer the more likely it takes longer but "better chance of getting in" (due to more sloppy review?).
 * 1) ❌ Tool support to propose reviewers or display on how many unreviewed patches a reviewer is already added so the author can choose other reviewers. Proposal to add reviewers to patches but needs good knowledge of community members as otherwise creating noise.
 * 2) ❌ Potentially document that "two reviewers find an optimal number of defects - the cost of adding more reviewers isn't justified [...]"
 * 3) ❌ Documentation for reviewers: "we should encourage people to remove themselves from reviewers when they are certain they won't review the patch. A lot of noise and wasted time is created by the fact that people are unable to keep their dashboards clean"
 * 4) Tool specific: Gerrit's CR-1 gets lost when a reviewer removes themselves (T115851) hence Gerrit lists (more) items which look unreviewed. ❌ Check if same problem exists in Phabricator Differential?
 * 5) ❌ Agree and document for reviewers: Should 'Patch cannot be merged due to conflicts' be a reason to CR-1 to have a 'cleaner' list? (But depending on your Continuous Integration infrastructure tools, rejection via static analysis might happen automatically anyway.)

Poor quality of contributors' patches
⏴ [ soc | time2rev ] Due to poor quality of contributors' patches, reviewers spend time on reviewing many revisions/iterations before successful merge. Might make reviewers ignore instead of reviewing again and again with CR-1.


 * 1) ❌ Make sure documentation for contributors states:
 * 2) Small, independent, complete patches are more likely to be accepted.
 * 3) "[I]f there are more files to review [in your patch], then a thorough review takes more time and effort" and "review effectiveness decreases with the number of files in the change set."
 * 4) Small patches (max 4 lines changed) "have a higher chance to be accepted than average, while large patches are less likely to be accepted" (probability) but "one cannot determine that the patch size has a significant influence on the time until a patch is accepted" (time)
 * 5) Patch Size: "Review time [is] weakly correlated to the patch size" but "Smaller patches undergo fewer rounds of revisions"
 * 6) Reasons for rejecting a patch (not all are equally decisive; "less decisive reasons are usually easier to judge" when it comes to costs explaining rejections):
 * 7) Problematic implementation or solution: Compilation errors; Test failures; Incomplete fix; Introducing new bugs; Wrong direction; Suboptimal solution works but there is a more simple or efficient way); Solution too aggressive for end users; Performance; Security
 * 8) Difficult to read or maintain: Including unnecessary changes (to split into separate patch); Violating coding style guidelines; Bad naming (e.g. variable names); Patch size too large (but rarely matters as it's ambiguous - if necessary it's not a problem); Missing docs; Inconsistent or misleading docs; No accompanied test cases (❌ How much are "No accompanied test cases" a CR-1/-2 reason in Wikimedia? In which cases do we require unit tests? Should be more deterministic?); Integration conflicts with existing code; Duplication; Misuse of API; risky changes to internal APIs; not well isolated
 * 9) Deviating from the project focus or scope: Idea behind is not of core interest; irrelevant or obsolete
 * 10) Affecting the development schedule / timing: Freeze; low urgency; Too late
 * 11) Lack of communication or trust: Unresponsive patch authors; no discussion prior to patch submission; patch authors' expertise and reputation
 * 12) cf. Upstream Phabricator reasons why patches can get rejected
 * 13) There is a mismatch of judgement: Patch reviewers consistently consider test failures, incomplete fix, introducing new bugs, suboptimal solution, inconsistent docs way more decisive for rejecting than authors.
 * 14) Propose guidelines for writing acceptable patches:
 * 15) Authors should make sure that patch is in scope and relevant before writing patch
 * 16) Authors should be careful to not introduce new bugs instead of only focussing on the target
 * 17) Authors should not only care if the patch works well but also whether it's an optimal solution
 * 18) Authors should not include unnecessary changes and should check that corner cases are covered
 * 19) Authors should update or create related documentation --- see Development policy
 * 20) Patch Writer Experience is relevant: Be patient and grow. "more experienced patch writers receive faster responses" plus more positive ones. Contributors' very first patch is likely to get positive feedback in WebKit; for their 3rd-6th patch it is harder.
 * 21) ❌ Related documentation pages to check / update (are there more?): mw:Manual:Coding_conventions, Gerrit/Code_review/Getting_reviews, T207: Update Code Review related documentation on wiki pages
 * 22) ❌ Agree on who is responsible for testing and document responsibility. (Tool specific: Phabricator Differential can force patch authors to fill out a test plan.)

Likeliness of patch acceptance depends on: Developer experience, patch maturity; Review time impacted by submission time, number of code areas affected, number of suggested reviewers, developer experience.

Hard to realize a repository is unmaintained
⏴ [ tech | time2rev ] Hard to realize how (in)active a repository is for a potential contributor


 * 1) ❌ Implement displaying "recent activity" information somewhere in the code repository browser and code review tool, to communicate expectations.
 * 2) Have documentation that describe steps how to ask for help and/or take over maintainership, to allow contributors to act if interested in the code repository. For Wikimedia these docs are located at Gerrit/Project_ownership.

No culture to improve changesets by other contributors
⏴ [ org | time2rev/accept ] Changesets are rarely picked up by other developers. After merging, "it is very difficult to revert it or to get original developers to help fix some broken aspect of a merged change" regarding followup fixing culture.


 * 1) ❌ Document best practices to amend a change written by another contributor if you are interested in bringing the patch forward: T121751 ("Phabricator doesn't really encourage it" and requires commandeering a revision)

Hard to find related patches
⏴ [ tech | accept ] Hard to find existing "related" patches in a certain code area when working on your own patch in that area, or when reviewing several patches in the same code area. (Hence there might also be some potential rebase/merge conflicts to avoid if possible.)


 * 1) ✅ Differential offers "Recent Similar Open Revisions". Gerrit might have such a feature in a newer version.

Lack of synchronization between teams
⏴ [ soc | time2rev ] Lack of synchronization between developer teams: team A stuck because team B doesn't review their patches?


 * 1) ✅ Organization specific: Wikimedia has regular "Scrum of Scrum" meetings of all scrum masters to communicate when the work of a team is blocked by another team.

Misc stuff that has not found a related section yet

 * https://meta.wikimedia.org/wiki/Grants:IdeaLab/Making_Gerrit_access_easier_for_developers_new_to_MediaWiki (Obsoleted by moving from Gerrit to Differential?)
 * Priority: significant correlation between priority in issue tracker and positivity
 * Component: "no relation between positivity and the component factor"
 * Organization: "both Apple and Google are more positive about their own patches than 'foreign' patches" to WebKit
 * Reviewers:
 * "[R]eviewers from different teams gave slightly more useful comments than reviewers from the same team [...] however, the magnitude of the difference is quite small"