User:AKlapper (WMF)/Code Review

From mediawiki.org

In Wikimedia we would like to review and merge better code faster. Especially patches submitted by volunteers. Code Review should be a tool and not an obstacle. Also see T101686, T78768.

The Wikimedia Foundation allowed me to spend some time on research earlier in 2016. The results might also be interesting for other free and open source software projects.

The list below incorporates random literature and comments from T114419 and other Wikimedia Phabricator tasks. (Wikimedia considered migrating from Gerrit to Differential (T191182 a while ago).

Benefits of Code Review are knowledge transfer, increased team awareness, and finding alternative solutions. Good debates help to get to a higher standard of coding and drives quality.[1] I see three dimensions of potential influential factors and potential actions (that often cannot be cleanly separated):

  • 3 aspects: social (soc), technical (tech), organizational (org).
  • 2 roles: contributor, reviewer.
  • 3 factors: Patch-Acceptance/Positivity-Likeliness (accept), Patch-Time-to-review/merge (time2rev); Contributor Onboarding (not covered here).

In general, "among the factors we studied, non-technical (organizational and personal) ones are betters predictors" (means: possible factors that might affect the outcome and interval of the code review process) "compared to traditional metrics such as patch size or component, and bug priority."[2]

The following items are theses about potential influential factors and ideas/actions. Some might not apply to your project for various reasons.

Culture[edit]

Weak review culture[edit]

[ org | time2rev ] Prioritization / weak review culture: more pressure to write new code than to review patches contributed? Code review "application is inconsistent and enforcement uneven."[3]

Introduce and foster routine and habit across developers to spend a certain amount of time each day for reviewing patches (or part of standup), and team peer review on complex patches[1].

  1. Yes Done Contact mw:Team Practices Group about their thoughts? -- Out of scope.
  2. N Not done Write code to display "a prominent indicator of whether or not you've pushed more changesets than you've reviewed"[4]?
  3. Technical: Allow finding / explicitly marking first contributions by listing recent first contributions and their time to review on C_Gerrit_Demo. Someone responsible to ping, follow up, and (with organizational knowledge) to add potential reviewers to such first patches. Might need an overall Code review wrangler similar to a mw:Bugwrangler/Bugmaster.

No culture to improve changesets by other contributors[edit]

[ org | time2rev/accept ] Changesets are rarely picked up by other developers[5]. After merging, "it is very difficult to revert it or to get original developers to help fix some broken aspect of a merged change"[5] regarding followup fixing culture.

  1. N Not done Document best practices to amend a change written by another contributor if you are interested in bringing the patch forward: phab:T121751 ("Phabricator doesn't really encourage it" and requires commandeering a revision)

Skill / knowledge[edit]

Unstructured review approach[edit]

[ soc | time2rev ] Unstructured review approach potentially demotivates first patch contributors, but fast and structured feedback is crucial for keeping them engaged

Set up and document a multi-phase, structured patch review process for reviewers: Three steps proposed by Sarah Sharp for maintainers / reviewers [6], quoting:

  1. Fast feedback whether it is wanted: Is the idea behind the contribution sound? / Do we want this? Yes, no. If the contribution isn’t useful or it’s a bad idea, it isn’t worth reviewing further. Or “Thanks for this contribution! I like the concept of this patch, but I don’t have time to thoroughly review it right now. Ping me if I haven’t reviewed it in a week.” The absolute worst thing you can do during phase one is be completely silent.[6]
  2. Architecture: Is the contribution architected correctly? Squash the nit-picky, perfectionist part of yourself that wants to comment on every single grammar mistake or code style issue. Instead, only include a sentence or two with a pointer to coding style documentation, or any tools they will need to run their contribution through.[6]
  3. Polishing: Is the contribution polished? Get to comment on the meta (non-code) parts of the contribution. Correct any spelling or grammar mistakes, suggest clearer wording for comments, and ask for any updated documentation for the code[6]

Unhelpful reviewer comments[edit]

[ soc | time2rev/accept ] Due to unhelpful reviewer comments, contributors spend time on creating many revisions/iterations before successful merge.

  1. N Not done Make sure documentation for reviewers states:
    1. Reviewers' CR comments considered useful by contributors: identifying functional issues; identifying corner cases potentially not covered; suggestions for APIs/designs/code conventions to follow.[7]
    2. Reviewers' CR comments considered somewhat useful by contributors: coding guidelines; identifying alternative implementations or refactoring[7]
    3. Reviewers' CR comments considered not useful by contributors: Authors consider reviewers praising on code segments, reviewers asking questions to understand the implementation, and reviewers pointing out future issues not related to the specific code (should be filed as tasks) as not useful.[7]
    4. Reviewers' CR comments considered somewhat useful by contributors: coding guidelines; identifying alternative implementations or refactoring[7]
    5. Avoid negativity and ask the right questions the right way. As a reviewer, ask questions instead of making demands to foster a technical discussion: "What do you think about...?" "Did you consider...?" "Can you clarify...?" "Why didn't you just..." provides a judgement, putting people on the defensive. Be positive.[1]
    6. If you learned something or found something particular well, give compliments. (As code review is often about critical feedback only.)[1]
    7. N Not done Agree and document how to use -1: «Some people tend to use it in an "I don't like this but go ahead and merge if you disagree" sense which usually does not come across well. OTOH just leaving a comment makes it very hard to keep track - I have been asked in the past to -1 if I don't like something but don't consider it a big deal, because that way it shows up in Gerrit as something that needs more work.»[8]
    8. Stakeholders with different expertise areas to review aspects need to split reviewing parts of a larger patch.

Poor quality of contributors' patches[edit]

[ soc | time2rev ] Due to poor quality of contributors' patches, reviewers spend time on reviewing many revisions/iterations before successful merge. Might make reviewers ignore instead of reviewing again and again with CR-1.

  1. N Not done Make sure documentation for contributors states:
    1. Small, independent, complete patches are more likely to be accepted.[9]
    2. "[I]f there are more files to review [in your patch], then a thorough review takes more time and effort"[7] and "review effectiveness decreases with the number of files in the change set."[7]
    3. Small patches (max 4 lines changed) "have a higher chance to be accepted than average, while large patches are less likely to be accepted" (probability) but "one cannot determine that the patch size has a significant influence on the time until a patch is accepted" (time) [10]
    4. Patch Size: "Review time [is] weakly correlated to the patch size" but "Smaller patches undergo fewer rounds of revisions"[2]
    5. Reasons for rejecting a patch (not all are equally decisive; "less decisive reasons are usually easier to judge" when it comes to costs explaining rejections):[11]
      1. Problematic implementation or solution: Compilation errors; Test failures; Incomplete fix; Introducing new bugs; Wrong direction; Suboptimal solution works but there is a more simple or efficient way); Solution too aggressive for end users; Performance; Security
      2. Difficult to read or maintain: Including unnecessary changes (to split into separate patch); Violating coding style guidelines; Bad naming (e.g. variable names); Patch size too large (but rarely matters as it's ambiguous - if necessary it's not a problem); Missing docs; Inconsistent or misleading docs; No accompanied test cases (N Not done How much are "No accompanied test cases" a CR-1/-2 reason in Wikimedia? In which cases do we require unit tests?[12] Should be more deterministic?); Integration conflicts with existing code; Duplication; Misuse of API; risky changes to internal APIs; not well isolated
      3. Deviating from the project focus or scope: Idea behind is not of core interest; irrelevant or obsolete
      4. Affecting the development schedule / timing: Freeze; low urgency; Too late
      5. Lack of communication or trust: Unresponsive patch authors; no discussion prior to patch submission; patch authors' expertise and reputation[11]
      6. cf. Upstream Phabricator reasons why patches can get rejected
    6. There is a mismatch of judgement: Patch reviewers consistently consider test failures, incomplete fix, introducing new bugs, suboptimal solution, inconsistent docs way more decisive for rejecting than authors.[11]
    7. Propose guidelines for writing acceptable patches:[11]
      1. Authors should make sure that patch is in scope and relevant before writing patch
      2. Authors should be careful to not introduce new bugs instead of only focussing on the target
      3. Authors should not only care if the patch works well but also whether it's an optimal solution
      4. Authors should not include unnecessary changes and should check that corner cases are covered
      5. Authors should update or create related documentation[11] --- see Development policy#Documentation_policy
    8. Patch Writer Experience is relevant: Be patient and grow. "more experienced patch writers receive faster responses" plus more positive ones. Contributors' very first patch is likely to get positive feedback in WebKit; for their 3rd-6th patch it is harder.[2]
    9. N Not done Related documentation pages to check / update (are there more?): mw:Manual:Coding_conventions, mw:Gerrit/Code_review/Getting_reviews, T207: Update Code Review related documentation on wiki pages
  2. N Not done Agree on who is responsible for testing and document responsibility. (Tool specific: Phabricator Differential can force patch authors to fill out a test plan.)[8]

Likeliness of patch acceptance depends on: Developer experience, patch maturity; Review time impacted by submission time, number of code areas affected, number of suggested reviewers, developer experience.[13]

Resourcing[edit]

Lack of enough skillful, available, confident reviewers and mergers[edit]

[ org/soc | time2rev/accept ] Not enough skillful or available reviewers and potential lack of confident reviewers[14]? Not enough reviewers with rights to actually merge into the codebase?

  1. N Not done Capacity building: Discuss consider handing out code review rights to more (trusted) volunteers by recognizing active CR+1 users (based on yet-to-create statistics? Or use Nemo's data at http://koti.kapsi.fi/~federico/crstats/ ?) and encourage them to become habitual and trusted reviewers; actively nominate to become maintainers[15]? Potentially recognize people not executing their CR+2 rights anymore. This requires statistics (to identify very active reviewers) and stakeholders (to decide on nominations).
  2. N Not done Review current CR+2 handout practice - documentation at Gerrit/+2#Granting.
  3. N Not done Consider establishing prestigious roles for people, like "Reviewers"?[16]
  4. N Not done "we recommend including inexperienced reviewers so that they can gain the knowledge and experiences required to provide useful comments to change authors"[7]; Reviewers who have prior experience give more useful comments as they have more knowledge about design constraints and implementation.[7]

Under-resourced or unclear responsibilities[edit]

[ org/soc | time2rev/accept ] Lack of repository owners / maintainers, or under-resourced or unclear responsibilities (everybody expecting another person to review). For MediaWiki core, cf. T115852 and T1287.

"Changes failing to capture a reviewer's interest remain unreviewed"[17] due to self-selecting process of reviewers, or everybody expects another person in the team to review. "When everyone is responsible for something, nobody is responsible"[12]

  1. N Not done Better statistics (http://korma.wmflabs.org/browser/gerrit_review_queue.html) to identify unmaintained areas within a codebase or codebases with unclear maintenance responsibilities.
  2. N Not done Define a role to "Assign reviews that nobody selects."[17] (There might be (old) code areas that only one or zero developers understand.) Might need an overall Code Review wrangler similar to a mw:Bugwrangler/Bugmaster.
  3. N Not done Clarify and centrally document which Engineering/Development/Product teams are responsible for which codebases, and Team/Maintainer ⟷ Codebase/Repository relations (Example: How Wikimedia Foundation's Reading team manages extensions)
  4. N Not done Actively outreach to volunteers for unmaintained codebases via mw:Gerrit/Project_ownership#Requesting_repository_ownership? Might need an overall Code Review wrangler similar to a mw:Bugwrangler/Bugmaster.
  5. N Not done Advertise a monthly "Project in need of a maintainer" campaign on a technical mailing list and/or blog posts?

Workload of existing reviewers[edit]

[ org | time2rev/accept ] Workload of existing reviewers; too many items on their list already

Reviewer's Queue Length: "the shorter the queue, the more likely the reviewer is to do a thorough review and respond quickly" and the longer the more likely it takes longer but "better chance of getting in" (due to more sloppy review?)[2].

  1. N Not done Tool support to propose reviewers or display on how many unreviewed patches a reviewer is already added so the author can choose other reviewers. Proposal to add reviewers to patches [15] but needs good knowledge of community members as otherwise creating noise.
  2. N Not done Potentially document that "two reviewers find an optimal number of defects - the cost of adding more reviewers isn't justified [...]" [17]
    1. N Not done Documentation for reviewers: "we should encourage people to remove themselves from reviewers when they are certain they won't review the patch. A lot of noise and wasted time is created by the fact that people are unable to keep their dashboards clean" [18]
  3. Tool specific: Gerrit's CR-1 gets lost when a reviewer removes themselves (phab:T115851) hence Gerrit lists (more) items which look unreviewed. N Not done Check if same problem exists in Phabricator Differential?
  4. N Not done Agree and document for reviewers: Should 'Patch cannot be merged due to conflicts' be a reason to CR-1[5] to have a 'cleaner' list? (But depending on your Continuous Integration infrastructure tools, rejection via static analysis might happen automatically anyway.)

Documentation / communication[edit]

Hard to identify good reviewer candidates[edit]

[ tech/org | time2rev ] Hard for new contributors to identify and add good reviewers

"choice of reviewers plays an important role on reviewing time. More active reviewers provide faster responses" but "no correlation between the amount of reviewed patches on the reviewer positivity"[2].

  1. N Not done Check "owners" tool in Phabricator "for assigning reviewers based on file ownership"[19] so reviewers get notified of patches in their areas of interest. In Gerrit this exists via https://www.mediawiki.org/wiki/Gerrit/watched_projects but is limited.
  2. N Not done Encourage people to become project members/watchers.[20].
  3. N Not done Either have automated updating of outdated manual mw:Developers/Maintainers, or replace individual names on mw:Developers/Maintainers by links to Phabricator project description pages.
  4. N Not done In the vague technical future, automatic reviewer suggestion systems could help[7], like automatically listing people who lately touched code in a code repository or related tasks in an issue tracking system and the length of their current review queue. (Proofs of concept have been published in scientific papers but code is not always made available.) Also see https://tools.wmflabs.org/reviewers/ brought up in phab:T155851.
  5. Tried in early 2019: Enabled Gerrit's "reviewers-by-blame" plugin in phab:T101131 but quickly disabled again; needs fixes listed in phab:T214397

Hard to realize a repository is unmaintained[edit]

[ tech | time2rev ] Hard to realize how (in)active a repository is for a potential contributor

  1. N Not done Implement displaying "recent activity" information somewhere in the code repository browser and code review tool, to communicate expectations.
  2. Have documentation that describe steps how to ask for help and/or take over maintainership, to allow contributors to act if interested in the code repository. For Wikimedia these docs are located at mw:Gerrit/Project_ownership#Requesting_repository_ownership.

Hard to find related patches[edit]

[ tech | accept ] Hard to find existing "related" patches in a certain code area when working on your own patch in that area, or when reviewing several patches in the same code area. (Hence there might also be some potential rebase/merge conflicts[5] to avoid if possible.)

  1. Yes Done Differential offers "Recent Similar Open Revisions".[21] Gerrit might have such a feature in a newer version.[22]

Lack of synchronization between teams[edit]

[ soc | time2rev ] Lack of synchronization between developer teams: team A stuck because team B doesn't review their patches?

  1. Yes Done Organization specific: Wikimedia has regular "Scrum of Scrum" meetings of all scrum masters to communicate when the work of a team is blocked by another team.

See Also[edit]

References[edit]

  1. 1.0 1.1 1.2 1.3 Prior, D., ""Cultivating a Code Review Culture" at RailsConf 2015
  2. 2.0 2.1 2.2 2.3 2.4 Baysal, O.; Kononenko, O.; Holmes, R.; Godfrey, M.W., "The influence of non-technical factors on code review," in Reverse Engineering (WCRE), 2013 20th Working Conference on , vol., no., pp.122-131, 14-17 Oct. 2013
  3. Ori: Make code review not suck
  4. Gilles Debuc: Core review before writing new code
  5. 5.0 5.1 5.2 5.3 saper: Make code review not suck
  6. 6.0 6.1 6.2 6.3 Sarah Sharp: The Gentle Art of Patch Review
  7. 7.0 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 Amiangshu Bosu, Michaela Greiler, and Christian Bird. 2015. Characteristics of useful code reviews: an empirical study at Microsoft. In Proceedings of the 12th Working Conference on Mining Software Repositories (MSR '15). IEEE Press, Piscataway, NJ, USA, 146-156.
  8. 8.0 8.1 Tgr: Make code review not suck
  9. Peter C. Rigby, Daniel M. German, and Margaret-Anne Storey. 2008. Open source software peer review practices: a case study of the apache server. In Proceedings of the 30th international conference on Software engineering (ICSE '08). ACM, New York, NY, USA, 541-550.
  10. Peter Weißgerber, Daniel Neu, and Stephan Diehl. 2008. Small patches get in!. In Proceedings of the 2008 international working conference on Mining software repositories (MSR '08). ACM, New York, NY, USA, 67-76.
  11. 11.0 11.1 11.2 11.3 11.4 Yida Tao; Donggyun Han; Sunghun Kim, "Writing Acceptable Patches: An Empirical Study of Open Source Project Patches," in Software Maintenance and Evolution (ICSME), 2014 IEEE International Conference on , vol., no., pp.271-280, Sept. 29 2014-Oct. 3 2014
  12. 12.0 12.1 Brian Wolff: Make code review not suck
  13. Yujuan Jiang, Bram Adams, and Daniel M. German. 2013. Will my patch make it? and how fast?: case study on the Linux kernel. In Proceedings of the 10th Working Conference on Mining Software Repositories (MSR '13). IEEE Press, Piscataway, NJ, USA, 101-110.
  14. bd808: Make code review not suck
  15. 15.0 15.1 Sumana Harihareswara: "Suggestions to reduce code review backlog"
  16. jayvdb: Make code review not suck
  17. 17.0 17.1 17.2 Rigby, Cleary, Painchaud, Storey, German: Contemporary Peer Review Action: Lessons from Open Source Development, 2012
  18. Nemo_bis: "Number of reviewers"
  19. mmodell: Make code review not suck
  20. dzahn: Unclear maintenance responsibilities for some parts of MediaWiki core repository?
  21. mmodell: Make code review not suck
  22. matmarex: Make code review not suck