Gerrit/Code review

This is a guide to reviewing contributions to the MediaWiki codebase. While this guide is written primarily for MediaWiki developers doing code reviews, developers submitting code for review will also find the guide useful.

Goals

 * Encourage new developers to continue contributing to MediaWiki.
 * Be nice. Frame your comments as helpful suggestions.
 * Tolerate idiosyncrasies where possible. Developers should feel that their creative input is valued and preserved.
 * Review patches and commits soon after they've been contributed, to avoid bitrot and feelings of abandonment.
 * Encourage community developers to contribute to the code under review.
 * Ensure that the code is readable and follows the coding conventions.
 * Contributed code should work as advertised, that is, any bugs found in the code should be fixed.
 * Protect MediaWiki users by ensuring an absence of security vulnerabilities. The code should follow the best practices outlined in Security for developers.
 * Aim for interface stability and backwards compatibility.
 * Review public interfaces carefully, so that future changes can be kept to a minimum.

Choosing revisions
There is a lot of code to review. How do we break up the task into more manageable portions?

There are several basic patterns which reviewers have found to be useful:

Chronological (and Reverse Chronological)
 * Start at the earliest revision, read each diff in turn until you finish the queue. Alternately, start at the latest revision and read each diff in turn until you reach the end of the open issues.
 * This approach is good for minor revisions, but using it exclusively has a negative effect on reviewer sanity.
 * The way different projects are interleaved in the chronological view means that it feels like you're constantly suspending one project, doing something else for a while, then resuming where you left off.

By project
 * Identify major pieces of work or sets of related revisions. A quick chronological review can be a good way to do this.
 * Open all changeset pages as tabs in a browser window. Open relevant files in a text editor.
 * Review new or unfamiliar files in their entirety. This is an engaging and satisfying way to work, and so should be preferred when possible. Pick a changeset with a major change to the relevant file and put your comments on that Code Review page.
 * Be careful not to blame the current developer for code written by a previous developer. Use the  command to trace changes to the revision where they were introduced. You can ask the current developer to fix the previous developer's code, but be aware that they will not feel obliged to do so, so frame your request accordingly.
 * If a project involves a major change to a large file, and the change is spread over many revisions, you can use  to combine all the changes into a single file. This allows you to avoid reviewing superseded changes, and to get an overall view of the changes being made. This is faster than reviewing the revisions individually.
 * If multiple projects are interleaved on the same file or directory, it may still be possible to get a project-specific diff, by checking out a new copy and pulling the relevant patchsets. Then you can run  on the combined file.
 * This method does not work for minor changes that are not part of any project. It also has some overhead: it requires the reviewer to identify projects and changesets. In some cases this is easy (such as when reviewing a new extension), but at other times it is less obvious.

By author
 * Make a list of authors, pick one, load their revision list in Gerrit.
 * Work through the revisions chronologically, or split their work up into projects and review by project.
 * This method allows the reviewer to get to know individual developers: their skills, faults and interests. The work has a sense of progress and continuity.

Review Guidelines
If the commit doesn't fall into one of the special cases that allow for high-level review, an in-depth review is necessary.

Remember that when you mark a commit with +2, you're saying:
 * I've inspected this code, and
 * the code makes sense, and
 * the code works and does something that we want to do, and
 * the code doesn't do anything that we don't want it to do, and
 * the code follows our development guidelines, and
 * the code will still make sense in five years.

If any of these conditions isn't met, then the commit is problematic. The checklist below provides a list of issues to look for.

Style

 * Ensure the style guide is followed, such as whitespace, line length, brace placement, etc. This is easy to check. It is essential for core contributions. For extension contributions, style deviations can be tolerated initially.

Readability

 * Functions should do what their names say. Choosing the correct verb is essential, a get* function should get something, a set* function should set something.
 * Variables names should use English, abbreviations should be avoided where possible.
 * Doc comments on functions are preferred.
 * Overly-complicated code should be simplified. This usually means making it longer. For instance:
 * Ternary operators (?:) may need to be replaced with if/else.
 * Long expressions may need to be broken up into several statements.
 * Clever use of operator precedence, shortcut evaluation, assignment expressions, etc. may need to be rewritten.
 * Duplication should be avoided.
 * Duplication, whether within files or across files, is bad for readability since it's necessary to play "spot the difference" to work out what has changed. Reading many near-copies of some code necessarily takes longer than reading a single abstraction.
 * It's also bad for maintainability, since if a bug (or missing feature) is found in the copied code, all instances of it have to be fixed.
 * It's common for new developers to copy large sections of code from other extensions or from the core, and to change some minor details in it. If a developer seems to be writing code which is "too good" for their level of experience, try grepping the code base for some code fragments, to identify the source. Comparing the original and the copy allows you to get a true feeling for the developer's level of expertise. Then guide the developer towards either rewriting or refactoring, to avoid the duplication.

Security

 * The reviewer should have read and understood the security guide and should be aware of the security issues discussed there.
 * There should not be the remotest possibility of arbitrary server-side code execution. This means that there should be no eval or create_function, and no /e modifier on preg_replace.
 * Check for text protocol injection issues (XSS, SQL injection, etc.) Insist on output-side escaping.
 * Check for register_globals issues, especially classic arbitrary inclusion vulnerabilities.
 * Check any write actions for CSRF.
 * Be wary of special entry points which may bypass the security provisions in WebStart.php.
 * Be wary of unnecessary duplication of security-relevant MW core functionality, such as using $_REQUEST instead of $wgRequest, or escaping SQL with addslashes instead of $dbr->addQuotes.

Architecture

 * Names which are in a shared (global) namespace should be prefixed (or otherwise specific to the extension in question) to avoid conflicts between extensions. This includes:
 * Global variables
 * Global functions
 * Class names
 * Message names
 * Table names
 * The aim of modularity is to separate concerns. Modules should not depend on each other in unexpected or complex ways. The interfaces between modules should be as simple as possible without resorting to code duplication.

Logic

 * Point out shortcuts and ask the developer to do a proper job.

Programmers have a model of what the code is meant to do in their heads. As long as you're a fast typist, programming is easiest when you just translate that model into code in the obvious way. But sometimes programmers think that it will be easier to write the code if it is shorter, so they figure out a new model which is almost the same as the old one, except it requires slightly less code. They think that it will do the same thing in all the cases they have in their heads, therefore it is good enough. This is a common source of bugs, since unexpected cases later come to light.

Taking shortcuts is counterproductive, since the amount of time spent figuring out the shortcut and verifying that it works could have been spent just typing out the original idea in full. The exception to this is if the developer is a very slow typist, but most of our developers can actually type pretty well.


 * Read relevant bug reports or documentation.
 * Familiarise yourself with any relevant technical issues. Read relevant specifications or manual sections.
 * Schema changes or changes to high-traffic queries should be reviewed by a DB expert. Currently this means Asher, Domas or Tim.
 * "Hot" code, i.e. code that is run many times in a request, or code that is run on startup, should be reviewed for performance. Suspicious code may need to be benchmarked.
 * Any web-accessible code which is very inefficient (in time, memory, query count, etc.) should be flagged for a fix.

Sign off
If you want to help to review the code, but don't feel comfortable (yet) making the final decision, you can use "+1" and indicate whether you've "verified" and/or "inspected" the code.

Give the green or red light

 * If the revision is good and has passed all tests above, mark it +2. Praise is a good practice: if you are particularly impressed by someone's work, say so in a comment.
 * If the revision is trivial, broken, has no redeeming value and its author isn't available on IRC, mark the commit as "-2"
 * If the revision has issues but also has some utility, or is showing signs of a developer heading in the right direction, mark it -1 and add a comment explaining what is wrong and how to fix it. Never mark something -1 without adding a comment. If someone marks your code -1, this doesn't mean that your code sucks; it means that it's good, but needs improvement.

Annotations
FIXME: this section is obsolete, but hasn't been replaced with a better process (see 38239).

Even if the commit is all good, in some cases it's not enough to mark it as ok; you'll want to add some tags to flag it for some people's attention.

Dealing with problems submissions
You have to weigh up the costs and benefits of each course of action, they are all valid. If you reject the feature completely (-2), then that feature will be lost, and the developer may be discouraged. If you tolerate the fault, the end product will suffer, as described in the goals section above. If you fix it yourself, then you're letting yourself get distracted from the review task, and perhaps you're sending the wrong message to the developer, who may believe that it is acceptable to submit low-quality code and then let someone else worry about the details.

In cases where fixing it makes sense, say because the feature is too useful or the bug fix too important to let it wait, then consider whether it is possible to find someone else to fix it instead, so that your review work can continue uninterrupted.

Giving negative feedback
Here are a few guidelines in the event you need to reject someone's submission or ask them to clean up their work:
 * 1)  Focus your comments on the code and any objectively-observed behavior, not motivations; for example, don't state or imply assumptions about motivating factors like whether the developer was just too lazy or stupid to do things right.
 * 2)  Be empathetic and kind.  Recognize that the developer has probably put a lot of work in their idea, and thank them for their contribution if you feel comfortable and sincere in doing so (and try to muster the comfort and sincerity).  Most importantly, put yourself in their shoes, and say something that indicates you've done so.
 * 3)  Help them schedule their work.  If their idea is a "not yet" kind of idea, try to recommend the best way you know of to get their idea on a backlog (i.e. the backlog most likely to eventually get revisited).
 * 4)  Let them know where they can appeal your decision.  For example, if the contributor doesn't have a history of being disruptive or dense, invite them to discuss the issue on wikitech-l.
 * 5)  Be clear.  Don't sugarcoat things so much that the central message is obscured.
 * 6)  Most importantly, give the feedback quickly.  While tactful is better (and you should learn from past mistakes), you can always apologize for a poorly-delivered comment with a quick followup.  Don't just leave negative feedback to someone else or hope they aren't persistent enough to make their contribution stick.