User:Zakgreant/pairdoc/2010-11-08

Participants: Mark A. Hershberger and Zak Greant

Summary

 * ~25 minutes general discussion on Mark's experiences getting up to speed with the MediaWiki codebase and how the docs relate to that
 * ~35 minutes of walking through the Code review guide

General Discussion

 * Cited Tim Starling's documents (like Security for developers) as being particularly useful. In particular, Mark liked the concrete examples that Tim puts in the docs he writes.
 * Talked about Mark's work on Linker.php, which led to an interesting discussion on how important it is to capture what devs have learned when it's fresh in their minds.
 * Likes list approach taken in the edited version of Security for developers

Review of Code review guide

 * We walked through the Code review guide, reading back and forth, with Mark commenting as he felt that he had things to add to the document. Mark's suggestions are being rolled into the document now – anything that needs to be rolled in or that needs more research is noted below.

Review responses

 * Good revisions should be marked "ok". If you are particularly impressed by someone's work, say so in a comment. (Emphasize)
 * Trivial revisions which are broken and have no redeeming value should be reverted on the spot. Do not write "please revert this immediately" in a comment, such requests are almost never complied with, and probably seem arrogant.
 * Revisions which have issues but also have some utility, or are showing signs of a developer heading in the right direction, can be given a comment and a "fixme" status. (If someone marks your code fixme, this doesn't mean that your code sucks - it means that it's good, but needs improvement.)
 * Revisions which are bad but have been fixed in a subsequent commit can be marked "ok" or "resolved". (ok code often won't be looked at - be very careful marking things as ok.)
 * Revisions which have been reverted should be marked "reverted".
 * Reversions themselves are always "ok".
 * The following kinds of changes should be marked "deferred":
 * i18n updates. We don't mark them "ok" because we don't review the diffs.
 * Changes to extensions or other software which does not run on Wikimedia, with the possible exception of SMW.
 * Changes to private or development branches.
 * Very old changes made by a reviewer, where there is no hope that someone else will review their changes.
 * A backport to a release branch can be marked "ok" if it has suitable release notes, if the case for backporting was sufficiently strong, and if the trunk equivalent was marked "ok".
 * In most cases, a backport to a deployment branch can be marked "ok", on the basis that people who have access to the deployment branches know what they are doing. It can be "ok" even if the trunk equivalent is marked "fixme", since temporary hacks are acceptable in deployment branches.
 * A revision which should be backported to a branch should be tagged with the branch name, for instance "1.17" or "1.16wmf4". Once the backport is done, the tag should be removed.
 * A revision which was committed to a branch and needs to be ported to trunk should be tagged with "trunk".
 * Schema changes should be tagged "schema".
 * Changes which may cause problems when they are deployed to Wikimedia should be tagged "scaptrap". This includes:
 * Most schema changes.
 * Changes which require a simultaneous update to the configuration files.
 * Changes which put significant pressure on server capacity, such as increases in request rate, outgoing bandwidth or disk space.
 * Changes to extensions which rely on changes to core MediaWiki that have not been reviewed and merged

The following change resume the action to take given a revision.

Code review response cheatsheet

(scaptrap - what's the status of this tag? need to link to tag defs in CR. mah doesn't see it. why two different ways to tag schema changes?)

(look at http://wikitech.wikimedia.org/view/Scap )

link tags to http://www.mediawiki.org/wiki/Special:Code/MediaWiki/status/

Dealing with non-compliance
It's quite common for a review comment with fixme status to go unanswered by the relevant developer. This occurs with both volunteers and staff.

Some amount of nagging is probably fair and useful, and if the developer is an employee, you have the option of getting their supervisor to help with that. But with some people, at some point it will come down to a choice of three options:


 * 1) Revert
 * 2) Tolerate the fault
 * 3) Fix it yourself

You have to weigh up the costs and benefits of each course of action, they are all valid. If you revert, then that feature will be lost, and the developer may be discouraged. If you tolerate the fault, the end product will suffer, as described in the goals section above. If you fix it yourself, then you're letting yourself get distracted from the review task, you're letting the non-compliant developer dictate your priorities, and perhaps you're sending the wrong message to the developer, who may believe that it is acceptable to commit low-quality code and then wait for someone else to fix it.

In cases where fixing it makes sense, say because the feature is too useful or the bug fix too important to allow reversion, then consider whether it is possible to find someone else to fix it instead, so that your review work can continue uninterrupted.

How to review a change
(increase priority/importance)

Style

 * Ensure the style guide is followed, such as whitespace, line length, brace placement, etc. This is easy to check. It is essential for core contributions. For extension contributions, style deviations can be tolerated initially.

Readability
(compare with Security for developers, Manual:Coding conventions )


 * Functions should do what their names say. Choosing the correct verb is essential, a get* function should get something, a set* function should set something.
 * Variables names should use English, abbreviations should be avoided where possible.
 * Doc comments on functions are preferred.
 * Overly-complicated code should be simplified. This usually means making it longer. For instance:
 * Ternary operators (?:) may need to be replaced with if/else.
 * Long expressions may need to be broken up into several statements.
 * Clever use of operator precedence, shortcut evaluation, assignment expressions, etc. may need to be rewritten.
 * Duplication should be avoided.
 * Duplication, whether within files or across files, is bad for readability since it's necessary to play "spot the difference" to work out what has changed. Reading many near-copies of some code necessarily takes longer than reading a single abstraction.
 * It's also bad for maintainability, since if a bug (or missing feature) is found in the copied code, all instances of it have to be fixed.
 * It's common for new developers to copy large sections of code from other extensions or from the core, and to change some minor details in it. If a developer seems to be writing code which is "too good" for their level of experience, try grepping the code base for some code fragments, to identify the source. Comparing the original and the copy allows you to get a true feeling for the developer's level of expertise. Then guide the developer towards either rewriting or refactoring, to avoid the duplication.

Security

 * The reviewer should have read and understood the security guide and should be aware of the security issues discussed there.
 * There should not be the remotest possibility of arbitrary server-side code execution. This means that there should be no eval or create_function, and no /e modifier on preg_replace.
 * Check for text protocol injection issues (XSS, SQL injection, etc.) Insist on output-side escaping.
 * Check for register_globals issues, especially classic arbitrary inclusion vulnerabilities.
 * Check any write actions for CSRF.
 * Be wary of special entry points which may bypass the security provisions in WebStart.php.
 * Be wary of unnecessary duplication of security-relevant MW core functionality, such as using $_REQUEST instead of $wgRequest, or escaping SQL with addslashes instead of $dbr->addQuotes.

Architecture
(global vars - wg, global functions wf - list of prefixes - compare to docs on globals) (examples
 * Names which are in a shared (global) namespace should be prefixed (or otherwise specific to the extension in question) to avoid conflicts between extensions. This includes:
 * Global variables
 * Global functions
 * Class names
 * Message names
 * Table names
 * The aim of modularity is to separate concerns. Modules should not depend on each other in unexpected or complex ways. The interfaces between modules should be as simple as possible without resorting to code duplication.

Logic

 * Point out shortcuts and ask the developer to do a proper job.

Programmers have a model of what the code is meant to do in their heads. As long as you're a fast typist, programming is easiest when you just translate that model into code in the obvious way. But sometimes programmers think that it will be easier to write the code if it is shorter, so they figure out a new model which is almost the same as the old one, except it requires slightly less code. They think that it will do the same thing in all the cases they have in their heads, therefore it is good enough. This is a common source of bugs, since unexpected cases later come to light.

Taking shortcuts is counterproductive, since the amount of time spent figuring out the shortcut and verifying that it works could have been spent just typing out the original idea in full. The exception to this is if the developer is a very slow typist, but most of our developers can actually type pretty well.

(What is a shortcut - provide an example.) (Link up user names.)


 * Read relevant bug reports or documentation.
 * Familiarise yourself with any relevant technical issues. Read relevant specifications or manual sections.
 * Schema changes or changes to high-traffic queries should be reviewed by a DB expert. Currently this means Domas or Tim.
 * "Hot" code, i.e. code that is run many times in a request, or code that is run on startup, should be reviewed for performance. Suspicious code may need to be benchmarked.
 * Any web-accessible code which is very inefficient (in time, memory, query count, etc.) should be flagged for a fix. (How do devs identify inefficient code? Domas' profiler post?)