Manual:Coding conventions

This page describes the coding conventions used within the MediaWiki codebase and extensions which are intended for use on Wikimedia websites, including appropriate naming conventions.

Whitespace etc.
Lines should be indented with a single tab character per indenting level. You should make no assumptions about the number of spaces per tab. Most MediaWiki developers find 4 spaces per tab to be best for readability, but many systems are configured to use 8 spaces per tab.

All text files should be checked in to Subversion with svn:eol-style set to "native". This is necessary to prevent corruption by certain Windows-based text editors.

All text files are encoded with UTF-8. Be sure that your editor supports this.

Do not use MS Notepad to edit files. Notepad inserts unicode byte order marks which stop PHP files from working.

Indenting and alignment
MediaWiki's indenting style is similar to the so-called "One True Brace Style". Braces are placed on the same line as the start of the function, conditional, loop, etc.

Multi-line statements are written with the second and subsequent lines being indented by one extra level:

Use indenting and line breaks to clarify the logical structure of your code. Expressions which nest multiple levels of parentheses or similar structures may begin a new indenting level with each nesting level:

Mid-line vertical alignment should be achieved with spaces. For instance this:

Is achieved as follows with spaces rendered as dots:

$namespaceNames·=·array(  →    NS_MEDIA············=>·'Media',   →    NS_SPECIAL··········=>·'Special',   →    NS_MAIN·············=>·'',

In general, you should avoid using vertical alignment, since it tends to create diffs which are hard to interpret, since the width allowed for the left column constantly has to be increased as more items are added.

Line continuation
Lines should be broken at between 80 and 100 columns. There are some rare exceptions to this. Functions which take lots of parameters are not exceptions.

The operator separating the two lines may be placed on either the following line or the preceding line. An operator placed on the following line is more visible and so is more often used when the author wants to draw attention to it:

An operator placed on the preceding line is less visible, and is used for more common types of continuation such as concatenation and comma:

When continuing "if" statements, a switch to Allman-style braces makes the separation between the condition and the body clear:

Opinions differ on the amount of indentation that should be used for the conditional part. Using an amount of indentation different to that used by the body makes it more clear that the conditional part is not the body, but this is not universally observed.

Continuation of conditionals and very long expressions tend to be ugly whichever way you do them. So it's sometimes best to break them up by means of temporary variables.

Spaces
MediaWiki favours a heavily-spaced style for optimum readability.

Put spaces on either side of binary operators, for example:

NOT

Put spaces next to parentheses on the inside, except where the parentheses are empty. Do not put a space following a function name.

Opinions differ as to whether control structures if, while, for and foreach should be followed by a space; the following two styles are acceptable:

Single-line comments should have a space between the # or // and the comment text.

To help developers fix code with an inadequately spacey style, a tool called stylize.php has been created, which uses PHP's tokenizer extension to add spaces at the relevant places.

Braceless control structures
Single-line if statements are rarely used. They reduce the readability of the code by moving important statements away from the left margin, where the reader is looking for them.

Remember that making code shorter doesn't make it simpler. The goal of coding style is to communicate effectively with humans, not to fit computer-readable text into a small space.

Most MediaWiki developers favour fully-braced control structures:

This avoids a common logic error, which is especially prevalent when the developer is using a text editor which does not have a "smart indenting" feature. The error occurs when a single-line block is later extended to two lines:

Later changed to:

This has the potential to create subtle bugs.

emacs style
In emacs (see also php-mode), you can approximate this style with a custom minor mode in your .emacs file, i.e.

Assignment expressions
Using assignment as an expression is surprising to the reader and looks like an error. Do not write code like this:

Space is cheap, and you're a fast typist, so instead use:

Using assignment in a while clause used to be legitimate, for iteration:

This is unnecessary in new code; instead use:

Ternary operator
The ternary operator can be used profitably if the expressions are very short and obvious:

But if you're considering a multi-line expression with a ternary operator, please consider using an if block instead. Remember, disk space is cheap, code readability is everything, "if" is English and ?: is not.

String literals
For simple string literals, single quotes are slightly faster for PHP to parse than double quotes. Perhaps more importantly, they are easier to type, since you don't have to press shift. For these reasons, single quotes are preferred in cases where they are equivalent to double quotes.

However, do not be afraid of using PHP's double-quoted string interpolation feature:

This has slightly better performance characteristics than the equivalent using the concatenation (dot) operator, and it looks nicer too.

Heredoc-style strings are sometimes useful:

Some authors like to use END as the ending token, which is also the name of a PHP function. This leads to IRC conversations like the following:

<Simetrical>	vim also has ridiculously good syntax highlighting. <TimStarling>	it breaks when you write <<<END in PHP <Simetrical>	TimStarling, but if you write <<<HTML it syntax-highlights as HTML! <TimStarling>	I have to keep changing it to ENDS so it looks like a string again <brion-codereview>	fix the bug in vim then! <TimStarling>	brion-codereview: have you ever edited a vim syntax script file? <brion-codereview>	hehehe <TimStarling>	http://tstarling.com/stuff/php.vim <TimStarling>	that's half of it... <TimStarling>	here's the other half: http://tstarling.com/stuff/php-syntax.vim <TimStarling>	1300 lines of sparsely-commented code in a vim-specific language <TimStarling>	which turns out to depend for its operation on all kinds of subtle inter-pass effects TimStarling: it looks like some franken-basic language.

C borrowings
The PHP language was designed by people who love C and wanted to bring souvenirs from that language into PHP. But PHP has some important differences from C.

In C, constants are implemented as preprocessor macros and are fast. In PHP, they are implemented by doing a runtime hashtable lookup for the constant name, and are slower than just using a string literal. In most places where you would use an enum or enum-like set of macros in C, you can use string literals in PHP.

PHP has three special literals: true</tt>, false</tt> and null</tt>. Homesick C developers write null</tt> as NULL</tt> because they want to believe that it is a macro defined as ((void*)0)</tt>. This is not necessary.

Use elseif</tt> not else if</tt>.

PHP pitfalls

 * Understand and read the documentation for isset</tt> and empty</tt>. Use them only when appropriate.
 * empty</tt> is inverted conversion to boolean with error suppression. Only use it when you really want to suppress errors. Otherwise just use !</tt>. Do not use it to test if an array is empty, unless you simultaneously want to check if the variable is unset.
 * Do not use isset</tt> to test for null</tt>. Using isset</tt> in this situation could introduce errors by hiding mis-spelled variable names.  Instead, use $var === null</tt>
 * The same advice applies to array keys. Instead of <tt>isset</tt> or <tt>empty<tt>, consider using <tt>array_key_exists</tt> to verify that a key exists.
 * Study the rules for conversion to boolean. Be careful when converting strings to boolean.
 * Be careful with double-equals comparison operators. Triple-equals is often more intuitive.
 * <tt>'foo' == 0</tt> is true
 * <tt>'000' == '0'</tt> is true
 * <tt>'000' === '0'</tt> is false
 * Array plus does not renumber the keys of numerically-indexed arrays, so <tt>array('a') + array('b') == array('a')</tt>
 * Make sure you have <tt>error_reporting</tt> set to <tt>E_ALL | E_STRICT</tt> for PHP 5. This will notify you of undefined variables and other subtle gotchas that stock PHP will ignore.

Classes
As a holdover from PHP 4.x's lack of private class members and methods, older code will be marked with comments such as <tt>/** @private */</tt> to indicate the intention; respect this as if it were enforced by the compiler.

Use proper visibility in new code, including <tt>public</tt> if the function could be confused with old code, but do not add visibility to existing code without first checking, testing and refactoring as required, because the above rule has been broken in several places.

Files
Files which contain include code should be named in <tt>UpperCamelCase</tt>. Name the file after the most important class it contains; most files will contain only one class, or a base class and a number of descendants. For instance, Title.php contains only the <tt>Title</tt> class; HTMLForm.php contains the base class <tt>HTMLForm</tt>, but also the related class <tt>HTMLFormField</tt> and its descendants.

Name other files, such as JavaScript, CSS, images and SQL, in <tt>lowercase</tt>. Maintenance scripts are generally in <tt>lowerCamelCase</tt>, although this varies somewhat. Files intended for the end user, such as readmes, licenses and changelogs, are usually in <tt>UPPERCASE</tt>.

Never include spaces in filenames or directories, or use non-ASCII characters. For lowercase titles, hyphens are preferred to underscores.

Code elements
Use lowerCamelCase when naming functions or variables. For example:  Use UpperCamelCase when naming classes: <tt>class ImportantClass</tt>. Use uppercase with underscores for global and class constants: <tt>DB_WRITE</tt>, <tt>Revision::REV_DELETED_TEXT</tt>. Other variables are usually lowercase or lowerCamelCase; avoid using underscores in variable names.

There are also some prefixes used in different places:

Functions

 * <tt>wf</tt> (wiki functions) - Top-level functions, e.g.

Verb phrases are preferred: use <tt>getReturnText</tt> instead of <tt>returnText</tt>.

Variables

 * <tt>wg</tt> - global variables, e.g. <tt>$wgVersion</tt>, <tt>$wgTitle</tt>. Always use this for new globals, so that it's easy to spot missing "global $wgFoo" declarations.
 * <tt>m</tt> - object member variables: <tt>$this->mPage</tt>. This is discouraged in new code, but try to stay consistent within a class.

Extension Functions and Variables

 * <tt>ef</tt> - extension functions: top-level functions added by user extensions
 * <tt>eg</tt> - extension globals.

You should include the Extension name as a namespace delimiter, whether or not you use the <tt>eg</tt> prefix: <tt>$wgAbuseFilterConditionLimit</tt>, <tt>$egCentralNoticeTables</tt>.

HTTP and session stuff
The following may be seen in old code but are discouraged in new code:


 * <tt>ws</tt> - Session variables, e.g. <tt>$_SESSION['wsSessionName']</tt>
 * <tt>wc</tt> - Cookie variables, e.g. <tt>$_COOKIE['wcCookieName']</tt>
 * <tt>wp</tt> - Post variables (submitted via form fields), e.g. <tt>$wgRequest->getText( 'wpLoginName' )</tt>

Database

 * Table names should be singular nouns: <tt>user</tt>, <tt>page</tt>, <tt>revision</tt>, etc. There are some historical exceptions: <tt>pagelinks</tt>, <tt>categorylinks</tt>...
 * Column names are given a prefix derived from the table name: the name itself if it's short, or an abbreviation:
 * <tt>page</tt> &rarr; <tt>page_id</tt>, <tt>page_namespace</tt>, <tt>page_title</tt>...
 * <tt>categorylinks</tt> &rarr; <tt>cl_from</tt>, <tt>cl_namespace</tt>...

Common local variables
It is common to work with an instance of the <tt>Database</tt> class; we have a naming convention for these which helps keep track of the nature of the server to which we are connected. This is of particular importance in replicated environments, such as Wikimedia and other large wikis.


 * <tt>$dbw</tt> - a Database object for writing (a master connection)
 * <tt>$dbr</tt> - a Database object for non-concurrency-sensitive reading (may be a read-only slave, slightly behind master state)

Inline documentation

 * The Doxygen documentation style is used (it is very similar to PHPDoc for the subset that we use). For example: giving a description of a function or method, the parameters it takes (using <tt>@param</tt>), and what the function returns (using <tt>@return</tt>), or the <tt>@ingroup</tt> or <tt>@author</tt> tags. Please use "@" rather than "\" as the escape character (e.g. use <tt>@param<tt> rather than <tt>\param</tt>) - both styles work in Doxygen, but the <tt>@param<tt> style works with PHPDoc too, whereas the <tt>\param</tt> style does not.


 * General format for parameters is such: <tt>@param $varname [type] [description]</tt> so make sure you don't put <tt>[type]</tt> before <tt>$varname</tt>.

Messages

 * When creating a new message, use hyphens (-) where possible. So for example, "some-new-message" is a good name, while "someNewMessage" and "some_new_message" are not.
 * If the message is going to be used as a label which can have a colon after it, don't hardcode the colon; instead, put the colon inside the message text. Some languages (such as French) need to handle colons in a different way, which is impossible if the colon is hardcoded.
 * HTML class and ID names should be prefixed with "mw-". It seems most common to hyphenate them after that, like "mw-some-new-class" instead of "mw-somenewclass" or "mw-some_new_class", but there doesn't appear to be a clear convention at present.

To do

 * Naming
 * Function parameter choice