User:Aaron Schulz/programming pitfalls

From mediawiki.org

Here is a list of common causes of errors and problems. Some things are PHP specific.

General tips[edit]

  • Please do simple case testing. If you add items to a page list, spend the 2 minutes to play around with it. Does it page correctly? Do the other filters work? It is easy for even fairly small changes to have annoying bugs.
  • Always double check and diff before committing anything
  • Ask someone to look over your code if possible
  • Try to avoid large code duplication. This is an easy source of errors and maintainability issues
  • Avoid long functions; encapsulate!
  • Make nice OOP code (use data abstraction)
  • Don't forget to test the obvious cases too!
  • Use specific and accurate names
  • Document how things work

Errors[edit]

  • Failure to initialize variables due to assumptions that they would be set in some non-trivial while/if/else set of blocks.
  • Typos, especially when renaming variables. Always use search & replace tools for this, one by one.
  • Being lazy with weak ("==") comparisons. Various MediaWiki errors have been made this way. Note that in PHP:
    • var_dump( '0' == false) => bool(true)
    • var_dump( 'Hello' == 0) => bool(true) yet var_dump( 'Hello' == false) => bool(false)
  • Laziness with mixed function return types, so checks may fail.
    • Say a function returns both false and null on failure possible. One might accidentally code "if( !is_null($return) )" when the check needs to catch "false" too.
  • Coding extensions using hooks and making erroneous assumptions about the timing and consistency of hooks getting called. This is a messy area and it can be hard to tell what hooks are called when exactly. One way to avoid this if possible is to cache the results and query if the cache is not empty, rather than *assume* one hooked function gets called twice, and setting the stuff there only and referencing possibly undefined variables in latter hooked functions
  • DB write queries that are not wrapped in a nice transaction.
    • Say we want to rename a title, and if it is an image, rename the file. It is best to start with $dbw->begin(), make the title DB changes, then rename the image. If the image rename fails, call $dbw->rollback(). In this case the image move was done last, since it is hard to rollback.
  • Not using successors for lazy-initialized objects. Just saying $obj->mSourceText can cause errors if that field is not set on construction. Using a nice accessor should avoid this, as it should load the needed data if not already present.
  • Failure to properly escape URL parameters
  • Using DB slaves for data that really needs to be curent (such for edits)
  • Using + for arrays does not do what is expected most likely
  • array_merge() is bad for arrays where the keys are numeric
  • Assuming certain variables always are set (like globals). empty() checks may be needed.

Slowness[edit]

  • Initializing everything all the time
    • Say "$a = new Object( $par )" does a DB query on __construct(), and we want to override $a to use a different parameter. This requires extra DB queries on load.
  • Parsing and/or fetching things twice for no reason. Hooks that try to deal with edits and cache are vulnerable to this. It may be useful to pass and object through the hook that caches the data. For example, $article objects have getText() and prepareTextForEdit() that cache values.
  • Extensions that initialize all messages unconditionally on startup. It is best to delay it until something that needs the messages is about to trigger.
  • Be careful with very dynamic queries constructed by a host of user params, it is easy to end up with combinations that do filesorts and scan hundreds of rows.
  • EXPLAIN SELECT your queries! Mistakes (like joining `revision` on `page` and forgetting to say 'page_id = rev_page') can create massive Cartesian joins and filesorts that bring down the site. You may have made your DB tables nicely, but if you forget to overspecify things to trigger index use, it may still go hella slow. Also, MySQL doesn't yet have a near perfect query planner, so some 'obvious' indexes may not be used.
  • INDEX clutter. Say you want to mark a `recentchanges` row corresponding to rev_id = Y with rc_patrolled = x. Instead of playing around with an index on rc_patrolled, we can overspecify the row. That is say "SET rc_patrolled = x ... WHERE rc_timestamp = Z AND rc_user_text = 'K'". This allows use of the "user_timestamp" index, which already exists.
  • Failure to cache variables. If we have $wgObject and we keep calling $wgObject->getUsageList(), make sure that it caches that list if possible.
  • Use of expensive objects (like Titles) in huge arrays. If all you need to do is make a list of titles, perhaps use prefixed DB keys or namespace (int) -> title (string) pars. Otherwise, you may hit memory limits and PHP will fail.