How to become a MediaWiki hacker/2011 Workshop

A workshop to teach developers how to hack MediaWiki.

Allot three to four hours, including a 20-min biobreak in the middle. Works all right with 3-4 teachers and about 20 learners.

Originally taught to about 24 participants on Aug 2, 2011, Hecht House, Haifa, Israel (part of Developers' Days).

Overview
In this workshop, we will go over the basic coding toolchain and our code intake/review/merge/deploy/release workflow. We will cover the topics in [[How to become a MediaWiki
 * Format: unified diff.

We review it: Then experienced developers review your code and either tell you how to improve it, or accept it into our code repository. You can see recent updates to our code repository at Special:Code/MediaWiki. Then there is an additional, more thorough code review step; we run automated tests to see if there's a problem, and someone visually inspects your change, and marks it as "ok" or tells you how to fix it. (We mark code FIXME if the change can't be deployed to production sites because there's something wrong with it, and then you have to fix it.)

We deploy and release it: Every few months, we take all the new code that's been marked "ok" and deploy it onto Wikimedia servers. Then we fix any bugs that arise and release a new version of MediaWiki (the "tarball").

Once you are a little more experienced, you can start running automated tests on your own work before you commit it, to find problems before you submit the patch.

And eventually, you should request commit access -- it makes sense to do this when you've submitted enough (good) patches (through Bugzilla) that it's more convenient for everyone to give you access. This might take a few weeks of dedicated effort, or months or years if you don't have time to submit patches very often.


 * Q: Is there a place in Git where you can download a good working copy (without fixmes etc.)?
 * A: No. But we do have Git branches for stable releases (eg. /mediawiki/branches/REL1_18) and we have the tarball releases as downloadables.


 * Recommendation: Check out the latest branch that is being prepared for a release.
 * Download from Git
 * Tip: If you submit patches, they should be made based on trunk to avoid merge conflicts.
 * Do not submit patches based on a branch or a stable tarball download!
 * Also note that if there are serious FIXMEs they will not be in trunk, because we will revert them. Trunk must always be stable enough to run a small wiki on.  For example, https://translatewiki.net also runs on the live trunk!

Our goal is to get trunk into an always-deployable state by getting closer and closer to Continuous integration, getting tests to automatically run, and rejecting changes that break tests.


 * Time for a 20-30 min break.

User preferences
The easiest way to play around with MediaWiki is to change user preferences. And if you're running your own MediaWiki site, you can change/customize default values, like the default skin, etc. Examples: enhanced recent changes, gadgets!

Gadgets are a nice way for new developers (who know JavaScript) to start customizing their wiki experience, and eventually get involved in MediaWiki development. There's a detailed intro at Gadget kitchen and a detailed training document and video at Gadget kitchen/Training. And you can use jQuery (as of MediaWiki 1.17).


 * Q: How is the JavaScript from Gadgets executed?
 * A: The gadget extension is loaded on every page and checks if the user has the preference for this page is enabled. If that is the case, it loads the javascript as a &lt;script> page.

The gadgets are only editable/publishable by wiki administrators and are stored as wiki pages on MediaWiki:Gadget-{gadgetname}.js. Inside the script a gadget maker can do things like. This way, the script will only be loaded on that page name.


 * To "create" a new Gadget, the wiki administrator adds a list item on MediaWiki:Gadgets-definition. More about this is in Gadget kitchen.
 * Non-ResourceLoader gadgets all run in the global scope. This means that you should wrap your script in a closure to avoid naming conflicts and scope leakage.  This will be fixed in MediaWiki 1.19 (in 1.19+ gadgets are always loaded through ResourceLoader, which means each module executes in a local (new) scope by default).

Configuration options
There are also things you can do to configure MediaWiki, as the site's administrator, to customize how it runs. See Manual:Configuration settings.

Skins
"Skins" are what we call themes. The best-designed standard skin currently is "Vector" (now default, and you can modify it or write your own. For more information, see
 * the "extensions and skins" part of the architecture manual
 * User:Dantman/Skinning system
 * Manual:Skinning/Tutorial

Extensions
Extensions let you customize how MediaWiki looks and works. There are a bunch of kinds of extensions.


 * More documentation is at Manual:Extensions and Manual:Developing extensions. Also see these beginners' resources, especially this intro video.

In the Subversion tree, there is an  directory. This is where extensions are. If you write one and commit it to the extensions directory, it will be translated by the translatewiki.net translators. Most of these extensions aren't suitable for the Wikimedia sites, either because they're specialized, unneeded (e.g. proofreading tools on Wikisource) or unsecure for sites like Wikipedia (for example a "Who is online" extension is more of a social extension, not for an encyclopedia). On any MediaWiki wiki, you can see the list of installed extensions by looking at your wiki's Special:Version page.

You can consider an extension fairly safe & stable if it's used on a Wikimedia production wiki (indicated by a box at the bottom of the extension's page -- example).


 * Please look at the example extensions to understand how extensions should work in practice. In Git:  . Web-browsable version at https://phabricator.wikimedia.org/diffusion/EXAM/

You may need to learn Parser functions to do some of the things you want with extensions.

Special pages
It is fairly easy to write a small, simple "Special page". See Manual:Special pages for more details, including a template you can work from. A special page is just a PHP file - you can output whatever HTML you like to do what you want to do with it. There is example code in the "examples" Git subdirectory.

Magic words
Manual:Magic words

A magic word lets you add your own wikitext tags to extend wikitext. Extensions can introduce both magic words and tags.

There are three kinds of magic words: Parser functions, variables and boolean triggers.

Look like
 * Parser functions

There are core parser functions included in core MediaWiki -- see Manual:Parser functions. Extensions can introduce their own parser functions, with the most popular one being Extension:ParserFunctions.

etc.
 * Variables


 * Behaviour switches (like  would place the table of contents in that position, extension can add their own behaviour triggers (like  ).

Parser tags
Individual projects will often find it useful to extend the built-in wiki markup with additional capabilities, whether simple string processing, or full-blown information retrieval. Tag Extensions allow users to create new custom tags that do just that. For more, see Manual:Tag extensions

New tags can be introduced by an extension. For example, 

could be a parser tag created by an extension to show recent posts of a blog inside a wiki page).

Tags cannot be nested, because the contents are handled by the tag function and not by the wikitext parser.

Hooks
A hook is something that calls a function whenever a specific event happens, and it can do things like notifying the IRC channel when a specific page is changed. More info: Manual:Hooks.

MediaWiki core overview
If none of the approaches above do what you want, you may have to actually alter MediaWiki core, which lives in.

Everything comes in through index.php which dispatches to  class, determines your action parameter (?), with logic handled in the   class, & that dispatches various aspects.

Check out the  class, which affects all special pages. You can thus modify Preferences, Contributions, Version, etc. Easy place to jump into code, because it's self-contained.

Here is the structure of the database schema.


 * This is a good time to dive into hands-on bug triage or testing for people who don't want to contribute just yet, and Annoying little bugs work for people who want to dive in and code. One way to structure the workshop: this hands-on work would last for the rest of the workshop, and if one of the developers leading it felt the need to burst into a few minutes of lecturing (because a few people were having the same problem), that would be fine.

During the coding/workshop, if anyone is ready to actually use trunk (to install or to suggest a patch), be ready to explain the directory structure, e.g., what's ?
 * “ ” is Magnus Manske’s fault (it actually is the “third rewrite”) and historically called so. For more history on this see Manual:MediaWiki architecture and MediaWiki history(This link is redirecting to the South Sudan page, and should be set to the actual MediaWiki history page).

Important Git directories:
 * https://phabricator.wikimedia.org/diffusion/MW/ MediaWiki core
 * https://phabricator.wikimedia.org/diffusion/ MediaWiki extensions

File structure overview of MediaWiki core ( https://phabricator.wikimedia.org/diffusion/MW/ ):

This is a small file that does some initialization and then goes off to WebStart. All article views, edits, actions, special pages -- all of it goes through this file.
 * api.php
 * /skins/
 * /includes: the "php includes directory"
 * /includes: the "php includes directory"
 * /includes: the "php includes directory"

Contains default settings, global functions, and all classes (such as the Special pages, Wiki page actions (read, edit, etc.), ResourceLoader, API modules and lots more.

Example case: a reading list for a user to track what they've read or not in a given category and to make an educational game which asks questions about the pages that have been read.
 * needs a hook to do something in the DB when a user reads something
 * needs a special page to show the user what they've read

You could do this as a gadget, but you wouldn't have access to the database, so it would be simpler to do it as an extension written in PHP.

Permissions and vandalism
In MediaWiki everybody with the 'edit' right can make an edit. Users can use the 'undo' button from history page to undo any previous edit.

There is little to no vandalism detection in core, but there are many good extensions to help with this. For example Wikipedia uses CheckUser, Nuke, AbuseFilter, AntiBot, AntiSpoof, ConfirmEdit, GlobalBlocking, SimpleAntiSpam, SpamBlacklist, Title Blacklist, TorBlock, etc.


 * Sidetrack: ideas for things to work on:
 * * rollback state in database for edits
 * * Perhaps a new FlaggedRevisions state? (i.e. a "added vandalism" and "repairing" status)

Bots
Bots are not in the MediaWiki source tree: they're programs that have (for example) Wikipedia accounts and use the action API to (for example) edit the text of pages or look at the recent changes. They are often written in Python (using the pywikibot framework) or in any programming language that can do HTTP requests (e.g., in PHP with cURL to the API).

See Pywikipediabot/Basic use and API; there is a detailed tutorial at API/Tutorial.

Plugins to other systems
If you use the MediaWiki API you can create plugins that integrate Wikipedia or other MediaWiki installations into other apps or tools.


 * Now the teacher has a choice -- code tour or individual hacking.

Start hacking!

 * Installing MediaWiki
 * Annoying little bugs
 * Developer hub
 * Manual:Code

Code Tour
Now the teacher goes over the general architecture of MediaWiki on a code level, covering important tables and classes.


 * ''Note to teacher: you may want to show the students the execution workflow of a web request within MediaWiki.

There is a lot of stuff in this diagram of MediaWiki's database tables! But you don't need to know most of it. We'll go over the important tables.

Tables
The Page, Revision, and User tables are the most important ones.


 * 'page' table

Page table is the centerpiece of lots of interactions.
 * every page has a numerical ID
 * exposed in the interface sometimes.
 * autoincrement. Every page as it is created gets one.
 * Counterintuitively, the namespace of a page is a number, not a textfield. (Surprising.  You'd expect the text to be somewhere.)
 * Text is a configuration thing.... pages in the user namespace have a "2" in that field (includes/Defines.php).
 * A separate file says that 2 means user. This is for localisation.
 * And so you can rename namespaces. If you add a new namespace you have to give a number...
 * it should be over 15 for core.
 * For extensions adding namespaces: use something higher than 100
 * 0-99 are "reserved" as core namespaces.
 * Different extensions reserve diff namespaces. Semantic, for example, uses 100-199.
 * Negative namespaces are special, don't try to use them)
 * Wikimedia-specific namespaces are not in the main code
 * see $wgExtraNamespaces — you can create your own.
 * Many languages have "Portal", but not all, and it's not in the core.
 * Example: on Lithuanian Wikipedia, for articles that are lists, they put those articles in a different namespaces.
 * How high can you go? 32 bits.  So 2 or 4 billion namespaces are possible for a single wiki.
 * Every namespace should have an associated talk namespace.
 * Subjects are even numbers and Talk namespaces are odd. If you break this convention, MediaWiki breaks!
 * title itself, stored without a namespace prefix.
 * Example: for User:Sumanah the database stores: namespace=2, title=Sumanah
 * Is this page a redirect?
 * is it new?
 * Relates to: Revision table! specifically, this relates to a row in that table.
 * Is this page a redirect?
 * is it new?
 * Relates to: Revision table! specifically, this relates to a row in that table.
 * Is this page a redirect?
 * is it new?
 * Relates to: Revision table! specifically, this relates to a row in that table.
 * Relates to: Revision table! specifically, this relates to a row in that table.
 * Relates to: Revision table! specifically, this relates to a row in that table.
 * Relates to: Revision table! specifically, this relates to a row in that table.
 * Relates to: Revision table! specifically, this relates to a row in that table.


 * 'revision' table

Revision table records metadata for every revision, including creation.


 * a unique ID for a revision
 * sometimes visible in the interface as “oldid”
 * see the text the text table for the text of the edit
 * ID, blob of text, & some flags. In theory, just text.  In practice, gzipped (then the gzip flag would be flagged), or stored externally, etc.
 * edit summary
 * user who made the edit
 * timestamp
 * edit summary
 * user who made the edit
 * timestamp
 * user who made the edit
 * timestamp
 * timestamp
 * timestamp


 * 'user' table


 * autoincrement ID
 * user's name
 * user's name
 * user's name

The rest of the stuff in a user record is all sorts of random metadata about a user.


 * ‘recentchanges'

is a separate summary table (although the data can be inferred from other tables) for performance reasons.

Classes
We'll show you the two most important classes,  and. For those and all the others, the best reference is https://doc.wikimedia.org/ which is always up-to-date.
 * Doxygen documentation, similar to PHPdoc, but can be used in many languages.
 * Autogenerated documentation, comments start with /** instead of just /*


 * About a wiki page (Title class, WikiPage class)

Every page is uniquely identified in two ways:
 * "page id", stored in

or:
 * "namespace & title" pair, that together are always unique (there can be two pages with the same title (Project:Foo, User:Foo, Category:Foo), but there can be only one with the same namespace + title pair.
 * (A page like "Brion Vibber" could be a User page, Article, Category, or anything; the page title itself isn't enough to be unique within a wiki. You need the namespace to know exactly what the page is.)


 * Title class

Inside MediaWiki, pages are identified through instances of the "Title" class, not by passing s or namespace/title combos. So if an extension hooks into core functionality it will get the instance of the  class for the current page (stored in the global  ) and can use methods like   to see if a page is a redirect.

To create an instance of  use any of the constructor methods such as   or   etc.

The  class has several static utility functions to offer. Most used is. Always use safe methods when your input is coming from users, or if you aren't sure that it's sanitized. It may be ok to use non-safe methods when your input is coming directly from the database, through.

A  may return. This indicates that the title is invalid. Always check for s!


 * Q: We see getters, but no setters. How do we set methods?
 * A: That's not possible, title objects would otherwise be mutable/not be reliable. Besides, they represent a title, not a modifiable page.

On that subject: to actually touch the database and modify an actual wiki page, we use the  class.


 * WikiPage class

On a high level: $myTitle = Title::newFrom***( ... ); $myArticle = new WikiPage( $myTitle ); $myArticle->doEdit( ... );

Wrap-up
Time for Q&A!


 * Q: There is an extension that calculates statistics -- how does it work?
 * A: The view-counters are actually not an extension but part of MediaWiki core; the functionality uses  and there's a hit counter table.  Whenever a user hits a page, it adds a row.  That row is a temporary buffer and gets dumped to the hit counter table periodically.
 * Since large sites may not want to increment a database table field, it is possible to disable this feature (See $wgDisableCounters) (Wikipedia has disabled it).

We're happy to continue mentoring you in person or online (chat, mailing lists). Everyone on the core team once started out as a beginner!

What participants want/like about this workshop
They find it informative, even if they already know some of the material. They find the database overview and code architecture behind Wikipedia interesting:


 * what makes it run
 * process of being able to interact with the code
 * bugtracking system
 * how to check out code
 * edit it
 * check it back in
 * what happens after that