How to become a MediaWiki hacker/2011 Workshop

A workshop to teach developers how to hack MediaWiki.

What to have prepared ahead of time?

 * Have the LAMP stack installed: Linux, Apache, MySQL (or SQLite) and PHP
 * If you are using Windows, you may want to install an Ubuntu Linux virtual machine by downloading the latest Ubuntu ISO file and VirtualBox
 * Have MediaWiki installed -- download 1.17.0 tarball, use installer
 * if adventurous, try downloading & installing from trunk, but don't worry
 * Install an IRC client such as xchat or ChatZilla
 * Have an account on http://bugzilla.wikimedia.org
 * Install an Subversion (SVN) client

And if you run into trouble, hit IRC or wikitech-l mailing lists.

Workshop time
The Workshop will take place on Tuesday, August 2 at 10:00 in Beit Hecht, Hanassi Ave 138.

Process
coding toolchain & code intake/review/merge/deploy/release workflow. Go through "HOWTO Become A MediaWiki Hacker".

Ask workshop

 * what's your reason/interest/focus?
 * Then point out things of interest in MediaWiki or Wikimedia

What You Can Do
Explain how one might change the desired behavior in MediaWiki in some scenario(es). Start with easy & work way up to more time-consuming/pioneering work.
 * User preferences
 * Config options
 * Skins
 * Extensions (refer to the example extensions, which are up-to-date)
 * Gadgets (JavaScript-based site extensions, requiring the Gadget extension to be installed. Especially good for eye candy.)
 * Special pages
 * Parser hooks
 * Hooks in general
 * Parser functions & parser tags
 * Modifying MediaWiki core

Shallowest Possible Overview of the Application
Everything comes in through index.php which dispatches to MediaWiki class, determines your action parameter (?), logic handled in article class, & that dispatches various aspects.

Specialpage class -- all special pages. Preferences, contributions, version, etc. Easy place to jump into code. self-contained. Easy place to jump into.

there is a nice pic of the structure of the DB schema: http://www.mediawiki.org/wiki/File:MediaWiki_database_schema_1-17_%28r82044%29.png (go to Wikimedia Commons, link to "recent" .jpg)

Then: workshop! Bug triage or testing for people who don't want to contribute just yet, and Annoying Little Bug work for people who want to dive in and code. This would last for the rest of the 90 minutes, and if one of the developers leading it felt the need to burst into a few minutes of lecturing (because a few people were having the same problem), that would be fine.

Annoying little bugs list: http://www.mediawiki.org/wiki/Annoying_Little_Bug Roan to prescreen a few?

During the coding/workshop, if anyone is ready to actually use trunk (to install or to suggest a patch), be ready to explain the directory structure, e.g., what's phase3?

= Workshop notes =

Aug 2, 2011, Hecht House, Haifa, Israel (part of Developers' Days prior to Wikimania). About 24 participants.

Contact of experts that were present in this session:
 * Roan Kattouw 
 * Timo Tijhof 
 * Andrew Garrett

10:10 sumana introducing the agenda 10:10 asking about who doesn't have MediaWiki installed on their laptop

10:11 Roan going to go over coding chain, code review, etc.

10:12 asking for interest of people for being here

10:15 Sumana talking about bugs https://bugzilla.wikimedia.org/buglist.cgi?keywords=easy http://www.mediawiki.org/wiki/Annoying_Little_Bug

10:18 It's a good idea for beginners to register for an account on bugzilla help & discussion about mediawiki development: #mediawiki on freenode

10:19 * David: fix easy and simple bugs 10:20 * Waldir: watchlist RSS 10:20 * MingLi : improving the API 10:20 Nikos: skins & templates; semantic extension Innocenti Maresin: just lookin' 10:21 <?> overview <?> spectator as well 10:22 Finne: did dev years ago, mostly working with MySQL for school purposes Jeremyb: low hanging fruits; knows a bit about MW, hasn't hacked on it yet Kalan: easy bugs, usability issues Alolita: learn about the API 10:23 Lea: parser (geographical coordinates from wikitext) Hoger: drupal and civiCRM dev Oren: interested in everything, but specifically search, parser, UI improvements for Wiktionary 10:24 Dror: everything; already developing a bit Eran: visual editor 10:25 <?> inaudible Alan: programmer IRL, WP editor, trying to bring them together <?> just listenin' 10:26 Amir: promote open source

10:30 

10:37 chatter not finished yet

10:40 Roan talking about http://www.mediawiki.org/wiki/Annoying_little_bugs

10:42 http://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker

10:44 New people joined, including Daniel Kinzler (from WM-DE)

10:45 http://www.mediawiki.org/wiki/MediaWiki_on_IRC

10:47 Sumana describing the process of providing patches through bugzilla, with an experienced developer reviewing the patch, committing it, and then the patch being deployed and included into the release

10:48 format: unified diff; http://en.wikipedia.org/wiki/Diff#Unified_format 10:49 Short version: $ svn di > /Users/you/Desktop/my_first_patch.diff

10:50 Roan showing an example of bug (http://bugzilla.wikimedia.org/28296 ) Roan going through the bug comments

10:52 Roan presenting Code Review http://www.mediawiki.org/wiki/Special:Code/MediaWiki - http://www.mediawiki.org/wiki/Code_review

10:54 Discussing URLs that every developer should remember: http://mediawiki.org/wiki/Special:Code/MediaWiki/status/fixme and the like

CodeReview statuses 10:57: fixme: the commit can't be deployed to production sites because there's something wrong with it BUT we really want people to TEST their work BEFORE they commit it, in order to keep trunk in a usable state. This helps with the code review process and ensures changes are deployed and released more regularly Some areas of the job are covered with tests. See also http://www.mediawiki.org/wiki/Continuous_integration

You usually request commit access when you submit enough (good) patches (through bugzilla) that it's more convenient for everyone to give you access

Q: Is there a place in SVN where you can download a good working copy (without fixmes etc.) A: no. But we do have SVN branches for stable releases (eg. /mediawiki/branches/REL1_18) and we have the tarball releases as downloadables. Recommendation: Checkout the latest branch that is being prepared for a release (right now this is branch 1_18). http://www.mediawiki.org/wiki/Download_from_SVN Tip: If you submit patches, they should be made based on trunk to avoid merge conflicts. Do not submit patches based on a branch or a stable tarball download! Also note that if there are serious fixme's they will not be in trunk, they will be reverted, trunk must always be stable enough to run a small wiki on (http://translatewiki.net also runs on the live trunk!) Our goal is to get trunk into an always-deployable state by getting closer and closer to continous integration, getting tests to automatically run, and bouncing off changes that break tests

11:05 break

-- Link to source code of WordPress plugin "PhotoCommons" -- http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools/wp-photocommons/ -- Workflow: Dialog to enter search keywords, find images through Wikimedia Commons API, click thumbnail to insert into wordpress post/page. -- Author: Krinkle and Husky

-- Link to Drupal plugin to parse wikitext and render through MediaWiki API: -- http://svn.wikimedia.org/viewvc/wikimedia/branches/wmse_civicrm/wmse_mediawikiparser/ -- Author:Holger Motzkau (User:Prolineserver), Manuel Schneider (User:80686)

11:36 almost everyone back

Roan presenting MediaWiki preferences; if you're running your own MediaWiki site, you can change/customize default values, like the default skin, etc. 11:40 Yay, enhanced recent changes! 11:43 Yaaay, gadgets! http://mediawiki.org/wiki/Extension:Gadgets

Gadgets are a nice way for new developers (who know JavaScript) to get involved in MediaWiki development

11:48 Q: How is the JavaScript from Gadgets executed ? A: The gadget extension is loaded on every page and checks if the user has the preference for this page is enabled. If that is the case, it loads the javascript as a page. The gadgets are only editable/publishable by wiki administrators and are stored as wiki pages on MediaWiki:Gadget-{gadgetname}.js. Inside the script a gadget maker can do things like "if ( wgPageName == 'Amsterdam' )" that way the script will only be loaded on that page name.

11:52 To "create" a new gadget the wiki administrator adds a list item on MediaWiki:Gadgets-definition. Starting with MediaWiki 1.17, you can use jQuery in Gadgets 11:54 Non-resourceloader gadgets all run in the global scope. This means that you should wrap your script in a closure to avoid naming conflicts and scope leakage. This will been fixed in MediaWiki 1.19 (in 1.19+ gadgets are always loaded through ResourceLoader, which means each module executes in a local (new) scope by default)

next 30 min: extensions

http://www.mediawiki.org/wiki/Manual:Extensions

http://www.mediawiki.org/wiki/Manual:Developing_extensions there's a separate extensions respository in SVN trunk

11:58 Go to http://mediawiki.org/wiki/Manual and read it, and fix it if it's out of date.

Extensions
11:59 in the svn tree, there is an extensions directory. This is where extensions are. If you write one and commit it to the extensions directory, it will be translated by the translatewiki.net translators. Most of these extensions aren't suitable for the Wikimedia sites, either because they're specialized, unneeded (e.g. proofreading tools on wikisource) or unsecure for sites like Wikipedia (for example a "Who is online" extension is more of a social extension, not for an encylopedia). The Special:Version page on any wiki will show where extensions are in use. On any MediaWiki wiki, you can see the list of installed extensions by looking at your wiki's Special:Version page

An extension should be considered fairly safe & stable if it's used on a Wikimedia production wiki (indicated by a box at the bottorm of the extension page)

example: in SVN /trunk/extensions/examples/ http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions/examples

12:05 http://mediawiki.org/wiki/Manual:Parser_functions

Special pages
http://www.mediawiki.org/wiki/Manual:Special_pages

12:08:03 Werdna Andrew arrived! will give a talk/workshop during the conference

Magic words
http://www.mediawiki.org/wiki/Manual:Magic_words

A magic word lets you add your own wikitext tags to extend wikitext. Extensions can introduce both magic words and tags.

There are three kinds of magic words (Parser functions, variables and boolean triggers)

Look like
 * Parser functions

Parser functions included in core MediaWiki: http://www.mediawiki.org/wiki/Manual:Parser_functions Extensions can introduce their own parser functions, the most popular one: http://www.mediawiki.org/wiki/Extension:ParserFunctions

etc.
 * Variables


 * Behaviour switches (like _TOC_ would place the table of contents in that position, extension can add their own behaviour triggers (like _NOINDEX_).

Tags
http://www.mediawiki.org/wiki/Manual:Tag_extensions

New tags can be introduced by an extension (ie.  could be a parser tag created by an extension to show recent posts of a blog inside a wiki page)

Tags cannot be nested, because the contents are handled by the tag function and not by the wikitext parser.

12:10 A special page is just a PHP file - you can output whatever HTML you like to do what you want to do with it. There is example code in the "examples" SVN subdirectory

Hooks
http://www.mediawiki.org/wiki/Manual:Hooks

A hook is something that calls a function whenever a specific event happens, and it can do things like notifying the IRC channel when a specific page is changed.

12:13:16 “Andrew’s Captain Hook!” (and Daniel Kinzler is Inspector Gadget) WP:BJAODN ;-]

MediaWiki core overview
12:16 “phase3” is Magnus Manske’s fault (it actually is the “third rewrite”) and historically called so. http://www.mediawiki.org/wiki/MediaWiki_history

Important Subversion directories: http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3 "MediaWiki core" http://svn.wikimedia.org/viewvc/mediawiki/trunk/extensions "MediaWiki extensions" http://svn.wikimedia.org/viewvc/mediawiki/trunk/tools "Tools not written in or for MediaWiki but are somehow related (like a WordPress plugin for MediaWiki, or a dump script, etc.)"

File structure overview of MediaWiki core (http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/ ): index.php A small file that does some initialization and then goes off to WebStart. All article views, edits, actions, special pages all of it goes through this file. api.php .. /skins/ .. /includes: the "php includes directory" Contains default settings, global functions, and all classes (such as the Special pages, Wiki page actions (read, edit, etc.), ResourceLoader, Api modules and lots more.

Example case about a reading list for a user to track what they've read or not in a given category and to make an educational game which asks questions about the pages that have been read. needs a hook to do something in the DB when a user reads something needs a special page to show the user what they've read

You could do this as a gadget, but you wouldn't have access to the database, so it would be simpler to do it as a php extension.

Bots
Bots are not in the MediaWiki source tree: they're programs that have (for example) Wikipedia accounts use the API to (for example) edit the text of pages or look at the recent changes. They are often written in Python (using pywikibot framework) or in any programming language that can do HTTP requests (eg. in PHP with cURL to the API). See http://meta.wikimedia.org/wiki/Pywikipediabot/Basic_use http://mediawiki.org/wiki/API For more information about the API or about bots on Wikipedia you can ask Roan or Krinkle.

Vandalism
In MediaWiki everybody with the 'edit' right can make an edit. Users can use 'undo' button from history page to undo any previous edit. There is little to no vandalism detection in core, but there are many good extensions to help with this. For example Wikipedia

Side track: rollback state in database for edits (From Daniel) Perhaps a new FlaggedRevisions state ? (ie. a "added vandalism" and "repairing" status)

http://www.mediawiki.org/wiki/Special:Code/MediaWiki/ http://www.mediawiki.org/wiki/Extension:Gadgets#Usage http://www.mediawiki.org/wiki/Manual:Configuration_settings

Start hacking!

 * http://www.mediawiki.org/wiki/Annoying_Little_Bug
 * https://bugzilla.wikimedia.org/buglist.cgi?query_format=advanced&keywords=easy&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&product=MediaWiki&list_id=20029
 * http://www.mediawiki.org/wiki/How_to_become_a_MediaWiki_hacker#Installing_MediaWiki

14:15 Code Tour or individual Hacking

14:15 Sumana asking the remaining people on what they’re going to do 14:19 it started 14:20 Roan: http://mediawiki.org/wiki/Developer_hub and http://mediawiki.org/wiki/Manual:Code

Code Tour
There is a lot of random stuff in this diagram. Roan did not even realize we had so many random tables.

Tables

 * 'page' table

Page table is the centerpiece of lots of interactions
 * page_id
 * every page has a numerical ID
 * exposed in the interface sometimes.
 * autoincrement. Every page as it is created gets one.
 * page_namespace
 * Namespace of a page is a number, not a textfield. Surprising.  You'd expect the text to be somewhere.
 * Text is a config thing.... pages in the user namespace have a "2" in that field (includes/Defines.php).
 * A separate file says that 2 means user. This is for localisation.
 * And so you can rename namespaces. If you add a new namespace you have to give a number
 * it should be over 15 for core.
 * For extensions adding namespaces: use something higher than 100
 * 0-99 are "reserved" as core namespaces.
 * Different extensions reserve diff namespaces. Semantic, for example, uses 100-199.
 * Negative namespaces are special, don't try to use them)
 * Wikimedia-specific namespaces are not in the main code
 * see $wgExtraNamespaces — you can create your own.
 * Many languages have "Portal", but not all, and it's not in the core.
 * Lithuanian Wikipedia, articles that are lists, they put in a different namespaces.
 * How high can you go? 32 bits.  So 2 or 4 billion namespaces possible.
 * Every namespace should have an associated talk namespace.
 * Subjects are even numbers
 * Talk namespaces are odd
 * If you break this convention, MediaWiki breaks.
 * page_title
 * title itself, stored without a namespace prefix.
 * Example: User:Sumanah -- Database stores: namespace=2, title=Sumanah
 * page_restritions
 * page_counter
 * page_is_redirect
 * Is page redirect?
 * page_is_new
 * is it new?
 * page_random
 * page_touched
 * page_latest
 * Relates to: Revision table! specifically, a row in that table.
 * page_len


 * 'revision' table

Revision table records metadata for every revision, including creation.


 * rev_id
 * a unique ID for a revision
 * sometimes visible in the interface as “oldid”
 * rev_page
 * rev_text_id
 * see the text the text table for the text of the edit
 * ID, blob of text, & some flags. In theory, just text.  In practice, gzipped (then the gzip flag would be flagged), or stored externally, etc.
 * rev_comment
 * edit summary
 * rev_user
 * user who made the edit
 * rev_user_text
 * rev_timestamp
 * timestamp
 * rev_minor_ecit
 * rev_deleted
 * rev_len
 * rev_parent_id


 * 'user' table


 * user_id
 * autoincrement ID
 * user_name
 * user's name

rest is all sorts of random metadata about a user


 * ‘recentchanges'

recentchanges is a separate summary table, although the data can be inferred from other tables, for performance reasons.

Classes
http://svn.wikimedia.org/doc/
 * Doxygen documentation, similar to PHPdoc, but can be used in many languages.
 * Autogenerated documentation, comments start with /** instead of just /*


 * About a Wiki-page (Title class, WikiPage class)

Every page is uniquely identified in two ways: "page id", stored in page.page_id or: "namespace & title" pair, that together are always unique (there can be two pages with the same title (Project:Foo, User:Foo, Category:Foo), but there can be only one with the same namespace + title pair.

A page like "Brion Vibber" could be a User-page, Article, Category, anything. The namespace will make it obvious what the page is exactly, the page title itself is not enough.


 * Title class

Inside MediaWiki pages are identified through instances of the "Title" class. Not by passing page_id's or namespace/title combos. So if an extension hooks into core functionality it will get the instance of the Title class for the current page (stored in the global $wgTitle) and can use methods like $mytitle->isRedirect to see if a page is a redirect.

To create an instance of Title use any of the constructor methods such as Title::newFromId or Title::newFromText etc.

The Title class has several static utility functions to offer. Most used is "makeTitleSafe". Always use SAFE methods when your input is coming from users, or you aren't sure that it's sanitized. May be OK to use non-safe methods when your input is coming directly from the database through.

A Title-constructor may return null, this indicates that the title is invalid. Always check for null's!

Q: We see getters, but no setters. How do we set methods? A: That's not possible, title objects would otherwise be mutable/ not be reliable. Besides, they represent a title, not a modifiable page.

On that, to actually touch the database / modify an actual wiki page, we use the WikiPage class.


 * WikiPage class

On a high level: $myTitle = Title::newFrom***( ... ); $myArticle = new WikiPage( $myTitle ); $myArticle->doEdit( ... );

Wrap-up

 * Q There is an extension that calculates statistics -- how does it work?
 * A The view-counters are actually not an extension but part of MediaWiki core (uses page.page_counter).
 * Since large sites may not want to increment a database table field, it is possible to disable this feature (See $wgDisableCounters) (Wikipedia has disabled it).
 * additional notes from Sumana: hmm, core developers do not understand request-context??
 * There is an extension that calculates statistics -- how does it work?
 * view counters are in MediaWiki -- you can disable them, as we have on WMF because of performance
 * page counter field in the page table
 * and there is a hit counter table
 * periodically, whenever you hit a page, it adds a row, this is a temporary buffer, then ever

Feedback
some feedback on the How To Start Hacking MediaWiki session:


 * "very informative -- I mostly knew the material, but I'd had to learn it over several years!"
 * understanding architecture of code behind Wikipeda
 * what makes it run
 * process of being able to interact with the code
 * bugtracking system
 * how to check out code
 * edit it
 * check it back in
 * what happens after that
 * database overview was extremely useful
 * “such a big topic. NONE of the discussion was wasted -- a short discussion under time pressure”