MediaWiki architecture document/How and why

Hashar

 * If you had to pick, what are 5 key decisions MediaWiki's developers made that were very insightful?
 * 1) Making MediaWiki reusable by other people. At a time it was hard to install, you had to run a command line installer to set it up and there were plenty of references to Wikipedia and hardcoded paths everwhere. Releasing tarball is great for that.
 * 2) Rewriting the schema early to better support web scaling. I think it was in MediaWiki 1.5.
 * 3) Using a tokenizer to parse wikitech (JeLuF wrote it). Unfortunately lack of performances with PHP array memory allocations led to a revert after 3 days of having it running on live site. We are back to the huge pile of regexp since them.
 * 4) Writing our own database abstraction layer and load balancer. Well at that time, there were not much around so we HAD to write one :-b
 * 5) Opening source code to a lot of volunteer developers. We have tons of people able to commit :-)
 * 6) Peer reviewing of every single patch since day 1.
 * 7) Migrating to svn (next step git?)
 * 8) Finally using jQuery for javascript.


 * And if you had to pick, what are 5 key decisions MediaWiki's developers made that were, in retrospect, the wrong choice?
 * 1) Reusing MediaWiki to build commons. It was, and is still not, adapted to handling millions of media files. We should have started a dedicated project with the goal of handling media files, something less like a wiki and more like Flickr or Picassa.
 * 2) relying on PHP/MySQL which are probably not the best choices for performances. On the other hand, they are both very popular and most software developer can thus submit patches :-)
 * 3) The templating architecture, Tim Starling can talk about it much more than me.
 * 4) Using per language encoding. That led to a lot of issues. Eventually everything migrated to UTF-8 which makes things easier when you deal with hundreds of different languages.
 * 5) We still have metadata (categories, interwiki) in the body of text. This should really have been coded in a different table / interface. Users have to edit the whole page just to add/remove categories and interwikis :-)


 * How would you describe our attitude to backwards compatibility?

AFAIK, we are really conservative. Most old methods / functions are kept in the code, nowaday they are marked as deprecated and removed after 2 or 3 releases. We still support the 10 years old skins.

Extensions part of our svn repository are more or less maintained by core developers. At least when it comes to API changes and sometime coding style. The wikitext parser and render still supports hacks we really want to remove for performances reasons. Still, since people use them, we keep the features :D


 * Did MediaWiki get more customizable over time (with extensions, user scripts, gadgets, and skins) or less? Why?

It get more customizable. Given a skin, someone can now apply his own stylesheet just by editing an article (something like User:username/skinname.css )

Gadgets are great. Want to talk Brion about them.

At one time, you could not write any extension. I am not sure who added the first hooks, I know I proposed it to some developers, eventually the hook system was added by someone else.


 * What decisions and improvements improved or hurt MediaWiki's performance?
 * Rewriting the database schema improved MW.
 * Adding support for memcached (in memory cache) and APC (PHP opcode cache) had a HUGE impact in improving perfs.
 * The template system degraded them. People came with creative uses of the template system and eventually led to templates that takes seconds to render. The worse of them is probably the citation templates. We should really have coded that template as a PHP extension which would get us better perfs.
 * Our broken parser has and still has bad performances.
 * ResourceLoader !!! :)


 * Anything else you think should be included in a document about MediaWiki's architecture and history?