MediaWiki architecture document/How and why

Hashar

 * If you had to pick, what are 5 key decisions MediaWiki's developers made that were very insightful?
 * 1) Making MediaWiki reusable by other people. At a time it was hard to install, you had to run a command line installer to set it up and there were plenty of references to Wikipedia and hardcoded paths everwhere. Releasing tarball is great for that.
 * 2) Rewriting the schema early to better support web scaling. I think it was in MediaWiki 1.5.
 * 3) Using a tokenizer to parse wikitech (JeLuF wrote it). Unfortunately lack of performances with PHP array memory allocations led to a revert after 3 days of having it running on live site. We are back to the huge pile of regexp since them.
 * 4) Writing our own database abstraction layer and load balancer. Well at that time, there were not much around so we HAD to write one :-b
 * 5) Opening source code to a lot of volunteer developers. We have tons of people able to commit :-)
 * 6) Peer reviewing of every single patch since day 1.
 * 7) Migrating to svn (next step git?)
 * 8) Finally using jQuery for javascript.


 * And if you had to pick, what are 5 key decisions MediaWiki's developers made that were, in retrospect, the wrong choice?
 * 1) Reusing MediaWiki to build commons. It was, and is still not, adapted to handling millions of media files. We should have started a dedicated project with the goal of handling media files, something less like a wiki and more like Flickr or Picassa.
 * 2) * I think we could do a lot better while still working within the MediaWiki framework, but we never really did that either. --brion
 * 3) relying on PHP/MySQL which are probably not the best choices for performances. On the other hand, they are both very popular and most software developer can thus submit patches :-)
 * 4) The templating architecture, Tim Starling can talk about it much more than me.
 * 5) Using per language encoding. That led to a lot of issues. Eventually everything migrated to UTF-8 which makes things easier when you deal with hundreds of different languages.
 * 6) * We eliminated this in 1.5, along with the "big schema change". Definitely wish we'd done it first though! --brion
 * 7) We still have metadata (categories, interwiki) in the body of text. This should really have been coded in a different table / interface. Users have to edit the whole page just to add/remove categories and interwikis :-)


 * How would you describe our attitude to backwards compatibility?

AFAIK, we are really conservative. Most old methods / functions are kept in the code, nowaday they are marked as deprecated and removed after 2 or 3 releases. We still support the 10 years old skins.

Extensions part of our svn repository are more or less maintained by core developers. At least when it comes to API changes and sometime coding style. The wikitext parser and render still supports hacks we really want to remove for performances reasons. Still, since people use them, we keep the features :D


 * Did MediaWiki get more customizable over time (with extensions, user scripts, gadgets, and skins) or less? Why?

It get more customizable. Given a skin, someone can now apply his own stylesheet just by editing an article (something like User:username/skinname.css )

Gadgets are great. Want to talk Brion about them.

At one time, you could not write any extension. I am not sure who added the first hooks, I know I proposed it to some developers, eventually the hook system was added by someone else.


 * What decisions and improvements improved or hurt MediaWiki's performance?
 * Rewriting the database schema improved MW.
 * Adding support for memcached (in memory cache) and APC (PHP opcode cache) had a HUGE impact in improving perfs.
 * The template system degraded them. People came with creative uses of the template system and eventually led to templates that takes seconds to render. The worse of them is probably the citation templates. We should really have coded that template as a PHP extension which would get us better perfs.
 * Our broken parser has and still has bad performances.
 * ResourceLoader !!! :)


 * Anything else you think should be included in a document about MediaWiki's architecture and history?

Brion

 * If you had to pick, what are 5 key decisions MediaWiki's developers made that were very insightful? Why?


 * 1) FOSS from the beginning. While we're not always the best at accepting patches & sharing, we're far from the worst. :) MediaWiki's early couple years involved zero engineering budget and a lot of volunteer turnover -- the original authors of the phase 2 and phase 3 PHP codebases were Wikipedians with a technical bent as were most of our other early devs; folks like me started on fixes, internationalization, and features support based on our ability to see the source in CVS & the bug tracker on SourceForge.
 * 2) Regular releases and installer. Making sure the software was easy to set up was a BIG factor in getting more volunteer developers involved, both directly (people already possibly interested wanting a small impedence to start working) and indirectly (easier for 3rd-party usage, leading to people sending fixes and customizations upstream). Release frequency has gotten less regular but we're still pushing them out, and the installer's gotten a big boost in 1.17.
 * 3) Extension architecture. While I'm not 100% happy with every detail of how we do it, we have a fairly flexible infrastructure which has helped us to make specialized code more modular, keeping the core software from expanding (too) much and making it easier for 3rd-party reusers to build custom stuff on top.
 * 4) Site/user JS/CSS and gadgets: hugely impactful, this has greatly increased the democratization of MediaWiki's software development. Individual users are empowered to add features for themselves; power users can share these with others both informally and through globally-configurable admin-controlled systems.
 * 5) Templates. While we have plenty of things to whinge about in the syntax, management etc, the ability to create partial page layouts and reuse them in thousands of articles with central maintenance has been a big boon.


 * And if you had to pick, what are 5 key decisions MediaWiki's developers made that were, in retrospect, the wrong choice? Why?


 * 1) Cleaner markup syntax near the beginning would simplify our lives a lot with template & editing stuff
 * 2) The flat namespace for articles is too simple: for Wikipedia it encourages overly long pages (leads to performance problems as we have to parse and copy around huge chunks of text that will not usually get read all at once, and makes it harder to navigate to relevant, more digestable chunks of data). For other sites like wikibooks, wikisource, wikiversity, heck even mediawiki.org we could benefit a lot from more structured entities that consist of multiple subpages.
 * 3) Not implementing structured messaging / discussion / chat systems from the beginning has left us with a legacy of "talk pages" that are horrid to work with and unaccountable IRC backchannels.
 * 4) DB & filesystem storage layout for media files is very awkward, with a number of problems that hinder our ability to mirror, cache, and do major online maintenance.
 * 5) A cleaner accounts system that spanned multiple sites from the beginning would have saved lots of trouble; CentralAuth is still a bit hacky to work with.


 * What are MediaWiki's idiosyncrasies? What makes it special compared to other PHP software?




 * What would you say were the main milestones in MediaWiki's history?


 * creation, testing, and initial deployment of Magnus' "phase 2"
 * creation, testing, and initial deployment of Lee's "phase 3"
 * Refactorings & performance improvements in the early Brion & Tim years
 * internationalization & unicode
 * addressable logs
 * 1.5 schema refactor
 * compression & external storage
 * web-installable package
 * regular releases
 * Early empowerment of end-users
 * user/site JS/CSS
 * extensions
 * templates
 * CentralAuth
 * Gadgets
 * API
 * 1.12 or so - preprocessor
 * 1.17 - resourceloader


 * What decisions and improvements improved or hurt MediaWiki's performance?


 * improvements above? :)
 * hurt: awful awful syntax making it harder to plug in better-performing parser etc bits
 * hurt: PHP has not benefited from performance improvements that some other dynamic languages have seen in recent years (eg JavaScript VMs now have aggressive JITs etc, but Zend's PHP still doesn't ship an opcode cache, much less try to actually compile anything)
 * hurt: MySQL has had a few specific areas it's lagged in that have been problematic:
 * lack of full native UTF-8 (this is finally in in the latest versions, but you have to jump through some hoops and we have years of legacy databases)
 * no or limited online alter table makes even simple schema changes painful to deploy, slowing some development
 * data dump format is very hard to parallelize well; even with few changes to the database it takes forever to build one due to the compression.


 * How would you describe our attitude to backwards compatibility?


 * .... varies. :)


 * Anything else you think should be included in a document about MediaWiki's architecture and history?


 * will add more at some point