Manual:MediaWiki architecture

This page contains the actual text of the MediaWiki architecture document project. This content will be merged into the appropriate mediawiki.org pages once the specific formatting required for inclusion in the AOSA book is not needed any more.

Wikipedia's software

 * context
 * Wikipedia and sister sites
 * optimization and performance because of requirements of high-profile sites like those operated by the WM


 * software originally written for a very specific purpose: serving a community for the creation & curation of knwoledge; not the case of most CMS.
 * constant design decisions that never come up in other CMSes
 * openness of wikis; technology response: abusefilter, etc.
 * community defining roles, processes, to deal with vandalism, for example. That doesn't come up in other contexts: technological answers that don't come up with regular CMS. other CMSes: corporate customers, existing workflows
 * needs of the community
 * collaboration
 * the community and its need evolved over time; it wouldn't have been possible to foresee the features the community would need
 * interplay of editing community and development

the architecture of MW has been driven many times by initiatives started of requested by the community. e.g. creation of Wikimedia Commons, FlaggedRevs, Wiki loves monuments


 * Performance
 * Security
 * Languages
 * open-source


 * Regular releases and installer. Making sure the software was easy to set up was a BIG factor in getting more volunteer developers involved, both directly (people already possibly interested wanting a small impedence to start working) and indirectly (easier for 3rd-party usage, leading to people sending fixes and customizations upstream).


 * Deploy-then-release


 * endless compromise between performance and features. e.g. template system degraded performance. One possibility is to turn them into dedicated extensions


 * hurt performance: awful awful syntax making it harder to plug in better-performing parser etc bits


 * Factors that contribute to a 'performance culture':
 * We have a few very expert people on board who know a lot about performance optimization (DB performance is a big chunk of it, but the rest is important too)
 * MediaWiki must run on a top-ten web site. Things that would not scale to that size are fixed, reverted, or put behind a config var
 * MediaWiki must run on a top-ten web site operated on a shoestring budget, which has additional implications for performance and caching


 * couple slides from talk at CCC in 2004

Reuse

 * reusability of MediaWiki by other people, not just for our purposes; used to be difficult to install (command line installer, many references to Wikipedia, hardcoded paths); now easier with the web-based installer
 * installer. a quick note about the installer would be good. During the early days (ask Tim Starling), you add to run a shell script to install MW. Then it was made to just upload file and run the /config/ wizard.
 * Experimental web-based installer since 1.2 (March 2004)
 * Making it FLOSS since the very beginning was very important for its popularity. MediaWiki has become the 800-lb gorilla of wiki software, and it wouldn't have happened with a closed development model.
 * Reusing MediaWiki to build commons: not adapted to handling millions of media files.

backwards compatibility

 * Some aspects such as hooks or configuration variables, remain very stable for a long time. When they change, they typically go through a slow deprecation process to allow users and extension authors to catch up.
 * However our internal apis change all the time which can be frustrating to extension authors (and even core devs!)
 * I think this will improve in the coming releases though. A lot of our "omg rewrite" situations in the past ~2 years have been to bring ancient code into the 21st century.


 * Inconsistent. Such is the result of having many volunteer developers with many different opinions on this fraught issue.


 * Most old methods / functions are kept in the code, nowaday they are marked as deprecated and removed after 2 or 3 releases. We still support the 10 years old skins.
 * The wikitext parser and render still supports hacks we really want to remove for performances reasons.

Phase I: UseModWiki
Wikipedia was launched in January 2001. At the time, it was mostly an experiment, to try and boost the production of content for Nupedia, a free-content, but peer-reviewed, encyclopedia created by Jimmy Wales. Because it was an experiment, Wikipedia was originally powered by UseModWiki, an existing GPL wiki engine written in Perl, using CamelCase and storing all pages in individual text files.

It soon appeared that CamelCase wasn't really appropriate to name encyclopedia articles. In late January 2001, UseModWiki developer and Wikipedia participant Clifford Adams added a new feature to UseModWiki: free links, e.g. the ability to link to pages with a special syntax (double square brackets), instead of automatic CamelCase linking. A few weeks later, Wikipedia enabled the new version of UseModWiki supporting free links, and enabled them.

While this initial phase isn't about MediaWiki per se, it provides some context and shows that, even before MediaWiki was created, Wikipedia started to shape the features of the software that powered it. UseModWiki also influenced some of MediaWiki's features, for example its basic text formatting syntax.

Phase II: the PHP script
In 2001, Wikipedia was not yet a top 10 website; it was an obscure project sitting in a dark corner of the Interwebs, unknown to most search engines, and hosted on a single server. Still, performance was already an issue, notably because UseModWiki stored its content in a flat file database. At the time, Wikipedians were worried about being "inundated with traffic" following articles in the New York Times, Slashdot or Wired.

In Summer 2001, Wikipedia participant Magnus Manske (then a university student) started to work on a dedicated wiki engine for Wikipedia in his free time. The goal was to improve Wikipedia's performance using a database-driven software, but also to be able to develop Wikipedia-specific features that couldn't be provided by a "generic" wiki engine. Written in PHP and MySQL-backed, the new engine was simply called the "PHP script", "PHP wiki", "Wikipedia software" or "phase II".

The "PHP script" was made available in August 2001, shared on SourceForge in September, and tested until late 2001. As Wikipedia suffered from recurring performance issues because of increasing traffic, the English language Wikipedia eventually switched from UseModWiki to the PHP script in January 2002. Other language versions also created in 2001 were slowly upgraded as well, although some of them would stay powered by UseModWiki until 2004. An automated program, called "User:Conversion script", converted the last version of the existing articles to the phase II format; previous revisions of the articles' history pre-January 2002 were partly restored by developer Brion Vibber in September 2002.

As a PHP software using a MySQL database, the PHP script was the first iteration of what would later become MediaWiki. Not only that, but it also introduced many critical features still in use today, like namespaces to organize content (including talk pages), skins, and special pages (including maintenance reports, contributions list and user watchlist).

Phase III: MediaWiki
Despite the improvements from the PHP script and database back-end, the combination of increasing traffic, expensive features and limited hardware continued to cause performance issues. In 2002, developer Lee Daniel Crocker rewrote the code again, calling the new software "Phase III". Because the site was experiencing frequent difficulties, Lee thought there "wasn't much time to sit down and properly architect and develop a solution", so he "just reorganized the existing architecture for better performance and hacked all the code".

The Phase III software kept the same basic interface, and was designed to look and behave as closely to the Phase II software as possible. A few new features were also added, like a new file upload system, side-by-side diffs of content changes, and interwiki links.

It was deployed to the English Wikipedia in July 2002, along with a hardware move to a new (but still single) server. Other features were added over 2002, like new maintenance special pages, or the "edit on double click" option. Performance issues quickly reappeared, though: view count and site stats, which were causing two database writes on every page view, were for example disabled in November 2002, then re-enabled. The site would occasionally be switched to read-only mode to maintain the service for readers, and expensive maintenance pages would be disabled during high-access times because of table locking problems.

In early 2003, developers discussed whether they should properly re-engineer and re-architect the software from scratch, before the fire-fighting became unmanageable, or whether they should rather continue to tweak and improve the existing code base. The latter solution was chosen, mostly because most developers were sufficiently happy with the code base, and confident enough that further iterative improvements would be enough to keep up with the growth of the site.

In June 2003, a second server was added, serving as a database server separate from the web server. The database server also served as the web server for non-English Wikipedia sites. Load-balancing between the two servers would be set up later that year. A new page caching system was also enabled, that used the file system to cache rendered, ready-to-output pages for anonymous users.

June 2003 is also when Jimmy Wales created the Wikimedia Foundation, a nonprofit to support Wikipedia, and manage its infrastructure and day-to-day operations. The "Wikipedia software" was officially named "MediaWiki" in July, as a word play on the Wikimedia Foundation's name. What was thought at the time to be a clever pun would confuse generations of users and developers.

New features were added in July, like the automatically-generated table of contents, and the ability to edit page sections, both still in use today. The first release under the name "MediaWiki" happened in August 2003, concluding the long genesis of a software whose overall structure would remain fairly stable from there on.

The Coming of age
2003
 * December 2003: MediaWiki 1.1: User-editable interface messages through "MediaWiki namespace" and Magic words

2004
 * January 2004: still performance problems
 * January 29th, 2004: new features: Edit toolbar "can be enabled in prefs (works perfectly in IE, near perfect in Mozilla, not so great in most others); will be refined and possibly made default in the future", Extended image syntax "allowing automatic generation of small versions of images, and alignment of images without HTML"
 * January 30, 2004: Nine new Wikimedia servers, purchased using about $20K of the money generously donated in the December/January fundraising drive. Brought online over February 2004
 * Late May 2004: MediaWiki 1.3
 * December 12, 2004: Spam blacklist


 * Categorization system
 * 1.3: The MonoBook skin, categories, templates and extensions.
 * 1.5 schema refactor; Proposed Database Schema Changes/October 2004

2005
 * January 2005: MediaWiki 1.4 on the English Wikipedia
 * Hooks!
 * Major database redesign decoupling text storage from revision tracking
 * Page content must be encoded in UTF-8.

2006
 * ParserFunctions extension
 * API?

2007
 * Gadgets extension

2008
 * FlaggedRevisions extension
 * May: CentralAuth & Unified login
 * 1.12 - preprocessor rewrite (Tim); improvements to template performance

2009-2010
 * ResourceLoader; beginning of strong JavaScript module APIs
 * Usability initiative (Vector skins/WikiEditor extension etc.)

MediaWiki code base

 * Unprefixed class names.
 * PHP core and PECL developers have the attitude that all class names that are English words inherently belong to them, and that if any user code is broken when new classes are introduced, it is the fault of the user code.
 * e.g. 12294 Namespace class renamed to MWNamespace for PHP 5.3 compatibility, fixed in 1.13
 * Prefixing e.g. with "MW" would have made it easier to embed MediaWiki inside another application or library.


 * We try to do things cleanly if there are benefits to it (e.g. separation of logic and output in the architecture) but at the same time we're not afraid to ignore standards or rules if that's better for us (e.g. not fully complying with stupid provisions of HTML4, denormalizing the DB schema where that brings performance benefits)


 * MediaWiki seems to have been started mostly by people who weren't really very expert in their field, and as a result a lot of ugly old code is lying around that lacks proper logic/view separation and has other nasty issues

MediaWiki grew organically and is still evolving. It's hard to criticze the founders for not implementing some abstraction which we now find to be critical, when the initial code base was so small, and the time taken to develop it so short.


 * relying on PHP/MySQL: probably not the best choices for performance. But very popular and facilitates recruitment of new devs
 * hurt: PHP has not benefited from performance improvements that some other dynamic languages have seen in recent years (eg JavaScript VMs now have aggressive JITs etc, but Zend's PHP still doesn't ship an opcode cache, much less try to actually compile anything)
 * Obviously using Java would have been much better for performance, and scaling up the execution of backend maintenance tasks would have been simpler.

We've seen major new architectural elements introduced to MediaWiki throughout its history, for example:


 * The Parser class
 * The SpecialPage class
 * The Database class
 * The Image class, then later the media class hierarchy and the filerepo class hierarchy
 * ResourceLoader
 * The upload class hierarchy
 * The Maintenance class
 * The action hierarchy

MediaWiki started without any of these things, despite the fact that all of them support features that have been around since the beginning. Many developers are driven primarily by feature development -- architecture is often left behind, only to catch up later as the cost of working within an inadequate architecture becomes apparent.

Security
Because MediaWiki is the platform for high-profile sites such as Wikipedia, security is paramount. A strong security-minded development culture has been established among MediaWiki developers; to make it easier to write secure code, they are provided with wrappers around HTML output and database queries to handle escaping.

User input sanitization is done with the  class, which analyzes data passed in the URL or via a POSTed form. It removes "magic quotes" slashes, strips illegal input characters and normalizes Unicode sequences. Cross-site request forgery (CSRF) is avoided by using tokens, and cross-site scripting (XSS) by validating inputs and escaping outputs, usually using PHP's  function. MediaWiki also provides (and uses) an XHTML sanitizer with the  class, and database functions that prevent SQL injection.

Configuration
MediaWiki offers hundreds of configuration settings, stored in global PHP variables. Their default value is set in, and the system administrator can override them by editing.

Placing configuration variables in the global namespace is mostly a legacy from older versions of PHP, and has caused several issues over time. It had serious security implications with PHP's  function (not needed any more by MediaWiki since version 1.2). This system also limits potential abstractions for configuration, and makes optimization of the start-up process more difficult. Moreover, the configuration namespace is shared with variables used for registration and object context, leading to potential conflicts. From a user perspective, global configuration variables have also made MediaWiki seem difficult to configure and maintain, and hurt third-party users.

Database
MediaWiki has been using a relational database back-end since the Phase II software.

The default (and best supported) DBMS for MediaWiki is MySQL, which is the one used by all Wikimedia sites. Because alternate DBMSes (e.g. PostgreSQL, Oracle, SQLite) are not a priority for Wikimedia, support has been long to come and inconsistent. For example, PostgreSQL support was "largely working" in MediaWiki 1.4, dropped in 1.5, experimental in 1.7, and fully restored in 1.8. In recent years, volunteer developers have made efforts to support alternate DBMSes, to cater for the needs of third-party users. The DBMS can be chosen by the system administrator during installation, and MediaWiki provides both a database abstraction and a query abstraction layer, that simplify database access for developers.

The database went through dozens of schema changes over the years, the most notable being the decoupling of text storage and revision tracking in MediaWiki 1.5. This change, aiming to make better use of the database cache and disk I/O, resulted in significant performance boosts for some operations, like rename and delete operations on pages with very long edit histories.

The current layout contains dozens of tables, many of which related to the wiki's content (e.g.,  ,  , &  ). Other tables include data about users, media files , caching and internal tools (  for ResourceLoader,   for the job queue), among others.

Because of the size of Wikimedia sites, their databases, and their readership, database handling in MediaWiki is extremely optimized for performance. As of 1.4.5 (2005), MediaWiki can store page text externally on a separate database server, and old page revisions can be compressed to reduce the size of the database. Indices and summary tables are used extensively in MediaWiki, as SQL queries that scan huge numbers of rows can be very expensive, particularly in the context of Wikimedia. As a matter of fact, unindexed queries are usually discouraged in MediaWiki.

MediaWiki has built-in support for load balancing, added as early as 2004 in MediaWiki 1.2 (when Wikipedia got its second server — a big deal at the time). Load balancing is now a critical part of Wikimedia's infrastructure, which explains its influence on some algorithm decisions in the code. Also visible in the code is the influence of Wikimedia's multi-database environment, where writes are made to a unique master database, and then replicated to read-only slaves. Native management of the replication lag is one of the features offered by MediaWiki that few software developers usually have to worry about.


 * hurt performance: MySQL has had a few specific areas it's lagged in that have been problematic:
 * lack of full native UTF-8 (this is finally in in the latest versions, but you have to jump through some hoops and we have years of legacy databases)
 * no or limited online alter table makes even simple schema changes painful to deploy, slowing some development
 * data dump format is very hard to parallelize well; even with few changes to the database it takes forever to build one due to the compression.

Execution workflow of a web request
is the main entry point for MediaWiki, and handles most requests processed by the application servers (i.e. requests that were not served by the caching infrastructure; see below). The code executed from  performs security checks, loads configuration settings from , instantiates a   object , and creates a   object depending of the title and action parameters from the URL request.

can take a variety of action parameters in the URL request; the default action is, which shows the regular view of an article's content. For example, the request  displays the content of the article "Apple" on the English Wikipedia. Other frequent actions include  (to open an article for editing),   (to preview or save an article),   (to show an article's history) and   (to add an article to the user's watchlist). Administrative actions include  (to delete an article) and   (to prevent edits to an article).

is then called to handle most of the URL request. It checks for bad titles, read restrictions, local interwiki redirects, redirect loops, and determines whether the request is for a normal or a special page.

Normal page requests are handed over to, to create an   object for the page, and then to  , which handles "standard" actions. Once the action has been completed,  finalizes the request by committing DB transactions, outputting the HTML and launching deferred updates through the job queue. commits the deferred updates and closes the task gracefully.

If the page requested is a Special page (i.e. not a regular wiki page, but a special page of the software),  is called instead of  ; the corresponding PHP script is then called. Special pages can do all sorts of magic things, and all have a very specific purpose, usually independent of any specific article or its content. Special pages include various kinds of reports (recent changes, logs, uncategorized pages) and wiki administration tools (user blocks, user rights changes), among others. Their execution workflow depends on their function.

For debugging, many functions contain profiling code, which makes it possible to follow the execution workflow, if profiling is enabled.

Caching
MediaWiki itself is improved for performance because it plays a central role on Wikimedia sites, but it is also part of a larger operational ecosystem that has influenced its architecture. Wikimedia's caching infrastructure has imposed limitations in MediaWiki; developers worked around the issues, not by trying to shape Wikimedia's extensively optimized caching infrastructure around MediaWiki, but rather by making MediaWiki more flexible, so it could work within that infrastructure, without compromising on performance and caching needs.

On Wikimedia sites, most requests are handled by reverse caching proxies (Squids), and never even make it to the MediaWiki application servers. Squids contain static versions of entire rendered pages, served for simple reads to users who aren't logged in to the site. MediaWiki natively supports Squid and Varnish, and integrates with this caching layer by, for example, notifying them to purge a page from the cache when it has been changed.

For logged-in users, and other requests that can't be served by Squids, Squid forwards the requests to the web server (Apache), adding the "X-Forwarded-For" header containing the remote address; preserving the client's address is important since MediaWiki uses IPs for a variety of purposes, e.g. to block users and to credit edits made without a user account.

The second level of caching happens when MediaWiki renders and assembles the page from multiple objects, many of which can be cached to minimize future calls. Such objects include the page's interface (sidebar, menus, UI text) and the content proper, parsed from wikitext. The in-memory object cache has been available in MediaWiki since the early 1.1 version (2003), and is particularly important to avoid re-parsing long and complex pages. The default object/parser cache uses memcached.

In 2011, Wikimedia developed a disk-backed object cache to supplement memcached, and increase the amount of space dedicated to caching parsed pages. The system uses a MySQL back-end, splitting the cache into several tables, to avoid table lock contention on servers with a high write load. This feature, now included in MediaWiki, tripled Wikimedia's parser cache hit ratio, now over 90%.

Login session data can also be stored in memcached, which lets sessions work transparently on multiple front-end web servers in a load-balancing setup (Wikimedia heavily relies on load balancing, using LVS with PyBal).

Since version 1.16, MediaWiki uses a dedicated object cache for localized UI text; this was added after noticing that a large part of the objects cached in memcached consisted of UI messages localized into the user's language. The system is based on fast fetches of individual messages, minimizing memory overhead and start-up time in the typical case.

The last caching layer consists of the PHP opcode cache, commonly enabled to speed up PHP applications. Compilation can be a lengthy process; to avoid compiling PHP scripts into opcode every time they're invoked, a PHP accelerator can be used to store the compiled opcode and execute it directly without compilation. MediaWiki will "just work" with many accelerators such as APC, PHP accelerator and eAccelerator.

Because of its Wikimedia bias, MediaWiki is optimized for this complete, multi-layer, distributed caching infrastructure. Nonetheless, it also natively supports alternate setups for smaller sites. For example, it offers an optional simplistic file caching system that stores the output of fully rendered pages, like Squid does. Also, MediaWiki's abstract object caching layer lets it store the cached objects in a number of different places, including the file system, the database, or the opcode cache.

ResourceLoader
Like in many web applications, MediaWiki's interface has become more interactive and responsive over the years, mostly through the use of JavaScript. Usability efforts initiated in 2008, as well as advanced media handling (e.g. online editing of video files), called for dedicated front-end performance improvements.

ResourceLoader was developed to optimize the delivery of JavaScript and CSS assets. Started in 2009, it was completed in 2011 and has been a core feature of MediaWiki since version 1.17. ResourceLoader works by loading JS and CSS assets on demand, thus reducing loading and parsing time for unused features, for example in older browsers. It also minifies the code, groups resources to save requests, and can embed images as data URIs.

ResourceLoader is a particularly interesting case, as it's one of the few core components of MediaWiki that benefited from proper architecting prior to development. The main reason is that its developers not only wanted to "do it right", but also had the opportunity to do so, as the Wikimedia Foundation grew its engineering staff and more developers were available.

Languages
Imagine a world in which every single human being can freely share in the sum of all knowledge. This is Wikimedia's vision. A central part of effectively contributing and disseminating free knowledge to all is to provide it in as many languages as possible. Wikipedia is available in more than 280 languages, and encyclopedia articles in English represent less than 20% of all articles.

Because Wikipedia and its sister sites exist in so many languages, developers and translators are committed to fully localizing MediaWiki. And because Wikimedia sites are based on collaboration, it is important not only to provide the content in the readers' native language, but also to provide a localized interface, and effective input and conversion tools, so participants can contribute content. Some languages are really rare on the Internet; Wikimedia provides them with a platform to create educational content, through MediaWiki.

For this reason, localization and internationalization (l10n & i18n) are a central component of MediaWiki. The i18n system is pervasive, and impacts many parts of the software; it's also one of the most flexible and feature-rich. Translator convenience is usually preferred to developer convenience, but this is believed to be an acceptable cost.

MediaWiki is currently localized in more than 350 languages, including non-latin and right-to-left (RTL) languages, with varying levels of completion. The interface and content can be in different languages, and have mixed directionality.

Content language
MediaWiki originally used per-language encoding, which led to a lot of issues. Latin-1 support was dropped in 2005, along with the major database schema change in MediaWiki 1.5; content must now be encoded in UTF-8.

Special characters can be customized and inserted via MediaWiki's "Edittools", or its JavaScript version. The WikiEditor extension for MediaWiki, developed as part of a usability effort, merges special characters with the edit toolbar. Another extension, called "Narayam", provides additional input methods and key mapping features for non-latin characters.

Recent and future improvements include better support for right-to-left text, bidirectional text (LTR and RTL text on the same page) and WebFonts.

Interface language
Localization of the user interface messages was implemented in many different ways in the early years of MediaWiki, especially in MediaWiki extensions. Efforts were made to standardize them; interface messages are now all stored in PHP arrays of key-values pairs. Each message is identified by a unique key, which is assigned different values across languages. This standard was established for legacy reasons, and also because other systems were deemed not to be flexible enough for MediaWiki. For example, gettext doesn't support plural forms for multiple variables.

MediaWiki messages can embed parameters provided by the software, which will often influence the grammar of the message. In order to support virtually any possible language, MediaWiki's localization system has been improved and complexified over time to accommodate their specific traits and exceptions, often considered oddities by English speakers.

For example, adjectives are invariable words in English, but languages like French require adjective agreement with nouns. If the user specified their gender in their preferences, the  files, where   is the ISO-639 code of the language (e.g.   for French); default messages are in English and stored in. MediaWiki extensions use a similar system, or host all localized messages in an  file. Along with translations, Message files also include language-dependent information such as date formats.

Contributing translations used to be done by submitting PHP patches for the  files. In December 2003, MediaWiki 1.1 introduced "database messages", a subset of wiki pages in the MediaWiki namespace containing interface messages. The content of the wiki page  is the message's text, and overrides its value in the PHP file. Localized versions of the message are at, e.g..

This feature has allowed power users to translate (and customize) interface messages locally on their wiki, but the process doesn't update i18n files shipping with MediaWiki. In 2006, MediaWiki developer Niklas Laxström created a special, heavily hacked MediaWiki website (now hosted at ) where translators can easily localize interface messages, simply by editing a wiki page. The  files are then updated in the MediaWiki code repository, where they can be automatically fetched by any wiki, and updated using the LocalisationUpdate extension. On Wikimedia sites, database messages are now only used for customization, and not for localization any more.

In order to help translators understand the context and meaning of an interface message, it is considered a good practice in MediaWiki to provide documentation for every message. This documentation is stored is a special Message file, with the  language code, which doesn't correspond to a real language. The documentation for each message is then displayed in the translation interface on translatewiki.net. Another help tool is the  language code: there is no associated   file, and it isn't possible to select this fake language in the user's preferences. But when used with the  parameter to display a wiki page (e.g.  ), MediaWiki will display the message keys instead of their values in the user interface: this is very useful to identify which message to translate or change.

Registered users can set their own interface language in their preferences, in which case it overrides the site's default interface language. MediaWiki also supports fallback languages: if a message isn't available in the chosen language, it will be displayed in the closest possible language, and not necessarily in English. For example, the fallback language for Breton is French.

Authentication

 * A cleaner accounts system that spanned multiple sites from the beginning would have saved lots of trouble; CentralAuth is still a bit hacky to work with.

Permissions & blocks

 * Lack of a unified, pervasive permissions concept.
 * This stymied the development of new user rights and permissions features, and led to various security issues.


 * User rights management within the wiki: since 1.2 (March 2004)
 * 1.8: Allow blocks on anonymous users only.


 * Manual:User.php

Content structure

 * Page title: CamelCase, then free links. Allowed for greater flexibility in link text and page names and made it less confusing. Free links have since become the de-facto standard for internal links in most wiki software now
 * subpages (/talk subpages initially)
 * namespaces, parentheses in titles; related to the abandon of subpages decided by Larry Sanger in November 2001 after intense debate.


 * categories: since 1.3 (August 2004)


 * namespaces
 * The flat namespace for articles is too simple: for Wikipedia it encourages overly long pages (leads to performance problems as we have to parse and copy around huge chunks of text that will not usually get read all at once, and makes it harder to navigate to relevant, more digestable chunks of data). For other sites like wikibooks, wikisource, wikiversity, heck even mediawiki.org we could benefit a lot from more structured entities that consist of multiple subpages. It also means it's harder to separate draft or provisional pages from the published article space.


 * Magnus Manske implemented them in the first PHP script, then they were reimplemented a few times
 * way to separate the different spaces for the community to evolve
 * created the necessary preconditions for the community for meta-level discussions, community processes, user profile

Wikitext & Parser

 * Manual:Parser.php


 * tokenizer to parse wikitext (JeLuF wrote it). Unfortunately lack of performances with PHP array memory allocations led to a revert after 3 days of having it running on live site. We are back to the huge pile of regexp since them.
 * Cleaner markup syntax near the beginning would simplify our lives a lot with template & editing stuff


 * The parser wasn't formally spec'd from the beginning--it just morphed and evolved as needs have demanded. This makes it difficult for alternative parsers to exist and has made changing the parser hard. The parser's spec is whatever the parser spits out, plus a few hundred test cases.


 * The parser has to remain very very stable. Hundreds of millions of wikipages worldwide depend on the parser to continue outputting HTML the way it always has. It makes changing the parser difficult.

Making wikitext such a complex and idiosyncratic language that parsing it with 'normal' parsers is very hard was definitely a bad move, and we're feeling the pain now


 * Editing toolbar for learning wiki syntax: since 1.2 (March 2004)

1.12: preprocessor rewrite: "The main motivation for making these changes was performance. The new two-pass preprocessor can skip "dead branches" in template expansion, such as unfollowed #switch cases and unused defaults for template arguments. This provides a significant performance improvement in template-heavy test cases taken from Wikipedia.

Editing

 * metadata (categories, interwiki) in the body of text. This should really have been coded in a different table / interface.


 * The visual editor project is way overdue. We're fixing it now, and that's good, but it's kind of ridiculous that, in 2011, the main interface of one of the largest sites on the web is still a  from the 90s

Templates
1.3 (August 2004) Templates have been expanded with parameters, and separated from the MediaWiki: localization scheme. 1.6: default values for template parameters


 * While we have plenty of things to whinge about in the syntax, management etc, the ability to create partial page layouts and reuse them in thousands of articles with central maintenance has been a big boon.


 * The template argument default parameter feature was ultimately the most costly feature in terms of performance. It enabled the construction of a functional programming language implemented on top of PHP, starting with.

ParserFunctions

 * ParserFunctions never should've seen the light of day. Granted we were responding to the needs of the time, but if we did it over, we probably should've taken the time to solve the problem properly rather than putting PFuncs as a stopgap measure. This page has some interesting history on the subject (Also this and also older versions of this page)

Collaboration and wiki features

 * Automatic merging of edit conflicts when possible since 1.3 (August 2004)
 * Notifications: RSS, e-mail
 * RSS 2.0 & Atom 0.3 feeds for Recent Changes, New Pages since 1.3; upgraded to to Atom 1.0 in MW1.6
 * RSS for watchlist since 1.16


 * Other features related to collaboration, and how they influenced the architecture of MW ?


 * spam, logs, vandalism
 * 'Recentchanges Patrol' to mark new edits that haven't yet been viewed.: MW1.4
 * rel="nofollow" activated and applied to all external links to prevent linkspam since 1.4


 * 1.5: "Updates to user talk pages and watchlist entries can optionally send e-mail notifications."
 * 1.9: Undo
 * 1.10: cascading protection
 * 1.14: edit notices

Media

 * DB & filesystem storage layout for media files is very awkward, with a number of problems that hinder our ability to mirror, cache, and do major online maintenance.


 * Image resizing and thumbnail generation: since 1.2 (March 2004)


 * File repository code could've been done slightly differently. Ideally wikis	 should be able to upload *to* foreign repos, rather than just read from. Also, most of the code assumes a local filesystem or NFS which isn't very flexible--other backends like the database, Swift, etc. shouldn't be so hard to add.


 * Replication support


 * SVG rasterization support since 1.4
 * thumbnails for DJVU files and multipage DJVU display support since 1.8


 * InstantCommons & ForeignRepo
 * External images

Beyond MediaWiki

 * Several levels:


 * System administrator:


 * Wiki administrator:


 * Common.js, Common.css, and skin-specific counterparts
 * Gadgets (per wiki, but can be enabled/disabled by each user)
 * New developments on gadgets (central repo, UI improvements)


 * User:


 * User:Foo/vector.css, User:Foo/common.js

Side programs

 * interaction with other pieces of software (i.e. LaTeX, ImageMagick, etc.)
 * January 6, 2003: Inline TeX math formulas are now supported

API

 * w:User:Yurik/Query API
 * API:Wikimania 2006 API discussion
 * 1.9: "experimental machine API interface" enabled by default, read-only
 * The creation of api.php, and the addition of write actions (including edit) to it

Skins & extensions

 * hooks; http://thread.gmane.org/gmane.science.linguistics.wikipedia.technical/14227
 * Extension architecture. fairly flexible infrastructure which has helped us to make specialized code more modular, keeping the core software from expanding (too) much and making it easier for 3rd-party reusers to build custom stuff on top.


 * The skin system has been terrible since the beginning. It's damn near impossible for 3rd parties to write their own custom skins without reinventing the wheel.


 * Extension registration based on code execution at startup rather than cacheable data.
 * Limits abstraction and optimisation

"Optional modules" in 1.3: WikiHiero, Timeline


 * 1.4: "Skins system more modular: templates and CSS are now in /skins/ New skins can be dropped into this directory and used immediately."
 * "More extension hooks have been added. Authentication plugin hook."


 * Extensions are sometimes moved to Core


 * the extension registration systems hurt MediaWiki's performance.

Gadgets and site/user JS/CSS

 * hugely impactful, this has greatly increased the democratization of MediaWiki's software development. Individual users are empowered to add features for themselves; power users can share these with others both informally and through globally-configurable admin-controlled systems.


 * Using jQuery for javascript.

Future

 * Current RfCs, etc.
 * New parser (specification), wiki dom, Visual editor


 * MW is a tool that's used for very different purposes: media categorizations, mgmt of structured data, curation of content, patrolling of content, corporate CMS
 * we'll see an increased need for specialized uses
 * ex: MW used for Commons. Not built for that. Benefits for doing that, but not built for that. So needs improvements for that specific use case
 * other example: structured data
 * ways to make MW better : look at the workarounds (toolserver, bots, etc.)

Notes and references

 * Automatically-generated MediaWiki documentation:
 * Domas Mituzas, Wikipedia: site internals, configuration, code examples and management issues, MySQL Users conference, 2007. Full text available at