MediaWiki architecture document/Initial ideas

This page helps structure and plan a potential chapter about MediaWiki in the Architecture of Open Source Applications book.


 * Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another's mistakes rather than building on one another's successes.


 * This book's goal is to change that. In it, the authors of twenty-five open source applications explain how their software is structured, and why. What are each program's major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to this book provide unique insights into how they think.

Possible key points

 * MediaWiki history general overview
 * Critical architectural components/concepts
 * what is the overall workflow of a user request?
 * a "page"
 * key classes
 * parser
 * diagrams
 * index.php dispatches to MediaWiki class; SpecialPage class; page, revision, & user tables, Title & WikiPage classes
 * DB API, cache handling (Squid purge, memcached).
 * from hashar & Chad:
 * hashar: lastly, the top question is: how do you handle billions of page views per days?  (tip:  HEAVY CACHING)
 * hashar: that is more or less like the "handling a user request workflow" question
 * hashar the idea is to expose that everytime the workflow request data, there is a cache to speed it up
 * hashar: the first one being web caches (Squid, Varnish), then memcached for data/parser, then database query cache (not sure it is used)
 * Chad: (If you're not caching it in 3 places, you're not caching enough!)
 * hashar: that could also lead to "scaling a top site"
 * hashar so the plan could be:  what is the workflow of a user request?  How do you handle that workflow with billions of page views? (ie cache), how do you scale further ? (split queries accross multiples databases, partitions caches ...)


 * WebRequest / Sanitizer
 * user input sanitization. We have webRequest to grab parameters given by a user and make them safe. So the question could be: "how do you avoid code injection?"
 * Big decisions we made, snapshots of what the codebase/architecture was like at
 * "phase 1"
 * "phase 2"
 * "phase 3"
 * 1.5 <-- schema rewrite IIRC Hashar 08:31, 31 August 2011 (UTC)
 * Proposed Database Schema Changes/October 2004
 * couple slides from talk at CCC in 2004
 * 1.12 or so? <-- Tim's major parser preprocessor rewrite made big improvements to template performance
 * 1.15
 * 1.17 <-- ResourceLoader: beginning of strong JavaScript module APIs
 * installer. a quick note about the installer would be good. During the early days (ask Tim Starling), you add to run a shell script to install MW. Then it was made to just upload file and run the /config/ wizard. Lastly, it is now a complete wizard :) hashar: "so the question could be :  "how did you install MW? How did we improve it for external usage (aka non WMF) ? What is now (the new installer)."
 * Drawbacks, decisions we have rued
 * aspects of parser & templates
 * PHP?
 * lack of extension management?
 * any database stuff?
 * Our unique position
 * Deploy-then-release
 * Heightened security and performance needs
 * User freedom is a very high priority

Questions to answer
See MediaWiki architecture document/How and why/questions

Sources to review
Current
 * Manual:Cache
 * Manual:File cache
 * Disk-backed object cache
 * This probably won't be very interesting to anyone but us. ^demon 20:43, 29 September 2011 (UTC)
 * Manual:Code
 * Localisation
 * PHP configuration
 * Requests for comment
 * MediaWiki on Wikipedia

Historical content
 * w:Wikipedia:Milestones 2001 and following
 * w:Special:PrefixIndex/Wikipedia:PHP script
 * w:Wikipedia:MediaWiki (archive)
 * http://meta.wikimedia.org/w/index.php?title=MediaWiki&dir=prev&action=history
 * http://en.wikipedia.org/w/index.php?title=Wikipedia:Software_Phase_III&action=history
 * w:User:Conversion script
 * http://nostalgia.wikipedia.org
 * Release notes
 * m:Cache strategy
 * m:PHP caching and optimization
 * m:Wikimedia cache strategy evolution during 2003
 * m:Wikimedia servers
 * m:Separate database and web servers
 * m:Upgrade to MySQL 4.0
 * One-pass parser
 * m:Kernel upgrade
 * m:Compiler optimizations
 * m:Category:Category:MediaWiki archives
 * m:Move Text to Filesystem
 * m:Main causes of lag
 * PHPTal

(and associated talk pages)