Incremental dumps/File format/XML output

The XML output from incremental dumps should be exactly the same as the current XML dumps, with the following exceptions. Any exception not listed here is most likely a bug and should be reported.

The exceptions (from most serious to least):


 * 1) Revisions of a page are ordered by their id in history dumps. XML dumps don't actually have any order specified.
 * 2) The   tag is omitted. The   field in the database is not used anymore, so the   tag doesn't provide accurate information about the restrictions of a page.
 * 3) The   attribute is missing for the   tag in stub dumps. This is currently used in the dump infrastructure for creating pages dumps, but is not useful to users.
 * 4) Comments that are 255 bytes long and end in an invalid UTF-8 sequence are shortened. In the current dumps, the invalid sequence is replaced with U+FFFD REPLACEMENT CHARACTER. In the XML produced by incremental dumps, the invalid sequence is removed. This applies only to the last character of full-length comments. In other cases, incremental dumps use U+FFFD REPLACEMENT CHARACTER, just like current dumps.
 * 5) Anonymous IPv6 contributors whose address is not in full form (i.e. it contains  ) will be normalized to full form. This should be very rare, the addresses should almost always be in full form already.
 * 6) The   tag is consistently written as   (with space). In current dumps, this is inconsistent: pages dumps use , while stub dumps use  . This could affect users who read the dumps using regular expressions or similar methods, it doesn't make any difference for those who use XML parsers.