< Incremental dumps | File format(Redirected from User:Svick/Incremental dumps/File format/XML output)
The XML output from incremental dumps should be exactly the same as the current XML dumps, with the following exceptions. Any exception not listed here is most likely a bug and should be reported.
The exceptions (from most serious to least):
- Revisions of a page are ordered by their id in history dumps. XML dumps don't actually have any order specified.
<restrictions>tag is omitted.
page_restrictionsfield in the database is not used anymore, so the
<restrictions>tag doesn't provide accurate information about the restrictions of a page.
idattribute is missing for the
<text>tag in stub dumps.
This is currently used in the dump infrastructure for creating pages dumps, but is not useful to users.
- Comments that are 255 bytes long and end in an invalid UTF-8 sequence are shortened.
In the current dumps, the invalid sequence is replaced with U+FFFD REPLACEMENT CHARACTER. In the XML produced by incremental dumps, the invalid sequence is removed.
This applies only to the last character of full-length comments. In other cases, incremental dumps use U+FFFD REPLACEMENT CHARACTER, just like current dumps.
- Anonymous IPv6 contributors whose address is not in full form (i.e. it contains
::) will be normalized to full form. This should be very rare, the addresses should almost always be in full form already.
minortag is consistently written as
<minor />(with space).
In current dumps, this is inconsistent: pages dumps use
<minor />, while stub dumps use
This could affect users who read the dumps using regular expressions or similar methods, it doesn't make any difference for those who use XML parsers.