Requests for comment/Structured data push notification support for recent changes

Structured data push notification support for recent changes. A long title for a goal that has been named and proposed in various forms. Following is a list of related buzz words:
 * Structured data
 * JSON and/or XML
 * Jabber / XMPP
 * WebSockets (HTML5 / AJAX)
 * PubSubHubbub
 * Push Notification Service
 * Socket.io: http://socket.io/

Specification

 * Recent changes packages should be easily readable by machine (JSON)
 * Should not be influenced by local wiki modifications (e.g. Interface messages)
 * Should have a way for a client to present a localized sentence describing the event (i.e. which i18n messages to use, which variables to replace with what)
 * This could probably be done by an API module that returns a map of log type/actions and message-keys. With the new logging framework as of 1.18, the order of variables is more logical, making this easier to implement.
 * Properties (depending on the implementation, some could be made optional / toggleable):
 * timestamp
 * user (name, id)
 * user rights (array)
 * user groups (array)
 * page (current pageid, fullpagename, is_redirect)
 * page_namespace (canonical, localized, id)
 * page_title
 * revision size (bytes before, bytes after, bytes diff: "-100" or "+12")
 * revision ids (revision oldid, revision diffid)
 * revision comment (raw comment, parsed comment)
 * url to diff (page edit), page (log event) or oldid (page creation)
 * rc id
 * rc type (new, edit, log)
 * rc flags (rc_minor: m, rc_bot: b, rc_patrolled: !) ( should these be in revision table instead? )
 * tags (revision tags)
 * log specific stuff
 * Push order must match order in which events occurred
 * Push order must match order in which events occurred

Current: UDP / netcat / ircd

 * MediaWiki emits a UDP packet to a specified server (see $wgRC2UDPAddress etc.)
 * This packet contains 1 single localized string. Similar to the text in the list-item on Special:RecentChanges, though flattened to not have HTML.
 * The UDP receiver (netcat) pipes the message as-is to a channel on a known IRC server (ircd running in the background)
 * Clients join the channel through an irc socket

Problems

 * No machine-readable structure (1 English string, instead of key/value parse)
 * Hard to parse, unstable/variable output:
 * Color-coded IRC stuff
 * Requires periodic downloading of interface messages from the target wiki (which can change at any time either due to software updates or when a user on the local wiki changes the message in the MediaWiki-namespace)
 * Messages can cut-off (because UDP has a limited length). Right now this usually doesn't break the notification because the last part of the string is the "edit summary", and there is no close-tag after it, so the receiver just reads it as if the edit summary was shorter. It only gets problematic if it gets cut-off before the edit summary starts, because then the message no longer matches the expected pattern.
 * Not flexible / extendable
 * UDP is (apparently?) unreliable in that packets can go missing or arrive in the wrong order.
 * Is this in general due to how UDP works, or because of the way we use it?
 * Can we fix it or do we have to use UDP this way in order to be performant enough (since it is emitted from within the web request response).

UDP / NodeJS / socket-io

 * MediaWiki emits a UDP packet
 * This packet contains a JSON string with stable (localization independent) keys (it would look much like the JSON response of api.php action=recentchanges)
 * We'll have to figure out a way to deal with cut-off messages, or rather make it so that MediaWiki will not cut off the packets and instead spread it over multiple packets.
 * Idea 1: Spread it over multiple UDP-packets. If we just cut it randomly and send another packet with the rest it brings problems:
 * UDP is not always in the right order
 * Users can't process invalid JSON, so not really useful.
 * Idea 2: Use D-JSON to send the JSON string in multiple UDP-packets (when it doesn't fit into one). D-JSON has a protocol for connecting begin, middle and end parts and re-assembling them. So if needed there are multiple UDP-packets for one event, but (once they all arrived) there will only be one packet send to the subscribers. Best of both?
 * The UDP receiver (NodeJS) forwards the JSON string to a topic in the (socket-io powered) socket (running in the same Node process)
 * Clients subscribe to topic(s) through the socket, and parse the JSON.