Extension:EventLogging/Data flow notes

Event as it is generated by developer-provided code executing in the user's browser:

{wasClicked: false}

JavaScript EventLogging code validates that record, annotates it with 'EventCapsule' fields, and URL-encodes it so that it can be embedded in a URL and sent to the bits servers:

http://bits.wikimedia.org/event.gif?%7B%22event%22%3A%7B%22wasClicked%22%3Afalse%7D%2C%22clientValidated%22%3Atrue%2C%22revision%22%3A5329872%2C%22schema%22%3A%22BannerImpression%22%2C%22webHost%22%3A%22127.0.0.1%22%2C%22wiki%22%3A%22enwiki%22%7D;

Varnish running on bits transmits a log entry representing the occasion of the user sending this data to the server. Using special configuration. The log entry is transmitted to Vanadium using udp2log and it looks like this:

?%7B%22event%22%3A%7B%22wasClicked%22%3Afalse%7D%2C%22clientValidated%22%3Atrue%2C%22revision%22%3A5329872%2C%22schema%22%3A%22BannerImpression%22%2C%22webHost%22%3A%22127.0.0.1%22%2C%22wiki%22%3A%22enwiki%22%7D; niobium.wikimedia.org 12363 2013-03-18T19:32:47 216.38.130.161

The data now includes: a) The event data (wasClicked: false) as encoded by user's browser b) The event capsule data (webHost, clientValidated, wiki, etc.) as encoded by user's browser c) The extra annotations added by the bits server

EventLogging code running on vanadium knows about the structure of the data and uses that knowledge to decode the ugly '%7B..' line above into something structured and readable:

{ "wiki":"enwiki", "uuid":"533cde05d407554888e871f65ce60fec", "webHost":"127.0.0.1", "timestamp":1363635487, "clientValidated":true, "recvFrom":"niobium.eqiad.wmnet", "seqId":664496, "clientIp":"e6553bbd10a51a2c6270147ea8617a5080863ac6", "schema":"BannerImpression", "event":{ "wasClicked":false }, "revision":5329872 }

This decoded, validated record is published for any and all interested clients running on the cluster using ZeroMQ.

Currently subscribed to this stream of decoded, validated events are: 1) A client that writes all events into a MySQL database (db1047)

2) A client that writes all events into a MongoDB database 3) Mobile client that is generating real-time metrics about mobile app usage 4) Hadoop client that writes the data into Kraken / HDFS 5) .... your client here! ...

END OF UNIFIED EVENT PROCESSING DATA PIPELINE -- THINGS GO IN DIFFERENT DIRECTIONS FROM HERE

Let's focus on what the 'json2sql' client which writes into MySQL does: When it gets an event, it checks if a table for it already exists If the table does not exist, it retrieves the schema from metawiki and uses it to construct a SQL statement instructing the database to create a table that has appropriate columns for the data:

CREATE TABLE `BannerImpression_5329872` (	id INTEGER NOT NULL AUTO_INCREMENT,	uuid VARCHAR(255),	`clientIp` VARCHAR(255),	`clientValidated` BOOL,	`isTruncated` BOOL,	timestamp VARCHAR(14),	`webHost` VARCHAR(255),	wiki VARCHAR(255),	`event_wasClicked` BOOL,	PRIMARY KEY (id),	CHECK (`clientValidated` IN (0, 1)),	CHECK (`isTruncated` IN (0, 1)),	CHECK (`event_wasClicked` IN (0, 1)) )ENGINE=InnoDB CHARSET=utf8

With the table newly created (or if the table already existed), it issues a SQL statement instructing the database to insert the event as a new record in the table:

INSERT INTO `BannerImpression_5329872` (uuid, `clientIp`, `clientValidated`, timestamp, `webHost`, wiki, `event_wasClicked`) VALUES ('fb378cdda3fe58799c334f9565365246', 'e6553bbd10a51a2c6270147ea8617a5080863ac6', 1, '20130318192909', '127.0.0.1', 'enwiki', 0)