Extension:EventLogging/Todos

If you're interested in diving in, get in touch with Ori Livneh.

Schemas

 * ✅ Make sure all properties have helpful "description" fields.

Server-side schema handling

 * ✅ Write Python abstraction for grabbing schemas from metawiki.
 * ✅ Validate incoming events against declared schema.
 * ✅ Generate SQL schema from JSON Schema (WIP: see 'glass' project in Gerrit).
 * ✅ Automatically INSERT TABLE when new schema encountered. (But carefully consider security and scalability implications.)

Monitoring

 * ✅ Watch for truncated events (tell-tale sign: missing trailing ';' in query string).
 * ✅ Keep sequence ID counters (one per host) and watch for gaps, which indicate packet loss.
 * Keep tabs on rate of incoming invalid events and emit alerts as appropriate.
 * Emit alerts as bona fide, subscribable events.
 * Write gmond plugin to send stats to Ganglia.
 * Create new $wgDebugLogGroup that writes to vanadium; use it to log EventLogging alerts from Apaches.

Storage / archiving

 * ✅ Set up automatic archiving and log rotation of raw event log data dump.
 * Figure out a sane MySQL permissions scheme.
 * Make sure Hadoop is getting all events, not just esams.
 * ✅ Make sure MySQL insert failures are handled gracefully.
 * Failover & replication plans.
 * If required: write up specs for add'l machine.

Client-side
✅ As and when Mobile team begins to use EventLogging features, deploy the extension to wikis beyond enwiki.
 * Migrate remaining ClickTracking clients (see Trello card for list).
 * Reliably generate the anonymous user cookie & token (currently done by E3Experiment's openTask.js with generateId function copy-pasted from mediawiki.user.js).
 * always supply this as _token, like _rv and _id?
 * Provide default implementations for common fields.
 * If we continue with a userbuckets cookie to determine client-side behavior, then take over code from ClickTracking's ext.UserBuckets.js (and mediawiki.user.js) and fix bugs.
 * Handle excessively long query strings, relevant because varnish only logs the first 255 characters of the query string!

PHP-side

 * Assuming we continue to log events on the server (currently account_create events), reimplement an appropriate subset of client-side logging in PHP.

Misc

 * ✅ Puppetize.
 * More unit tests.
 * Documentation.
 * ✅ DevServer.php should validate schema (WIP, staged in Ori's repo)
 * Improve dev tooling on Metawiki. Write a a small JavaScript module for Schema: pages that:
 * generates the $wgResourceLoaderModules declaration, so one can simply copy/paste schema module setup code.
 * provides a textarea for pasting a JSON object and checking if it validates against the schema.


 * ✅ Test varnish patch referenced in RT 4094. Let Mark know how it goes.
 * Deploy CodeEditor to Meta (see Gerrit change 36343).
 * Override JSON validation error messages (see ) on Meta with nicer template.
 * Read the JSON Schema spec in full and do a "conceptual lint": figure out what we're doing wrong or not utilizing.