Extension:EventLogging/Todos

From mediawiki.org

If you're interested in diving in, get in touch with Ori Livneh.

Schemas[edit]

  • Yes Done Make sure all properties have helpful "description" fields.

Server-side schema handling[edit]

  • Yes Done Write Python abstraction for grabbing schemas from metawiki.
  • Yes Done Validate incoming events against declared schema.
  • Yes Done Generate SQL schema from JSON Schema (WIP: see 'glass' project in Gerrit).
  • Yes Done Automatically INSERT TABLE when new schema encountered. (But carefully consider security and scalability implications.)

Monitoring[edit]

  • Yes Done Watch for truncated events (tell-tale sign: missing trailing ';' in query string).
  • Yes Done Keep sequence ID counters (one per host) and watch for gaps, which indicate packet loss.
  • Keep tabs on rate of incoming invalid events and emit alerts as appropriate.
  • Emit alerts as bona fide, subscribable events.
  • Write gmond plugin to send stats to Ganglia.
  • Create new $wgDebugLogGroup that writes to vanadium; use it to log EventLogging alerts from Apaches.

Storage / archiving[edit]

  • Yes Done Set up automatic archiving and log rotation of raw event log data dump.
  • Figure out a sane MySQL permissions scheme.
  • Make sure Hadoop is getting all events, not just esams.
  • Yes Done Make sure MySQL insert failures are handled gracefully.
  • Failover & replication plans.
  • If required: write up specs for add'l machine.

Client-side[edit]

  • Migrate remaining ClickTracking clients (see Trello card for list).
  • Reliably generate the anonymous user cookie & token (currently done by E3Experiment's openTask.js with generateId() function copy-pasted from mediawiki.user.js).
  • always supply this as _token, like _rv and _id?
  • Provide default implementations for common fields.
  • If we continue with a userbuckets cookie to determine client-side behavior, then take over code from ClickTracking's ext.UserBuckets.js (and mediawiki.user.js) and fix bugs.
  • Handle excessively long query strings, relevant because varnish only logs the first 255 characters of the query string!

Yes Done As and when Mobile team begins to use EventLogging features, deploy the extension to wikis beyond enwiki.

PHP-side[edit]

  • Assuming we continue to log events on the server (currently account_create events), reimplement an appropriate subset of client-side logging in PHP.

Misc[edit]

  • Yes Done Puppetize.
  • More unit tests.
  • Documentation.
  • Yes Done DevServer.php should validate schema (WIP, staged in Ori's repo)
  • Improve dev tooling on Metawiki. Write a a small JavaScript module for Schema: pages that:
  • generates the $wgResourceLoaderModules declaration, so one can simply copy/paste schema module setup code.
  • provides a textarea for pasting a JSON object and checking if it validates against the schema.
  • Yes Done Test varnish patch referenced in RT 4094. Let Mark know how it goes.
  • Deploy CodeEditor to Meta (see Gerrit change 36343).
  • Override JSON validation error messages (see includes/JsonSchema.i18n.php) on Meta with nicer template.
  • Read the JSON Schema spec in full and do a "conceptual lint": figure out what we're doing wrong or not utilizing.