Extension:EventLogging/Data model

Data model for event tracking. We're using Redis.

Tagging using Bitmaps
It'll be fast and space-efficient to use bitmaps to implement tagging. enwiki has ~17m users.

S's notes
vanadium will run redis, a distributed key-value store. redis has features beyond memcache :
 * can nest a set of key-value pairs inside a key, e.g. 'ori' can have user_id => xy, lastview -> abc
 * has sorted sets
 * the sorted set is sorted by a score, e.g. timestamp, or userid
 * say an event comes in, first goes to the sorted set for the event_id
 * this is what's useful for rev tagging, which is annotating an edit with additional information (rather than trying to add columns to the page table)
 * redis server can override timestamp if it's too out of sync
 * has its own pub-sub, so a connector can watch for certain kinds of events coming in and hook them back into mySQL. Or can do batch import.

So vanadium's redis stores all the key-value info from events. For analysis, we could import data sets from vanadium into another redis instance on another system, or import into a conventional SQL database.

This redis part might be replaced by Kraken project from Analytics.

Hadoop might be a more efficient solution for much larger datasets, but redis is very performant.

Questions

 * What queries will be easy to write/run?
 * What queries will be hard to write/run?
 * Capacity
 * Performance
 * Chance of collisions
 * Persistence
 * Recovery

Reference
Useful articles
 * en:Entity–relationship model
 * en:Relational model
 * en:Data model
 * Ori's Redis link dump