|This page is an archive. Do not edit the contents of this page. Please direct any additional comments to the current talk page.|
|This page is obsolete. It is kept for historical interest only. It may document extensions or features that are obsolete and/or no longer supported. Do not rely on the information here being up-to-date.|
EventLogging is an extension to Mediawiki. There is a useful guide here: Extension:EventLogging/Guide.
Draft features/stories as of 2014-05-21. This is an attempt to start articulating the work that needs to be done from a product development perspective.
|Monitoring system fires alert when event volume is high||5 points; Tasked on etherpad
|Product manager specifies sampling rate for his EL schema||https://bugzilla.wikimedia.org/show_bug.cgi?id=65500|
|Product manager specifies schema ownership||We need to know who owns a schema so we can fire alerts to them if the volume exceeds what db can handle.
|Automated process handles old data||This is a large task that needs to be better defined and then broken down. Some features related to this are:
|User has old data for ServerSideAccountCreation||scrub or aggregate it so it is available beyond 90 days. It is used by others|
|User has old data for NavigationTiming||scrub or aggregate it so it is available beyond 90 days. It is used by others|
|Product manager extends persistence of events||suppose we're two months into a data collection job. The researcher realizes he needs the data for 180 days. Provide a mechanism to extend the persistence of a set of events. At the very least have a mechanism to aggregate or anonymize the data so the researcher can have a longer time period for his data.|
|User suppresses EventLogging for his actions||Define a mechanism for user to opt-out of the EventLogging process.|
EventLogging is a widely used library in the Foundation. The Analytics team and Ori have discussed the details of the Analytics team taking over responsibility for this Extension. This document is that proposal.
Formalize agreement with Ori, Ops Talk to RobLa/Platform Figure out what ask to make of Ori in terms of regular commitment Discuss this document
- Send out support email
- Target handover start
4/24/16 (Needs agreement from Analytics, Ops, Platform teams)
Probably the most common EventLogging support task is schema review. We'd like to make this a revolving responsibility among the users
- Create EventLogging review group in Gerrit
- Ask people for consent before adding them
- Announce / request social convention of adding people to the review group once they've successfully instrumented something
We'd also like users to take responsibility for their own data generated by EventLogging. The Analytics team isn't staffed to follow up on invalid data from a single schema but we will invest in automated tools and notifications.
- Announce the generating invalid data is a software bug and you are expected to fix it in a prompt fashion.
- Invite people to subscribe to eventlogging-alert
- Provide information about notification and debugging tools
- Bugs reported in Bugzilla should be acknowledged and resolved.
- Create graphite script that shows valid and invalid events for each schema, thereby satisfying the requirement that eventlogging be in principle self-serving
- Add alert for number of events
- A daily report should go out reporting the number of valid and invalid events logged, broken down by schema.
- Operational support by analytics team: Event_logging/OperationalSupport
- Data recovery plan - Ori thinks we shouldn't hack the EventCapsule or validation model again
- Dario thinks we will have some high-priority data recovery needs related to DB outages
- Create and respond to alerts
- Once a month, the backup process (vanadium -> stat1001 -> tridge) should get a quick lookover to ensure that it is functioning.
- Once every six months, a drill should be conducted to test system failover and recovery procedures.
- Sean Pringle is supporting db replication
- Failover for Vanadium