Extension:EventLogging/Data representations

From mediawiki.org

This page gives an overview over the various representations of EventLogging data available on the WMF production cluster, and expectations around those representations.


MySQL / MariaDB database on m2[edit]

This database is the best place to consume EventLogging data from.

Available as log database on m2 replicas, such as analytics-store.eqiad.wmnet.

Only validated events enter the database.

In case of bugs, this database is the only place that gets fixes like cleanup of historic data, or live fixes.


'all-events' JSON log files[edit]

Use this data source only to debug issues around ingestion into the m2 database.

Entries are JSON objects.

Only validated events get written.

In case of bugs, historic data does not get fixed.

Those files are available as:

  • stats1002:/a/eventlogging/archive/all-events.log-$DATE.gz
  • stats1003:/srv/eventlogging/archive/all-events.log-$DATE.gz
  • vanadium:/var/log/eventlogging/...


Raw client and server side log files[edit]

Use this data source only to debug issues around ingestion into the m2 database.

Entries are parameters to the event.gif's request. They are not decoded at all.

In case of bugs, historic data does not get fixed. Neither need hot-fixes reach those files.

Those files are available as:

  • stats1002:/a/eventlogging/archive/client-side-events.log-$DATE.gz
  • stats1002:/a/eventlogging/archive/server-side-events.log-$DATE.gz
  • stats1003:/srv/eventlogging/archive/client-side-events.log-$DATE.gz
  • stats1003:/srv/eventlogging/archive/server-side-events.log-$DATE.gz
  • vanadium:/var/log/eventlogging/...


Kafka[edit]

EventLogging data is no longer separately fed into Kafka since 2014-06-12.

The EventLogging data in Kafka had no users.

Turning it on again is tracked in task T68528.


MongoDB[edit]

EventLogging data is no longer fed into MongoDB since 2014-02-13.

The EventLogging data in MongoDB did not appear to get used.


ZMQ[edit]

ZMQ is available from vanadium.

In case of bugs, historic data cannot get fixed :-)

Data coming from the forwarders (ports 8421, 8422) is not validated and need not see hot-fixes.

Data coming from processors (port 8521, 8522) and multiplexer (port 8600) is validated.


Nginx pipeline[edit]

Since EventLogging data is typically coming in through https, and the EventLogging payload is encoded in the URL, EventLogging data is available in all the log targets from the SSL terminators.

In case of bugs, historic data does not get fixed. Neither need hot-fixes reach this pipeline.


Varnish pipeline[edit]

Since EventLogging data is extracted at the bits caches, and the EventLogging payload is encoded in the URL, EventLogging data is available in all log targets from the bits caches.

In case of bugs, historic data does not get fixed. Neither need hot-fixes reach this pipeline.