Requests for comment/Structured logging

This is a request for comment about adding Structured logging to MediaWiki. For the purposes of this discussion "logging" refers to  and it's related family of functions and not audit logs stored in the database by Special:Log or similar systems. Ie it is appealing to wiki administrator and developers, not end users.

Problem
Logging in MediaWiki (and WMF in general?) is optimized for human consumption. This works well for local development and testing and can scale to management of a small to mid size wiki depending on the number of eyes applied to the logs. The operational support personnel can dig in it using common processing tools such as grep, cut, sed, awk or custom scripts, but it is not ideal.

Unfortunately what works well for a single developer does not work as well for analyzing the log output of a large production site. Wikimedia cluster generates millions of lines per day.

What we would need:
 * tools for wider audience
 * aggregation
 * de-duplication
 * cross system correlation
 * alerting
 * reporting

This is not a wholly new idea. Let's look at what's out there and see if we can find a solution or at least borrow the best bits.

Current logging

 * wfDebug( $text, $logonly = false )
 * Logs developer provided free-form text + optional global prefix string
 * Possibly has time-elapsed since start of request and real memory usage inserted between prefix and message
 * Delegates to


 * wfDebugMem( $exact = false )
 * Uses  to log "Memory usage: N (kilo)?bytes"


 * wfDebugLog( $logGroup, $text, $public = true)
 * Logs either to a custom log sink defined in  or via
 * Default
 * Prepends "[$logGroup] " to message
 * Custom sink
 * May log only a fraction of occurrences via  sampling
 * Prepends  to message
 * Delegates to  to actually write to sink


 * wfLogDBError( $text )
 * Enabled/disabled with  sink location
 * Logs "$date\t$host\t$wiki\$text" via  to    sink
 * Date format is  with possible custom timezone specified by


 * wfErrorLog( $text, $file )
 * Writes  to either a local file or a UDP packet depending on the value of
 * UDP
 * If  ends with a string following the host name/IP it will be used as a prefix to
 * The final message with optional prefix added will be trimmed to 65507 bytes and a trailing newline may be added
 * FILE
 * will be appended to file unless the resulting file size would be >= 0x7fffffff bytes (~2G)


 * wfLogProfilingData
 * Delegates to  using   sink
 * Creates a tab delimited log message including timestamp, elapsed request time, requesting IPs, and request URL followed by a newline and the profiler output.
 * Date is from

Serialization Format
Rather than dive down a rabbit hole of trying to find a universal spec for log file formats let's just keep things simple. PHP loves dictionaries (well they call them arrays but whatever; key=value collections) and has a pretty fast json formatter. So the simplest thing that will work reasonably well would be to keep log events internally as PHP arrays and serialize them as json objects. This will be relatively easy to recreate on other internally developed applications as well with the possible exception of apps written in low level languages such as C that don't have ready made key=value data structures.

Data collected
Here's a list of the data points that we should definitely have:
 * timestamp
 * Local system time that event occurred either as UNIX epoch timestamp or ISO 8601 formatted string


 * host
 * FQDN of system where event occurred


 * source
 * Name of application generating events; correlates to APP-NAME of RFC 5424


 * pid
 * Unix process id, thread id, thread name or other process identifier


 * severity
 * Message severity (RFC 5424 levels)


 * channel
 * Log channel. Often the function/module/class creating message (similar to  groups)


 * message
 * Message body

Additionally we suggest adding a semi-structured "context" component to logs. This would be a collection of key=value pairs that the developers determine to be useful for debugging. There should be two different methods available to add such data. The first is as an optional argument to the logging method itself and the second is a global collection patterned after the Log4J Mapped Diagnostic Context (MDC).

The local collection is useful for obvious reasons such as attaching class/method state data to the log output and deferring stringification of resources in the event that runtime configuration is ignoring messages of the provided level.


 * file
 * Source file triggering message


 * line
 * Source line triggering message


 * errcode
 * Numeric or string identifier for the error


 * exception
 * Live exception object to be stringified by the log event emitter


 * args
 * key=value map of method arguments

The global collection is very useful for attaching global application state data to all log messages that may be emitted. Examples of data that could be included:
 * vhost
 * Apache vhost processing request


 * ip
 * Requesting ip address


 * user
 * Authenticated user identity


 * req
 * Request ID; UUID or similar token that can be used to correlate all log messages connected to a given request

API
FIXME: Propose a PHP API for logging and a migration plan for existing wfErrorLog calls


 * The PSR-3 logging interface standard would be a reasonable approach if we decide to move away from global functions as the API. It may be reasonable to adopt this standard interface and alter the existing global functions to proxy to the new interface.
 * One possible implementation would be to use an existing PHP logging framework such as monolog. This could be exposed directly to the MediaWiki codebase or hidden behind a custom facade.