Parsoid/Adding instrumentation how-to

From mediawiki.org

Parsoid's performance statistics, including time elapsed before response, size of inputs/outputs, number of requests, etc, is currently collected using Graphite, Node-txstatsd, StatsD and Grafana.

This guide will serve as a brief walkthrough of adding performance instrumentation to Parsoid.

Notes on the Libraries used[edit]

Graphite[edit]

Graphite is a real-time graphing system written in Python. It is made up of three components, a Twisted daemon that listens for numeric time-series data (carbon), a database library for storing that time-series data (whisper), and a Django app for rendering graphs (Graphite-webapp).

carbon, Graphite's backend storage daemon, listens for time-series data over several protocols including UDP and writes these metrics to disk using whisper. Once metrics have been received by carbon they can be rendered into graphs from within the Graphite web-app.

StatsD and Node-txstatsd[edit]

StatsD is a simple Node.js network daemon developed by Etsy to aggregate and relay application metrics to virtually any monitoring system. It listens for metrics sent over UDP, aggregates metrics received within a certain interval and flushes them to the specified backend monitoring system.

StatsD allows for capturing metrics in several forms including Gauges (to measure the value of a particular thing at a particular time), Counters (to track how many times an event occurred per second, averaged over one minute), Timers (to track how long an event took to complete), and Increments/Decrements.

Note that Timers can also be used to aggregate non timing-related measurements.

Node-txstatsd is a WMF port of Node-statsd, a Node.js client for Etsy's StatsD server. It collects metrics from within Node.js applications and sends them to the StatsD server which in turns forwards the metrics to Graphite's carbon over UDP. The Node-txstatsd implementation replaces the StatsD Increment/Decrement method with a Count method.

Grafana[edit]

Grafana is a feature-rich open source front-end for visualizing time-series data. Grafana can be configured to use Graphite as its metric storage backend and ElasticSearch for storing its dashboards.

Although metrics from Graphite can be rendered on various front-ends including gdash and graphite.wikimedia.org, Grafana's richer graphical and dash-boarding options won out as Parsoid's choice of front-end.

Wikimedia's Grafana installation is located at grafana.wikimedia.org. In addition to reading the docs, we can play around with Grafana features, see example graphs and new updates, here.

Once our metrics have been captured by Graphite, we can view and render the metrics into graphs/dashboards by visiting either the Wikimedia Grafana install site mentioned above or betalabs, searching under the parsoid.* namespace and creating the relevant graphs.

Before adding Instrumentation[edit]

Timers[edit]

Node-txstatsd does not include a Timer function to obtain timing information. Parsoid currently uses Javascript's Date.now() method to obtain timing data. Date.now() returns the milliseconds elapsed since 1 January 1970 00:00:00 UTC, as a number.

Namespacing[edit]

In addition to deciding on parts of the codebase to instrument, it is important to establish a proper namespace and namespace hierarchy for each metric.

Metrics in Graphite are defined by dot separated camelCase namespaces, for example html2wt.selser.domDiff. These namespaces serve as buckets where our metrics are collected and aggregated. In addition to each metric's unique namespacing, all Parsoid metrics sent to StatsD/Graphite, are appended with the parsoid.prefix.

For further information, this article provides a comprehensive overview of metric namespacing.

Settings and Configs[edit]

Parsoid includes a Node-txstatsd wrapper which can be found here. Our performance instrumentation code will make calls to Util.StatsD's timing and count methods.

StatsD settings (Instrumentation toggle, hostname and port) are located Here, for beta labs and production are set here and here respectively. The Node-txstatsd instance is then instantiated here.

Graphite/carbon does not require us to declare namespaces or special configurations prior to sending each new metric. Instead, on receiving a metric, carbon checks to see if it has configurations for that namespace and if none, creates a new configuration. All metrics that carbon receives share the same data retention and flush interval schemas, which can be found here.

Once received, Graphite aggregates statistics for each metric including 99%tile, 999%tile, count, max, mean, min, rate and standard deviation.

Adding Instrumentation[edit]

Determine the relevant code-path[edit]

For an example, if we want Parsoid to report on the time it took to serialize html to wikitext (html2wt) we would add instrumentation to the html2wt function located in the api/routes.js file.

Instrumentation might not be as straightforward as adding code to a single file, so it might take grepping around to find all relevant places to add instrumentation. If less familiar with the Parsoid codebase, chatting with the Parsoid team at #mediawiki-parsoid can provide guidance as to the relevant parts of the codebase to instrument.

Add Instrumentation[edit]

Along with the elapsed time from html serialization to wikitext (html2wt), we are interested in grabbing the input and output sizes associated with each html2wt request.

Here's what our code might look like;

var html2wt = function( req, res, html ) {
    var env = res.local('env');
    // Access the Node-txstatsd instance via the parser environment variable 'env'.
    var timer = env.conf.parsoid.performanceTimer, 
                startTimers;

    if ( timer ){
        // To avoid race conditions, use the Map object to set the start time 
        // then send the elapsed time to Node-txstatsd.
        startTimers = new Map();
        startTimers.set( 'html2wt.total', Date.now() );
    }

    //.....

    html = html.replace(/\r/g, '');

    if (timer){
        timer.timing( 'html2wt.size.input', '', html.length );
    } 
     
    //.....

    if ( timer ) {
        timer.timing( 'html2wt.total', '', Date.now() - startTimers.get( 'html2wt.total' ));
        timer.timing( 'html2wt.size.output', '', output.length );
    }

    //.....
    // end of html2wt  function.
};

Once our code is merged, metrics can then be viewed in betalabs, and afterwards, on grafana.wikimedia.org.

That's it!

See also[edit]