Parsoid/Adding instrumentation how-to

Performance information about Parsoid including time elapsed before response, size of input/output, number of requests, etc, is currently collected using Graphite, txstatsd (a WMF port of node-statsd, itself a node.js client for Etsy's StatsD server), and Grafana.

This guide will serve as a brief walkthrough of the process of adding performance instrumentation to Parsoid.

Graphite
Graphite is a real-time graphing system written in Python. It is made up of three components, a Twisted daemon that listens for numeric time-series data (carbon), a database library for storing that time-series data (whisper), and a Django app for rendering graphs (graphite-webapp). Graphite is written in Python and is pretty agnostic about

carbon, Graphite's backend storage daemon, listens for time-series data over several protocols including UDP and writes them to disk using whisper. Once metrics have been received by carbon, they can then be visualized using the graphite webapp or other front-end visualization services.

StatsD and node-txstatsd
StatsD is a simple node.js network daemon developed hy Etsy to aggregate and relay application metrics to virtually any monitoring system. It listens for metrics sent over UDP, aggregates metrics received within a certian interval and then flushes them to the specified backend monitoring system.

StatsD allows for capturing metrics in several forms including Gauges (to measure the value of a particular thing at a particular time), Counters (to track how many times an event occurred per second, averaged over one minute), Timers (to track how long an event took to complete), and Increment/Decrement.

Note that Timers can be used to aggregate statistics like output size, and are not restricted to only timing-related measurements.

node-txstatsd is a WMF port of node-statsd, a node.js client for Etsy's StatsD server. It collects metrics from within node.js applications and sends them to the StatsD server which in turns forwards the metrics to Graphite's carbon over UDP. This implementation removes the increment/decrement and sets (if operating in txstatsd mode) metric types.

Grafana
Grafana is a feature-rich open source front-end for visualizing time-series data. Grafana can be configured to use Graphite as a metric storage backend and ElasticSearch for storing its dashboards. Although metrics from Graphite can be rendered on various front-ends including gdash, graphite.wikimedia.org), Grafana offers richer graphical and dashboarding options than other front ends.

Wikimedia's Grafana installation is located at grafana.wikimedia.org. In addition to reading the docs, you can play around with Grafana features, see example graphs and new updates Here. Once your metrics are captured by Graphite, you can then view and render the metrics by visiting either the Wikimedia Grafan install site mentioned above, or if you included beta labs in the config settings, at betalabs

Before adding Instrumentation
In addition to deciding on parts of the codebase to instrument, it is important to establish a proper namespace for each metric. Metrics are defined by dot separated namespaces, for example Application.Server.Host. These namespaces serve as buckets where your metrics are collected and aggregated. This article provides a comprehensive overview of metric namespacing.