User:JDrewniak (WMF)/notes/Using Turnilo to debug WebClientError events

Accessing the WebRequest table in Turnilo
Turnilo gives us GUI access to the Webrequest logs, which include logs of literally all traffic that hits our servers (for the past 90 days max, sampled at a rate of 1/128).

To start working with the webRequest data in Turnilo, access it from the sidebar.

Filter Webrequests to only WebClientError traffic
WebClientError traffic goes to a URL like this: https://en.m.wikipedia.org/beacon/statsv?MediaWiki.minerva.WebClientError.loggedin=1c Or for anon users: https://en.m.wikipedia.org/beacon/statsv?MediaWiki.minerva.WebClientError.anon=1c Where the value of  is a counter with the number of errors.

All traffic from all wikis goes to the statsv endpoint of, so let’s start by filtering by Uri Path: There are many analytics schemas that go to this path, so let's pick out ours. Add another filter, this time a Uri Query (regex) matching the query parameter of our URL. Now we can add a split on Time (Day), and filter by last 90 days (our max length) to see the fancy graph of WebClientErrors!

Link to the graph above

Warning: at this point, you might start hitting some 502 or 503 errors. Try a few times and it might work. Filtering all this data is no small feat!

Filtering WebClientErrors by wiki
If you’re looking for the most recent errors, select a group 0 or group 1 wiki like Hebrew or Catalan (since those are deployed first) and filter by Uri Host: Most wikis will trail Catalan &#x26; Hebrew by exactly 1 day because of the deployment cycle.

Filtering WebClientErrors by page
The webRequest table contains a “referer” field. In this context, referer is the page the request was sent from, i.e. the current page (whereas in browserland referer typically means &#x22;the previous page”).

The referer field lets us see which page is producing the most errors. This should correlate to the total popularity of a page.

Filtering WebClientErrors by user-agent
Just like adding a split by referer, we can add a split by User Agent as well.

Probably also correlates to the popularity of each user agent.