Analytics/Reports/Clients without JavaScript

Goal
The goal of this project is to get a rough estimation of how big of percentage of our page requests come from browsers with partial (or none) Javascript support. The methodology used will provide a rough - but not a detailed - estimate. The idea is to be able to know if the number of clients is smaller than 10%, 1% or 0.1%.

Metodology
Every request to all wikimedia projects is stored in hadoop for about 30 days. Requests are segregated into "text" (requests to desktop websites), "mobile" (requests to apps and mobile website) and "bits" (requests to our static domain, from which javascript and css are served). More so, for every request to "text" and "mobile" we store whether the request is a pageview or not according to the new pageview definition:.

Our method for the rough estimation goes as follows:

At the end of this step we know for example that "1.2% of our pageviews  come from IE10".
 * 1. For a timeperiod T get all requests to text and mobile.
 * 2. Calculate browser percentages for all those requests (Let's call this 'set#1').
 * 3. For timeperiod T get all requests of javascript files to bits.
 * 4. Calculate browser percentages on javascript bits data (Let's call this 'set#2').
 * 5. Compare set#2 with set#1, set#1 should be a super-set of set#2. Browsers that are on set#1 but do not appear on set#2 represent the set of browsers for which javascript is not enabled

Caveats
This methodology will not catch users that navigate with a modern browser (say Chrome 39) but with javascript turned of. To detect those a very specific experiment is needed. We need to choose a precision to report our data, as below a certain percentage browser numbers are imprecise.

Since we consider some bots requests pageviews our report will include bot requests and count those as clients without javascript.

We do not expect browser percentages on text and mobile to match those on bits as requests for static files are subjected to different cache ratios than requests for main content that, in the case of our projects, is never cached on the client side.