Analytics/Reports/Clients without JavaScript

Goal
The goal of this project is to get a rough estimation of how big of a percentage of our page requests come from browsers with partial (or none) Javascript support. The methodology used will provide a rough - but not a detailed - estimate. The idea is to be able to know if the number of clients is smaller than 10%, 1% or 0.1%.

Metodology
Every request to all wikimedia projects is stored in hadoop for about 30 days. Requests are segregated into "text" (requests to desktop websites), "mobile" (requests to apps and mobile website) and "bits" (requests to our static domain, from which javascript and css are served). More so, for every request to "text" and "mobile" we store whether the request is a pageview or not according to the new pageview definition:.

Our method for the rough estimation goes as follows:

At the end of step 2 we know -for example- that "1.2% of our pageviews  come from IE10". We do not take the device into account.
 * 1. For a timeperiod T get all requests to text and mobile.
 * 2. Calculate browser percentages for all those requests (this would be 'set#1').


 * 3. For timeperiod T get all requests of javascript files from bits.
 * 4. Calculate browser percentages on javascript bits data (this would be 'set#2').
 * 5. Compare set#2 with set#1, set#1 should be a super-set of set#2. Browsers that are on set#1 but do not appear on set#2 represent the set of browsers for which javascript is not enabled

The timeperiod we have choosen was the 2nd week of January.

Caveats
This methodology will not detect users that navigate with a modern browser (say Chrome 39) but with javascript turned off. To detect those a very specific experiment is needed. Our report will include bot requests and count those as clients without javascript.

We do not expect browser percentages on text and mobile to match exactly those on bits as requests for static files are subjected to different cache ratios than requests for main content that, in the case of our projects, is never cached on the client side. But browser percentages ratios should match. For example: if we get 0.6% of our pageview requests from Chrome 39 on Mac Os X with version 10.6 and and 0.4% of pageviews come from Mac Os X  with version 10.9 the ratio 0.6/0.4 should be about the same on browser percentages on bits.

Phones with no caching of javascript to speak of are overrepresented on bits data. The winner is windows7 phone, which is responsible for a huge number of requests in percentage in the static domains.

Results
Results with data from the 2nd week of January. Pageview data was sampled 1/1000 which amounts to 4 million of requests, bits data was raw (200 million). The approximate total of pageview requests without javascript enabled is about 10% but note that this includes bot requests. If we remove the main bots we see: Bingbot, YandexBot, Googlebot, TwitterBot and other self-labeled "Python Requests" the percentage is much lower, about 3% 

Details
Percentage of pageview totals for browsers that do not request javascript files on bits.

Without OS info
The percentage of browsers without javascript enabled bots removed is still in the same ballpark (2.3%). Note that this list reports browsers responsible for at least 0.001% of pageviews at the OS level, that makes visible Opera Mini.

0.0012 {"browser_major": "15"  "browser_family": "Chrome Frame"} 0.0012 {"browser_major": "21"  "browser_family": "Opera Mini"} 0.0013 {"browser_major": "2"  "browser_family": "Opera Mini"} 0.0013 {"browser_major": "22"  "browser_family": "Opera Mini"} 0.0014 {"browser_major": "0"  "browser_family": "Maxthon"} 0.0014 {"browser_major": "25"  "browser_family": "Opera Mini"} 0.0014 {"browser_major": "7"  "browser_family": "Opera"} 0.0014 {"browser_major": "8530"  "browser_family": "BlackBerry"} 0.0017 {"browser_major": "5"  "browser_family": "Baidu Browser"} 0.0017 {"browser_major": "9700"  "browser_family": "BlackBerry"} 0.0019 {"browser_major": "0"  "browser_family": "Kazehakase"} 0.0019 {"browser_major": "11"  "browser_family": "Opera Mobile"} 0.0019 {"browser_major": "14"  "browser_family": "Opera Mobile"} 0.0019 {"browser_major": "2"  "browser_family": "iBrowser"} 0.002 {"browser_major": "4"  "browser_family": "SEMC-Browser"} 0.002 {"browser_major": "537"  "browser_family": "WebKit Nightly"} 0.0021 {"browser_major": "0"  "browser_family": "Python Requests"} 0.0021 {"browser_major": "2010"  "browser_family": "Outlook"} 0.0021 {"browser_major": "720"  "browser_family": "CFNetwork"} 0.0023 {"browser_major": "1"  "browser_family": "K-Meleon"} 0.0024 {"browser_major": "10"  "browser_family": "Opera Mobile"} 0.0027 {"browser_major": "3"  "browser_family": "Nokia OSS Browser"} 0.0029 {"browser_major": "24"  "browser_family": "Thunderbird"} 0.0031 {"browser_major": "0"  "browser_family": "K-Meleon"} 0.0031 {"browser_major": "548"  "browser_family": "CFNetwork"} 0.0035 {"browser_major": "2"  "browser_family": "Lynx"} 0.0035 {"browser_major": "9"  "browser_family": "Opera Mini"} 0.0036 {"browser_major": "2007"  "browser_family": "Outlook"} 0.0037 {"browser_major": "-"  "browser_family": "CFNetwork"} 0.0046 {"browser_major": "31"  "browser_family": "Thunderbird"} 0.0053 {"browser_major": "4"  "browser_family": "Ovi Browser"} 0.0078 {"browser_major": "3"  "browser_family": "NetFront"} 0.0084 {"browser_major": "18"  "browser_family": "Chromium"} 0.0101 {"browser_major": "454"  "browser_family": "CFNetwork"} 0.0164 {"browser_major": "609"  "browser_family": "CFNetwork"} 0.0207 {"browser_major": "3"  "browser_family": "Ovi Browser"} 0.0221 {"browser_major": "1"  "browser_family": "TwitterBot"} 0.0339 {"browser_major": "7"  "browser_family": "Nokia Browser"} 0.0368 {"browser_major": "2"  "browser_family": "Ovi Browser"} 0.0551 {"browser_major": "6"  "browser_family": "UP.Browser"} 0.1104 {"browser_major": "8"  "browser_family": "Opera"} 0.124 {"browser_major": "2"  "browser_family": "Python Requests"} 0.1377 {"browser_major": "5"  "browser_family": "IE"} 0.1387 {"browser_major": "4"  "browser_family": "IE"} 0.2476 {"browser_major": "711"  "browser_family": "CFNetwork"} 0.5503 {"browser_major": "-"  "browser_family": "YandexBot"} 0.882 {"browser_major": "-"  "browser_family": "Slurp"} 6.0593 {"browser_major": "2"  "browser_family": "bingbot"}

What about IE6?
We do see user agents on bits for IE6, namely this one: {"browser_major":"6","os_family":"Windows XP","os_major":"-","device_family":"Other","browser_family":"IE","os_minor":"-"} Likely this browser is not identified by our code as IE6 and thus is being served Javascript (this is a bug) This browser represents about 1% of total pageviews.

We need to do a little bit more research here to see the javascript requests being served.