Requests for comment/Content-Security-Policy

The purpose of this RFC is to propose using the Content-Security-Policy HTTP header on Wikimedia Wikis. Skip to tl;dr

Background
Content-Security-Policy (CSP) (And its friend, Content-Security-Policy-Report-Only) is an HTTP header which allows a web server to tell a web browser to disable certain features which are commonly used when exploiting XSS vulnerabilities. It can instruct the browser to restrict which locations one can load JavaScript (and other types of resources - CSS, images, iframes, etc.). It can also be used to disable inline execution of JavaScript (e.g. via event handler attributes like, via   URLs, via  , etc). In the case of inline scripts in script tags, it also has the option to only allow them if they have a nonce attribute that matches the nonce value supplied in the HTTP header.

With these restrictions in place, it can be very difficult for an attacker who found an XSS to actually exploit it. While the current architecture of MediaWiki limits the applicability of this protection somewhat (particularly, the fact we allow users to make their own JS files, and users expect to be able to load arbitrary wikipages as JS files), I still believe that we can significantly benefit from using CSP.

CSP is of course no substitute for proper security practices like escaping. However, I hope that it can be a defense-in-depth measure that severely restricts the exploitability of escaping related bugs.

I have split this RFC into stages for different uses of the CSP header. I expect stage 0-2 to be relatively uncontroversial, while the other stages will be more controversial as they will inconvenience some gadget developers. I believe all the stages to be worthwhile, but 0-2 to be particularly important.

Potential attacks that are relevant to this discussion:
 * 1) [Persistent XSS] User discovers a way to inject into the parser an attribute like onmouseover, which should not be allowed (a surprisingly large number of XSS vulnerabilities allow the user to inject an attribute, but not new elements generally) [For the purpose of this numbering scheme, I'm not counting injecting data attributes here]
 * 2) [Persistent XSS] User discovers a method of injecting arbitrary HTML into the parser.
 * 3) [DOM XSS] User discovers a method of injecting arbitrary HTML into DOM with the help of client side scripts.
 * 4) [Persistent XSS-data] Attacker discovers a method of injecting data-mw-foo, data-ooui attributes into the parser (The sanitizer bans data attributes starting with data-mw, in order that client side js can use them as trusted input)
 * 5) [Upload XSS] User uploads a malicious file that the browser interprets as containing JavaScript (e.g. SVG that bypasses the anti-script check, or if someone manages to get an HTML file passed the upload checks)
 * 6) [Privacy fail] A misguided administrator adds Google Analytics to MediaWiki:Common.js
 * 7) [Privacy fail] Somebody loads images/iframes/etc from an external source. Similar to the Google Analytics case, but not as severe

I believe that by using CSP (and a related server-side blacklist patch) we can mitigate all of these attacks except #3, and #4. See the nonce part of stage 4 for discussion of attack 3, and section for a potential work around for #4 that needs more feedback. DOM-based attacks (#3) are generally out of scope for the protection that this RFC is trying to provide, since the architecture of MediaWiki, where users are allowed to load arbitrary custom JavaScript, makes it pretty much impossible to defend against using tools like CSP. CSP may help against DOM-based XSS in certain circumstances where the attacker only has limited control.

The MediaWiki portion of the code to implement stage 3+ is at I80f6f469. The API endpoint is Id9212

Implementation summary
CSP can control what resources the browser loads, and what scripts it executes. It has two modes, report-only and enforce. I want to first enable it in report-only to see what breaks. Once there is sufficient awareness and enough scripts have been migrated, we continue to enable it in enforce mode. The aim of this feature is to make it very difficult to exploit an XSS bug in the parser or in the upload file filter.


 * [Stage 0-3]: We use CSP header to ban javascript and external resources from upload.wikimedia.org
 * [stage 3-6]:
 * We use CSP to ban JS from domains not controlled by Wikimedia.
 * We ban inline javascript(, inline event handler attributes like  , linking to   URLs)
 * For inline script tags that MW needs, there is a nonce value supplied in an HTTP header. Script tags like  are still allowed to run. The nonce changes on every page view for non-Varnish users, and whenever cache is cleared for Varnish users
 * On the server side, we use a regex to look for  tags, and remove them from parser output. This is in order to have the benefit of CSP while still allowing custom user JS. The assumption being, it's difficult to ban every type of script loading thing, but its easy to just blacklist.
 * [stage 7]: Ban loading images, iframes etc from other domains, to ensure misguided admins don't break the privacy policy and make it more difficult for an XSS to lead to tracking users.

✅: Stage 0 (Create CSP reporting end-point)
Content-Security-Policy (and Content-Security-Policy-Report-Only) allows one to specify a URL where the browser will send violation reports

Stage 0 was to merge https://gerrit.wikimedia.org/r/#/c/274022/ so we have an API end-point to receive these reports. This was completed in 2016.

Stage 1 (Enable CSP reporting for uploads)
Write a Varnish patch to add the following to images served from upload.wikimedia.org:

Content-Security-Policy-Report-Only: default-src 'none'; style-src 'unsafe-inline' data:; font-src data:; img-src data:; media-src data:; sandbox; report-uri https://commons.wikimedia.org/w/api.php?reportonly=1&src=image X-Content-Security-Policy-Report-Only: default-src 'none'; style-src 'unsafe-inline' data:; font-src data:; img-src data:; media-src data:; sandbox; report-uri https://commons.wikimedia.org/w/api.php?reportonly=1&src=image X-Webkit-CSP-Report-Only: default-src 'none'; style-src 'unsafe-inline' data:; font-src data:; img-src data:; media-src data:; sandbox; report-uri https://commons.wikimedia.org/w/api.php?reportonly=1&src=image

This should ensure we get a report any time something on upload.wikimedia.org tries to execute a script, or load an external resource

Stage 2 (CSP enforce for uploads)
This will stop attack #5.

If there are no unexpected consequences of stage 1, remove the report-only part of the headers from stage 1, to start enforcing CSP.

This will also make sure that any old SVGs uploaded before the current filters are in place, render in the browser much closer to how we render them, so will probably be a good move for consistency's sake too.

Once we start doing this on upload.wikimedia.org, I would also suggest we start outputting the headers from parts of MediaWiki that can stream files (e.g.,  ,  ).

I think it would be a good idea to also put a  file in the upload directory of default MediaWiki that adds these headers. This would make third-party users who use Apache more secure against malicious uploads.

Stage 3 (Prepare Wikimedia wikis for general CSP)
The next stage would be to try and enable CSP on our main sites. Before doing that, we should fix cases we already know are going to break:
 * Where previously using event handler attributes in extension HTML was already strongly discouraged, it must now be banned entirely.
 * CharInsert needs to stop using inline event handlers (I2d298040ca3)
 * CleanChanges needs to not use inline event handlers
 * We should allow  to take a JavaScript function as an alternative to URL for the target of the portlet link. This function seems to be a major use case for   URLs in Wikimedia projects.
 * Announce to the Wikimedia community that use of inline event handlers and  URLs is now deprecated.
 * Merge the XSS filter patch. This will remove  tags from MediaWiki parser output. CSP can ban most ways of loading scripts, but we still need to be able to load normal user JavaScript. It is impossible for CSP to distinguish between legit JavaScript being loaded from a user sub-page and the attacker loading evil JS from a user subpage. However, since no modern MediaWiki extension should output script tags directly in the body of the page, we can simply look at the parser output for  . Thus we blacklist script tags directly on the server, and rely on CSP to filter all the other ways of loading JS. This of course means we can't really protect against DOM-based XSS.

Stage 4 (Report only CSP on testwiki)
This requires I80f6f469b.

We enable a Content-Security-Policy-Report-Only: header on testwiki. This will allow us to see what parts of MediaWiki trigger CSP, without getting overwhelmed with reports.

We could set $wgCSPReportOnlyHeader = true; to do this. But that might make a very long header: Content-Security-Policy-Report-Only: script-src 'unsafe-eval' 'self' 'nonce-2jGMIpo4g6rZ+pvkjldH' 'unsafe-inline' *.wikipedia.org *.wikinews.org *.wiktionary.org *.wikibooks.org *.wikiversity.org *.wikisource.org wikisource.org *.wikiquote.org www.wikidata.org m.wikidata.org test.wikidata.org *.wikivoyage.org www.mediawiki.org m.mediawiki.org wikimediafoundation.org advisory.wikimedia.org affcom.wikimedia.org auditcom.wikimedia.org boardgovcom.wikimedia.org board.wikimedia.org chair.wikimedia.org checkuser.wikimedia.org collab.wikimedia.org commons.wikimedia.org donate.wikimedia.org exec.wikimedia.org grants.wikimedia.org incubator.wikimedia.org internal.wikimedia.org login.wikimedia.org meta.wikimedia.org movementroles.wikimedia.org office.wikimedia.org otrs-wiki.wikimedia.org outreach.wikimedia.org quality.wikimedia.org searchcom.wikimedia.org spcom.wikimedia.org species.wikimedia.org steward.wikimedia.org strategy.wikimedia.org usability.wikimedia.org wikimaniateam.wikimedia.org; default-src * data: blob:; style-src * data: blob: 'unsafe-inline'; report-uri /w/api.php?action=cspreport&format=json&reportonly=1

Hopefully header compression in HTTP/2 will reduce the impact of the header length. Nonetheless it is very long. We want to ban loading scripts from domains like Google for privacy reasons, however given that users can create their own script pages, there's probably not much security benefit to being so strict which Wikimedia sub-domains to load from. So instead, lets just include *.wikimedia.org $wgCSPReportOnlyHeader = [ 'includeCORS' => false, 'script-src' => [ '*.wikipedia.org', '*.wikinews.org', '*.wiktionary.org', '*.wikibooks.org', '*.wikiversity.org', '*.wikisource.org', 'wikisource.org', '*.wikiquote.org', '*.wikidata.org', '*.wikivoyage.org', '*.mediawiki.org', '*.wikimedia.org', 'wikimediafoundation.org', ]

];

Which makes a header like: Content-Security-Policy-Report-Only: script-src 'unsafe-eval' 'self' 'nonce-ukjJjdMn5uApeN2LKiqN' *.wikipedia.org *.wikinews.org *.wiktionary.org *.wikibooks.org *.wikiversity.org *.wikisource.org wikisource.org *.wikiquote.org *.wikidata.org *.wikivoyage.org *.mediawiki.org *.wikimedia.org wikimediafoundation.org 'unsafe-inline'; default-src * data: blob:; style-src * data: blob: 'unsafe-inline'; report-uri /w/api.php?action=cspreport&format=json&reportonly=1

Which is still sort of long, but significantly shorter.

If we're willing to break some things, we could make the assumption that most cross wiki user scripts are either on a different language edition of the same project, at commons (e.g. HotCat), or at meta, and simply whitelist *.wikimedia.org and *. .org. I'm not sure how many things that would break. I'm not sure if this is worth the back-compat cost, but if we're willing to accept more breakage, we can shorten things down to:

Content-Security-Policy-Report-Only: script-src 'unsafe-eval' 'self' 'nonce-ukjJjdMn5uApeN2LKiqN' *.wikipedia.org *.wikimedia.org 'unsafe-inline'; default-src * data: blob:; style-src * data: blob: 'unsafe-inline'; report-uri /w/api.php?action=cspreport&format=json&reportonly=1

[Assuming we're on en.wikipedia.org. If on different project replace *.wikipedia.org with *. .org. *.wikimedia.org is always needed as meta has a special role, and probably also loginwiki. We could even kill the *. .org if we really wanted to]

So what does this do? It tells the browser to report if any of the following happens
 * Loading a script from a non-Wikimedia controlled domain. (Attack #6)
 * url is executed
 * an inline event handler (attribute starting with on) is executed
 * A script tag that has neither a src attribute, or a nonce attribute is executed. e.g. . More on the nonce attribute thing below

The actual header can be parsed as follows:


 * means that you can use  and   in JS code. This is needed for RL module caching. Most uses of eval in core are sane (with the exception of jquery.ui.datepicker). The main benefit to disabling eval would be to prevent users from doing stupid things in gadgets, since eval is easy to misuse. However, we need it, so it's still enabled.
 * means we can load JS from ourselves. Kind of repetitive since we list basically every Wikimedia domain, but good as a fallback just in case.
 * means that inline script tags are not allowed. But   is allowed. It also causes   URLs and attribute event handlers to be blocked. If the user is logged in (or otherwise skipping Varnish) then this nonce value changes on every request. Otherwise it is cached. The caching is not as bad as it sounds, as the primary goal here is to prevent XSS attacks in the parser. Any time the someone saves a page, the nonce will change unpredictably. The nonce attribute also helps in certain DOM-based XSS scenarios where the attacker has limited control (e.g. they can inject just an attribute). However, in general this does not provide protection for a DOM-based scenario, as the attacker could just load an arbitrary user sub-page containing his evil code assuming he can inject a   tag.
 * The domains obviously allow loading scripts from those domains.
 * is a fallback line for browsers that don't support . Browsers supporting nonce should ignore it.
 * For style, we allow CSS from everything.  is probably not needed here strictly speaking, but it doesn't seem to hurt anything.
 * means we allow  attributes on elements.
 * For everything else (images,, iframe, etc). We allow all domains,   urls, and   urls (  is needed for Special:Upload and UploadWizard)

Note: This uses some CSP2 features, so we only use the modern Content-Security-Policy header (not the X- variants) which has less browser support compared to stage 0-2. This should work on modern Firefox, Chrome and Opera.

Todo: Tyler suggested adding a meta tag with a more restrictive policy after page load. This would help mitigate the fact that anon users get repeated nonces, and also help provide better protection for browsers that don't support CSP2. Need to investigate this.

Stage 5 (Enforce CSP on testwiki, report on others)
If everything works out OK in report-only mode on testwiki, we now enable enforce mode on testwiki so users have a place to test their JavaScript with the new settings.

At this point we start enabling report-only mode on other wikis (perhaps starting with smaller wikis first so we don't overwhelm ourselves with reports). We start working with users to fix their scripts. It is expected that this stage will take a long time.

Open questions:
 * how to differentiate personal scripts and site-wide situations (like Common.js or a default gadget),
 * This will be fairly easy - If in a half hour period there are > 10,000 reports, probably a sitewide script. The reports also include line number and file name (however that's less useful in the context of non-debug mode RL) -bawolff
 * how to forward privacy policy violations to stewards (e.g. on SN) so that they can speedy disable the violation as per praxis,
 * For privacy violation aspects, that probably won't come to much later, once we have the main script security violations figured out. Initial deployment will be hard enough with just the anti-XSS features. But once that eventually happens, I imagine someone (like me) will be monitoring the log, and if action is needed, will forward it to the appropriate people. -bawolff
 * how to communicate with developers/local administrators for the rest.

loginwiki might also be a good first candidate to start enforcing CSP on.

Stage 6 (Actually enforce CSP)
Start enforcing on some wikis as popular gadgets get fixed. I expect this will not happen all at once, but gradually we're notice things like - hmm, frwikisource stopped sending CSP reports, let's enable it there, etc.

Once we've done the first 6 stages, we should have mitigated attacks 1,2,5,6 for all users, and attack #3 totally for logged in users, and sort of partially for logged out users. Leaving just #4 and #7.

Stage 7 (CSP to prevent tracking pixels, etc.)
For privacy reasons, we could start enforcing CSP for other resources (, iframe, etc). The idea being, even in the event of an XSS (or just unfiltered style attribute), the attacker should not be able to leak things like IP addresses of our users to a third party, or embed tracking images with third party cookies.

This will help ensure our privacy policy is upheld in the face of a misguided admin.

Note: MapSources extension used on Wikivoyage embeds https://tools.wmflabs.org/wiwosm/osm-on-ol/embed-labs.html in an iframe. I'm unclear if that violates the privacy policy, but if we continue to do that, we'll have to remember to make an exception to CSP to allow tool labs as an iframe source. There is a similar issue with WikiMiniAtlas, and OpenStreetMap features in use on some wikis.

This will fix attack #7. At least against a naive attacker or someone just misguided and not malicious. There are still side channel attacks that a sophisticated attacker could probably use.

The header would have to look like: Content-Security-Policy: script-src 'unsafe-eval' 'self' 'nonce-1yaAqSy3ScolhtfIS+Kz' *.wikipedia.org *.wikinews.org *.wiktionary.org *.wikibooks.org *.wikiversity.org *.wikisource.org wikisource.org *.wikiquote.org *.wikidata.org *.wikivoyage.org *.mediawiki.org *.wikimedia.org wikimediafoundation.org 'unsafe-inline'; default-src 'self' data: blob: *.wikipedia.org *.wikinews.org *.wiktionary.org *.wikibooks.org *.wikiversity.org *.wikisource.org wikisource.org *.wikiquote.org *.wikidata.org *.wikivoyage.org *.mediawiki.org *.wikimedia.org wikimediafoundation.org; style-src 'self' data: blob: *.wikipedia.org *.wikinews.org *.wiktionary.org *.wikibooks.org *.wikiversity.org *.wikisource.org wikisource.org *.wikiquote.org *.wikidata.org *.wikivoyage.org *.mediawiki.org *.wikimedia.org wikimediafoundation.org 'unsafe-inline'; report-uri /w/api.php?action=cspreport&format=json

Unfortunately, that makes the header quite long, without quite the same benefit as restricting the JavaScript has. I'm unsure if it's worth it. Note: We can't just have upload.wikimedia.org in the default-src, since we want to be able to load CSS from any wikimedia wiki since users do have cross-wiki gadgets. The alternative would be to just force everyone to put their cross-wiki gadgets/scripts on meta (which would shorten not just default-src, but also script-src), but that would probably antagonize users.

Stage 8 (Option in the installer?)
Eventually I think it would be a good idea to add CSP as an option in the installer for security-conscious third-party users. However I don't think it should be enabled by default since it breaks many legitimate use cases for third-parties.

Potential solutions to data-mw attributes (Attack #4)
The largest hole in this scheme is the  and   prefix attributes. The MediaWiki sanitizer bans attributes starting with those prefixes in order to allow trusted input to be given to the client side. Some parts of MediaWiki use  to mark interface elements. An attacker forging them could potentially confuse the user into doing something they don't want, or mess up the interface. Older versions of OOJS allowed constructing links to  URLs if a malicious person could construct a   attribute. New versions of OOJS don't have this issue and CSP would ban the  URL anyway. However TimedMediaHandler does allow arbitrary XSS by forging attributes. And its probably safe to assume that in the future other people will avail themselves of this feature, allowing better attacks in the future for people who can bypass the sanitizer.

One idea I had to mitigate this issue which I would like feedback on, is to do something similar to the nonce attribute for script tags. We could add a per page nonce variable to (the client side) mw.config.get. When we want to use a trusted data attribute, we would add an attribute like  Then instead of , client side scripts could use some new function. This code would look for the data-mw-nonce attribute, and only return the other data attributes if the number of data attributes, and the nonce value, is correct. The nonce would change everytime the page is parsed.

Thus in order to inject an evil data attribute, one would have to have an XSS directly in the middle of an existing element containing a data-mw-nonce attribute. And one would have to be able to do more than just add attributes, since the number of data attributes is included in the nonce (e.g. If someone only forgot to escape quote marks, that's still no good, as that only allows you to add additional attributes, not end the element early).