Wikimedia Security Team/ApplicationScanning

Overview & Goals
The Wikimedia Foundation Security Team seeks to complement its manual security testing processes with automated security scanning. The team will evaluate several tools to determine their feasibility for use within the organization.

Overall Tasks

 * Define selection criteria
 * Test and compare features and performance of tools in trial installations
 * Select and deploy tool in labs
 * Configure weekly automated scanning of beta from from labs (coordinated with RelEng)
 * Record baseline scan results for core and one extension

Active Scanning Areas/Checks

 * accuracy of this table is questionable; no source code inspection was done

DVWA, Authenticated
Arachni never reached a completion state. Some error condition occurs which causes the scan to restart from the beginning, but there is insufficient information in the logs to determine the root cause of the error, and why arachni feels it is appropriate to restart. A positive is that the equivalent of Ctrl-C when running a scan from the command line (q in the console) causes an .afr file to be written, from which arachni_report may be use to generate a report, but the other tools allow for realtime inspection of scan results, so this is not an advantage over the other tools.

T73394
None of the tools detected this vulnerability using default and XSS-specific scan profiles, with scope limited to a single page. (Full application scope does not complete.)

MediaWiki, Authenticated
Using default settings, none of the scanners completed a scan of MediaWiki in a timely manner. Rule and scope tuning and re-invocation is in progress.
 * It would be great to also run these scanners on known vulnerable versions of mediawiki/extensions, and see which ones pick up those issues (or see how much customization to the default scan rules is required to find the issue). A lot of the recent #vuln-xss issues would be reasonable tests. CSteipp (WMF) (talk) 00:21, 30 July 2015 (UTC)

General Observations and Questions
The tools have varying levels of support for automated scanning. It’s less important that any of the tools be able to run headless, and more important that they simply be invokable via some non-GUI mechanism (REST API, command line, etc.). Burp, unfortunately, seems to be least featureful in this respect.

All of the tools have a concept of a scan profile, which exists separately from a scan invocation. Arachni requires that all scan specific variables (form field values, authentication credentials, etc.) be stored in a profile. This means that adjusting any of those parameters requires duplication and modification of an existing profile. Burp and Zap allow some level of parameterization of profiles, storing parameter values with the scan data itself.

Some configurability is offered via the API’s for both Burp and ZAP, however our most flexible route to scan configuration requires use of the desktop tools to create empty sessions, then upload of those sessions to tool instances running on our scan servers.

What do we scan?
It won’t be feasible to allow the chosen scanner to spider all accessible links at http://en.wikipedia.beta.wmflabs.org/wiki/Main_Page, for example. Just spidering would not complete in a timely manner, and scanning itself would take a huge amount of time. Should we scan all Special: pages, plus a manually selected subset of entry pages?
 * Ideally, the authors of the various features / extension should provide us with a way to discover what the most relevant paths are. My initial thinking was to use browser tests to collect paths through the application, then active scan those urls (and possibly one level of spidering off those urls, if those complete in a reasonable timeframe). We could also have users submit a manifest of urls and parameters that their feature excepts, similar to how all API modules have to explicitly list each parameter and the type. CSteipp (WMF) (talk) 23:11, 29 July 2015 (UTC)

Does scanning ability take precedence over support for automation?
I think they're about equal in weight for our decision. I'm assuming, CSteipp (WMF) (talk) 23:30, 29 July 2015 (UTC)
 * If we find a scanner that is clearly better, but needs to be manually run, we can run it manually periodically in addition to our automated scanning
 * Whenever someone reports a crazy vulnerability in mediawiki that isn't currently detected by our scanning, we should be able to write a test for that scenario into our scanner and test for that vulnerability across all of MediaWiki. So I'm assuming our scanning ability will improve over time.

Why use choose one?
At the end of the application review period, we will have decent understanding of running each of these tools in an unattended manner. It may serve us well to run the two unchosen tools on a monthly basis, in addition to whatever tool we decided to use weekly. Each tool has strengths and they may be complimentary, and increase our overall security posture.

Arachni
Arachni sort of requires the least work in order to be run in an headless manner. It is semi-headless out of the box, by nature of its architecture.

Arachni has a built-in finding review system, which allows for discussion of issues, flagging as valid or false-positive, etc.

Internal server errors were observed during relatively normal operations, such as editing the textual description of a scan after it has completed. While scans are run, server-side errors are output to a textarea in the middle of the page, with an message that these errors may need to be reported to developers. This hinders viewing of scan results which are populated via XMLHttpRequest-based polling in a pane below the error message textarea. Viewing of the errors themselves are also hindered by refreshing of the textarea every second or so, causing focus to be placed at the bottom of the textarea content. Additionally, the fact that these errors appeared during a basic, unauthenticated scan of WebGoat is concerning.

Something about the operation of Arachni in a local VM spikes the CPU. More investigation is needed. I believe it is the frequency of polling and in-browser activity, but I’m not certain about this. If it is in-browser activity, it’s not a huge obstruction for headless operation, but may be of concern at results inspection/review time.

At some point my Arachni VM hung. I hard reset the machine, logged in, restarted Arachni, and found that all scan results were gone. It was as if the database (SQLite in this instance) had been corrupted beyond repair, and silently rebuilt. More investigation is needed (and we’d be running on PostgreSQL in production anyway, so it’s maybe not a huge problem).

Arachni cannot select from a group of usernames and passwords for authenticating during scanning.

Burp
Burp appears to require to most work to run headless. This is not supported at all by the vendor, but community efforts have been made to support automated scanning using Burp.

Burp seems to have the best scanner of the bunch. A default scan of DVWA found command injection, stored XSS, reflected XSS. Burp also seems to