Security for developers/Tutorial

Introduction
The Secure coding and code review for MediaWiki Tutorial is based on MediaWiki security best practices and was originally created for the Berlin Hackathon 2012. It is designed to teach skills, to be reusable and, as all things Wikimedia, to be a living, editable document. Please contribute your thoughts on the discussion page and update this tutorial as needed.

Course Outline

 * 1) Problems we see on MediaWiki
 * 2) Top Vulnerabilities overview (20 mins)
 * 3) "Spot the Vulnerability" (5 mins)
 * 4) Secure Design Principles (5-10 mins)
 * 5) Collaborative write some secure code (15 mins)
 * 6) Security review of volunteer's code

Top Problems We See

 * Stuff we keep seeing
 * Cross-site Scripting (XSS)
 * Cross-site Request Forgery (CSRF)
 * Register Globals
 * SQL Injection

XSS

 * 1) Definition: an attacker is able to inject client-side scripting into a web page viewed by other users.
 * 2) Results in:
 * 3) Authenticated Requests
 * 4) Session Hijacking
 * 5) Click Jacking
 * 6) XSS Worms
 * 7) Internal network portscanning
 * 8) Types
 * 9) Reflected
 * 10) Stored / 2nd Order
 * 11) DOM / 3rd Order
 * 12) XSSI / "javascript hijackign"

XSSI
Known as "JavaScript hijacking" Here Used in 2006 to compromise gmail on evil.com:  document.write("");

More about XSS

 * For the theories behind XSS, and why certain filter should be applied, read: OWASP XSS Prevention Cheat Sheet
 * Quick reference of how to escape data in different document settings: OWASP Abridged XSS Prevention Cheat Sheet
 * Writing text into the document
 * Encode with HTMLEntities
 * Writing values into attributes
 * Quote the value with single or double quotes
 * HTMLEntities encode data
 * Encode or escape double or single quotes, depending on what the string is quoted with
 * Strictly validate unsafe attributes such as background, id and name
 * Writing html
 * Writing element names
 * HTML Validation
 * Writing attributes
 * HTML Validation
 * Writing into Javascript variables
 * Except for alphanumeric characters, escape all characters with the \uXXXX unicode escaping format (X = Integer).

Same Origin Policy
''' To understand the cross-site aspects of xss, you will need to understand Same Origin Policy SOP '''  Lots of useful information about how browsers handle cross-domain situations  ''' SOP is NOT the same for Javascript vs. Flash vs. XHR ''' SOP is changing with CORS
 * OWASP .ppt on SOP
 * http://code.google.com/p/browsersec/wiki/Part2

Understanding
Or more likely:
 * When an HTTP session is tracked with a cookie, the cookie is appended to every call to the cookie's originating server. This is done automatically by the browser.
 * This includes calls to a server for an image, or an iframe
 * If a user has an authenticated session established, a remote site

Preventing

 * Tokens given out just prior to editing, checked to authorized edit
 * In addition to authentication / authorization checks (not a replacement for)
 * Must be difficult to predict / guess (e.g., md5( $username, $timestamp ) would be bad; md5( $username, $secretKey, $timestamp ) would be ok)

Register Globals

 * If register_globals is on, then an attacker can set variables in your script

Dangers

 * Remote File Inclusion (RFI), if allow_url_fopen is also true
 * Alter code execution

Protections

 * Don't use globals in script paths
 * Ensure your script is called in the correct context
 * if ( !defined( 'MEDIAWIKI' ) ) die( 'Invalid entry point.' );
 * Sanitize defined globals before use
 * Define security-critical variables before use as 'false' or 'null'

Understanding

 * Poorly validated data recieved from the user is used as part of a database (SQL) statement
 * Often occurs when attacker-controlled values are concatinated into INSERT or WHERE clauses

Dangers

 * Authentication Bypass
 * SELECT * FROM users WHERE username='$username' AND password='$pass';
 * If $pass is set to "' OR 1='1"?
 * Data corruption
 * DROP TABLE, UPDATE data (esp. user tables)
 * In the worst case, complete system compromose
 * xp_cmdshell on SQL Server
 * SELECT INTO OUTFILE on MySql
 * lots of other bad stuff...

Preventing

 * Use MediaWiki builtin db classes and pass variables by key=>value for CRUD
 * select
 * selectRow
 * insert
 * insertSelect
 * update
 * delete
 * deleteJoin
 * buildLike
 * If you really have to, use database::addQuotes to escape a single value

Top Vulnerabilites in Web Apps OWASP top 10

 * A1: Injection
 * A2: Cross-Site Scripting (XSS)
 * A3: Broken Authentication and Session Management
 * A4: Insecure Direct Object References
 * A5: Cross-Site Request Forgery (CSRF)
 * A6: Security Misconfiguration
 * A7: Insecure Cryptographic Storage
 * A8: Failure to Restrict URL Access
 * A9: Insufficient Transport Layer Protection
 * A10: Unvalidated Redirects and Forwards

Play "Spot the Vulnerability"
Vulnerabilites: Extra Credit:
 * RFI in lang parameter
 * sqli in auth
 * $lang xss in form
 * errMsg reflective xss
 * $translations['login'] reflective xss (with register globals on)
 * insecure cookie for authentication
 * nextUrl header injection
 * $translations['bad_login'] header injection (with register globals on)
 * No salt for md5 hash of passwords
 * PHP_SELF reflective xss

Secure Design Principles
When designing / Architecting
 * Simplicity
 * Secure by Default
 * Secure the Weakest Link
 * Defense-in-depth
 * Least Privilege

Simplicity (Demonstrable Security)
Attack surface area and simplicity go hand in hand. Certain software engineering fads prefer overly complex approaches to what would otherwise be relatively straightforward and simple code. Developers should avoid the use of double negatives and complex architectures when a simpler approach would be faster and simpler.  In MediaWiki, keep your security-related code as simple and easy to review as possible, even if there is a more "elegant" solution. Clearly document security assumptions.

Secure by Default
There are many ways to deliver an “out of the box” experience for users. However, by default, the experience should be secure, and it should be up to the user to reduce their security – if they are allowed. For example, by default, password aging and complexity should be enabled. Users might be allowed to turn these two features off to simplify their use of the application and increase their risk.  Secure by Default: in the real world, software will not achieve perfect security, so designers should assume that security flaws would be present. To minimize the harm that occurs when attackers target these remaining flaws, software's default state should promote security. For example, software should run with the least necessary privilege, and services and features that are not widely needed should be disabled by default or accessible only to a small population of users.  In MediaWiki, make the more secure option the default, and allow administrators / users the option to lower the security if they desire. If you're not sure what the most secure option is, ask!

Secure the Weakest Link
Attackers are more likely to attack a weak spot in a software system than to penetrate a heavily fortified component.  For example, an attacker will not spend time circumventing the MediaWiki parser functions to inject a xss if they can socially engineer users to give them their passwords.

Defense-in-Depth
Defense in depth is an information assurance (IA) concept in which multiple layers of security controls (defense) are placed throughout an information technology (IT) system. Its intent is to provide redundancy in the event a security control fails or a vulnerability is exploited that can cover aspects of personnel, procedural, technical and physical for the duration of the system's life cycle.  Security controls should : The principle of defense-in-depth is that layered security mechanisms increase security of the system as a whole. If an attack causes one security mechanism to fail, other mechanisms may still provide the necessary security to protect the system. For example, it is not a good idea to totally rely on a firewall to provide security for an internal-use-only application, as firewalls can usually be circumvented by a determined attacker (even if it requires a physical attack or a social engineering attack of some sort). Other security mechanisms should be added to complement the protection that a firewall affords (e.g., surveillance cameras, and security awareness training) that address different attack vectors. 
 * Prevent
 * Detect
 * Contain
 * Recover

Least Privileges
In information security, computer science, and other fields, the principle of least privilege (also known as the principle of minimal privilege or the principle of least authority) requires that in a particular abstraction layer of a computing environment, every module (such as a process, a user or a program depending on the subject) must be able to access only the information and resources that are necessary for its legitimate purpose.  "Just enough authority to get the job done." 

Media Wiki Secure Coding Checklist

 * Get cookies with $wgRequest->getCookie
 * Do not use eval, create_function
 * Regex'es
 * Don't use with /e
 * Escape user-controled strings that get used in a regex with preg_quote
 * execute external programs with wfShellExec
 * Use MW's HTMLForm class, or include/check $wgUser->editToken to prevent CSRF
 * Use $wgRequest instead of $_GET / $_POST for passed in variables:
 * getVal - String
 * getArray - Array
 * getIntArray - Get array of ints
 * getInt - get int or default
 * getIntOrNull - get int, or null if empty
 * getBool - Boolean
 * getFuzzyBool - Boolean, accepts 'false'
 * getCheck - true if the value is set
 * getText - String, without carriage returns, transliterations applied
 * Defend against register-global variable injections
 * Use Html and Xml helper classes to write out text
 * Use ResourceLoader for CSS and Javascript
 * Use Sanitizer::checkCss for any css from users
 * Use database wrappers to communicate with DB
 * Clearly comment unexpected / odd parts of your code

Collabrative Development of Code

 * Starting with a skelleton of a SpecialPage
 * Create a Special Page that allows searching, and showing results
 * Assume a database of text data
 * CREATE table `myData` (`id` INT, `name` varchar(80), `body` TEXT);
 * Presents a search box to users of your content.
 * When a search is recieved, search the database for a match in the `name` or `body` fields, display the search term, and a list of article names (which link to the article), and the begining of the body to the users.