OAuth (obsolete info)/Issues

This page serves as an overview of issues in both the OAuth 1 and OAuth 2 specs.


 * https://tools.ietf.org/html/rfc5849 - OAuth 1
 * https://tools.ietf.org/html/draft-ietf-oauth-v2-30 - OAuth 2
 * https://tools.ietf.org/html/draft-ietf-oauth-v2-bearer-22 - OAuth 2 Bearer tokens
 * https://tools.ietf.org/html/draft-ietf-oauth-v2-http-mac-01 - OAuth 2 MAC tokens
 * https://tools.ietf.org/html/draft-jones-oauth-jwt-bearer-04 - OAuth 2 JSON Web Tokens Bearer tokens
 * https://tools.ietf.org/html/draft-jones-json-web-token-10 - JSON Web Tokens (TWT)
 * https://tools.ietf.org/wg/jose/ - JOSE; The WG behind JWA, JWS, JWE, JWT, ...
 * http://hueniverse.com/2012/07/oauth-2-0-and-the-road-to-hell/

Please use the talkpage for any questions or comments on the topic.

--Daniel Friesen (Dantman) (talk)

Signatures
Signatures are an important part of the security of these specs. Signatures protect the important client secrets from being extracted by 3rd parties.
 * They are absolutely necessary when the request is being sent over HTTP due to it's inherent insecurity.
 * While TLS is supposed to protect the transport, that only protects the secret from middlemen. If a bug in the client or something related to discovery causes the client to make the OAuth request to a server on a different domain name all the owner of that domain name needs to do is use a valid certificate for their own domain name and the client will happily hand over it's primary secret credentials to the malicious party when signatures are not used.
 * Additionally there are known issues with the use of TLS by client developers.
 * The roots for TLS are not always configured by default and this often gets in the way of client developers.
 * As stupid as it is, many client developers decide to deal with this not by installing the root certificates but instead by setting verify_peer to false and effectively completely disabling all of TLS' security features.
 * In fact while cURL's default for verify_peer is True, the verify_peer setting for ssl:// https:// in PHP's own network libraries is False. ie: Insecurity by default.
 * Basically this means that client developers can't even be trusted to use TLS correctly. They will happily write apps that will leak client secrets to middlemen. So signatures become an important method of protecting the secret from 3rd parties.

OAuth 1
OAuth 1 defines 3 different signatures:
 * PLAINTEXT - This method isn't actually a signature. The client secret is simply sent to the server.
 * HMAC-SHA1 - This signature utilizes a shared secret that both the client and server know. A HMAC signature using the sha1 hash algorithm is used to sign the request keeping the actual signature from being sent over the wire.
 * RSA-SHA1 - This signature utilizes a RSA private key that the client knows for which the server is told what the client's public key is. A RSASSA-PKCS1-v1_5 signature is used to sign the request with the client's private key.

Overall OAuth 1 is better in the signature area than OAuth 2. Though it does have one sticking point. And a side point.

Issues with the signature base string
The OAuth 1 base string (the text that is signed and verified) includes the following:
 * The HTTP method
 * A base string uri consisting of:
 * http:// or https:// depending on if SSL/TLS is used
 * The hostname in the Host header
 * The port after a : if the port is non-default. Nothing if the port is default (ie: 80 for http, 443 for https)
 * The parameters to the request:
 * The query component of the request uri
 * The parameters inside of the HTTP Authorization header if used
 * The parameters inside the entity body (eg: POST body) if application/x-www-form-urlencoded is used

This data is encoded and put together into an & separated string and used to create the signature.

There are some tricky details to this whole ordeal:
 * The use of a http://example.org:8080/ style base string incurs extra code. You now have to be careful to make sure you omit the port if it is a specific port. This gives a situation where someone can easily make the mistake of including an :80 into the base url breaking their signature algorithm implementation. Basically it just makes signatures trickier to implement with no benefit.
 * The whole query component bit has a number of issues with it:
 * The spec tells clients and servers to work with the query data after it is decoded and work with it as a whole blob. Rather than just using the raw query strings used inside the request. Implementations end up needing to be careful how they order a long list of query parameters. If the order that parameters are in gets changed slightly the signature implementation is completely broken.
 * Url encoding itself is even inherently troublesome. There are multiple ways to encode one chunk of text for use in a url (ie: it's not a consistent encoding) as a result the OAuth spec actually goes to the trouble to define the exact url encoding algorithm that must be used inside of a signature.

While not impossible to implement. OAuth signatures are basically more complex than necessary. Simply due to adding a conditional that doesn't need to exist, lumping data together that could have been serialized as separate items, and picking methods of encoding the data that are historically not consistent enough to trust in signing. An ideal spec could have found a way to build the signature base string in a way that was much less prone to mistakes without affecting the security of the signature.

This is an example OAuth 1 signature base string:

RSA-SHA1
OAuth 1 includes the ability to use a client private key for the client credentials. Unfortunately using it isn't as ideal as one would wish.

OAuth 1 has these two things to say about the use of private keys in OAuth: 4.1. RSA-SHA1 Signature Method

Authenticated requests made with "RSA-SHA1" signatures do not use the token shared-secret, or any provisioned client shared-secret. This means the request relies completely on the secrecy of the private key used by the client to sign requests.

[...]

4.11. SHA-1 Cryptographic Attacks

SHA-1, the hash algorithm used in "HMAC-SHA1" and "RSA-SHA1" signature methods, has been shown to have a number of cryptographic weaknesses that significantly reduce its resistance to collision attacks. While these weaknesses do not seem to affect the use of  SHA-1 with the Hash-based Message Authentication Code (HMAC) and should not affect the "HMAC-SHA1" signature method, it may affect the use of the "RSA-SHA1" signature method. NIST has announced that it  will phase out use of SHA-1 in digital signatures by 2010 [NIST_SHA-1Comments].

Practically speaking, these weaknesses are difficult to exploit, and by themselves do not pose a significant risk to users of this protocol. They may, however, make more efficient attacks possible, and servers should take this into account when considering whether SHA-1 provides an adequate level of security for their applications.

As a result the use of private keys in OAuth 1 is less than ideal. I have a feeling that it's possible to write a standard that can make secure and effective use of private keys. Likely by using some algorithm that succeeds RSA-SHA1. And perhaps even by making use of a combination of an RSA key and a shared HMAC secret key (ie: Something like encrypting a temporary shared secret using a key then using HMAC-SHA1). But that won't happen anywhere under either version of OAuth.

OAuth 2
OAuth 2 takes signatures a whole different direction. Well, actually it doesn't take them anywhere. Instead of talking about signatures OAuth 2 just talks about 'tokens' and says that some types of access tokens can use signatures.

Some key points here:
 * Client credentials don't use signatures anywhere. OAuth 2 says to use the insecure HTTP Basic auth and relies entirely on broken TLS for the security of client<->server communication. It says that you 'can' use other http authentication systems however you must store a mapping of what clients use what authentication system.
 * Access tokens are actually the only part of the spec that support signatures. This means that the all important to keep secure refresh token that has more rights and persistence than the access token is sent entirely in the clear and relies on broken TLS. This also means that it is impossible to use OAuth 2 without the authorization server using TLS. The only place you can use http (and I'll point out why that's not even as secure as OAuth 1 later) is on the resource server using MAC based access tokens. However in reality the authorization server and the resource server are typically the same server. That means that a wiki actually has to have https support to safely use OAuth. (Something I'd like to say is not as cheap and easy as people would like to say it is, nor something to expect of the thousands of small wiki which have as much right to an auth scheme as Wikipedia).
 * OAuth 2 actually doesn't have any access token implementation inside of it. It defers that to two different specs oauth-v2-bearer and oauth-v2-http-mac.
 * OAuth 2 doesn't use client credentials in any request using an access token. As a result if an access token is leaked it can be used by a third party without needing to also get access to the client credentials.
 * OAuth 2 does not include any public/private key based signature. Unlike OAuth 1 you have no option besides shared secrets.
 * While not referred to from the OAuth 2 spec there is another method of signing things proposed. JWT, JSON Web Tokens. But it has issues too:
 * While JWT is based off of the JOSE WG's standards which include JWE (JSON Web Encryption) JWT tokens for OAuth 2 still don't use private keys
 * Ultimately you'd best not even consider JWT. JWT is non-trivial and completely excessive. It's designed entirely for the enterprise space where they can sell high priced consulting services to people. It's not something designed to make OAuth 2 something securely usable for everyone.

Bearer tokens have no security. They are basically the same as PLAINTEXT in OAuth 1.

OAuth 2 MAC access tokens
Signatures in OAuth 2 access tokens are defined by oauth-v2-http-mac. These signatures have some nice points as well as some faults.

Firstly the format of the MAC signature base string consists of:
 * The timestamp of the request (included in the MAC Authorization header)
 * The noonce
 * The HTTP Request method
 * The raw request-uri (The part before {GET,HEAD,...} and before HTTP/1.1, ie: just /foo no http://example.com:8080/)
 * The hostname inside of the Host header
 * The port used in the request (unconditionally, unlike in OAuth 1 where you have to omit a :80 over http)
 * The value of the ext="" field in the Authorization if used ("" otherwise)

These components are combined together into a simple string separated by newlines. For example: (\n to show where newlines are): 1336363200\n dj83hs9s\n GET\n /resource/1?b=1&a=2\n example.com\n 80\n \n

As a result besides all the issues I listed before with signatures in OAuth 2 the actual format of the signatures in OAuth 2 is much cleaner. This format is more consistent and easier to implement without making a critical mistake.

Unfortunately... the actual tokens themselves have an inferiority to the signatures used inside of OAuth 1.

In OAuth 1 the entity-body of a request was included inside of the signature if it was application/x-www-form-urlencoded data. However OAuth 2 signatures do not include the entity-body anywhere in the signature. As a result OAuth 2 signs the url of a request but it's possible for a middleman to freely modify the post body of a request without breaking the signature. So if you go and include API query parameters entirely inside the POST body instead of putting most of them in the query a middleman will be free to make an entirely different api request.

Temporary credentials and flows
OAuth 1 uses the following pattern for grants:
 * Client makes a request to the server's "Temporary Credential Request" endpoint with a callback URI signed by the client credentials.
 * Some temporary credentials are returned from the server.
 * Client sends user agent to the server's "Resource Owner Authorization" with the public token given in the temporary credentials request.
 * User logs in and authorizes the client.
 * Server authorizes the temporary credentials.
 * Server redirects user back to the client's endpoint with the same token and a verifier attached.
 * Client makes a request to the server's "Token Request" endpoint with the public token signed by the client credentials and the secret temporary credentials.
 * Server returns permanent credentials to the client.
 * Client now makes a resource/api requests to the server as the user by signing the request with the client credentials and permanent credentials returned.

OAuth 2 uses the following pattern for the closest type of grants:
 * Client sends user agent to the server's "Authorization Endpoint" with response_type=code, the client id, a state parameter, and the callback URI.
 * User logs in and authorizes the client.
 * Server redirects user back to the client's endpoint with an authorization code and the same state parameter.
 * Client makes a request to the server's "Token Endpoint" with grant_type=authorization_code, the authorization code, and the client credentials.
 * Server returns an access token, expires time, and refresh token

The issue with temporary credentials
One problem with the OAuth 1 pattern was the "Temporary Credentials". The server first grants and stores some credentials. The user authorizes them. And then the client asks for permanent credentials. However a good portion of the time the user might decide they don't want to authorize the client and just close the window. As a result any OAuth 1 implementation is stuck storing an authorization for every single time someone says they want to let a user authorize their client. And they are stuck holding this data on for an indefinite amount of time because if they let it go and the user tries to authorize the client they will then get an error.

In the OAuth 2 flow. Instead of asking for temporary credentials first. The client sends the user to the server. The user authorizes the client. And only then after the user has authorized the client does the server create some temporary credentials and send them back along with the user to the site. The authorization code is protected from 3rd parties by requiring a combination of both the access code and the client credentials to get the code back. Though this does mean you need multiple authorization flows for non-browser authorizations to work.

Multiple flows
The final version of OAuth 1 included a single authorization flow for applications (multiple flows were merged into one). While this one flow was designed to work in multiple environments in practical use it was discovered that using this one authorization flow for multiple profiles of users lead to a substandard user experience on these other profiles.

OAuth 2 includes multiple authorization flows to fix this. However while it defines a flow to work with semi-offline, client side JavaScript only, and potentially browser built-in profiles and includes various other flows. It still does not provide an explicit description of how authorization of things such as mobile devices should work without the use of passwords.

An ideal spec can probably go to the effort to think of profiles at the user level and then come up with what authorization flows fit each one. Rather than just listing authorization flows with no reference to all the cases they may be used.

Protocol
OAuth 2 actually doesn't define a protocol. OAuth 1 was developed as a "protocol". While OAuth 2 is a "framework".

This basically means OAuth 1 and 2 are completely different in fundamental goals. OAuth 1 was going to be a standardized protocol. Meaning you could take nearly any properly implemented OAuth client and any properly implemented OAuth server and connect them together. OAuth 1 could actually be used in combination with discovery to use random websites for user auth.

OAuth 2 on the other hand calls itself a "framework" and lists a disclaimer "this specification is likely to produce a wide range of non-interoperable implementations."

Basically OAuth 2 is in essence a how-to guide on writing a proprietary auth setup for your own proprietary service. Implementation of the specification gives you no actual advantages. It doesn't give you security advantages because you could do just as good in your own proprietary home brewed system. And it doesn't give you any interoperability so you still end up writing your own server and client implementation because OAuth 2 isn't a protocol you can implement, it's a guide on how to write something yourself. And when it comes down to it there is absolutely no advantage to even sticking to the spec.

The spec says "This framework was designed with the clear expectation that future work will define prescriptive profiles and extensions necessary to achieve full web-scale interoperability." But that's basically just some hand-waving saying "Not our problem. Someone in the future will magically turn this into something interoperable across the whole web." which has no value because someone could just as easily write a brand new standard that serves that purpose and doesn't have the flaws.

Discovery
Discovery is supposed to allow OAuth to work in situations besides the one where you have one proprietary service like Facebook offering access to their API. Namely it allows a client which uses an API which is the same on many different websites — like the MediaWiki API — to work on all those websites. This is done with a standard for discovering the location of OAuth endpoints and also either a static client_id for generic use or a protocol for dynamically registering a client.

Discovery isn't very useful to proprietary services. And all the web folks have left OAuth 2 and it's basically being driven by committee and enterprise. The only discovery related to OAuth is dead or enterprise focused at this point.


 * OAuth 1's attempt at discovery was defined by OAuth Discovery 1.0 Draft 2 which was marked as obsolete and never finished.
 * OAuth Extension for Specifying User Language Preference gave OAuth 1 language preferences.
 * OAuth Dynamic Client Registration Protocol is a draft spec for dynamic client registration in OAuth 2.

Interoperability
Interoperability between server implementations is important for both discovery for the goal of general login authentication and also important to be able to write client libraries. Unless everyone follows a standard protocol and sticks to it you can't write a client that will work everywhere. Now OAuth 2 in this area is already a failure since OAuth 2 is not a protocol. But having the first adopters not even stick to the framework just makes things worse for the idea.

Even though this doesn't matter to us since these proprietary services don't have an open API we need to be interoperable with I'll include this here since it's a good reminder that using OAuth 2 doesn't give us any advantages, you can see no-one actually cares about OAuth 2 even when they say they do, and the first adopters are already getting in the way of having proper robust client libraries.

Looking over the Facebook, Google, and Meetup implementations of OAuth 2:
 * Facebook's json error responses violate the spec by including an error object instead of an error code and separate error description. It doesn't include the actual OAuth error type anywhere so it's useless for an OAuth 2 client.
 * Google introduces non-standard access_type={online|offline} and approval_prompt={force|auto} parameters into it's authorization endpoint usage.
 * Google introduces a non-standard TokenInfo endpoint to the "Implicit" flow.
 * Meetup uses non-standard X-OAuth-Scopes and X-Accepted-OAuth-Scopes headers to indicate what scopes were authorized.
 * Meetup appears to ignore the spec and mangle the &scope= value you pass through in some cases.
 * token_type is a mess:
 * Facebook completely ignores the token portion of the standard and gives out a form urlencoded string containing access_token and expires parameters (even ignoring the fact that the spec uses expires_in instead of expires) instead of using the json output the standard specifies. Naturally there is no token_type anywhere.
 * Meetup violates the spec by not including any token_type parameter inside of the grant response.
 * Google, well they do follow the spec. They return a token_type of "Bearer". The spec says that token_type is case-insensitive so it's correct. Even though the spec says it's "bearer" not "Bearer". "Bearer" is used in one of the oauth-v2-bearer examples, so it looks like someone at Google C&P from an example instead of strictly reading the spec.
 * The use of access tokens is inconsistent:
 * Google supports the Authorization header as well as access_token inside the query parameters. Though it does not specify if it supports it inside of the post body.
 * Facebook and Meetup exclusively uses the access_token query parameter and don't appear to support the Authorization header.

Now with all these inconsistencies behind us. The way these three use client_secret on the other hand is quite consistent. They all violate the spec. OAuth 2 defines a way of using an Authorization: header with Basic HTTP auth. And a way of including it as a client_secret parameter inside of a POST body strongly recommending it not be used. Facebook, Google, and Meetup all ignore this and exclusively make use of the client_secret as URI query parameter. (This was never valid in any draft of OAuth 2. Not even old drafts these providers are supposed to have implemented.)

Discovery for us
OAuth 1 discovery is obsolete. OAuth 2 has numerous issues. And OAuth 2 discovery while it contains some interesting ideas, does not fit our needs.

Some notes on things we'd need out of discovery:
 * The fact that software like MediaWiki can be installed below the root and with no arbitrary url handling needs to be considered. So discovery cannot depend purely on /.well-known/host-meta patterns.
 * We need to consider the fact that many wikis and potentially clients won't have https. We need to consider some way to securely get client credentials for dynamic client registration to the client without leakage. (The whole Convergence idea comes into play here)
 * We need to consider the possibility of clients hosting their client information in a 3rd party service (eg: A large WMF hosted registry of MW client information for extraction) in case they don't have https themselves.

Some other thoughts:
 * The client information should probably also define some other information such as whether a client is confidential or public. Perhaps even things that it supports.

Summary
Based on some of the topics above and some other thoughts an ideal auth spec would have the following points:
 * The protocol will use the basic pattern of the "Authorization Code Grant" auth flow to avoid the issues with temporary credential storage.
 * The use of signatures would be mandated throughout the spec.
 * Signatures would use OAuth 2 style signature base string encoding to avoid implementation bugs but find some way to deal with the flaw of the entity-body not being signed:
 * Existing server and client libraries would be examined. Situations where the script won't have direct access to the data will be noted.
 * If the raw entity-body is used some sort of hash of the entity-body will be used inside of the signature base string instead of the raw body. This will prevent the unwanted restriction of forcing the server to buffer the whole body.
 * APIs that make use of XML, JSON, etc... in the POST body instead of form encoded values will be taken into account to avoid restricting these protocols to insecurity.
 * Client credentials will be used in all client requests. ie: Signatures using access tokens but not client credentials will not be allowed.
 * From the start only signatures will be supported. PLAINTEXT/Bearer style credentials/tokens will ONLY be provided if a valid case for their inclusion can be presented.
 * Given the noted insecurity in depending purely on TLS the argument that TLS is enough will not be accepted.
 * Arguments that PLAINTEXT/Bearer style credentials/tokens are easier to use in places like command line cURL will not be accepted.
 * An effort will be made to develop some standard libraries / SDKs so that the auth system can reasonably be used inside apps and hopefully servers too without a heavy implementation burden.
 * A test system should probably be provided. Base it on say the first python implementation of the auth system and provide some command line scripts to test that a server implementation conforms to various components in the specification.
 * To ease in development and testing in places like a command line cURL environment a "Test" signature profile will be provided that works more like Bearer tokens (or omits a portion of signing) may be provided in the spec. This will only be permitted for use in non-production phases of development. A recommendation of servers will be made that they either provide a development toggle switch for client registrations that places limits on the client and enables the "Test" signature profile or give clients separate development and production client credentials and impose the same limits on development clients. Or perhaps even (especially in the case of big things such as financial environments) provide a "Test" and "Production" environment and different credentials for each.
 * The current state of public/private key signing will be examined and an attempt will be made to find some way to incorporate key signing into the signature portion of the spec without encountering the negatives in OAuth 1.
 * The specification will be written with support only for the credentials inside of a HTTP Authorization: header and will not include any support for credentials in the query or post body. This is due both to the implementation burden with no benefits it will impose on library development getting in the way of security. As well as the fact that mandating things about query parameters will reduce the value of the specification as a way to provide a non-interfering auth layer for resources. (If it can be shown that there is a significant portion of potential implementations where there are issues with an Authorization header a method of using something like an {AuthName}-Authorization header may be considered and submitted to the IETF as specified by RFC 3864.
 * Various physical user flows will be examined and the best auth flow for them will be picked out. When auth flows are included into the specification it will retain that information and give details on the use of these flows in the different physical flows that clients may implement them in.
 * The specification will examine and mandate what flows are required of all implementations and what ones are optional. For instance the Authorization Code grant style flow would be a requirement as this flow will likely be use for general website authentication. While if an OAuth 2 style "Client Credentials Grant" style flow is provided in the spec it may be noted as an optional component since some auth systems may not have any use for auth without a linked user.
 * The specification of a token revocation endpoint will be baked into the spec instead of written as an optional side spec.

Some various thoughts:
 * An "Implicit Grant" style flow will likely be suggested. However when this is done the spec should consider using the #! pattern instead of just appending it to #.
 * The possibility of using a Convergence style scheme for secure discovery and establishing a secure channel over http using the involvement of 3rd parties is and idea to consider. (Writing this as a standard that can apply to other websites and engines will potentially greatly help in getting people besides just Wikimedia to setup notaries)
 * When dealing with adding the use of private/public keys as signing methods consider the option of providing some way of using the client private key to securely obtain a temporary shared secret to use in HMAC based signing.