Talk:Requests for comment/Service-oriented architecture authentication

Similarity to AuthStack
How different is this from AuthStack? This RfC seems like just a generalized version of the latter. Can they just be merged? Parent5446 (talk) 18:14, 9 June 2014 (UTC)


 * This RFC focuses on authentication in a SOA world, and formulates some architectural goals. One of those goals is a separation of concerns and isolation. Most code should not have access to sensitive user information, so that security issues in random features don't lead to an exposure of sensitive information. Another goal is to push authentication to the lowest layers (storage service) wherever possible to avoid the risk of a confused deputy & address the issues of different services collaborating to provide specific functionality.


 * The solution presented in the AuthStack RFC does not seem to address several of these goals. This leads me to believe that the goals of the two RFCs are actually different. -- Gabriel Wicke (GWicke) (talk) 18:32, 9 June 2014 (UTC)


 * I think "Authentication" is a bad name for this RFC, but can't think of a better one. Maybe "Inter-service user identification, authorization, and session management"?
 * While AuthStack deals (primarily) with Authentication in MediaWiki, this RFC is about MediaWiki acting as an Identity Provider for other services, and how to efficiently make those assertions. If we make a MediaWiki Authentication Service, then it would need to account for all of the stuff discussed in AuthStack, as well as how MediaWiki core would consume those. I don't think that discussion should happen until we have the inter-service session management pieces working in production. CSteipp (talk) 22:39, 12 June 2014 (UTC)

Tokens
gwicke and I had talked about using JWT's for identification. I did a quick test to see the size, and encoding basic information about the user, issuer, validity timestamps, and the list of user rights that user has generates a JWT that's about 4k. The RS256 signature was about 600 bytes larger than HS256, using a 4096 bit rsa key. CSteipp (talk) 22:40, 12 June 2014 (UTC)


 * I'd expect the size to be mostly determined by the key size. Do we need 4096 bits, especially with key rotation? We might also be able to gzip + base64 encode the value for the benefit of plain HTTP users, although this should not do much for the signature. -- Gabriel Wicke (GWicke) (talk) 22:41, 12 June 2014 (UTC)


 * 2048 would probably be ok if we actually do key rotation. Key management is hard, but if someone is willing to stay on top of it, we can assume that. So that takes the signature down to about 300 bytes. Also, we would want to use RS512 to get equivalent security to HS256, so using my test user gives:
 * Uncompressed: RS512 JWT = 4054 B, HS256 JWT = 3755 B
 * Compressed: RS512 JWT = 2482 B, HS256 JWT = 2183 B
 * So ~2.5k overhead on every request. CSteipp (talk) 23:35, 12 June 2014 (UTC)


 * My understanding is that RS256 is recommended (SHA-2, signed with 2048 bit RSA key ). Why do you feel that RS512 is necessary for 2048 bit RSA?
 * My understanding is also that HS256 is just a SHA-2 over the message & a shared secret: "The HMAC SHA-256 MAC is generated per RFC 2104, using SHA-256 as the hash algorithm "H", using the octets of the ASCII [USASCII] representation of the JWS Signing Input as the "text" value, and using the shared key."
 * Based on the size, I'm guessing that HS256 in your data is RS256?
 * So assuming we go with the recommended RS256, it looks like we'd end up with 2183bytes. This compares with 441bytes worth of cookies in production, although some of those will still be needed.
 * Without SPDY / HTTP2 this would not be impossible, but also not ideal. By the current stats at http://caniuse.com/spdy this would affect about 30% of all HTTPS traffic, and all of HTTP traffic. SPDY support will further improve soon with Apple just announcing support and IE gaining it fairly recently.
 * We might still want to wait with using full tokens until SPDY support is more common, and we actually support it as well. Until then we can start using this for API requests. We could also consider storing the tokens in memcached based on the session id, and retrieving those for API requests with a session cookie only. -- Gabriel Wicke (GWicke) (talk) 04:20, 13 June 2014 (UTC)


 * Since the secret key mixed into the hash is unknown, the attacker has to essentially brute force the key that we use-- which happens to be 256 bits when we use HS256 for OAuth right now. To spoof a signature, they "just" need to find a collision in the hash. Sha256 is takes a lot of work to find collisions (I think it's still over well over 128 bits of work, which is virtually impossible), but it's less than 256 bits, so a larger hash ensures that the hash is not weakest part of the signature. The 2048 bit key is approximately equivalent to 112 bits of brute forcing, so that becomes the weaker link. Again, not that any of those attacks are feasible right now, but in 5 years, it's anyone's guess. And no, I did mean HS256 in my test, not RS256.


 * Hmm, but isn't 2048 bit RSA then the weakest link even with SHA-2?
 * I'm surprised that the overall size is that large even with just a SHA-2 signature. It sounds like the JSON itself is fairly large. Could you paste the JSON somewhere? I could try to see if I can represent the user data a bit more compactly. -- Gabriel Wicke (GWicke) (talk) 05:24, 13 June 2014 (UTC)


 * Correct, the json is very large. Like I said, the signature is 300-600 bytes of the 4k. The biggest section is the array of user rights. Much smaller, but second largest is the array of groups the user is a member of. Since groups have different rights per wiki, I think we want both. So a service can know it will grant certain abilities to Stewards, or users with the revisionsuppress right.


 * I was actually thinking about only encoding membership in the 'user' group in the JSON. That's sufficient for the bulk of all requests & actions, and can be represented in a single boolean (if it isn't already implicit in having a token). We could encode more group memberships in a bitmap, but at that point it's IMO fine to call back into the auth service to check whether the user has this rare right or that. As a side effect, this also lets us revoke more sensitive group memberships more quickly than the token validity period.


 * Regarding variance of rights associated with groups across wikis: In the longer term this can be stored per bucket in the storage service. In the shorter term, the storage service can fetch the right info per group from the auth service. -- Gabriel Wicke (GWicke) (talk) 20:51, 13 June 2014 (UTC)


 * I think it will give us much more flexibility long-term to have the rights explicitly in whatever we pass to the service. SAML and OpenID both behave this way. Kerberos, in it's basic form, doesn't. Although microsoft extended kerberos to have the central authority add an assertion about the user's group memberships into the protocol. So if history is an indication, I think we're going to want to have an assertion of the user's permissions in the token. And the basic unit of that is a user right.
 * If want each service to register which rights its interested in first, and the token contains 0/1 values for an specific array of user rights, we could do that. That obviously means if the permissions change, you have to wait for current authorizations to time out before the service can check for those authorizations, but it will make the token much smaller. CSteipp (talk) 20:36, 16 June 2014 (UTC)


 * I'd like to cover the vast majority of requests (95+%) with the minimal overhead possible. This means that cookies / tokens should be fairly small, and that there should be no extra per-request network calls in services while processing such common requests.


 * To me it also seems that calling back into the auth service for rare and sensitive actions would provide us with *more*, not less, flexibility in how quickly we'd like to revoke such sensitive rights. Could you describe a case where the reverse would be true? -- Gabriel Wicke (GWicke) (talk) 20:46, 16 June 2014 (UTC)


 * As we discussed, at minimum, the service needs to be able to say if a user has a right in the context of a given title, so that services can do the equivalent of $title->userCan, and checks by extensions implementing the userCan hook are consulted. Due to that, this service would potentially be impacted by Requests_for_comment/AuthStack, so we need to make sure the implementation of that should take this use case into account. CSteipp (talk) 21:19, 17 June 2014 (UTC)


 * Arbitrary access right schemes will always require arbitrary code, which means that they'll involve a callback into the auth service. A goal of this RFC is to still support such requirements in the auth service while speeding up the typical case where users can read all articles in a wiki. -- Gabriel Wicke (GWicke) (talk) 21:46, 17 June 2014 (UTC)


 * Also, the services should authenticate to the authorization service. I'd prefer mutually authenticated TLS. CSteipp (talk) 21:19, 17 June 2014 (UTC)


 * Ideally we would not place special trust in ordinary services. There should be no way to retrieve non-public information about a user from the auth service without presenting a valid token provided by the user. This means that services can only act on a user's behalf. Mutual authentication of services can additionaly help in a belt-and-suspenders kind of way, but IMHO it should probably not be the primary protection. -- Gabriel Wicke (GWicke) (talk) 21:46, 17 June 2014 (UTC)


 * Added a goal sub-bullet of small token sizes for non-SPDY clients in the RFC. -- Gabriel Wicke (GWicke) (talk) 21:20, 16 June 2014 (UTC)


 * Gwicke, what is your vision for how these tokens would be issued. At login? By calling another service? CSteipp (talk) 21:19, 17 June 2014 (UTC)


 * As we discussed, there can be two flows: a) a cookie-based flow, and b) an OpenID connect based non-browser flow. Lets focus on a) for now.
 * The cookie-based flow is basically identical to that of current session cookies, except that the additional signed information in the cookie lets services authenticate most requests without a need for additional backend / auth service requests. A cookie with a signed and time-limited token is issued on log-in. Timed-out tokens (validity elapsed) can be implicitly refreshed by asking the authentication service for a fresh token, and then sending the result in a set-cookie header. The refresh business can be handled generically by a front-end proxy (restface for example), so that back-end services don't need to deal with it. -- Gabriel Wicke (GWicke) (talk) 21:46, 17 June 2014 (UTC)


 * This makes the validity time period of the token meaningless, unless there's some other authentication happening. If I steal a cookie from a user, I can keep using it forever as long as I periodically send it to the authz service and get a new one. At the very least, it needs to be tied to the user's session so a logout invalidates it. It would be better for the user's browser to do the refresh.


 * How would you propose to do a browser-initiated refresh? -- Gabriel Wicke (GWicke) (talk) 18:45, 18 June 2014 (UTC)


 * For your case b), I'm definitely not convinced. Can you come up with a use case for when a non-browser would use this, but wouldn't use OAuth? CSteipp (talk) 00:45, 18 June 2014 (UTC)


 * Case b) is actually OAuth2. But as I said, lets shelve that for now. -- Gabriel Wicke (GWicke) (talk) 18:45, 18 June 2014 (UTC)

Alternate proposal
I think we're narrowing in on a proposal that will work. I wanted to list what I think we're in pretty close agreement on, and we can work through further details.
 * 1) We use a JWT assertion of the user's basic identity details. We'll set it in a cookie when the user logs in, and periodically as core sees it hasn't been issued in a while and it's convenient to restore.
 * 2) To keep the json small, we'll store:
 * 3) the user's id (which shows they have an account)
 * 4) if they have read access on the wiki
 * 5) if they are blocked
 * 6) current username?
 * 7) The time the token was issued
 * 8) Each service can decide how old a token can be to still trust it. Maybe recommend 5 minutes for non-security-critical services?
 * 9) At any point, a service can exchange the user's session id (or login token?) for a current basic assertion JWT. The reference implementation will be a separate mediawiki endpoint in PHP, but we'll agree on an SLA that it will provide 90% of request in under XXms (we can work out an exact, reasonable number). That may require writing it in a faster language, or HHMV may be fast enough.
 * 10) If a service does this, they should return the new JWT in the user's secure cookie.
 * 11) We'll provide two other methods / services to allow you to:
 * 12) Exchange the basic JWT, or the user's session ID, for a full assertion of all rights on the wiki.
 * 13) Give an authorization determination for a (user + title + action) triple. This will interact pretty deeply with the AuthStack RFC I think.

Open Questions:
 * How do we do key rotation and revocation for the JWT signing key? A service that returns the current key?
 * Key sizes

-- CSteipp (WMF)


 * There is indeed a lot of agreement. We are both shooting for roughly the same system, which allows most requests to be authenticated by checking signatures only. The remaining discussion is about details, which should all be pretty straightforward to work out:


 * Introducing per-wiki right information doesn't scale for SUL. For reads on most private wikis all we need to know is whether the user is authenticated, which is already vouched for by having a valid token in the first place. Additional group memberships can be checked by calling back into the authentication service. Same for blocks.
 * I think the authentication service should provide a way to check individual group membership / block assertions, and perhaps a way to retrieve group memberships / block status per wiki. The translation of group memberships to rights is ultimately specific to each service. As an example, read access for a bucket in a storage service might be restricted to a specific group within the domain.
 * I don't see a need to handle per-page restrictions in the authentication service. In the short and medium term those will continue to be handled by core. In the longer term they could potentially be handled by the storage service.
 * For the implementation, the main things I care about are:
 * Really good response times, so that writes and other less common actions remain snappy. Ballpark based on experience with similar services: < 5ms at the 95th percentile.
 * Isolation from app code, so that eventually the authentication service will be the only service with access to user data, and exploits in app code can't directly compromise this data.


 * -- Gabriel Wicke (GWicke) (talk) 18:33, 3 September 2014 (UTC)


 * Regarding the open question you listed:
 * How do we do key rotation and revocation for the JWT signing key? A service that returns the current key?
 * The authentication service should be able to provide this. Instead of a single key it can also return a set of public keys to be used in the future with their validity time ranges so that services don't need to poll this all the time to remain up to date. We already assume that the clocks are reasonably synchronized (and use NTP everywhere).
 * Key sizes
 * I'm inclined to follow the recommendations / best practices on this one, which from my reading of the RFCs is currently RS256.
 * -- Gabriel Wicke (GWicke) (talk) 19:40, 3 September 2014 (UTC)

RfC meeting
This RFC has been scheduled to be discussed in the Architecture RfC meeting today, 2014-09-03. Sorry for the late notice.--Qgil-WMF (talk) 15:49, 3 September 2014 (UTC)