Topic on Talk:Requests for comment/SessionStorageAPI

Tgr (WMF) (talkcontribs)

I'm not sure I understand the threat model this change is supposed to protect against. Is it about an attacker enumerating Redis keys via a PHP remote code execution attack? I don't see how that would be improved by a separate service - as long as the attacker can write keys from PHP (which seems unavoidable), he can just create his own sessions for all targeted accounts which has the same impact.

Or is it about an attacker obtaining the information needed to connect to Redis directly (without a remote code execution vulnerability, which would make that unnecessary) and then enumerating sessions via a direct connection? How would that work?

EEvans (WMF) (talkcontribs)

I hadn't considered the case where an attacker could forge sessions like this, but does that really render moot any concerns over iteration? Surely the discovery and exposure of current/active sessions could be leveraged for all sorts of nastiness too, no?

EEvans (WMF) (talkcontribs)

I wonder, how likely it is that obtaining a reference to the client object would risk exposing other data in Redis (and what that might be).

Tgr (WMF) (talkcontribs)

Any other data from MediaWiki would be free game anyway - session data is special in that the key comes from a browser cookie, but for everything else someone with access to a MediaWiki execution context can easily reconstruct the key. I don't know if we store anything non-MediaWiki-based in Redis. (ORES uses it but I'm not sure if those are the same set of servers.)

Writing into Redis could also be potentially problematic, maybe it could be used to turn a reflected code execution vulnerability into a persistent one if e.g. something stores the name and parameters of a callback function there. Although currently there are easier ways for that as well.

Are user sessions the last remaining use of Redis in MediaWiki? If not, this service wouldn't change the fact that the Redis connection object is exposed.

Tgr (WMF) (talkcontribs)

For an active attack, I don't see what it could be used for that you couldn't do more directly. There is some value in being able to access sessions for snooping - depending on the details of session handling, you might be able to tell if a given user was recently active, which could be used to deanonymize readers. Or maybe find out their IP - we don't currently put that in the session, but it wouldn't be an unreasonable thing to do. So if the threat model is a repressive government trying to identify a user who has been active recently and still has a live session, but did not do any public action (in which case there are easier ways to identify them), it would reduce the threat surface. But for governments the network or the user's machine is a much easier target so making the strongest link even stronger might not have much impact.

Tgr (WMF) (talkcontribs)

Although we do store unencrypted passwords in the session, even if only for very short periods (the time between submitting the login form and submitting the 2FA form) so I guess an attacker could fish for those. If you just want to take over a Wikimedia account, with code execution access there are easier ways, but users tend to reuse passwords, so there's some value in stealing them for attacking other sites.

EEvans (WMF) (talkcontribs)

I can see that at a minimum, this section was poorly constructed if for no other reason than it puts a lot of stress on this one thing. The raison d'ĂȘtre for this project is to make session storage multi-master replicated. So that it is more available within a data-center (i.e. restarting won't cause the loss of a shard and log a bunch of users out), but most importantly so sessions work across data-centers. It's one of the pieces that moves us closer to an active-active configuration.

The most naive implementation would be to simply create a CassandaBagOStuff analogous to the RedisBagOStuff used now, but a convincing argument was made for building it as a service to create additional isolation; To expose a narrow interface supporting nothing more than the use case requires. The idea that if an attacker could obtain a reference to the Redis client object, they'd be able to do nasty stuff was cited as one possible example, but I believe the thinking also ran largely along the lines of making all of the things we hadn't imagined, difficult or impossible too (see https://phabricator.wikimedia.org/T140813 for example). Again, this section was meant as justification for creating an abstraction (a REST interface), not as the primary justification for the project.

I'll restructure it to reflect this.

Tgr (WMF) (talkcontribs)

Yeah, I get that security is just one of the reasons for doing this. If you are planning to create a service wrapper anyway, for better monitoring or whatnot, exposing that as a narrow interface in PHP is a good idea IMO - BagOStuff is not a great interface, it's being used by too many different things which all look like simple key-value stores from a distance, but are used in slightly different ways. (The same could probably be achieved by adding an extra layer of abstraction on top of CassandraBagOStuff though.) But if the sole reason to put the service endpoint between Cassandra and MediaWiki is to add an extra layer of security, I think it is of dubious value. (T140813 has the same problem, actually: for most of the scenarios proposed there it would not thwart an attacker, just force him to do things in a slightly different way. Once you have a remote code execution vulnerability in the main application, the extent to which you can limit the attacker is going to be minimal. To change that, you'd have to move a significant fraction of the business logic into services, not just create thin services whose operation is still controlled by the main app. And that's an effort of an entirely different magnitude.)

EEvans (WMF) (talkcontribs)

Assuming I haven't misunderstood, your main concern here is that the cost:benefit of a service isn't there given the dubious benefits of isolation. I still believe principle of least privilege has some merit here, that there is value in a narrower interface for the vectors we cannot yet imagine, but I do not find fault in any of your rationale; You make a compelling argument.

That said, I think there are other factors that contribute to the benefit side of the equation. Features like monitoring and rate limiting strike me as being more straightforward to implement in an external service. There is also some concern over the PHP driver for Cassandra which uses an extension linked to the C++ driver.

And session storage was only one of several use-cases being evaluated for such a multi-master key-value store. The software used here will be quite simple, and general, and I suspect will see use plenty of re-use. Assuming this is the case, economies of scale will drive down the cost side of the equation as well.

GLavagetto (WMF) (talkcontribs)

In short:

- there is a significant cost involved in converting and packaging a C++ extension to work with HHVM; I'd say larger than developing a service from scratch. Unless we want to wait for HHVM's dismissal in ~ 6 months before thinking of improving how sessions work, we won't be able to use that php extension.

- there is again no doubt that from a security design point of view reducing the API interface is beneficial, and can allow to refine the service in the future to avoid things like mass session invalidation and so on

So I think that an external service is really the best direction to go to.

Tgr (WMF) (talkcontribs)

To repeat what I said, I think this plan makes sense if you do not particularly care about security as a goal and are happy to just get some small and not very consequential side benefits from changes that you would do anyway. (I don't dispute that there are valid operational reasons like monitoring for preferring a simple service over a PHP library.) My concern at the IRC meeting was that people did seem to think security is an important goal, but were unwilling to invest significant time into doing proper security design (starting with threat analysis) because of time pressure from the multi-DC migration, despite the service not even being really necessary for that migration (although it seems I was wrong about that last part). Basically the argument seemed to be "let's set up this service now so that in the future we can have a probably entirely different service that is better for security", which is just extra work compared to doing that different service from start.

(And again, mass session invalidation happens in MediaWiki, it does not touch Redis at all. Mainly because it would be too slow for the cases where we actually need to do mass session invalidation. Se we have user_token and $wgAuthenticationTokenVersion instead.)

EEvans (WMF) (talkcontribs)

What do you anticipate the interface of this BagOStuff alternative would look like? The way the RfC treats it now, the service very much is a simple key-value store, which just documents the contract as it applies to replication. I'd rather be explicit, but It's not clear to me how you'd change that to make it so (I mean, I could come up with something contrived, but I'm not sure how to do it in a meaningful way).

Tgr (WMF) (talkcontribs)

There's nothing wrong with a simple get/set/delete interface. The problem with BagOStuff is that it isn't that - it has everything from locking to counter support, and usually poorly specced out and unclear which backends support it well (e.g. changeTTL() might or might not be atomic and concurrency-safe, depending on the backend).

Edit: CAS support is always nice, if Cassandra can provide it. And you could go one level deeper in get/set - write individual session keys instead of the whole session data object of the user. I don't really have arguments for or against that.

Mobrovac-WMF (talkcontribs)

Note that exposing a narrow API as a service in this case also allows other non-MW parts of the system to directly access session info, in this way decentralising session management and confining it to a specialised sub-system.

Tgr (WMF) (talkcontribs)

Exposing sessions to other applications would be a lot more complicated. For one thing, if you just expose naked session data, you make the system less isolated, not more - there would have to be some access management so that each application can only access its own data plus data explicitly shared by other applications. And you'd have to handle session invalidations - if the user logs out in MediaWiki, other applications have to know the session is not valid, even though the session might not have been deleted (e.g. because it's for a different wiki from where the user clicked on logout). MediaWiki handles that by storing a user token in both the session and the database, updating the DB value on logout, and comparing the two on every request. The session service would have to do something similar.

DKinzler (WMF) (talkcontribs)

In my mind, the security advantage of a separate service over accessing Cassandra directly from MediaWiki is as follows (I have only skimmed the conversation, so my apologies if this is redundant):

With direct access to Cassandra, an attacker that gains code execution in MediaWiki or on application servers could potentially look up sessions by user name, delete/reset user sessions (individually or in bulk), or gain access to other information stored in Cassandra. This may also be prevented by carefully designing the data model to resist such attacks, and my being restrictive about which application has access to what in the Cassandra instance. But it seems much simpler to just prevent all direct access to Cassandra from application servers , and only expose a minimum of functionality.

The concrete thread this protects against is finding session keys by enumerating user names (or by targeting specific users), which would allow impersonation, or could enable an attacker to log other users out of the system completely by repeatedly resetting all sessions, to prevent counter-emasures.

Tgr (WMF) (talkcontribs)

Again, the problem is that logging a user in or out, given the username, are things MediaWiki legitimately needs to do; if an attacker takes over MediaWiki, you can't prevent him from simulating it. You can rate-limit (with or without a specially crafted service) and you can prevent a few special cases that require enumeration, but overall it won't change the attacker's capabilities too much.

DKinzler (WMF) (talkcontribs)

There is no legitimate use case for logging a user out based on the user name, without knowing the session ID, ass far as I know. Logging in, yes, but that needs credentials as well.

Tgr (WMF) (talkcontribs)

Sessions are per-wiki, you only know the session ID for the current wiki but the user needs to be logged out from all of them. And (at least in the current CentralAuth implementation) logout terminates all your sessions, not just the current one. Plus you want to terminate sessions in certain events (password reset, email change etc). And we want to reserve the ability to users out as an emergency action (which we did use a couple times in the past).

For logging in you need credentials, but those are handled by MediaWiki so the session service has no means to verify whether a login attempt is legitimate.

Mobrovac-WMF (talkcontribs)

One can see this as a first step towards better isolation. Most of the concerns raised here apply to both auth(n|z) management as well as session management. I don't think it's conceivable to resolve auth(n|z) issues through session management. That should (and hopefully will be) a second step.

ABaso (WMF) (talkcontribs)

To make sure I understand correctly, is the current state of this discussion suggesting that MediaWiki is the security gateway for accessing the session storage API, and then later on there's a possibility of there being a separate security gateway (I'm assuming that's what the term CAS means here?) for things like validation of the user's session ID, rate limiting, and other security/privacy things?

I get the value of multi-DC as the primary driver near term. Do I understand correctly that for the near term that applications wanting to actually retrieve values for a given session will need the session ID in the key itself (as opposed to in a header or predefined part of the URL)?

As far as session enumeration, if I'm reading correctly, what it seems @Tgr (WMF) is saying is that for the present moment at least, the list of sessions is available to the application server, and therefore the application server (or an attacker with a foothold) will have the context needed to obtain the data associated with all sessions. Is that the correct way to interpret this?

Tgr (WMF) (talkcontribs)

CAS is for compare-and-set (sorry, using TLAs is always a bad idea), or more generally handling situations like two clients simultaneously trying to increment the same key. The BagOStuff interface in MediaWiki gives some assurances that this kind of thing can be done safely so any service that is integrated as a BagOStuff implementation should probably honor that.

Accessing session data outside MediaWiki is complicated and should really be a separate discussion. There is no guarantee that MediaWiki deletes sessions when they become invalid, for example, so exposing the session backend as it is now to other applications is not really useful.

Redis has commands for getting all keys (KEYS, SCAN) so in theory an attacker can enumerate them (no idea how feasible that is in practice, given the large number of sessions). What I am mainly saying is that that is not a realistic attack scenario given that the attacker can always just create its own sessions (which is harder to prevent, since unlike enumeration that's a perfectly normal usage pattern) and the data in the session is not particularly sensitive (apart from maybe T209556 which needs to be fixed anyway and is a simple change).

Reply to "Threat model"