Talk:Requests for comment/SessionStorageAPI

Jump to navigation Jump to search

About this board

Tgr (WMF) (talkcontribs)

I'm not sure I understand the threat model this change is supposed to protect against. Is it about an attacker enumerating Redis keys via a PHP remote code execution attack? I don't see how that would be improved by a separate service - as long as the attacker can write keys from PHP (which seems unavoidable), he can just create his own sessions for all targeted accounts which has the same impact.

Or is it about an attacker obtaining the information needed to connect to Redis directly (without a remote code execution vulnerability, which would make that unnecessary) and then enumerating sessions via a direct connection? How would that work?

EEvans (WMF) (talkcontribs)

I hadn't considered the case where an attacker could forge sessions like this, but does that really render moot any concerns over iteration? Surely the discovery and exposure of current/active sessions could be leveraged for all sorts of nastiness too, no?

EEvans (WMF) (talkcontribs)

I wonder, how likely it is that obtaining a reference to the client object would risk exposing other data in Redis (and what that might be).

Tgr (WMF) (talkcontribs)

Any other data from MediaWiki would be free game anyway - session data is special in that the key comes from a browser cookie, but for everything else someone with access to a MediaWiki execution context can easily reconstruct the key. I don't know if we store anything non-MediaWiki-based in Redis. (ORES uses it but I'm not sure if those are the same set of servers.)

Writing into Redis could also be potentially problematic, maybe it could be used to turn a reflected code execution vulnerability into a persistent one if e.g. something stores the name and parameters of a callback function there. Although currently there are easier ways for that as well.

Are user sessions the last remaining use of Redis in MediaWiki? If not, this service wouldn't change the fact that the Redis connection object is exposed.

Tgr (WMF) (talkcontribs)

For an active attack, I don't see what it could be used for that you couldn't do more directly. There is some value in being able to access sessions for snooping - depending on the details of session handling, you might be able to tell if a given user was recently active, which could be used to deanonymize readers. Or maybe find out their IP - we don't currently put that in the session, but it wouldn't be an unreasonable thing to do. So if the threat model is a repressive government trying to identify a user who has been active recently and still has a live session, but did not do any public action (in which case there are easier ways to identify them), it would reduce the threat surface. But for governments the network or the user's machine is a much easier target so making the strongest link even stronger might not have much impact.

Tgr (WMF) (talkcontribs)

Although we do store unencrypted passwords in the session, even if only for very short periods (the time between submitting the login form and submitting the 2FA form) so I guess an attacker could fish for those. If you just want to take over a Wikimedia account, with code execution access there are easier ways, but users tend to reuse passwords, so there's some value in stealing them for attacking other sites.

EEvans (WMF) (talkcontribs)

I can see that at a minimum, this section was poorly constructed if for no other reason than it puts a lot of stress on this one thing. The raison d'être for this project is to make session storage multi-master replicated. So that it is more available within a data-center (i.e. restarting won't cause the loss of a shard and log a bunch of users out), but most importantly so sessions work across data-centers. It's one of the pieces that moves us closer to an active-active configuration.

The most naive implementation would be to simply create a CassandaBagOStuff analogous to the RedisBagOStuff used now, but a convincing argument was made for building it as a service to create additional isolation; To expose a narrow interface supporting nothing more than the use case requires. The idea that if an attacker could obtain a reference to the Redis client object, they'd be able to do nasty stuff was cited as one possible example, but I believe the thinking also ran largely along the lines of making all of the things we hadn't imagined, difficult or impossible too (see https://phabricator.wikimedia.org/T140813 for example). Again, this section was meant as justification for creating an abstraction (a REST interface), not as the primary justification for the project.

I'll restructure it to reflect this.

Tgr (WMF) (talkcontribs)

Yeah, I get that security is just one of the reasons for doing this. If you are planning to create a service wrapper anyway, for better monitoring or whatnot, exposing that as a narrow interface in PHP is a good idea IMO - BagOStuff is not a great interface, it's being used by too many different things which all look like simple key-value stores from a distance, but are used in slightly different ways. (The same could probably be achieved by adding an extra layer of abstraction on top of CassandraBagOStuff though.) But if the sole reason to put the service endpoint between Cassandra and MediaWiki is to add an extra layer of security, I think it is of dubious value. (T140813 has the same problem, actually: for most of the scenarios proposed there it would not thwart an attacker, just force him to do things in a slightly different way. Once you have a remote code execution vulnerability in the main application, the extent to which you can limit the attacker is going to be minimal. To change that, you'd have to move a significant fraction of the business logic into services, not just create thin services whose operation is still controlled by the main app. And that's an effort of an entirely different magnitude.)

EEvans (WMF) (talkcontribs)

Assuming I haven't misunderstood, your main concern here is that the cost:benefit of a service isn't there given the dubious benefits of isolation. I still believe principle of least privilege has some merit here, that there is value in a narrower interface for the vectors we cannot yet imagine, but I do not find fault in any of your rationale; You make a compelling argument.

That said, I think there are other factors that contribute to the benefit side of the equation. Features like monitoring and rate limiting strike me as being more straightforward to implement in an external service. There is also some concern over the PHP driver for Cassandra which uses an extension linked to the C++ driver.

And session storage was only one of several use-cases being evaluated for such a multi-master key-value store. The software used here will be quite simple, and general, and I suspect will see use plenty of re-use. Assuming this is the case, economies of scale will drive down the cost side of the equation as well.

GLavagetto (WMF) (talkcontribs)

In short:

- there is a significant cost involved in converting and packaging a C++ extension to work with HHVM; I'd say larger than developing a service from scratch. Unless we want to wait for HHVM's dismissal in ~ 6 months before thinking of improving how sessions work, we won't be able to use that php extension.

- there is again no doubt that from a security design point of view reducing the API interface is beneficial, and can allow to refine the service in the future to avoid things like mass session invalidation and so on

So I think that an external service is really the best direction to go to.

Tgr (WMF) (talkcontribs)

To repeat what I said, I think this plan makes sense if you do not particularly care about security as a goal and are happy to just get some small and not very consequential side benefits from changes that you would do anyway. (I don't dispute that there are valid operational reasons like monitoring for preferring a simple service over a PHP library.) My concern at the IRC meeting was that people did seem to think security is an important goal, but were unwilling to invest significant time into doing proper security design (starting with threat analysis) because of time pressure from the multi-DC migration, despite the service not even being really necessary for that migration (although it seems I was wrong about that last part). Basically the argument seemed to be "let's set up this service now so that in the future we can have a probably entirely different service that is better for security", which is just extra work compared to doing that different service from start.

(And again, mass session invalidation happens in MediaWiki, it does not touch Redis at all. Mainly because it would be too slow for the cases where we actually need to do mass session invalidation. Se we have user_token and $wgAuthenticationTokenVersion instead.)

EEvans (WMF) (talkcontribs)

What do you anticipate the interface of this BagOStuff alternative would look like? The way the RfC treats it now, the service very much is a simple key-value store, which just documents the contract as it applies to replication. I'd rather be explicit, but It's not clear to me how you'd change that to make it so (I mean, I could come up with something contrived, but I'm not sure how to do it in a meaningful way).

Tgr (WMF) (talkcontribs)

There's nothing wrong with a simple get/set/delete interface. The problem with BagOStuff is that it isn't that - it has everything from locking to counter support, and usually poorly specced out and unclear which backends support it well (e.g. changeTTL() might or might not be atomic and concurrency-safe, depending on the backend).

Edit: CAS support is always nice, if Cassandra can provide it. And you could go one level deeper in get/set - write individual session keys instead of the whole session data object of the user. I don't really have arguments for or against that.

Mobrovac-WMF (talkcontribs)

Note that exposing a narrow API as a service in this case also allows other non-MW parts of the system to directly access session info, in this way decentralising session management and confining it to a specialised sub-system.

Tgr (WMF) (talkcontribs)

Exposing sessions to other applications would be a lot more complicated. For one thing, if you just expose naked session data, you make the system less isolated, not more - there would have to be some access management so that each application can only access its own data plus data explicitly shared by other applications. And you'd have to handle session invalidations - if the user logs out in MediaWiki, other applications have to know the session is not valid, even though the session might not have been deleted (e.g. because it's for a different wiki from where the user clicked on logout). MediaWiki handles that by storing a user token in both the session and the database, updating the DB value on logout, and comparing the two on every request. The session service would have to do something similar.

DKinzler (WMF) (talkcontribs)

In my mind, the security advantage of a separate service over accessing Cassandra directly from MediaWiki is as follows (I have only skimmed the conversation, so my apologies if this is redundant):

With direct access to Cassandra, an attacker that gains code execution in MediaWiki or on application servers could potentially look up sessions by user name, delete/reset user sessions (individually or in bulk), or gain access to other information stored in Cassandra. This may also be prevented by carefully designing the data model to resist such attacks, and my being restrictive about which application has access to what in the Cassandra instance. But it seems much simpler to just prevent all direct access to Cassandra from application servers , and only expose a minimum of functionality.

The concrete thread this protects against is finding session keys by enumerating user names (or by targeting specific users), which would allow impersonation, or could enable an attacker to log other users out of the system completely by repeatedly resetting all sessions, to prevent counter-emasures.

Tgr (WMF) (talkcontribs)

Again, the problem is that logging a user in or out, given the username, are things MediaWiki legitimately needs to do; if an attacker takes over MediaWiki, you can't prevent him from simulating it. You can rate-limit (with or without a specially crafted service) and you can prevent a few special cases that require enumeration, but overall it won't change the attacker's capabilities too much.

DKinzler (WMF) (talkcontribs)

There is no legitimate use case for logging a user out based on the user name, without knowing the session ID, ass far as I know. Logging in, yes, but that needs credentials as well.

Tgr (WMF) (talkcontribs)

Sessions are per-wiki, you only know the session ID for the current wiki but the user needs to be logged out from all of them. And (at least in the current CentralAuth implementation) logout terminates all your sessions, not just the current one. Plus you want to terminate sessions in certain events (password reset, email change etc). And we want to reserve the ability to users out as an emergency action (which we did use a couple times in the past).

For logging in you need credentials, but those are handled by MediaWiki so the session service has no means to verify whether a login attempt is legitimate.

Mobrovac-WMF (talkcontribs)

One can see this as a first step towards better isolation. Most of the concerns raised here apply to both auth(n|z) management as well as session management. I don't think it's conceivable to resolve auth(n|z) issues through session management. That should (and hopefully will be) a second step.

ABaso (WMF) (talkcontribs)

To make sure I understand correctly, is the current state of this discussion suggesting that MediaWiki is the security gateway for accessing the session storage API, and then later on there's a possibility of there being a separate security gateway (I'm assuming that's what the term CAS means here?) for things like validation of the user's session ID, rate limiting, and other security/privacy things?

I get the value of multi-DC as the primary driver near term. Do I understand correctly that for the near term that applications wanting to actually retrieve values for a given session will need the session ID in the key itself (as opposed to in a header or predefined part of the URL)?

As far as session enumeration, if I'm reading correctly, what it seems @Tgr (WMF) is saying is that for the present moment at least, the list of sessions is available to the application server, and therefore the application server (or an attacker with a foothold) will have the context needed to obtain the data associated with all sessions. Is that the correct way to interpret this?

Tgr (WMF) (talkcontribs)

CAS is for compare-and-set (sorry, using TLAs is always a bad idea), or more generally handling situations like two clients simultaneously trying to increment the same key. The BagOStuff interface in MediaWiki gives some assurances that this kind of thing can be done safely so any service that is integrated as a BagOStuff implementation should probably honor that.

Accessing session data outside MediaWiki is complicated and should really be a separate discussion. There is no guarantee that MediaWiki deletes sessions when they become invalid, for example, so exposing the session backend as it is now to other applications is not really useful.

Redis has commands for getting all keys (KEYS, SCAN) so in theory an attacker can enumerate them (no idea how feasible that is in practice, given the large number of sessions). What I am mainly saying is that that is not a realistic attack scenario given that the attacker can always just create its own sessions (which is harder to prevent, since unlike enumeration that's a perfectly normal usage pattern) and the data in the session is not particularly sensitive (apart from maybe T209556 which needs to be fixed anyway and is a simple change).

Reply to "Threat model"
Mobrovac-WMF (talkcontribs)

Session data is structured, so it is a bit unclear why value is a mandatory field. Furthermore, since different entities use sessions to store their data, I wonder if it makes sense to allow partial updates/stores, so as to limit the impact of one entity to mess with the data of another entity.

Pchelolo (talkcontribs)

From the stand point of limiting the impact - I would agree this might make sense. However, from performance optimization perspective I really doubt it will make us any good - the values are supposed to be fairly small, so additional costs in serializing/deserializing/network transferring the whole blob should be negligible.

Wrapping the POST/PUT data into a 'value' property makes sense from extensibility standpoint - what if we eventually want to add configurable TTL to the requests, what if we want to add some additional metadata - it will be fairly natural to just add top-level properties in the payload for it.

An alternative would be to use headers for metadata, and, for the TTL example of additional functionality there's a perfect Cache-Controlheader, but even though from REST API perspective I like it more then wrapping payload into value, the latter might be more extensible. I do not have strong preferences.

Mobrovac-WMF (talkcontribs)

I prefer headers as well, and these may be passed both ways (from client to service and vice-versa). I don't believe TTLs and other instructions to the service should be in the payload.

EEvans (WMF) (talkcontribs)

If TTLs are something we impose service-wide, then re-purposing a cache header makes a kind of sense. However, once a client is able to assign them on a per entity basis, then I believe it should become a part of the representation. And, this is consistent with how it is stored and accessed (in Cassandra, you can request a TTL in the query projection, and return it as an attribute in its own right).

DKinzler (WMF) (talkcontribs)

So we'd use /v1/{key}/path to access parts of the session? Sounds nice, this would also remove the need to read/modify/write instead of just writing.

But... should this behave like a tree of paths, and should /v1/{key}/ with no path return all paths / a tree? Or is the session a flat map, so we'd use for /v1/{key}/field? Would there be a way to get the entire map?

Mobrovac-WMF (talkcontribs)

In my view, the API could be /v1/{key}{/field} where field is optional. This means that if the client wants to read/write only a part of the session, it can, but it's not obligatory. If the field name is omitted then one should expect that the whole session data should be retrieved/stored.

Good point about depth. It could be done too, since if we allow the top-most field to be retrieved, the service has to has knowledge about the data's structure.

Pchelolo (talkcontribs)

Especially if we use Cassandra Map under the hood - it's a CRDT so we'd be able to avoid read/modify/write entirely. However, with a map we will not be able to build an entirely hierarchical structure, we will be limited to a flat map (or some fairly complex code to emulate the hierarchal access)

Anomie (talkcontribs)

The problem with providing access like /v1/{key}/path is that sessions in MediaWiki, and in PHP more generally, don't work that way. They read the whole thing, manipulate it in-process, and write the whole thing back out when they're done.

I doubt changing that in MediaWiki would be worth the added complexity. While I haven't actually checked, I doubt much code blindly writes the session without reading from it. I also doubt that trying to read each field on demand would be worthwhile, any savings per GET would almost certainly be outweighed by having to make more GETs as soon as anything accesses more than one field. You'd be left with tracking which fields were written to do individual PUTs for just them, and even making that happen would probably require abandoning the use of BagOStuff as the interface between SessionManager and the storage backend.

Pchelolo (talkcontribs)

There's no consensus whether this is needed, but there was some confusion whether the current API is preventing from adding this later if we want. I believe, first of all, the confusion can be minimized by renaming the key in the API into session_id cause AFAIK we're talking about storing the whole session object here, not actually enumeration all the paths like WMSession:<session_id>:metadata:userName and storing it value-by value.

So, since it's just a session id and it will be URI-encoded, nothing prevents us from adding and optional {/path} parameter to the URI without reserving any specific separators right now. For updating multiple paths at once we could support HTTP PATCH verb, or just make a separate update endpoint. This also does not interfere with CAS semantics if implemented using If-Non-Modified

To sum up, I think since there's no immediate need for this, there's no consensus on this and the current API does not prevent us from adding it in the future when/if needed, I propose to table this discussion for later

EEvans (WMF) (talkcontribs)

We started by defining this thing as a key-value store (opaque key, opaque value). What we're talking about now is a different thing altogether, let's call it a key-map store. I don't think this is the way to approach this. If we really do need a key-map store, then let's specify that (and a key-map would be a super-set of a key-value store, nothing would prevent you from storing a map with one k/v pair).

Pchelolo (talkcontribs)

We're talking about the API here. The API is that you PUT a JSON, you GET and JSON. How would the API change for a key-map store if we ever want to make it a key-map store. Or are you implying that the value in the payload must be a string? I thought it would be an arbitrary JSON object that will be stringified internally before storing it and deserialized upon retrieval.

Mobrovac-WMF (talkcontribs)

The idea would be to allow for such access. Allowing clients to read/write sessions only partially does not imply they would have to do it that way. One could imagine that a more efficient and more consistent way to go about things is to retrieve all the session data, but write only fields that the code path wants/needs to modify, but it's not a requirement.

The point of allowing more fine-grained access is to minimise collisions on write (albeit, it wouldn't completely eliminate them).

EEvans (WMF) (talkcontribs)

I can't help but think we might be prematurely optimizing, succumbing to YAGNI, violating KISS, or <insert tech industry jargon here>. We should be careful, because implementing a simple key-value store as discussed here, and then later implementing a key-map store, might very well end up being less work/time then we could spend spinning on this.

Reply to "GET / PUT values"

https://www.mediawiki.org/wiki/probs/<code> URLs

5
Legoktm (talkcontribs)

These URLs are a bit weird. I don't think "probs" is an appropriate page title, something like API:SessionStorage/Errors#<code> would be more natural IMO.

EEvans (WMF) (talkcontribs)

For what it's worth, that URL was meant more or less as a placeholder; We should definitely establish something more appropriate.

Mobrovac-WMF (talkcontribs)

The idea here is that the problems could be described uniformly across services, hence the generic name. That said, I agree "probs" is probably a bit arbitrary, but I wouldn't specifically create error pages just for this service.

EEvans (WMF) (talkcontribs)

Thinking out loud, but if we can describe them uniformly across all services, then I wonder what the point is? It would seem redundant to put up a page somewhere to describe what has been thoroughly described elsewhere, (a 404 Not Found, for example). It seems like this is only useful if we're able to expound on the meaning of an error in the context of a specific service.

Pchelolo (talkcontribs)

Currently there's no apparent standard between different services on how to report errors.

- Parsoid just returns an HTML with 'cannot GET' for a 404 for example. This should probably be fixed.

- RESTBase is returning a JSON error with a type property somewhat like https://mediawiki.org/wiki/HyperSwitch/errors/not_found

- event streams return a JSON with a type property just not_found

- ORES returns fairly big HTML if the route is not found, 404 with JSON non RFC7807 compliant response if the project is not found, and a 200 with a complex JSON structure where for every score it says 'RevisionNotFound'.

I have only looked into this subset of services, and I've only looked into 404 errors. The question regarding errors format have been raised before, I've filed task T209164 in order to bring other node services to the standard, but after looking more deep into this I believe the scope of the task has to be expanded.

Reply to "https://www.mediawiki.org/wiki/probs/<code> URLs"
Tgr (WMF) (talkcontribs)

Note that MediaWiki uses the term "global session" for sessions which are shared between all wikis (ie. CentralAuth sessions), not sessions which are shared between all servers. Maybe that should be qualified somehow to avoid confusion.

EEvans (WMF) (talkcontribs)

Yeah, that will be confusing. Any suggstions?

Tgr (WMF) (talkcontribs)

Cross-DC sessions maybe? Or wiki-global vs. server-global?

Mobrovac-WMF (talkcontribs)

I'm confused now. SUL users are unique across all wikis, and each of them can have sessions associated with them per-wiki. However, it is my understanding that this is due to practical implementation details (e.g. different domains).

Tgr (WMF) (talkcontribs)

There's a per-wiki session and a global session. Per-wiki session is what you want to use for most things (an extension writing something into the session on enwiki and the same extension on dewiki should not clobber each other); the global session is used to track global login status. (If you log in on enwiki, then visit dewiki, dewiki will not find a local session, so it will check your global session to make sure you are logged in. There's a CentralAuth cookie, separate from the normal session cookie, and set on the .wikipedia.org domain, that stores the key to that session.)

See SessionBackend::save() for the local session handling code and CentralAuthUtils::setCentralSession() for the global session handling code.

DKinzler (WMF) (talkcontribs)

I see the potential for confusion, but I don't quite understand it. Sessions by nature *have* the be the same across application servers and data centers, right? Anything else would just be broken. So there's no need for a special name for this "kind" of session. We should just have the requirement of consistent storage across app servers and DCs as a requirement, and specify the limits (replication delay, etc).

The term "global sessions" should in my opinion be used only for cross-domain sessions. Though "cross-domain session" would probably be a better name :)

EEvans (WMF) (talkcontribs)

For what it's worth: I updated the text to refer to sessions in this context as "valid fleet-wide", in the hopes of avoiding confusion. I think this is only an issue now, in this context, as Daniel points out there'd be no need to ever refer to them as anything other sessions after-the-fact.

Pchelolo (talkcontribs)

During the RFC discussion User:Smalyshev (WMF) pointed out that the API specifies 401 responses while not providing any data regarding auth/autz. I think these parts of the spec should be removed - this is simply a storage of session data, it has no idea whether a particular operation with the session is authorized. I believe limiting access to Mediawiki would be done on the network/firewall level, so it's simply impossible for the service to return a 401. Auth/autz is beyond the scope.

EEvans (WMF) (talkcontribs)

Two things: First, I think there was some confusion during the IRC discussion with respect to matters of authn/authz. I know that at least some people where referring to this in the context of the service providing the means to authenticate and/or authorize Mediawiki users. So, to be clear (and for posterity sake): a status 401 here refers to session storage clients (aka Mediawiki) being unauthorized to read and/or write to session storage, and nothing more.

With that said (and this gets into implementation details we promised to omit from the RfC, but oh well...), something such as this, with the security implications it has, should leverage depth of defense wherever possible. We definitely do want to limit by firewall, but I think we should also encrypt and establish some trust mechanism (which means it is possible to raise a 401, even if it would be an error in our config for that to happen).

Mobrovac-WMF (talkcontribs)

I think it's good to keep it, as one might imagine a future integration or interaction with auth(n|z) mechanisms, but such a thing is out of scope here.

Pchelolo (talkcontribs)

And then nothing will prevent us from adding it back in. It's not a backwards-incompatible change. Having it here now only creates confusion imho

Reply to "401 errors specification"
Pchelolo (talkcontribs)

I guess it would be worth mentioning in the spec that error media-type should be application/problem+json

EEvans (WMF) (talkcontribs)

Good idea; Yes Done

Anomie (talkcontribs)

SessionManager:

Except for the lack of a client-specified TTL, this proposal looks like it has functionality sufficient to meet the needs of SessionManager.

Lack of support for a client-specified TTL isn't critical here. The main problem would come if the service TTL is less than half of the client-specified TTL, as that would prevent SessionManager's refresh logic from working.

CentralAuth:

CentralAuth uses the session storage backend for its own session-like data. It seems to have similar requirements as core SessionManager.

OAuth:

OAuth uses the session storage backend for storing tokens for its authentication process. For the most part it looks like this proposal has functionality sufficient to meet its needs, with two caveats:

  • OAuth uses a "set only if unset" call in several places. While that could be implemented as a GET to check unsetness followed by a PUT, such a mechanism is open to races.
  • OAuth seems to rely on short timeouts for security-related features. The service not honoring client-specified timeouts might allow things to be used for much longer than intended (or prevent nonce reuse for much longer than intended).

BagOStuff:

All of the above interact with the storage service via the BagOStuff PHP interface, and it's reasonable that people maintaining such code would expect the features of BagOStuff to generally work and might be surprised if the BagOStuff backed by your service doesn't.

Fortunately, the base BagOStuff class seems to make very few assumptions of the backend. The main shortcomings are already noted above: client-side TTLs and a non-racey implementation of add(). Implementing those two features should make almost everything else work reasonably well:

  • getWithSetCallback() doesn't seem to be documented as such, but I believe the intention is that the callback is supposed to be a lookup where any races would return the "same" data.
  • getMulti() and setMulti() don't seem to offer snapshot or atomicity guarantees, they're just sugar for a loop doing get() or set().
  • lock() is built on top of add() and short TTLs, and unlock() is just delete().
  • merge() by default uses lock(), although if you implement CAS it could use that instead.
  • incr() uses lock(), decr() is just a negated incr(), and incrWithInit() uses incr() and add().

The only other thing is changeTTL(), which by default is racey. I don't know why it doesn't use lock().

EEvans (WMF) (talkcontribs)

It has been suggested elsewhere that BagOStuff might be the wrong interface for sessions. To clarifiy: What is the intersection between the enumerated list above, and the features needed for session management?

Anomie (talkcontribs)

SessionManager uses get(), set(), and delete(). It uses client-specified TTLs, although as long as the server's TTL isn't too short compared to $wgObjectCacheSessionExpiry it shouldn't be much of a problem.

Keep in mind that moving SessionManager, CentralAuth, and OAuth all off of BagOStuff (or changing the latter two to a variable other than $wgSessionCacheType) could well be more work than just implementing TTLs and add(). Particularly when you remember that session management has to continue to work for third-party wikis that don't install your new service.

Mobrovac-WMF (talkcontribs)

Perhaps it would be worth to investigate whether a narrower interface is more appropriate for session management. Given that it would be narrower by definition, it could be made to have a BagOStuff fallback, which alleviates the third-party concern.

Anomie (talkcontribs)

On the other hand, as I pointed out, that would require changing the exiting code that configures and uses the BagOStuff. You'd probably wind up making the configuration more complex for all users to save yourself a tiny bit of complexity in implementing the interface to your service.

Pchelolo (talkcontribs)

OAuth uses a "set only if unset" call in several places. While that could be implemented as a GET to check unsetness followed by a PUT, such a mechanism is open to races.

This could be supported using Cassandra lightweight transactions The performance penalty will have to be payed (fairly significant penaly, since, quote "the latency of operations increases fourfold due to the due to the round-trips necessary between the CAS coordinators").

In the RESTful API this feature could be exposed with supporting If-None-Match: * header for the PUT request, since per RFC7232 "If-None-Match can also be used with a value of "*" to prevent an unsafe request method (e.g., PUT) from inadvertently modifying an existing representation of the target resource when the client believes that the resource does not have a current representation (Section 4.2.1 of ). This is a variation on the "lost update" problem that might arise if more than one client attempts to create an initial representation for the target resource."

Mobrovac-WMF (talkcontribs)

Alternatively, we could make the distinction clearer in the API: PUT does "set if unset", i.e. "if not exists", whereas POST can be used for creating a new record or updating an existing one (with CAS semantics). We could also have PATCH that only updates the record.

Reply to "An analysis of requirements"

Values persist until expiring, the result of a TTL configured on a site-wide basis.

13
Pchelolo (talkcontribs)

You write 'site-wide basis'. This is ambiguous. Are we talking about 'site' as wiki-specific TTL? Are we talking global TTL for the whole 'site'? If that's the former - I don't see how that could be possibly implemented

EEvans (WMF) (talkcontribs)

Site-wide here is used here in the context of the session storage service, so it means that the service is configured to expire records after a TTL period, and everything stored uses that TTL. Is that behavior wrong? If not, and that was just unclear, do you have suggestions for how to make that more obvious?

EEvans (WMF) (talkcontribs)

To update this thread with discussion that has occurred elsewhere: a site-wide configuration (in the sense that everything is stored with a TTL specified in the service's configuration), might not be the way to do this. Though it is not apparent this is needed at the WMF, setting a TTL on the persistence is a part of the contact, vis-a-vis Manual:$wgObjectCacheSessionExpiry, (or via Manual:Hooks/SessionMetadata).

It may be necessary that the service accept an attribute for the TTL on write.

Tgr (WMF) (talkcontribs)
or via Manual:Hooks/SessionMetadata

I misread that part of the code, sorry. The hook cannot change existing keys in the metadata array (including expiration time), it can only add new ones.

Pchelolo (talkcontribs)

It seems like the wgObjectCacheSessionExpiry is a no-op by now as well, reopened task T158829 to figure out whether my conclusion is correct and deprecate it.

Mobrovac-WMF (talkcontribs)

We have global user settings, so "site-wide" should mean "all SUL wikis" here. It doesn't make sense to have per-wiki TTLs, because we don't have per-wiki users.

DKinzler (WMF) (talkcontribs)

I don't think we have to support per-write TTLs for sessions. We shouldn't let the BagOStuff interface influence the design of the service interface.

Anyway, I think "site-wide" is confusing. "Per-instance" makes more sense in my mind. "Site" is not a concept that is well defined in the context of this service. An instance of the service is.

EEvans (WMF) (talkcontribs)

Anyway, I think "site-wide" is confusing...

I've clarified this in the text.

Anomie (talkcontribs)

In Topic:Uoge9eeznt3nntm0 I analyzed the requirements of things using the existing Redis session store, and it seems to me that at least OAuth probably requires client-specified TTLs.

Or rewriting to use a different store, or having SessionManager confusingly not use $wgSessionCacheType anymore.

Pchelolo (talkcontribs)

I believe there was a consensus on the RFC meeting that client-supplied TTL feature is required.

Regarding the API for the feature, we could either require a Cache-Control: max-age <TTL> header, or include the TTL as a required property into the payload. I personally slightly prefer the former, as I'm personally not yet convinced we need to wrap the actual data into a JSON with a single value property.

Also, SMalyshev have raised a question of what happens if for some reason someone forgets about the data, or (if we require the Cache-Control header) - sets it to close-to-infinite TTL - then the data will be there forever. So, I believe apart of the client-supplied TTL we need to set service-wide default TTL and cap the client-supplied TTL to it. Something significantly higher then we use now (which is 3600s)

Tgr (WMF) (talkcontribs)

At a glance, the longest expiry we currently use is one day, for CentralAuth global sessions.

Mobrovac-WMF (talkcontribs)

It was evoked in the IRC meeting that client-supplied TTLs might be needed, but I'm not really persuaded that is true. Even if we do allow per-request TTls, these should be taken into account only if they are lower than the configured, default TTL, IMHO.

Pchelolo (talkcontribs)

> It was evoked in the IRC meeting that client-supplied TTLs might be needed

To quote the RFC meeting,

22:08:23 <TimStarling> the only thing missing is client-side specification of the TTL

22:26:14 <TimStarling> also my strong preference is for there to be client-side TTL specification

22:28:35 <_joe_> SMalyshev: the proposed interface has a server-side TTL

22:35:09 <_joe_> can we say we have a consensus on client-set TTLs to be a desirable feature?

22:35:54 <SMalyshev> yes. q: is client TTL still subject to server one-fits-all ttl or replaces it?

22:36:20 <tgr> if the service is implemented as a new BagOStuff, which seems to be the easy way, it should honor the BagOStuff contract, and that includes TTL

22:36:21 <TimStarling> one reason I don't like server-side TTL is because configuring services is a PITA

22:36:25 <_joe_> mobrovac: but we want the clients to be able to set a LOWER ttl for a specifc session

22:36:51 <tgr> and adding TTL is easy, so really it seems less effort to support it than not

22:37:25 <_joe_> so we need client-side TTLs :)

Sounds like a consensus to me :)

Reply to "Values persist until expiring, the result of a TTL configured on a site-wide basis."
Tgr (WMF) (talkcontribs)

It would be nice to include some information on how the API inside MediaWiki is going to look. The two obvious options are

  1. add a new SessionService to the DI container with get/set/delete methods, have the default implementation fall back to the BagOStuff instance returned by ObjectCache::getInstance( $wgSessionCacheType ) and use $wgObjectCacheSessionExpiry for expiry, provide an alternative implementation that uses the new REST service (directly or via VirtualRESTService). This would require some (probably fairly trivial) refactoring in SessionManager/SessionBackend, CentralAuth and maybe OAuth.
  2. add a new BagOStuff implementation that talks to the service. As mentioned elsewhere, that would require client-side TTL support and some sort of CAS or transaction handling (the latter is probably a good thing to have anyway). It would also prevent exposing any functionality that is not supported by BagOStuff, although the current service doesn't include such functionality anyway.
Reply to "MediaWiki API"
Pchelolo (talkcontribs)

Several points regarding API consistency with other services we have:

  1. In the error response JSON you have instance property to indicate the requested URI. In other services we call the property uri
  2. We do have a tendency of returning the original request method together with the error information
EEvans (WMF) (talkcontribs)

So, instance comes from RFC7807, do you happen to know why we used uri instead (or why we added method, for that matter)?

Needing a rationale for doing it one way or another, I'm not sure I'd favor consistency with RESTBase over RFC7807 (at least not without rationale for why RESTBase is the way it is).

Also: The RESTBase online docs are wrong. They indicate that clients should expect an RFC7807 error response, and we return...well, something else.

Pchelolo (talkcontribs)

The history of this decision has vanished, it was like that even before I've joined. I don't think any application relies on this, so we can deprecate the use of uri property name and replace it with instance in all the other services.

Mobrovac-WMF (talkcontribs)

The uri part comes from the fact that we want(ed) to have a unique identifier for each resource, but yes, you are correct that it is not mentioned in RFC7807. In fact, we use uri as instance. I would be for correcting that mistake here and gradually introducing instance in other error responses we send out.

Pchelolo (talkcontribs)
There are no older topics