Topic on Talk:Requests for comment/SessionStorageAPI

Mobrovac-WMF (talkcontribs)

Session data is structured, so it is a bit unclear why value is a mandatory field. Furthermore, since different entities use sessions to store their data, I wonder if it makes sense to allow partial updates/stores, so as to limit the impact of one entity to mess with the data of another entity.

Pchelolo (talkcontribs)

From the stand point of limiting the impact - I would agree this might make sense. However, from performance optimization perspective I really doubt it will make us any good - the values are supposed to be fairly small, so additional costs in serializing/deserializing/network transferring the whole blob should be negligible.

Wrapping the POST/PUT data into a 'value' property makes sense from extensibility standpoint - what if we eventually want to add configurable TTL to the requests, what if we want to add some additional metadata - it will be fairly natural to just add top-level properties in the payload for it.

An alternative would be to use headers for metadata, and, for the TTL example of additional functionality there's a perfect Cache-Controlheader, but even though from REST API perspective I like it more then wrapping payload into value, the latter might be more extensible. I do not have strong preferences.

Mobrovac-WMF (talkcontribs)

I prefer headers as well, and these may be passed both ways (from client to service and vice-versa). I don't believe TTLs and other instructions to the service should be in the payload.

EEvans (WMF) (talkcontribs)

If TTLs are something we impose service-wide, then re-purposing a cache header makes a kind of sense. However, once a client is able to assign them on a per entity basis, then I believe it should become a part of the representation. And, this is consistent with how it is stored and accessed (in Cassandra, you can request a TTL in the query projection, and return it as an attribute in its own right).

DKinzler (WMF) (talkcontribs)

So we'd use /v1/{key}/path to access parts of the session? Sounds nice, this would also remove the need to read/modify/write instead of just writing.

But... should this behave like a tree of paths, and should /v1/{key}/ with no path return all paths / a tree? Or is the session a flat map, so we'd use for /v1/{key}/field? Would there be a way to get the entire map?

Mobrovac-WMF (talkcontribs)

In my view, the API could be /v1/{key}{/field} where field is optional. This means that if the client wants to read/write only a part of the session, it can, but it's not obligatory. If the field name is omitted then one should expect that the whole session data should be retrieved/stored.

Good point about depth. It could be done too, since if we allow the top-most field to be retrieved, the service has to has knowledge about the data's structure.

Pchelolo (talkcontribs)

Especially if we use Cassandra Map under the hood - it's a CRDT so we'd be able to avoid read/modify/write entirely. However, with a map we will not be able to build an entirely hierarchical structure, we will be limited to a flat map (or some fairly complex code to emulate the hierarchal access)

Anomie (talkcontribs)

The problem with providing access like /v1/{key}/path is that sessions in MediaWiki, and in PHP more generally, don't work that way. They read the whole thing, manipulate it in-process, and write the whole thing back out when they're done.

I doubt changing that in MediaWiki would be worth the added complexity. While I haven't actually checked, I doubt much code blindly writes the session without reading from it. I also doubt that trying to read each field on demand would be worthwhile, any savings per GET would almost certainly be outweighed by having to make more GETs as soon as anything accesses more than one field. You'd be left with tracking which fields were written to do individual PUTs for just them, and even making that happen would probably require abandoning the use of BagOStuff as the interface between SessionManager and the storage backend.

Pchelolo (talkcontribs)

There's no consensus whether this is needed, but there was some confusion whether the current API is preventing from adding this later if we want. I believe, first of all, the confusion can be minimized by renaming the key in the API into session_id cause AFAIK we're talking about storing the whole session object here, not actually enumeration all the paths like WMSession:<session_id>:metadata:userName and storing it value-by value.

So, since it's just a session id and it will be URI-encoded, nothing prevents us from adding and optional {/path} parameter to the URI without reserving any specific separators right now. For updating multiple paths at once we could support HTTP PATCH verb, or just make a separate update endpoint. This also does not interfere with CAS semantics if implemented using If-Non-Modified

To sum up, I think since there's no immediate need for this, there's no consensus on this and the current API does not prevent us from adding it in the future when/if needed, I propose to table this discussion for later

EEvans (WMF) (talkcontribs)

We started by defining this thing as a key-value store (opaque key, opaque value). What we're talking about now is a different thing altogether, let's call it a key-map store. I don't think this is the way to approach this. If we really do need a key-map store, then let's specify that (and a key-map would be a super-set of a key-value store, nothing would prevent you from storing a map with one k/v pair).

Pchelolo (talkcontribs)

We're talking about the API here. The API is that you PUT a JSON, you GET and JSON. How would the API change for a key-map store if we ever want to make it a key-map store. Or are you implying that the value in the payload must be a string? I thought it would be an arbitrary JSON object that will be stringified internally before storing it and deserialized upon retrieval.

Mobrovac-WMF (talkcontribs)

The idea would be to allow for such access. Allowing clients to read/write sessions only partially does not imply they would have to do it that way. One could imagine that a more efficient and more consistent way to go about things is to retrieve all the session data, but write only fields that the code path wants/needs to modify, but it's not a requirement.

The point of allowing more fine-grained access is to minimise collisions on write (albeit, it wouldn't completely eliminate them).

EEvans (WMF) (talkcontribs)

I can't help but think we might be prematurely optimizing, succumbing to YAGNI, violating KISS, or <insert tech industry jargon here>. We should be careful, because implementing a simple key-value store as discussed here, and then later implementing a key-map store, might very well end up being less work/time then we could spend spinning on this.

Reply to "GET / PUT values"