Talk:Core Platform Team/Initiatives/API Gateway

Jump to navigation Jump to search

About this board

On the use of a global pool for anonymous rate-limiting

6
EEvans (WMF) (talkcontribs)

User story 13, Anonymous rate limit, refers to the use of a "global pool". This has been clarified (elsewhere) to mean that requests from all anonymous agents will contribute to a single rate, with the limit to be applied against that rate. I believe this should be reconsidered.

Assuming some limit L and concurrency C (and assuming for the sake of argument that all agents make requests at the same rate), the effective rate becomes L/C. L is constant, but C is not, so the effective per-agent limit is unknowable (without visibility of C). How would we communicate this to users? How do we set expectations?

I believe the more conventional approach would be to apply limits on a per host/IP basis. These limits would be communicated directly to users; Interpretation would be straightforward.

If the concern is that for any given limit, a sufficiently large concurrency will push the system over capacity, then limiting concurrency is probably the appropriate response (and a sufficiently large concurrency could take us down anyway, regardless of rate).

Finally, since (to the best of my knowledge) this requirement isn't about implementing certain semantics for the sake of the product (the API), but about safe guarding the infrastructure that hosts it, we should really wait for SRE to weigh in. I believe they already have ideas (plans?) with regards to this; It may not even be necessary for us to rate-limit anonymous users (if for example this is occurring upstream of the gateway).

EProdromou (WMF) (talkcontribs)

What I'm hearing is that instead of a global pool of API calls for anonymous users, we'd have a rate limit for anonymous users based on their IP address?

As long as we still have all the requirements for rate limits around API keys, and that the per-IP limit wouldn't apply to OAuth-authorised requests, I'm fine with that.

EProdromou (WMF) (talkcontribs)

One thing I want to make sure I understand w/r/t capacity planning.

I think if we have a fixed capacity X, and a fixed anonymous pool Y, and a per-client pool Z, and a known number of registered clients C, then as long as X > Y + (C * Z), we are under capacity.

If we have a per-IP limit B instead of a fixed anonymous pool, and A possible IP addresses, we can only be sure we're under capacity if X > (A * B) + (C * Z). A is really big, here. We could probably use D as "number of IP addresses we actually expect to connect", but that's still big and fuzzy.

I'm OK living with that ambiguity, but it seems to me that the fixed anonymous pool is easier to do capacity planning for, rather than the per-IP limits.

EProdromou (WMF) (talkcontribs)

Another question is whether we would have limits for IPv6 addresses, too.

EEvans (WMF) (talkcontribs)

Yes; I cannot see any reason to treat IPv4 & IPv6 any differently.

EProdromou (WMF) (talkcontribs)

Also, I think having concurrency limits is an OK idea. There are some for public APIs that I've seen; usually something like N connections per [IP, client ID] pair where N is small, like 1 or 2 or 3.

So if an end user is using two different apps, it's OK to use two connections. And if two different users are using the same app on two different machines, that's OK, too.

Reply to "On the use of a global pool for anonymous rate-limiting"

Feed content availability API

1
APaskulin (WMF) (talkcontribs)
Reply to "Feed content availability API"

Wikifeeds endpoint stability

1
APaskulin (WMF) (talkcontribs)

In the docs, the stability classes for the feed endpoints are "unstable" or "experimental", meaning that they aren't subject to versioning. To incorporate these endpoints into the curated API versioning scheme, we'll need to upgrade the class to "stable".

Reply to "Wikifeeds endpoint stability"

Support for Action API and non-OAuth2 auth methods

3
Pchelolo (talkcontribs)

I understand that this document outlines the initiative for the API Gateway, but IMHO it needs a section on how is this fitting into the broader picture.


I think we have to be honest with ourselves that non-OAuth2 auth methods or Action API are not going anywhere in the foreseeable future (before we are all deceased from old age). This means that if the purpose of rate limiting here is to protect our infrastructure, we mustn't only protect one tiny corner while keeping the 99% of it exposed, or we're building something like this


We should at least mention that the rate limiting and perhaps routing infrastructure is intended to be used for all API access eventually and not be tightly coupled with the new APIs only.

Anomie (talkcontribs)

+1.

We've had decent success to date in adding rate limits and concurrency limits into specific expensive endpoints when infrastructure actually needs protecting. We haven't needed OAuth or global limits to do that, and I've yet to see an explanation of how those would improve the situation without causing significant other problems.

KChapman (WMF) (talkcontribs)

I think this fits into a larger architecture picture, which we don't know yet. Yes we will probably apply this further than in two places in the future, but we should first test this initial assumption and see how it goes before trying to plan for everything in the future.

Reply to "Support for Action API and non-OAuth2 auth methods"
EEvans (WMF) (talkcontribs)

The term API key appears throughout this document; Does API key refer to (in all cases) the OAuth client ID?

Assuming the above is true, does that mean that there are two categories of rate limits, one that corresponds to the client IDs of authenticated users (or classes of IDs, etc), and one that applies to everyone else (unauthenticated users)?

EvanProdromou (talkcontribs)

API Key = OAuth 2.0 client ID

Yes, there is a rate limit for requests that have an OAuth 2.0 client ID associated, and a different limit for all unauthenticated requests.

EEvans (WMF) (talkcontribs)

API Key = OAuth 2.0 client ID

Yes, there is a rate limit for requests that have an OAuth 2.0 client ID associated, and a different limit for all unauthenticated requests.

OK, and the requirements are to limit to N requests per Period, by client ID, and N requests per Period for all unauthenticated requests, correct? In other words: We would have N+1 limits defined, where N was the number of registered client IDs, and where each limit represented a pool of requests that could be made by every client included in that pool (i.e. all anon users, and all clients sharing a comment client ID)? Is this correct?

EvanProdromou (talkcontribs)

I think the idea of having classes of client IDs (user story 8) would make it so we would only have X+1 rate limits, where X is the number of classes. Classes might be something like:

- "new" (just registered)

- "normal" (some period of time has passed since registration and/or manual review)

- "preferred" (a developer who's shown a need for more API calls and has gone through a request/review process)

- "paid" (a developer who's paid for more API calls. This is out of scope for this project, but it's how I'd imagine we'd implement the feature).

- "in-house" (One of our official front-end clients, or a batch client like ORES. As close to infinite as we could make it.)

I don't think assigning a rate limit per client ID is a reasonable use of our time. If we feel like we need to set a particular limit for a particular client, we could do that by adding a class with only one member, and setting a limit for that class. If we find ourselves doing that a lot, I'd say we could come back around and add a feature to define a limit per client.

Reply to "Clarification requested"

Please get rid of "API server" terminology

2
Pchelolo (talkcontribs)

What you mean here is a "domain" or better "domain name".

EvanProdromou (talkcontribs)

Fixed, thanks.

An API gateway should only serve external users

3
GLavagetto (WMF) (talkcontribs)

I would suggest the removal of "and internal" from the description.

We've seen over and over how internally sending all the requests from every place through a gateway is a bad idea with RestROUTER.

I would like that on the long term, only external requests would flow through the api gateway, and that internal calls between services would go directly from one to the other.

EvanProdromou (talkcontribs)

By "internal" I mean internal to our organization, that is, WMF client developers -- our mobile apps, desktop and mobile web. I can see how this is easily confused with "internal to our network".

Is there a better term I can use there, instead of "internal"?

EEvans (WMF) (talkcontribs)

In-house?

Reply to "An API gateway should only serve external users"

All API calls count the same for rate limiting

3
Pchelolo (talkcontribs)

This will make this system entirely useless. If all the calls has roughly the same cost, this could work. However in our system we have endpoints that return responses within a few milliseconds running together with endpoints that are extremely costly can take tens of seconds and loads of resources to compute. If the goal of the rate limiter project is to protect the infrastructure, the limit would need to be set to a number, that would protect it for the expensive calls, which would be way too low for inexpensive calls. And wise-versa - if we set a high limit targeted to allow using cheap endpoints, a client can easily destroy the infrastructure just using the whole pool making expensive requests.


For example (NOT REAL NUMBERS I PROPOSE), for a cheap endpoint a reasonable limit could be 1000r/s, while for an expensive on, like parsing a huge page on Parsoid, 5r/s. These are so vastly different, that it would be impossible to come up with a single number to cover both. We need to have per-endpoint limits at least.

If you think that it's more complicated, it's actually easier to use. As a developer making a large project using the API, maintaining local limits when you're making the calls and slowing down if needed, and specifying concurrencies for requests where you're making it is much easier the maintaining a global pool of API requests from all your systems towards us.

EvanProdromou (talkcontribs)

There's a big advantage in usability for client developers in having a simple policy, "X API calls in Y seconds".

How can we balance that against our resource-management requirements? I can see a few ways to do it. First, we could just cost it out based on our most expensive API calls. So if we can handle X very expensive calls per hour, that's the number we use.

Another is that we take a statistical approach: 75% of the calls by all clients are to API endpoint 1, 15% to API endpoint 2, and 10% to API endpoint 3, each with different resource costs. We could set the rate limit so on average clients can make this mix of calls, even if individual developers might make more or fewer expensive calls.

I'm working on a review of the other major API platforms to get an idea of what their rate limit policies are. I want to fall squarely in the middle of the pack.

Pchelolo (talkcontribs)

> There's a big advantage in usability for client developers in having a simple policy, "X API calls in Y seconds".

I would argue with that. Getting rate-limited in one part of your system because something is misbehaving in an entirely different part of the system will be a nightmare to debug. Maintaining a global outgoing request rate limit is no fun in distributed systems. The only benefit I see in this simplicity would be that the docs are simpler.


> So if we can handle X very expensive calls per hour, that's the number we use.

That would make the system unusable for most GET requests.


> Another is that we take a statistical approach:

If the goal of the limit is to protect against malicious agents, this approach would clearly not work. Malicious agent can only make requests to the most expensive endpoint deliberately.


> I'm working on a review of the other major API platforms to get an idea of what their rate limit policies are. I want to fall squarely in the middle of the pack.

This would be an interesting data point in this discussion, eager to see it. But also let's not do cargo-cult programming either :)

Reply to "All API calls count the same for rate limiting"
Pchelolo (talkcontribs)

There's a very old Phabricator task https://phabricator.wikimedia.org/T95229 with all the discussion regarding why we have switched from using rest.wikimedia.org for RESTBase to using project domains with `/api/rest_v1`. We should evaluate whether the reasoning provided in the task and the comments is still valid to avoid repeating the mistakes we've already made.

GLavagetto (WMF) (talkcontribs)

I would very much avoid sharing a domain - what's the advantage, compared to the risk of information leaking between projects for things like authorization?

EvanProdromou (talkcontribs)

Sounds good.

AFAICT, the big downsides to using a specific API domain are the HTTPS initiation and DNS lookup costs. I think that's important if you're already connected to the project domain like en.wikipedia.org, for example, for an in-browser app like VE using the API.

I think it's less of a problem for an API-only client, like a third-party bot or tool (correct me if I'm wrong). I also think it makes things easier for API clients using multiple wikis; they only have to maintain a connection to api.wikimedia.org, not en.wp, en.wv, fr.wb, fr.wt, ...

Since we're not retiring the per-wiki endpoints right now, it seems OK to focus on an API domain; it won't interfere with the current in-browser apps.

Do we need to go further into this?

Reply to "Sharing a domain"
Pchelolo (talkcontribs)

> More evenly distributed API traffic

What does that mean? Distributed across what? Physical servers? Projects? Languages?

EvanProdromou (talkcontribs)

My intention was smoother distribution across time. My rough idea was "fewer surges of API traffic that cause problems for other API users". Maybe just reducing 503 errors is a good enough metric for this.

Reply to "Target metrics"