Topic on Project:Support desk

RSync Alarm wikipedia requests

29
99.102.84.25 (talkcontribs)
Bawolff (talkcontribs)

What is your actual concern?

That page is not really meant for public consumption.

99.102.84.25 (talkcontribs)

We are starting to see a large increase in the number of our requests to Wikipedia API timing out starting as of yesterday

99.102.84.25 (talkcontribs)

It seems to align with when the two rsync alarms began to go into alarm state

Ciencia Al Poder (talkcontribs)
Malyacko (talkcontribs)

Also, what's your user agent used for your requests?

Bawolff (talkcontribs)

It is very unlikely that the rsync alarm has anything to do with that.

75.172.125.42 (talkcontribs)

Is there any other change were going online yesterday that might possibly causing issue? like overall service connection issue?

Malyacko (talkcontribs)

Impossible to say without answers to all the currently unanswered questions in this thread.

75.172.125.42 (talkcontribs)

We are still checking our user agent and check the API Etiquette. But our code base has been out for a few years and the http request is falling suddenly since May 03. Anything possible might be causing the issue?

Malyacko (talkcontribs)

Yes, see above: Ignoring the rate limits, for example.

75.172.125.42 (talkcontribs)

Could you please add more information for the "Ignoring the rate limit"?

Btw after some investigation for the API Etiquette, here are some result:

  1. We usually just query 2 articles for the in the loop and the title is not piped, one of the title could be some transformation with removal the spaces.
  2. We do not continue send request for the result we get from another request, thus we do not have the generator thing in our use case.
  3. We still investigating around the gzip. Is there anything happened recently with gzip request? Like that become mandatory?

And we are only seeing the issue partially, not every request we make is having issue.

Bawolff (talkcontribs)

Gzip is not mandatory.

Which wiki are the requests being made to? What type of api requests (do you have an example)? What is the timestamp in utc when you started noticing the issue? Are you logged in (if so what username makes request? If not, what IP?) What precisely do you mean by "timeout" (are you getting an http response that is an error,if so what error, is your connection just not connecting? Do you not just not recieve an http response after some time (how long), something else?

Bawolff (talkcontribs)

There was database maintenance around this time. While it wasn't supposed to affect anything, there was some reports that it was causing temporary slowness. It may be related to your issue.

97.113.61.16 (talkcontribs)

Which wiki are the requests being made to?

What type of api requests (do you have an example)?

It is a Get request send to en.wikipedia.org/w/api.php

What is the timestamp in utc when you started noticing the issue?

Between 05/03/2022 9am to 10am UTC

Are you logged in (if so what username makes request? If not, what IP?)

No login with username, trying to get the ip.

What precisely do you mean by "timeout" (are you getting an http response that is an error,if so what error, is your connection just not connecting?

Sometimes for some queries the http request is not succeeded, no wiki response returned, with request aborted exception. Continue checking the specific http code.

Do you not just not recieve an http response after some time (how long), something else?

No response  for 3s in US

97.113.61.16 (talkcontribs)

Btw is the database maintenance still going on? We still seeing the issue on our side.

Bawolff (talkcontribs)

No.

Also, no response for 3 seconds sounds more like you should just increase your timeouts. A problem on wmf end would look more like getting a 503 error. Most api endpoints in normal times should respond within 3 seconds but that is not true of all of them.

97.113.61.16 (talkcontribs)

Actually we have a retry. The first call is 3s whlle the second call is 5s.

For the user agent we found mostly they are "Java/1.8.0_211-ea", and several of them are "Java/phoneme_advanced-Core-1.3-b16 sjmc-b111".

97.113.61.16 (talkcontribs)

And with the 5 s retry we are still seeing failure.

Bawolff (talkcontribs)

> For the user agent we found mostly they are "Java/1.8.0_211-ea", and several of them are "Java/phoneme_advanced-Core-1.3-b16 sjmc-b111".

Per WMF's user agent policy, this user agent isn't allowed and could potentially be blocked (you are probably not blocked, as you wouldget an error message). Your user agent must have a contact email adress in it and should have a descriptive name of your tool.


Anyways, i would suggest a timeout of 60 seconds.

97.113.61.16 (talkcontribs)

Yes we didn't see it completely blocked. Is this block up recently? Could you provide the WMF policy link to it or any related doc to it if it is possible? Could you also provide some example for the user agent that is expected?


Meanwhile we will look into the timeout change.

Bawolff (talkcontribs)
67.185.173.77 (talkcontribs)

Ah ok, that makes sense. The strange thing about this issue is that we have not made any changes to our client code that is making these requests in several weeks and we have not seen any of these failures before and then all of a sudden on May 3, we see this immediate spike in timeout failures on up to 10% of our requests and it has continued at this rate since then.  

97.113.61.16 (talkcontribs)

Hi, after more investigation, we got the http exception: Too many requests - for unthrottling, contact noc@wikimedia.org to discuss a less disruptive approach. And the status code is 429. And our current fix is to replace our user agent ("Java/1.8.0_211-ea", and  "Java/phoneme_advanced-Core-1.3-b16 sjmc-b111") with another one ("Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) Safari/533.3") then we stop receiving the throttle messages. And is happening only when POST request for parse the content.

Was there anything could happen on May 03 and caused the throttle on these user agents (or a specific set of user agents) suddenly? What could be a safe user agent we could use for future? Should we exactly follow the user agent policy you posted above? Is there any email list(noc@wikimedia.org) we could join or any metrics we could follow incase this happen again in the future?

Ciencia Al Poder (talkcontribs)
Bawolff (talkcontribs)

You are being blocked for not following the rules, follow the rules that were linked to you and you wont be blocked.

> "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) Safari/533.3") 

Using this user agent is not allowed, and makes it less likely for you to be unblocked as explained in the pages you were linked, because unlike the java user agent which looks like you were accidentally breaking the rules, the browser user agent makes it look intentional.

Malyacko (talkcontribs)

I'm a bit puzzled by questions like "Should we exactly follow the user agent policy you posted above?" No because you are also free to ignore the rules and get blocked instead? :)

97.113.61.16 (talkcontribs)

We are from Kindle device and the request we send out that have issues are from Kindle device. While we doing some other investigation, is there a good/ safe user agent we could use?

And for the current one we are using,

> "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) Safari/533.3")

Do you know how much time we will have before this one get blocked?

Do we get any notification before we get blocked?

And is there anything happened on May 03 that could cause this issue? As we have been using this one for years and we never seen this issue before.

Malyacko (talkcontribs)

> is there a good/ safe user agent we could use?

Any specific one. See https://meta.wikimedia.org/wiki/User-Agent_policy

> Do you know how much time we will have before this one get blocked?

No, you should change it now to follow the rules.

> Do we get any notification before we get blocked?

No. At any time you can avoid getting blocked by following the rules.

> And is there anything happened on May 03 that could cause this issue?

Someone probably looked at logs and then took action.

Reply to "RSync Alarm wikipedia requests"