Topic on Project:Support desk

RSync Alarm wikipedia requests

29 comments • 22:41, 13 May 2022 1 year ago

29

99.102.84.25 (talkcontribs)

Hey it seems the rsync is alarm if active at https://grafana.wikimedia.org/alerting/list and we are not getting success in all of our requests, who can we reach out to about fixing this?

Reply 00:37, 4 May 2022 1 year ago

Bawolff (talkcontribs)

What is your actual concern?

That page is not really meant for public consumption.

Reply Edited 05:04, 4 May 2022 1 year ago

99.102.84.25 (talkcontribs)

We are starting to see a large increase in the number of our requests to Wikipedia API timing out starting as of yesterday

Reply 10:58, 4 May 2022 1 year ago

99.102.84.25 (talkcontribs)

It seems to align with when the two rsync alarms began to go into alarm state

Reply 10:59, 4 May 2022 1 year ago

Ciencia Al Poder (talkcontribs)

Are you following API:Etiquette?

Reply 11:35, 4 May 2022 1 year ago

Malyacko (talkcontribs)

Also, what's your user agent used for your requests?

Reply 12:28, 4 May 2022 1 year ago

Bawolff (talkcontribs)

It is very unlikely that the rsync alarm has anything to do with that.

Reply 15:28, 4 May 2022 1 year ago

75.172.125.42 (talkcontribs)

Is there any other change were going online yesterday that might possibly causing issue? like overall service connection issue?

Reply 20:06, 4 May 2022 1 year ago

Malyacko (talkcontribs)

Impossible to say without answers to all the currently unanswered questions in this thread.

Reply 21:11, 4 May 2022 1 year ago

75.172.125.42 (talkcontribs)

We are still checking our user agent and check the API Etiquette. But our code base has been out for a few years and the http request is falling suddenly since May 03. Anything possible might be causing the issue?

Reply 17:22, 5 May 2022 1 year ago

Malyacko (talkcontribs)

Yes, see above: Ignoring the rate limits, for example.

Reply 17:42, 5 May 2022 1 year ago

75.172.125.42 (talkcontribs)

Could you please add more information for the "Ignoring the rate limit"?

Btw after some investigation for the API Etiquette, here are some result:

We usually just query 2 articles for the in the loop and the title is not piped, one of the title could be some transformation with removal the spaces.
We do not continue send request for the result we get from another request, thus we do not have the generator thing in our use case.
We still investigating around the gzip. Is there anything happened recently with gzip request? Like that become mandatory?

And we are only seeing the issue partially, not every request we make is having issue.

Reply 18:51, 5 May 2022 1 year ago

Bawolff (talkcontribs)

Gzip is not mandatory.

Which wiki are the requests being made to? What type of api requests (do you have an example)? What is the timestamp in utc when you started noticing the issue? Are you logged in (if so what username makes request? If not, what IP?) What precisely do you mean by "timeout" (are you getting an http response that is an error,if so what error, is your connection just not connecting? Do you not just not recieve an http response after some time (how long), something else?

Reply 05:36, 6 May 2022 1 year ago

Bawolff (talkcontribs)

There was database maintenance around this time. While it wasn't supposed to affect anything, there was some reports that it was causing temporary slowness. It may be related to your issue.

Reply 15:06, 6 May 2022 1 year ago

97.113.61.16 (talkcontribs)

Which wiki are the requests being made to?

What type of api requests (do you have an example)?

It is a Get request send to en.wikipedia.org/w/api.php

What is the timestamp in utc when you started noticing the issue?

Between 05/03/2022 9am to 10am UTC

Are you logged in (if so what username makes request? If not, what IP?)

No login with username, trying to get the ip.

What precisely do you mean by "timeout" (are you getting an http response that is an error,if so what error, is your connection just not connecting?

Sometimes for some queries the http request is not succeeded, no wiki response returned, with request aborted exception. Continue checking the specific http code.

Do you not just not recieve an http response after some time (how long), something else?

No response for 3s in US

Reply 22:32, 9 May 2022 1 year ago

97.113.61.16 (talkcontribs)

Btw is the database maintenance still going on? We still seeing the issue on our side.

Reply 01:10, 11 May 2022 1 year ago

Bawolff (talkcontribs)

No.

Also, no response for 3 seconds sounds more like you should just increase your timeouts. A problem on wmf end would look more like getting a 503 error. Most api endpoints in normal times should respond within 3 seconds but that is not true of all of them.

Reply Edited 01:24, 11 May 2022 1 year ago

97.113.61.16 (talkcontribs)

Actually we have a retry. The first call is 3s whlle the second call is 5s.

For the user agent we found mostly they are "Java/1.8.0_211-ea", and several of them are "Java/phoneme_advanced-Core-1.3-b16 sjmc-b111".

Reply 21:38, 11 May 2022 1 year ago

97.113.61.16 (talkcontribs)

And with the 5 s retry we are still seeing failure.

Reply 21:58, 11 May 2022 1 year ago

Bawolff (talkcontribs)

> For the user agent we found mostly they are "Java/1.8.0_211-ea", and several of them are "Java/phoneme_advanced-Core-1.3-b16 sjmc-b111".

Per WMF's user agent policy, this user agent isn't allowed and could potentially be blocked (you are probably not blocked, as you wouldget an error message). Your user agent must have a contact email adress in it and should have a descriptive name of your tool.

Anyways, i would suggest a timeout of 60 seconds.

Reply 00:51, 12 May 2022 1 year ago

97.113.61.16 (talkcontribs)

Yes we didn't see it completely blocked. Is this block up recently? Could you provide the WMF policy link to it or any related doc to it if it is possible? Could you also provide some example for the user agent that is expected?

Meanwhile we will look into the timeout change.

Reply 18:01, 12 May 2022 1 year ago

Bawolff (talkcontribs)

Your not blocked (you would get an error message if you were). The WMF policy is just that that user agent can be blocked arbitrarily. See https://meta.wikimedia.org/wiki/User-Agent_policy

Reply Edited 18:06, 12 May 2022 1 year ago

67.185.173.77 (talkcontribs)

Ah ok, that makes sense. The strange thing about this issue is that we have not made any changes to our client code that is making these requests in several weeks and we have not seen any of these failures before and then all of a sudden on May 3, we see this immediate spike in timeout failures on up to 10% of our requests and it has continued at this rate since then.

Reply 20:13, 12 May 2022 1 year ago

97.113.61.16 (talkcontribs)

Hi, after more investigation, we got the http exception: Too many requests - for unthrottling, contact noc@wikimedia.org to discuss a less disruptive approach. And the status code is 429. And our current fix is to replace our user agent ("Java/1.8.0_211-ea", and "Java/phoneme_advanced-Core-1.3-b16 sjmc-b111") with another one ("Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) Safari/533.3") then we stop receiving the throttle messages. And is happening only when POST request for parse the content.

Was there anything could happen on May 03 and caused the throttle on these user agents (or a specific set of user agents) suddenly? What could be a safe user agent we could use for future? Should we exactly follow the user agent policy you posted above? Is there any email list(noc@wikimedia.org) we could join or any metrics we could follow incase this happen again in the future?

Reply 17:00, 13 May 2022 1 year ago

Ciencia Al Poder (talkcontribs)

Pretty much what we said already about API:Etiquette.

Reply 19:15, 13 May 2022 1 year ago

Bawolff (talkcontribs)

You are being blocked for not following the rules, follow the rules that were linked to you and you wont be blocked.

> "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) Safari/533.3")

Using this user agent is not allowed, and makes it less likely for you to be unblocked as explained in the pages you were linked, because unlike the java user agent which looks like you were accidentally breaking the rules, the browser user agent makes it look intentional.

Reply 21:04, 13 May 2022 1 year ago

Malyacko (talkcontribs)

I'm a bit puzzled by questions like "Should we exactly follow the user agent policy you posted above?" No because you are also free to ignore the rules and get blocked instead? :)

Reply 22:13, 13 May 2022 1 year ago

97.113.61.16 (talkcontribs)

We are from Kindle device and the request we send out that have issues are from Kindle device. While we doing some other investigation, is there a good/ safe user agent we could use?

And for the current one we are using,

> "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US) AppleWebKit/533.3 (KHTML, like Gecko) Safari/533.3")

Do you know how much time we will have before this one get blocked?

Do we get any notification before we get blocked?

And is there anything happened on May 03 that could cause this issue? As we have been using this one for years and we never seen this issue before.

Reply 22:18, 13 May 2022 1 year ago