Topic on Talk:Citoid

Bloomberg sites say "are you a robot?"

3 comments • 18:00, 6 June 2022 1 year ago

3

Summary by Mvolz (WMF)

Roy17 (talkcontribs)

I just tried adding an article from https://www.bloomberg.com by citoid, but apparently bloomberg thinks citoid is a bot and asks for a captcha test. Is there anything Citoid developers can do? Or just up to bloomberg IT staff?

Reply 21:36, 9 February 2019 5 years ago

PerfektesChaos (talkcontribs)

AFAIK the citoid query at external website declares itself to be Citoid, not a user agent (browser).

One should not try to cheat by pretending to be a browser, since that will be discovered easily on the following dialogue.

Yes, you are right, Bloomberg staff should make a silent exception, but that could be exploited by every other grabber then.

Reply 08:11, 16 February 2019 5 years ago

Diegodlh (talkcontribs)

Wouldn't it be great if we could send something like an "Accept: application/ld+json" header meaning that we only want the metadata and not the content? I guess webmasters may have an incentive to do that because it may save them some bandwidth, from crawlers and bots like us. I wonder whether someone has made it a standard already; I haven't checked to be honest.

Anyways, if most websites already fail to embed metadata appropriately, I imagine most wouldn't implement something like this either! :/

Reply 18:00, 6 June 2022 1 year ago

Reply to "Bloomberg sites say "are you a robot?""