Topic on Talk:Citoid

Bloomberg sites say "are you a robot?"

3
Summary by Mvolz (WMF)

Tracked in task T210871

Roy17 (talkcontribs)

I just tried adding an article from https://www.bloomberg.com by citoid, but apparently bloomberg thinks citoid is a bot and asks for a captcha test. Is there anything Citoid developers can do? Or just up to bloomberg IT staff?

PerfektesChaos (talkcontribs)

AFAIK the citoid query at external website declares itself to be Citoid, not a user agent (browser).

One should not try to cheat by pretending to be a browser, since that will be discovered easily on the following dialogue.

Yes, you are right, Bloomberg staff should make a silent exception, but that could be exploited by every other grabber then.

Diegodlh (talkcontribs)

Wouldn't it be great if we could send something like an "Accept: application/ld+json" header meaning that we only want the metadata and not the content? I guess webmasters may have an incentive to do that because it may save them some bandwidth, from crawlers and bots like us. I wonder whether someone has made it a standard already; I haven't checked to be honest.

Anyways, if most websites already fail to embed metadata appropriately, I imagine most wouldn't implement something like this either! :/

Reply to "Bloomberg sites say "are you a robot?""