I just tried adding an article from https://www.bloomberg.com by citoid, but apparently bloomberg thinks citoid is a bot and asks for a captcha test. Is there anything Citoid developers can do? Or just up to bloomberg IT staff?
Topic on Talk:Citoid
AFAIK the citoid query at external website declares itself to be Citoid, not a user agent (browser).
One should not try to cheat by pretending to be a browser, since that will be discovered easily on the following dialogue.
Yes, you are right, Bloomberg staff should make a silent exception, but that could be exploited by every other grabber then.
Wouldn't it be great if we could send something like an "Accept: application/ld+json" header meaning that we only want the metadata and not the content? I guess webmasters may have an incentive to do that because it may save them some bandwidth, from crawlers and bots like us. I wonder whether someone has made it a standard already; I haven't checked to be honest.
Anyways, if most websites already fail to embed metadata appropriately, I imagine most wouldn't implement something like this either! :/