Topic on Project:Support desk

How to prevent unauthorized scraping?

8
Hanstiser (talkcontribs)

Given that disabling the API is deprecated and is going to be removed in a future release, what methods are available for preventing unauthorized scraping of our copyrighted content? A lot of the unauthorized scrapers are coming in via the API and clone entire websites for their spam farm or unauthorized "backup" operations.

Short of firewalling off /api.php and DMCAing infringements, is there an easier way?

2001:16B8:10A6:ED00:BDCF:7E0F:E372:9C20 (talkcontribs)

Allowing, not preventing access is one of the main goals MediaWiki is written for. There surely are some crutches, which more or less well try hiding content inside MediaWiki, but the "correct" answer to your question is this:

If you do not want your content to be publicly available, do not use MediaWiki, at least not at a place, which is publicly accessible.

Johnywhy (talkcontribs)

i disagree. MediaWiki is a software platform, to be used however people want. That's what open source means. Once the software is in the wild, people can and should apply it to their own purposes.

For example, some companies use MediaWiki as an internal knowledge-base, for employees only.

There are a variety of extensions available providing various kinds of privacy or information-hiding.

MediaWiki core includes some information hiding, for example certain admin pages are not accessible to non-admins.

AhmadF.Cheema (talkcontribs)

What was probably, only meant was that MediaWiki, by default, is not designed for access control. Most, if not all, access control extensions here, have some drawbacks.

If content privacy is really important, then either not use MediaWiki or use a much more customized version of it (which might not even exist yet). Enterprise MediaWiki solutions (such as BlueSpice Pro) probably include some robust access control options.

2001:16B8:10A4:2700:88F4:955F:8894:E7B9 (talkcontribs)

That is true. As I said, using MediaWiki at a public place <b>will</b> make the content within the installation public. Sure, using MediaWiki inside an internal network is possible - after all it is not publicly accessible then.

137.147.0.130 (talkcontribs)

If your content is publicly accessible, it can easily be copied, there's no way to prevent this. What you want is fundamentally incompatible with the public web. The API doesn't even really make it easier, as they would then need a functional MW install configured with the same extensions and templates in order to reuse your content, whereas just HTML scraping will mostly work as is on any server.

Johnywhy (talkcontribs)

easier doesn't seem to matter to them, If the OP is correct.

Johnywhy (talkcontribs)
Reply to "How to prevent unauthorized scraping?"