Project talk:WikiProject Bots

From MediaWiki.org
Jump to: navigation, search

[edit] Bot frameworks launched by MediaWiki extensions

Might it be possible and desirable to create a bot framework that could be launched via MediaWiki extension? The bot frameworks that I know of use the API, but perhaps performance could be enhanced by using the database access functions (or some similar functions, albeit with more limited access) instead.

My thought is that there could be a special page extension that would allow an authorized user to select a bot to launch. The bot code could then be run as a daemon and control would be returned to the special page extension. This bot could have access to the MediaWiki globals and to MediaWiki functions and classes just like any other MediaWiki code. (This would, of course, be a PHP bot framework.)

A few issues: This code would be run on the server side rather than on the client side, so the wiki would, on some shared hosts, need to have a dedicated IP so as to run daemons, and the bot code would have to be highly trusted. Poorly-written bot code could use up a lot of CPU cycles, for instance. Surely, wikis would want to have a code review process before approving new bots, but an overlooked glitch or security breach could still lead to some major havoc being wreaked, and it wouldn't be as easy to trace as edits made through the API.

So, backpedaling a bit, I wonder if there is some way to limit the access that is provided to the bot code, without losing the advantages of running it from within MediaWiki. E.g., allow the bot to read whatever page it wants without having to query the API, but don't allow the bot to delete rows from the database directly; if it wants to delete a page, it has to use a wrapper function from that special page extension, that runs the usual page deletion code from ApiDelete.php or Article.php or whatever, and that causes the deletion to be undoable by others and traceable back to the deleting bot account. I'm not sure whether this will be easy or even possible to implement in PHP.

So, maybe the solution is not to run it from within MediaWiki at all, but just set up a standalone bot framework that, like MediaWiki, is run on the server and knows the database username, password, etc. and thus can directly access the database. To eliminate the need to manually launch bots from the shell, this framework could have some PHP code to implement a web-based interface that would allow authorized users to access it remotely and launch bots. The database access functions would be private functions, not accessible to the individual bots, which would be plugins to the larger bot framework. Those bots would access the database indirectly, through wrapper functions that would limit their access and make all of their changes traceable. Tisane 15:20, 12 June 2010 (UTC)

Both methods have some pretty significant drawbacks. I don't believe its actually possible to run it from within MediaWiki without giving it the same access as any other PHP script, which would include direct database access. With the other method, you'd have to ensure that the wrapper functions work exactly the same way as the ones in MediaWiki, which means the framework could break, or worse, cause data corruption, if the wrong version of the framework is used with a specific version of MediaWiki; that would also make it difficult to run the trunk version of MediaWiki. And if its running from the webserver, there's always the possibility of it just doing require_once('/path/to/wiki/LocalSettings.php'); to get the database password MediaWiki uses. Personally, I don't think the extra performance boost would be worth the extra difficulty. If a site wanted to do something like this, it would probably be easiest to just set up something like the Toolserver and give trusted people restricted shell accounts with read-only DB access, then they can rely on the OS and MySQL for security, rather than PHP. They'd still have to do edits via HTTP, but it would be a lot faster. Mr.Z-man 18:44, 12 June 2010 (UTC)
Good points. Since the database access is mostly just needed for reads rather than writes, the read-only toolserver idea is probably workable. Maybe something like Extension:Asksql could also be used, although bots would need an API-based interface. Tisane 19:27, 12 June 2010 (UTC)
Personal tools

Variants
Actions
Navigation
Support
Download
Development
Communication
Print/export
Toolbox