Manual:Creating a bot/ja

MediaWiki Robots'または単にbots は、あたかも人間の編集者であるかのように Wikipedia (及び他のWikimedia プロジェクト) と対話する自動プロセスです. このページでは、ウィキメディアのプロジェクト群で使用するボットの開発方法を説明しようと試みています. この内容の多くは、MediaWiki ベースの他のウィキにも転用できます. 主に、プログラミングの経験はあるものの、その知識をどのように MediaWiki のボット作成に活かせるか分からない方を対象に解説しています.



ボットを作成する必要がある理由
ボットはタスクを自動化し、人間よりもはるかに高速に実行できます. 何度も実行する必要がある単純なタスク(例えば1,000ページのカテゴリ内のすべてのページにtemplateを追加するなど)は、人間よりもボットに適したタスクです.



ボット作成前の注意点


既存のボットを再利用する
多くの場合、既存のボットにボット作業を依頼する方がはるかに簡単です. 定期的な依頼しかない場合、またはプログラミングに不安がある場合は、通常これが最適な解決策です. 一部の Wikiには、そのようなリクエストを行うことができる専用ページがあります. さらに、誰でも使用できるツールが多数あります. これらのほとんどは、MediaWiki に特化した機能を持つ強化されたウェブ ブラウザーの形をとっています. The most popular of these is AutoWikiBrowser (AWB), a browser specifically designed to assist with editing on Wikipedia and other Wikimedia projects. ツールのより完全な一覧は英語版ウィキペディアの w:Wikipedia:Tools/Editing tools にあります. AWB のようなツールは、プログラミングをほとんど理解しなくても操作できることが多いです.

Reuse codebase
要件の頻度や新規性から独自のボットが必要だと判断した場合でも、ゼロから書く必要はありません. 多くのボットはソース コードを公開しており、少ない開発期間で再利用できる場合があります. また、標準的なボット フレームワークもダウンロードできるものが多数あります. これらのフレームワークは、ボットのコードの大部分を構成しています. これらのボット フレームワークは一般的に使用されており、複雑なコーディングは他者によって行われ、多くのテストが行われているため、これらのフレームワークに基づいていればボットの使用を承認してもらうのははるかに簡単です. The most popular and common of these frameworks is (PWB), a bot framework written in Python. It is thoroughly documented and tested and many standardized Pywikibot scripts (bot instructions) are already available. ボット フレームワークの他の例は、こちらにあります. PWB のようなボット フレームワークの一部では、フレームワークを構成する複雑なコードが他の人によって書かれ、テストされ、頻繁に更新されているため、ボットを正常に実行するには、スクリプトに関する一般的な知識があれば十分です (これらのボットでは、フレームワークの更新を定期的に適用することが重要です).

Important questions
ボットを新規作成したい場合、かなりのプログラミング能力を必要とする場合があることと、また、全く新しいボットは、通常の運用を承認されるまでにかなりのテストを受ける必要があることをご承知おきください. 間違いのない、効率的で効果的なプログラムを得るためには、計画立案が重要です. 以下の初期検討事項が重要です:


 * ボットは手動で補助するのか、それとも完全に自動化するのか?
 * 一人で作成するのか、それとも他のプログラマーと一緒に作成するのか?
 * ボットの実装にはどの言語を使用するのか?
 * ボットのリクエスト、編集、その他の操作は記録に格納されるのか? その場合は、記録はローカル メディアやウィキ ページに格納されるのか?
 * ボットはウェブ ブラウザー内で動作するのか? (例えば JavaScript で書かれたもの) それともスタンドアローン プログラムなのか?
 * If the bot is a standalone program, will it run on your local computer, or on a remote server such as the Toolforge?
 * リモート サーバーでボットを動作させた場合、他の編集者がボットを操作したり、起動させたりできるのか?



Overview of operation
人間の編集者と同じように、MediaWiki ボットはウィキ ページを読み、変更が必要だと思うところに変更を加えます. ボットは人間よりも高速で、疲れにくいのですが、人間より聡明であるという違いがあります. ボットは、簡単に定義できるパターンを持つ反復作業や、ほとんど意思決定が必要ない作業を得意としています.

最も典型的なケースでは、ボットは自分のアカウントにログインして、ブラウザーが行うのとほぼ同じ方法で (画面にはページを表示せず、メモリ上で作業しますが) ウィキにページをリクエストして、プログラムによってページのコードを調べ、変更が必要かどうかを確認します. ボットが設計されたとおりに編集を行い送信します. これもブラウザーと同じような方法です.

ボットは人間と同じようにページにアクセスするため、ボットも人間の利用者と同じような困難を経験する可能性があります. ページのリクエストや編集の際に、編集の競合に巻き込まれたり、ページがタイムアウトしたり、その他予期せぬ事態に遭遇するおそれがあるのです. ボットの作業量は生身の人間の作業量よりも多いため、こうした問題点が発生しやすいのです. したがって、ボットを記述する際は、これらの状況を考慮することが重要です.

APIs for bots
In order to make changes to wiki pages, a bot necessarily has to retrieve pages from the wiki and send edits back. There are several Application Programming Interfaces (APIs) available for that purpose.


 * MediaWiki Action API (api.php). This web service was specifically written to permit automated processes such as bots to make queries and post changes. Data is returned in JSON format (see output formats for more details).
 * Status: Built-in feature of MediaWiki, available on all Wikimedia servers. Other non-Wikimedia wikis may disable or restrict write access.
 * There is also an API sandbox for those wanting to test api.php's features.
 * Special:Export can be used to obtain bulk export of page content in XML form. See Manual:Parameters to Special:Export for arguments;
 * Status: Built-in feature of MediaWiki, available on all Wikimedia servers.
 * Raw (Wikitext) page processing: sending a  or a   GET request to index.php will give the unprocessed wikitext source code of a page. For example:  . An API query with   or   is roughly equivalent, and allows for retrieving additional information.
 * Status: Built-in feature of MediaWiki, available on all Wikimedia servers.

Some web servers are configured to grant requests for compressed (gzip) content. This can be done by including a line "Accept-Encoding: gzip" in the HTTP request header; if the HTTP reply header contains "Content-Encoding: gzip", the document is in gzip form, otherwise, it is in the regular uncompressed form. Note that this is specific to the web server and not to the MediaWiki software. Other sites employing MediaWiki may not have this feature. If you are using an existing bot framework, it should handle low-level operations like this.

Logging in
Approved bots need to be logged in to make edits. Although a bot can make read requests without logging in, bots that have completed testing should log in for all activities. Bots logged in from an account with the bot flag (see #Bot Flag below) can obtain more results per query from the Mediawiki API (api.php). Most bot frameworks should handle login and cookies automatically, but if you are not using an existing framework, you will need to follow these steps.

For security, login data must be passed using the HTTP POST method. Because parameters of HTTP GET requests are easily visible in URL, logins via GET are disabled.

To log a bot in using the MediaWiki API, 2 POST requests are needed:

Request 1 – this is a GET request to obtain a login token
 * URL:

This will return a "logintoken" parameter in JSON form, as documented at API:Login. Other output formats are available. It will also return HTTP cookies as described below.

Request 2 – this is a POST to complete the login where TOKEN is the token from the previous result. The HTTP cookies from the previous request must also be passed with the second request.
 * URL:
 * POST parameters:

A successful login attempt will result in the Wikimedia server setting several HTTP cookies. The bot must save these cookies and send them back every time it makes a request (this is particularly crucial for editing). On the English Wikipedia, the following cookies should be used: enwikiUserID, enwikiToken, and enwikiUserName. The enwikisession cookie is required to actually send an edit or commit some change, otherwise the MediaWiki:Session fail preview error message will be returned.

Main-account login via  is deprecated and may stop working without warning. To continue using bot code which logs in with, see Special:BotPasswords.

Editing; edit tokens
MediaWiki uses a system of edit tokens for making edits to MediaWiki pages, as well as other operations that modify existing content such as rollback. The token looks like a long hexadecimal number followed by '+\', for example:


 * d41d8cd98f00b204e9800998ecf8427e+\

The role of edit tokens is to prevent "edit hijacking", where users are tricked into making an edit by clicking a single link.

The editing process involves two HTTP requests. First, a request for an edit token must be made. Then, a second HTTP request must be made that sends the new content of the page along with the edit token just obtained. It is not possible to make an edit in a single HTTP request. An edit token remains the same for the duration of a logged-in session, so the edit token needs to be retrieved only once and can be used for all subsequent edits.

To obtain an edit token, follow these steps:

 MediaWiki API (api.php). Make a request with the following parameters (see API:Edit - Create&Edit pages).       The token will be returned in the  attribute of the response.  

If the edit token the bot receives does not have the hexadecimal string (i.e., the edit token is just '+\') then the bot most likely is not logged in. This might be due to a number of factors: failure in authentication with the server, a dropped connection, a timeout of some sort, or an error in storing or returning the correct cookies. If it is not because of a programming error, just log in again to refresh the login cookies. The bots must use assertion to make sure that they are logged in.

Edit conflicts
Edit conflicts occur when multiple, overlapping edit attempts are made on the same page. Almost every bot will eventually get caught in an edit conflict of one sort or another, and should include some mechanism to test for and accommodate these issues.

Bots that use the Mediawiki API (api.php) should retrieve the edit token, along with the  and the last revision "base" timestamp, before loading the page text in preparation for the edit;   can be used to retrieve both the token and page contents in one query ( |revisions&inprop=&intoken=edit&rvprop=timestamp|content example ). When submitting the edit, set the  and   attributes, and check the server responses for indications of errors. For more details, see API:Edit - Create&Edit pages.

Generally speaking, if an edit fails to complete the bot should check the page again before trying to make a new edit, to make sure the edit is still appropriate. Further, if a bot rechecks a page to resubmit a change, it should be careful to avoid any behavior that could lead to an infinite loop and any behavior that could even resemble edit warring.

Overview of the process of developing a bot
Actually, coding or writing a bot is only one part of developing a bot.

The development cycle below is a recommendation from English Wikipedia.

On Wikimedia wikis, ensure that your bot follows any potential bot policies of the wiki.

Idea

 * The first task in creating a MediaWiki bot is extracting the requirements or coming up with an idea.
 * Make sure an existing bot isn't already doing what you think your bot should do.

仕様

 * Specification is the task of precisely describing the software to be written, possibly in a rigorous way. You should come up with a detailed proposal of what you want it to do. Try to discuss this proposal with some editors and refine it based on feedback. Even a great idea can be made better by incorporating ideas from other editors.
 * In the most basic form, your specified bot must meet the following criteria:
 * The bot is harmless (it must not make edits that could be considered disruptive to the smooth running of the wiki)
 * The bot is useful (it provides a useful service more effectively than a human editor could)
 * The bot does not waste server resources.

Software architecture

 * Think about how you might create it and which programming language(s) and tools you would use. Architecture is concerned with making sure the software system will meet the requirements of the product as well as ensuring that future requirements can be addressed. Certain programming languages are better suited to some tasks than others, for more details see the section on programming languages below.

実装
Implementation (or coding) involves turning design and planning into code. It may be the most obvious part of the software engineering job, but it is not necessarily the largest portion. In the implementation stage you should:


 * Create an account for your bot. Go to the sign up page when logged in to create the account, linking it to yours. (If you do not create the bot account while logged in, it might be blocked on some wikis according to their policies)
 * Create a user page for your bot. Your bot's edits must not be made under your own account. Your bot will need its own account with its own username and password.
 * Add the same information to the user page of the bot. It would be a good idea to add a link to the approval page (whether approved or not) for each function.
 * Code your bot in your chosen programming language.

Testing
A good way of testing your bot as you are developing is to have it show the changes (if any) it would have made to a page, rather than actually editing the live wiki. Some bot frameworks (such as pywikibot) have pre-coded methods for showing diffs.

説明文書
An important (and often overlooked) task is documenting the internal design of your bot for the purpose of future maintenance and enhancement. This is especially important if you are going to allow clones of your bot. Ideally, you should post the source code of your bot on its userpage or in a revision control system (see #Open-source bots) if you want others to be able to run clones of it. This code should be well documented (usually using comments) for ease of use.

Queries/Complaints
You should be ready to respond to queries about or objections to your bot on your user talk page, especially if it is operating in a potentially sensitive area.

Maintenance
Maintaining and enhancing your bot to cope with newly discovered bugs or new requirements can take far more time than the initial development of the software. Not only may it be necessary to add code that does not fit the original design, but just determining how software works at some point after it is completed may require significant effort (this is another reason to document your code as you go along).

General guidelines for running a bot
In addition to the official bot policy, which covers the main points to consider when developing your bot, there are a number of more general advisory points to consider when developing your bot.

Bot best practices

 * Set a custom User-Agent header for your bot (per the Wikimedia User-Agent policy, if your bot will be operating on Wikimedia wikis). If you don't, your bot may encounter errors and may end up blocked at the server level.
 * Use the maxlag parameter with a maximum lag of 5 seconds. This will enable the bot to run quickly when server load is low, and throttle the bot when server load is high.
 * If writing a bot in a framework that does not support maxlag, limit the total requests (read and write requests together) to no more than 10/minute.
 * Use the MediaWiki API whenever possible, and set the query limits to the largest values that the server permits, to minimize the total number of requests that must be made.
 * Edit (write) requests are more expensive in server time than read requests. Be edit-light and design your code to keep edits to a minimum.
 * Try to consolidate edits. One single large edit is better than 10 smaller ones.
 * Enable HTTP persistent connections and compression in your HTTP client library, if possible.
 * Do not make multi-threaded requests. Wait for one server request to complete before beginning another
 * Back off upon receiving errors from the server. Errors such as timeouts are often an indication of heavy server load. Use a sequence of increasingly longer delays between repeated requests.
 * Make use of assertion to ensure your bot is logged in.
 * Test your code thoroughly before making large automated runs. Individually examine all edits on trial runs to verify they are perfect.

Manual assistance
If your bot is doing anything that requires judgment or evaluation of context (e.g., correcting spelling) then you should consider making your bot manually-assisted, which means that a human verifies all edits before they are saved. This significantly reduces the bot's speed, but it also significantly reduces errors.

Disabling the bot
It is good bot policy to have a feature to disable the bot's operation if it is requested. Remember that if your bot goes bad, it is your responsibility to clean up after it! You could have the bot refuse to run if a message has been left on its talk page, on the assumption that the message may be a complaint against its activities; this can be checked using the API  query (example on English Wikipedia). Or you could have a page that will turn the bot off if text on the page is changed (e.g. require the page be empty, contain only the word "True", or something similar); this can be checked by loading the page contents before each edit.

Signature
Just like a human, if your bot makes edits to a talk page in MediaWiki, it should sign its post with four tildes (~). Signatures usually belong only on talk namespaces.

Bot Flag
A bot's edits will be visible at Special:RecentChanges, unless the edits are set to indicate a bot. Once the bot has been approved and given its bot flag permission, one can add the "bot-True" to the API call - see API:Edit in order to hide the bot's edits in Special:RecentChanges. In Python, using either mwclient or wikitools, then adding  to the edit/save command will set the edit as a bot edit - e.g. PageObject.edit(text=pagetext, bot=True, summary=pagesummary)

Monitoring the bot status
If the bot is fully automated and performs regular edits, you should periodically check it runs as specified, and its behavior has not been altered by software changes.

Open-source bots
Many bot operators choose to make their code open source, and occasionally it may be required before approval for particularly complex bots. Making your code open source has several advantages:


 * It allows others to review your code for potential bugs. As with prose, it is often difficult for the author of code to adequately review it.
 * Others can use your code to build their own bots. A user new to bot writing may be able to use your code as an example or a template for their own bots.
 * It encourages good security practices, rather than security through obscurity.
 * If you leave the project, it allows other users to run your bot tasks without having to write new code.

Open-source code, while rarely required, is typically encouraged in keeping with the open and transparent nature of wikis, though there are some cases when code should not be made public. For example, the open proxy-finding code of ProcseeBot could be used for malicious purposes on other sites.

Making code open source can add some extra work to coding. One has to make sure that sensitive information such as passwords is separated into a file that isn't made public.

There are several options available for users wishing to make their code open. Some users choose to put the code in a subpage of the bot's userspace, although this can be a hassle to maintain if not automated and results in the code being multi-licensed under the wiki's licensing terms in addition to any other terms you may specify. Another solution is to use a revision control system such as SVN, Git, or Mercurial. Wikipedia has articles comparing the different software options and websites for code hosting, many of which have no cost. Wikimedia also offers Git code repository hosting for its users and running Wikimedia related software tools via Wikimedia Cloud Services.

Programming languages and libraries

 * See also: API:Client code

Bots can be written in almost any programming language. The choice of a language often depends on the experience of the bot writer (which languages are familiar) or on the availability of pre-developed libraries to perform the desired task. The following list includes some languages that have libraries to assist with bot tasks.

Awk

 * Framework and libraries: BotWikiAwk
 * Example bots in the GitHub account of User:GreenC at GitHub

Perl
If located on a webserver, you can start your program running and interface with your program while it is running via the Common Gateway Interface from your browser. If your internet service provider provides you with webspace, the chances are good that you have access to a perl build on the webserver from which you can run your Perl programs.

Libraries:
 * MediaWiki::API – Basic interface to the API, allowing scripts to automate editing and extraction of data from MediaWiki driven sites.
 * MediaWiki::Bot – A fairly complete MediaWiki bot framework written in Perl. Provides a higher level of abstraction than MediaWiki::API. Plugins provide administrator and steward functionality. Currently unsupported.

PHP
PHP can also be used for programming bots. MediaWiki developers are already familiar with PHP, since that is the language MediaWiki and its extensions are written in. PHP is an especially good choice if you wish to provide a webform-based interface to your bot. For example, suppose you wanted to create a bot for renaming categories. You could create an HTML form into which you will type the current and desired names of a category. When the form is submitted, your bot could read these inputs, then edit all the articles in the current category and move them to the desired category. (Obviously, any bot with a form interface would need to be secured somehow from random web surfers.)

The PHP bot functions table on English Wikipedia may provide some insight into the capabilities of the major bot frameworks.

Python
Python is a popular interpreted language with object-oriented features.


 * Libraries
 * Please help update this table.

Microsoft .NET
Microsoft .NET is a set of languages including C#, C++/CLI, Visual Basic .NET, J#, JScript .NET, IronPython, and Windows PowerShell. The Microsoft Visual Studio integrated development environment is often used, or the free Microsoft Visual Studio Express versions. Using Mono Project, .NET programs can also run on Linux, Unix, BSD, Solaris and Mac OS X as well as under Windows.

Libraries:
 * DotNetWikiBot Framework – a full-featured client API on .NET, that allows to build programs and web robots easily to manage information on MediaWiki-powered sites. Now translated to several languages. Detailed compiled documentation is available in English.
 * WikiFunctions .NET library – Bundled with AWB, is a library of stuff useful for bots, such as generating lists, loading/editing articles, connecting to the recent changes IRC channel and more.

Java
Java programs are generally developed with an IDE, such as Eclipse or NetBeans; development using a command line console (with the javac and java programs) is also an option.

Libraries:
 * Java Wiki Bot Framework – A Java wiki bot framework
 * wiki-java – A Java wiki bot framework that is only one file
 * WPCleaner – The library used by the WPCleaner tool
 * jwiki – A simple and easy-to-use Java wiki bot framework

JavaScript (Node.js)
JavaScript is a scripting language used mainly on web pages, such as for user scripts added to your vector.js or your monobook.js pages. Using Node.js it is possible to use JavaScript server-side, such as for developing bots.


 * Please help to update this table.

Ruby
Ruby is a popular dynamic, object-oriented programming language.

Libraries:
 * MediaWiki::Butt - Ruby framework for the API in active development. Tested with versions as up-to-date as CurseGamepedia is.
 * mediawiki/ruby/api, Ruby API client library. Last updated December 2017, no longer maintained, but still works.
 * MediaWiki::Gateway – Ruby framework for the API. Last updated January 2016. No longer in active development, tested up to MediaWiki 1.22, compatible with Wikimedia wikis. Unknown if still works.
 * wikipedia-client - Ruby framework using the API. Last updated March 2018. Unknown if still works.

Common Lisp

 * CL-MediaWiki implements MediaWiki API as a Common Lisp package. Is planned to use JSON as a query data format. Supports maxlag and assertion.

Haskell

 * https://hackage.haskell.org/package/mediawiki

VBScript
VBScript is a scripting language based on the Visual Basic programming language. There are no published bot frameworks for VBScript, but some examples of bots that use it can be seen below:

例:
 * w:User:Smallman12q/Scripts/cleanuplistingtowiki (2013) - Login and give preview of edit
 * w:User:Smallman12q/VBS/Savewatchlist (2012) - Login, get raw watchlist, save to file, logout, close IE
 * w:Commons:User:Smallbot - Several scripts showing the usage of VBScript (JavaScript, XMLHTTP, MSHTML, XMLDOM, COM) for batch uploads.

Bash
Bash is a Unix shell.
 * See API:Client_code/Bash. Requires cURL package.