Extension:SpamBlacklist/ja

From MediaWiki.org
Jump to: navigation, search
この拡張機能は MediaWiki 1.21 以降にバンドルされています。このため、別途ダウンロードする必要はありません。
MediaWiki 拡張機能マニュアルManual:Extensions
Crystal Clear action run.png
SpamBlacklist

リリースの状態:Extension status 安定

SpamBlacklist.gif
実装Template:Extension#type Page action
説明Template:Extension#description 正規表現ベースのスパムフィルターを提供する
作者Template:Extension#username Tim Starlingtalk
最新バージョンTemplate:Extension#version Continuous updates
MediaWikiTemplate:Extension#mediawiki 1.21 以降
PHPTemplate:Extension#php 5.3 以降
データベースの変更Template:Extension#needs-updatephp いいえ
ライセンスTemplate:Extension#license あらゆる OSI 承認済みライセンス
ダウンロード
README
使用するフックTemplate:Extension#hook
EditFilterMergedContentManual:Hooks/EditFilterMergedContent
APIEditBeforeSaveManual:Hooks/APIEditBeforeSave
EditFilterManual:Hooks/EditFilter
ArticleSaveCompleteManual:Hooks/ArticleSaveComplete
UserCanSendEmailManual:Hooks/UserCanSendEmail
AbortNewAccountManual:Hooks/AbortNewAccount

translatewiki.net で翻訳を利用できる場合は、SpamBlacklist 拡張機能の翻訳にご協力ください

使用状況とバージョン マトリクスを確認してください。

問題点Phabricator

未解決のタスク · Blacklist バグを報告

SpamBlacklist拡張機能はして入れたファイルもしくはwikiページで定義された正規表現にマッチするRLホストを含む編集を防止できます and registration by users using specified email addresses.

誰かがページを保存しようとすると、不正なホスト名の潜在的にとても巨大なリストに対してテキストをチェックします。マッチする物がある場合、ユーザに対してエラーメッセージを表示してページの保存を拒否します。

インストールとセットアップ[edit]

インストール[edit]

  • ダウンロードして、ファイルを extensions/ フォルダー内の SpamBlacklist という名前のディレクトリ内に配置します。
  • 以下のコードを LocalSettings.php の末尾に追加します:
    require_once "$IP/extensions/SpamBlacklist/SpamBlacklist.php";
    
  • Configure the blacklist at your convenience
  • YesY 完了 - ウィキの「Special:Version」に移動して、拡張機能が正しくインストールされたことを確認します。

Setting the blacklist[edit]

The local pages MediaWiki:Spam-blacklist, MediaWiki:Spam-whitelist, MediaWiki:Email-blacklist and MediaWiki:Email-whitelist are always used, whatever additional sources are listed.

The default additional source for SpamBlacklists list of forbidden URLs is the Wikimedia spam blacklist on Meta-Wiki, at m:Spam blacklist. By default, the extension uses this list, and reloads it once every 10-15 minutes. For many wikis, using this list will be enough to block most spamming attempts. However, since the Wikimedia blacklist is used by a diverse group of large wikis with hundreds of thousands of external links, it is comparatively conservative in the links it blocks.

The Wikimedia spam blacklist can only be edited by administrators; but you can suggest modifications to the blacklist at m:Talk:Spam blacklist.

You can add other bad URLs on your own wiki. List them in the global variable $wgSpamBlacklistFiles in LocalSettings.php, AFTER the require_once "$IP/extensions/SpamBlacklist/SpamBlacklist.php"; see examples below.

$wgSpamBlacklistFiles is an array, with each value containing either a URL, a filename or a database location.

If you use $wgSpamBlacklistFiles in LocalSettings.php, the default value of "[[m:Spam blacklist]]" will no longer be used - if you want that blacklist to be accessed, you will have to add it in manually, see examples below.

Specifying a database location allows you to draw the blacklist from a page on your wiki.

The format of the database location specifier is "DB: [db name] [title]". [db name] should exactly match the value of $wgDBname in LocalSettings.php. You should create the required page name [title] in the default namespace of your wiki. If you do this, it is strongly recommended that you protect the page from general editing. Besides the obvious danger that someone may add a regex that matches everything, please note that an attacker with the ability to input arbitrary regular expressions may be able to generate segfaults in the PCRE library.

[edit]

If you want to, for instance, use the English-language Wikipedia's spam blacklist in addition to the standard Meta-Wiki one, you could call the following in LocalSettings.php, AFTER the require_once "$IP/extensions/SpamBlacklist/SpamBlacklist.php":

$wgSpamBlacklistFiles = array(
   "[[m:Spam blacklist]]",
   "https://en.wikipedia.org/wiki/MediaWiki:Spam-blacklist"
);

...or this, which functions the same:

$wgSpamBlacklistFiles = array(
   "https://meta.wikimedia.org/w/index.php?title=Spam_blacklist&action=raw&sb_ver=1",
   "https://en.wikipedia.org/w/index.php?title=MediaWiki:Spam-blacklist&action=raw&sb_ver=1"
);

Here's an example of an entirely local set of blacklists: the administrator is using the update script to generate a local file called "wikimedia_blacklist" that holds a copy of the Meta-Wiki blacklist, and has an additional blacklist on the wiki page "My spam blacklist":

require_once "$IP/extensions/SpamBlacklist/SpamBlacklist.php";
$wgSpamBlacklistFiles = array(
   "$IP/extensions/SpamBlacklist/wikimedia_blacklist", // Wikimediaのリスト
   //  データベース タイトル
   "DB: wikidb My_spam_blacklist",    
);

問題点[edit]

Because the blacklist may be long, the following line may need to be added to LocalSettings.php, probably BEFORE the require_once "$IP/extensions/SpamBlacklist/SpamBlacklist.php" line:

// Bump the Perl Compatible Regular Expressions backtrack memory limit                                                                                  
// (PHP 5.2.x default, 100K, is too low for SpamBlacklist)                                                                                              
ini_set( 'pcre.backtrack_limit', '8M' );

ホワイトリスト[edit]

対応するホワイトリストはMediaWiki:Spam-whitelistメッセージを編集することで維持できます。使用している別のwikiのブラックリストからの選択エントリを上書きしたい場合、これは便利です。Wikimedia wikis, for instance, sometimes use the spam blacklist for purposes other than combatting spam.

It is questionable how effective the Wikimedia spam blacklists are at keeping spam off of third-party wikis. Some spam might be targeted only at Wikimedia wikis, or only at third-party wikis, which would make Wikimedia's blacklist of little help to said third-party wikis in those cases. Also, some third-party wikis might prefer that users be allowed to cite sources that are not considered reliable on Wikipedia, or that Wikipedia has considered so ideologically offensive as to warrant blacklisting. Sometimes what one wiki considers useless spam, another wiki might consider useful.

Users may not always realize that, when a link is rejected as spammy, it does not necessarily mean that the individual wiki he is editing has specifically chosen to ban that URL. Therefore, wiki system administrators may want to edit the system messages at MediaWiki:Spamprotectiontext and/or MediaWiki:Spamprotectionmatch on your wiki to invite users to make suggestions at MediaWiki talk:Spam-whitelist for pages that should be added by a sysop to the whitelist. For example, you could put, for MediaWiki:Spamprotectiontext:

The text you wanted to save was blocked by the spam filter. This is probably caused by a link to a blacklisted external site. {{SITENAME}} maintains [[MediaWiki:Spam-blacklist|its own blacklist]]; however, most blacklisting is done by means of [[metawikipedia:Spam-blacklist|Meta-Wiki's blacklist]], so this block should not necessarily be construed as an indication that {{SITENAME}} made a decision to block this particular text (or URL). If you would like this text (or URL) to be added to [[MediaWiki:Spam-whitelist|the local spam whitelist]], so that {{SITENAME}} users will not be blocked from adding it to pages, please make a request at [[MediaWiki talk:Spam-whitelist]]. A [[Project:Sysops|sysop]] will then respond on that page with a decision at to whether it should be whitelisted.

作者とライセンス[edit]

SpamBlacklist was written by Tim Starling and is (deliberately) ambiguously licensed.

注記[edit]

  • This extension examines only new external links added by wiki editors. To check user agents, add Bad Behaviour or Akismet, and to check an editor's IP address against lists of known spambots, supplement this with Check Spambots. As the various tools for combating spam on MediaWiki use different methods to spot abuse, the safeguards are best used in combination.
  • Extension:SpamBlacklist/update scriptは共有ブラックリストから自動的に更新できるクローンスクリプトです。memcachedを利用している場合、spam_blacklist_regexesキーも削除しなければなりません (例えば、maintenance/mcc.phpを使用する場合)。
  • There're no way to let some users override spam blacklist. See bugzilla:34928.

使用法[edit]

ブラックリストの構文[edit]

If you would like to create a blacklist of your own, or modify an existing one, here is the syntax:

一つの行で'#'文字の後の文字はすべて無視されます(コメント用)。すべての別の文字列はURL内でのみマッチする正規表現のフラグメントです。

注:

  • "http://"もしくは"www."を追加しません; 正規表現はURL内部でどのサブドメインにマッチするので必要ありません。
  • URLの前で終わらないパターンは使わないで下さい(例えば、'.*'を含むもの)。
  • '^'と'$'アンカーはURLの始めと終わりではなくページの始めと終わりでにマッチします。
  • Slashes don't need to be escaped by Backslashes. This will be done automatically by the script.

The following line will block all URLs that contain the string "example.com", except where it is immediately preceded or followed by a letter.

\bexample\.com\b

These are blocked:

  • http://www.example.com
  • http://www.this-example.com
  • http://www.google.de/search?q=example.com

These are not blocked:

  • http://www.goodexample.com
  • http://www.google.de/search?q=example.commodity

パフォーマンス[edit]

拡張機能は!http://[a-z0-9\-.]*(line 1|line 2|line 3|....)!Siのような単独の正規表現の文を作成します。すべてのページビュー上のすべてのコードをロードすることを回避するために、この正規表現は"loader"ファイルに保存されます。MediaWikiインストレーションに対してキャッシュを利用することは強くお勧めしますが、バイトコードキャッシュを使用していなくてもページビューパフォーマンスは影響を受けません。

正規表現のマッチ自身一般的にわずかなオーバーヘッドをページの保存に追加します(我々の経験によればおよそ100ms)。しかしながらディスクもしくはデータベースからスパムフィルタを読み込むこと、と正規表現を構築することはハードウェアによって膨大な時間がかかることがあります。この拡張機能を有効にすることで保存が過度に遅くなった場合、サポートされるバイトコードキャッシュをインストールしてみて下さい。SpamBlacklist拡張機能はそのようなシステムが存在する場合、構築された正規表現をキャッシュします。

If you're sharing a server and cache with several wikis, you may improve your cache performance by modifying getSharedBlacklists and clearCache in SpamBlacklist_body.php to use $wgSharedUploadDBname (or a specific DB if you do not have a shared upload DB) rather than $wgDBname. Be sure to get all references! The regexes from the separate MediaWiki:Spam-blacklist and MediaWiki:Spam-whitelist pages on each wiki will still be applied.

外部ブラックリストサーバー (RBL)[edit]

In its standard form, this extension requires that the blacklist be constructed manually. While regular expression wildcards are permitted, and a blacklist originated on one wiki may be re-used by many others, there is still some effort required to add new patterns in response to spam or remove patterns which generate false-positives.

Much of this effort may be reduced by supplementing the spam regex with lists of known domains advertised in spam e-mail. The regex will catch common patterns (like "casino-" or "-viagra") while the external blacklist server will automatically update with names of specific sites being promoted through spam.

In the filter() function in SpamBlacklist_body.php, approximately halfway between the file start and end, are the lines:

       # Do the match
       wfDebugLog( 'SpamBlacklist', "Checking text against " . count( $blacklists ) .
           " regexes: " . implode( ', ', $blacklists ) . "\n" );

Directly above this section (which does the actual regex test on the extracted links), one could add additional code to check the external RBL servers:

        # Do RBL checks
        $retVal = false;
        $wgAreBelongToUs = array('l1.apews.org.', 'multi.surbl.org.', 'multi.uribl.com.');
        foreach( $addedLinks as $link ) {
              $link_url=parse_url($link);
              $link_url=$link_url['host'];
              if ($link_url) {
                   foreach( $wgAreBelongToUs as $base ) {
                        $host = "$link_url.$base";
                        $ipList = gethostbynamel( $host );
                        if( $ipList ) {
                           wfDebug( "RBL match: Hostname $host is {$ipList[0]}, it's spam says $base!\n" );
                           $ip = wfGetIP();
                           wfDebugLog( 'SpamBlacklistHit', "$ip caught submitting spam: {$link_url} per RBL {$base}\n" );
                           $retVal = $link_url . ' (blacklisted by ' . $base .')';
                           wfProfileOut( $fname );
                           return $retVal;
                        }
                   }
              }
        }

        # if no match found on RBL server, continue normally with regex tests...

This ensures that, if an edit contains URLs from already-blacklisted spam domains, an error is returned to the user indicating which link cannot be saved due to its appearance on an external spam blacklist. If nothing is found, the remaining regex tests are allowed to run normally, so that any manually-specified 'suspicious pattern' in the URL may be identified and blocked.

Note that the RBL servers list just the base domain names - not the full URL path - so http://example.com/casino-viagra-lottery.html will trigger RBL only if "example.com" itself were blacklisted by name by the external server. The regex, however, would be able to block on any of the text in the URL and path, from "example" to "lottery" and everything in between. Both approaches carry some risk of false-positives - the regex because of the use of wildcard expressions, and the external RBL as these servers are often created for other purposes - such as control of abusive spam e-mail - and may include domains which are not engaged in forum, wiki, blog or guestbook comment spam per se.

その他のスパム対抗ツール[edit]

There are various helpful manuals on mediawiki.org on combating spam and other vandalism:

Other anti-spam, anti-vandalism extensions include:

関連項目[edit]

  • 互換性のあるブラックリスト (this is just a tiny sampling; there are many more)


言語:Project:Language policy English  • 日本語 • 한국어