Manual:$wgSpamRegex/fr

Tout texte ajouté à une page wiki vérifiant cette expression régulière (dénommée regex) sera reconnu en tant que pourriel Wiki et la modification sera bloquée. $wgSpamRegex concerne tous les groupes utilisateur; même les membres des groupes utilisateur des administrateurs système (sysop) et des bureaucrates ne pourront pas enregistrer leur texte si celui-ci vérifie $wgSpamRegex. Utilisez pour pouvoir définir les règles et vous permettre également de filtrer par groupe ! $wgSpamRegex est l'une des fonctionnalités anti-pourriel intégrées les plus efficaces de MediaWiki. Cela n'arrête pas tout vandalisme, mais peut le réduire de manière drastique et presque sans impact négatif sur les utilisateurs légitimes. Les paramètres de configuration de $wgSpamRegex contrôlent la façon dont MediaWiki analyse le texte des contributions et détermine si elles relèvent de pourriel ou pas.

Un exemple conséquent
L'exemple suivant est un bon paramètrage que vous pouvez utiliser sur votre wiki, s'il est de taille moyenne ou petite et s'il fait l'objet d'attaques de pourriels. Insérez ce qui suit dans votre fichier :

Note that the second-to-last line does not have the "|" at the end of the string. This is because the next line ends the regular expression with the closing wrapper / followed by the "i" switch.

Cet exemple intègre certains mots clés communs de pourriels (certains sont issus de la liste noire du pourriel de Meta-Wiki) ainsi que des techniques de blocage contre le pourriel caché dans le CSS.

Utiliser les expressions régulières pour bloquer les pourriels
Voici un tutoriel sur les expressions régulières. Experiment with the $wgSpamRegex setting, and test out some edits on your SandBox page, to see what gets blocked. But beware! Take care to avoid false positives i.e. incorrectly matching legitimate edits, see AVOID FALSE POSITIVES! below.

The setting which you assign to $wgSpamRegex, is a regular expression (See Wikipedia's article and PHP's manual on regular expressions). The above example shows a regular expression being built up over several lines, using PHP's dot syntax to concatenate strings. This makes this long regular expression more compact, but also a bit more complicated.

Si vous créez vos propres expressions régulières vous voudrez certainement les essayer séparément dans l'évaluateur d'expression régulière PCRE (cliquer sur l'onglet PCRE sur cette page).

Exemple simple
Voici un exemple plus simple :

Remember the idea is to decide - Is this spam: yes or no? With this example, any contribution text containing ' ' will match as spam. The '/' symbols at the beginning and end are part of the regular expression syntax.

Bloquer plusieurs mots ou domaines différents
Essayons d'étendre notre exemple pour capturer d'autres types de pourriels supplémentaires :

Using a '|' symbol between words, the above example will block several different spammy words, and also some domain names which are promoted by spammers.

$wgSpamRegex est appliquée à tous les textes de contribution, y compris les URLs des liens de pourriel. Et donc bloquer les noms de domaine peut être une manière très efficace pour se débarasser d'un vandale donné.

Evitez les faux positifs !
Le vrai challenge ici est d'éviter les faux positifs et on illustre ceci encore mieux avec un mauvais exemple :

Lots of spammers like to talk about ' ' (some kind of drug. Who cares? not us!) and so you might be tempted to match the word as a spam, but this will also prevent users from mentioning the word ' .' It is very easy to make this kind of mistake. Be careful with your regular expression setting. You want to stop spammers without inconveniencing your users. This problem can be overcome in many cases by including the "\b" word boundary pattern before and after any words that might be contained in a larger word, eg:

Autres conseils concernant les expressions régulières
Les expressions régulières sont très puissantes. $wgSpamRegex matching is applied to all text of the page or section being edited, not just URLs. This gives you the power to block anything you don't like, if you can work out a good regular expression to match it (be as specific as possible to avoid false positives). In the following section on CSS Hidden Spam we make use of this tool.

Message de détection de pourriel
Habituellement lorsque le paramètre $wgSpamRegex détecte du pourriel, le message suivant est affiché  :


 * La page que vous vouliez enregistrer a été bloquée par le filtre anti-pourriel. Ceci est probablement causé par un lien vers un site externe appartenant à la liste noire.


 * Le texte suivant est celui qui a déclenché notre filtre anti-pourriel : [word/domain name which was blocked]

This text can be changed, and is located on two editable wiki pages in the MediaWiki namespace. Click 'Special Pages' -> 'Wiki data and tools: System Messages', type 'spampro' into the 'Filter by prefix:' field and click 'Go'. If you get 'View Source' instead of 'Edit' on the top tab, then you don't have permission to edit. You need to log in as an sysop user (or the WikiSysop user which you configured during installation).

'$1' in MediaWiki:Spamprotectionmatch displays the failed edit's regex match that tripped the spam filter. Delete '$1' if you want it hidden.

Afficher/Masquer le texte compatible
If you've made a regex which is too restrictive, or you have made some other mistake in the setting, then you may get false positives. Indeed the full example above might match legitimate text in some rare circumstances (maybe your users really do want to talk about buying Viagra).

By displaying the text which matched, the MediaWiki:Spamprotectionmatch message helps to reduce problems caused by false positives.

It allows your users to accurately report problems to you, about your $wgSpamRegex setting.

It also allows them to figure out a workaround, so they can continue with their wiki editing.

Unfortunately it's also a very useful bit of information for spammers visiting your site. Some spammers are automated bots, so they won't be seeing this information anyway, however many spammers (believe it or not) are humans. These humans could go to the trouble of looking at the matching information, and trying to devise a workaround (e.g. just missing out the domain name that you have blocked, but linking to various other domains). It's difficult to know how prevalent this kind of behavior is, but if you wanted to make life more difficult for them. You could hide the spam matching information by simply setting your MediaWiki:Spamprotectionmatch message as empty. You should only do this if you are very aware of the above points about false positives, and have carefully designed your regexp to avoid them.

Pourriel caché dans le CSS
MediaWiki est complètement permissif quant aux balises HTML, et aux définitions de styles CSS (voir Aide:HTML dans le wikicode sur Meta-Wiki)

This has given spammers the opportunity to invent a sneaky trick to hide their spam from view. It doesn't show up on your pages, but it does show up in your edit boxes, and the changes show up in your 'recent changes' display. As such it causes confusion to your legitimate users, and that's before you consider the effects of helping a spammer by hosting their links. Generally 'CSS Hidden Spam' is all bad. Just because you can't see it (easily), doesn't mean you can ignore it.

The problem was identified by the folks at  in 2005, but has got a lot worse in 2006, to the point where it seems most MediaWiki spammers are using this trick.

We can use a regular expression to prevent the CSS tricks which they are using. Two of these are incorporated in the full example above (combined using the '|' symbol):

To prevent CSS hidden spam of the form :

To prevent CSS hidden spam of the form :

For a slightly more strict setting you might prefer to disallow various attributes of the style tag altogether:

...but you may find this starts to restrict your users more than you would like.

Bloquer TOUS les liens externes
Vous pouvez bloquer tous les liens externes en utilisant l'expression régulière suivante :

This is extremely restrictive to the wiki's legitimate users, as they cannot link to any external site anymore. It is a poor solution to the spam problem, although it is marginally better than a complete lock down.

If you are going to use this, make sure your 'MediaWiki:Spamprotectiontext' page has an explanation of what you have done.

Limiter à 100 le nombre de liens externes
You can limit the total number of external links allowed per page, to say 100, with this

If you do this, make sure your 'MediaWiki:Spamprotectiontext' page has an explanation of what you've done.

pcre.backtrack_limit
Depuis la version 5.3.7 PHP possède un pcre.backtrack_limit dont la valeur par défaut est 1000000 (1M). Néanmoins ceci peut encore être trop bas. Essayez d'ajouter la ligne suivante à votre fichier « LocalSettings.php » :

If this still not enough you may gradually increase this limit until it fits you wikis actual requirement.