Extension talk:AbuseFilter/Archive 1

Syntax error
I have installed this extension on a localhost Wiki but for some reason it rejects any filter I add and gives the error message: "There is a syntax error in the filter you specified. The output from the parser was:" And it gives no output from the parser. I know the syntax is correct because even simple things like "testing in SUMMARY" are rejected. Is this a bug, or have I done something wrong? 125.238.97.240 11:11, 14 September 2008 (UTC)

Throttling
What does "# editcount — Edit count — hack so that you can detect distinct users." mean? &mdash; Mike.lifeguard &#124; @meta 00:04, 20 February 2009 (UTC)

Some problems
There are a few problems with this extension
 * Actions only visible to the editor should not have public logging : Logging is not something where one level fits all, especially when it comes to public verbal attack on living persons. As a first approach access to the log items can be given on an action level, but probably this will also lead to a lot of problems. At least there should be possible to turn of logging for trivial actions which never give any public effect. For example, if a warning is given and the editor then avoid storing the edit, then no logging should happen. If he continue the upload of the edit, then the warning can be logged. Still note that the logging itself can release information about the topic described in the edit.
 * Public rules could itself be a violation of privacy given specific articles : Imagine someone gets some information that s/he assume could lead to legal action if published. This information is then used to write one or more rules for prohibiting said information from reaching a wiki. If this information is clear text then it can be easily inverted into the information that should be excluded. This happens most typically when the article in question is a biography and the rule is about some kind of rumor or fact, especially when such rumors or facts are illegal to publish. One solution is to make special digests instead of clear text in the rules.
 * Regex patterns are overly simplistic for analyzing real life vandalism : As long as the vandalism can be described by simple patterns matching singular markers thing works out pretty well. If the vandalism isn't simple words but phrases, then it becomes a lot more troublesome to detect such vandalism. Some previous work indicates that whole phrases should be analyzed anyhow, and that an approach with simple words are to simplistic, and will lead to a much to high number of false positives.

Probably there are other problems, but those are very close to show stoppers. Jeblad 20:26, 22 February 2009 (UTC)

You have obviously not examined the extension in detail. The last two are obvious non-issues, the second because rules can be hidden from the general public, and the third because the whole point of the extension is that it goes beyond regexes to more complicated boolean logic based on far more context than just a regex match.

The first is something that needs to be debated – it may be worthwhile offering this as a configuration option. Andrew Garrett 03:32, 23 February 2009 (UTC)


 * Limiting public disclosure to an smaller unidentified group is not good enough, ie you can not guarantee that it will not be diclosed by someone because you can't identify the persons having the opportunity to inspect the actual information (in Norway that is «behandlingsansvarlig» for such information). It seems to me that the only viable solution is to either create a limited group of identified persons, as for the OTRS-system on Wikimedia, or to make encrypted rules that is identifiable but which can not be read in clear text. Perhaps it is acceptable in some countries, but in those countries where you must identify who has access it can create some real problems. The same goes for logging, as logging and rules are symmetric in this respect.
 * A similar problem is to protect the rules itself from probing. Imagine a rule guarding an article from inclusion of a rumor about someone being raped, and then a reader wants to verify the rumor so he inserts some text with the word "rape" in it. If he is blocked, or even just given a warning, then he know that the rumor is known and the information is leaked by the action itself. The system must have some capability to obfuscate the actual action and the reason behind. One such method is to trigger on not only clear text patterns but on digests of word(s) where the digest algorithm has a low entrophy. This will make it difficult to reverse engineer the words, and also give an increased number of false positives. If such a rule is only used for blocking upload to a single article it would not pose a problem as long as it does not involve blocking of the editor himself.
 * I know that there is a system for boolean logic, but this still does not solve the complexity of parsing natural language. Without such capabilities the error rate will be far to high. Compare the two statements "He is a monkey" vs "It is a monkey". Jeblad 11:41, 23 February 2009 (UTC)

Proximity operators
If the solution is to be used for text analysis, then proximity operators should be added. Such operators typically act within some logical text unit such as a paragraph, a sentence or similar, or within a number of words. Jeblad 20:56, 23 February 2009 (UTC)