Thread:Project:Support desk/Improving the effectiveness of regular expressions against vandalism/reply (10)

I guess I'm late to the party here, but...

In your original post I assumed you meant that any edit containing a non-allowed word would be utterly rejected. In such a system, the room for error is pretty close to zero, as the inconvenience for a false positive is extremely high.

In regards to the number of bad words being larger than the number of good words. That is almost certainly true. There is a countably infinite number of bad words, and I believe only a finite number of good words. However just because the cardinality is higher on the bad word sign, does not necessarily make it a do-able task to do it in the other direction, since a number smaller then infinity can still be much to large to manage.

However, later on you talk about using this a system to trigger extra review, or tag an edit for futher review, etc. That's much more likely to be workable imo, although there would still be many questions to work out about how such a system would actually work, and what the review process would be like.

But if you do go in that direction, it might be better to go a more machine learning approach rather than spelling (lest the vandals and spammers learn to spell obligatory xkcd). You may be interested in reading up on w:User:ClueBot_NG.

Happy holidays,