Thread:Project:Support desk/Improving the effectiveness of regular expressions against vandalism/reply (3)

But this doesn't make sense. You think that the number of arbitrary words (illegal variations) are less than the known words that we consult in a simple dictionary? No they are much more, 1000x more, more, etc. Most people use few words. You can make a research and publish the word frequency used in the articles, you will see that the used words are very few compared to the whole lexicon that they could use.

The whitelist doesn't need to be perfect at the first time, words can be added as their use start appearing. And articles can be flagged to latter be published after someone personally check and possibly later add the word to whitelies database (or create a "inflexer" for a new variation).

The way vandalism its currently identified is much more "ridiculous" than trying something better. Man, I saw funny regular expressions just trying to catch a simple variation (illegal) of a word, and they failed, simply because its easy to bypass a system that accepts arbitrary combination of characters.

Usually the foreign languages are present through foreign terms in some articles, but we are not allowed to write an article using a foreign language. Thats another point making whitelist for category of articles, because if one present a term that has nothing to do with the subject, we known it is vandalism, or it its possibly a vandalism that can be notified.

I seriously believe that things wont get better with the current mechanism. If you believe, give me technical reasons please. Maybe in 20 or more years, things will be the same if they don't change this way to catch vandalism.