Thread:Project:Support desk/Improving the effectiveness of regular expressions against vandalism/reply (6)

> Did you known what are arbitrary characters combination?

Skizzerz is right. Please stop becoming offensive and stay on topic!

It is true that trusted editors could flag all new revisions in every article. The article would then only be displayed after this check. This concept in fact has been introduced e.g. in the German Wikipedia and successfully is in use there since years. However, this has nothing to do with an automated check against a whitelist in which all words of the article have to be included as you want it. While aiming at the same problem, flagged revisions are technically a completely different cup of tea.

I also see the point that you can use various kinds of spelling to write one and the same illegal word. E.g. for the small blue pills you can use two slashes instead of the "V", a small "L" or a "1" instead of the "i" or some kind of unicode sign which looks like one of these letters, but is not one of them. You see that you will quickly have several thousand variations of the same illegal word. However, compared to the few illegal words, the number of legal words is way bigger: The Oxford English Dictionary lists around 170.000 words, around 50.000 obsolete words and nearly 10.000 derivate words. And all this does neither include technical terms nor flections nor composita. There is a vast number of legal words and it is just untrue to say they would not be used. Even if the same user only uses "a few" (how many ever that may be) different words a day, the sum of users, the sum of languages and the sum of all their different areas of expertise will create an immensely hugh number of legal words. And what should happen when a user adds a word not in that list? Should we then disallow saving the whole edit? There so often are articles with typing errors - or maybe only about a word, which you have not amongst your billions of words. Imagine you wrote your first article and you just would not be able to save it. How annoying! The number of editors is already stagnating today, but what would happen with that patronizing? Many more would turn away and never come back. It would do way more harm than good.

And I even go one step further when I dare to say: Not only is it personally impossible; it is also technically impossible to check every single edit, every single word against a dictionary with these billions of words, flections and derivations. Although hardware and also CPU cycles are becoming cheaper and cheaper, you will still have to throw an immense amount of money at this problem would you really want to check every single edit and every single word against such a list. And all this does not even take into account that languages are constantly changing: New words are created and meanings of old words change. What was accepted yesterday maybe has become an insult now. You would have to constantly update your huddle of words to stay up to date. Basically like a dictionary company; but even more blatant as they can leave words out (e.g. because they don't consider them tightened enough), while with a whitelist you cannot do so.

I am not pessimistic, but just realistic when I say:

What you want, technically as well as personally, just is impossible to be accomplished.