ORES/BWDS review

How does BWDS work? [ edit ]
BWDS scans the history of a wiki for words that are commonly added in edits that have been reverted, but are uncommon to edits that are not reverted. This means the system tends to pick up curse words and informal words that don't belong in articles ("hello", "woohoo", "yolo", etc). The system also outputs words that are common to all edits as these can be used as stopwords when processing text.

How to sort [ edit ]
BWDS produces a wiki page that includes lists of words that are automatically generated. See the research wordlist for magyar language for an example. We need native speakers of the target language to help us sort these lists and remove words that were picked up by mistake.


 * list-generated
 * This list includes words that are added in reverted edits. This list needs to be sorted into badwords and informals


 * list-stop
 * This list includes words that are most common to all edits. This list doesn't need human review and can be ignored.


 * badwords
 * This list should include all of the words from list-generated that are unwelcome on any page. This would include curse words, spam and other content that would be reverted regardless of where it is inserted. Please feel free to supplement this list with additional badwords that were not detected by BWDS.


 * informals
 * This list should include all words that are unwelcome on article namespace but would be acceptable on talk pages. This would include words such as "hello" or "hahaha" which would be fine in discussions but not in articles. Please feel free to supplement this list with additional informal words that were not detected by BWDS.