ORES/BWDS review

This page gives a brief overview of how to sort a BWDS-generated word lists.

How does BWDS work?
[https://github.com/wiki-ai/bwds BWDS] scans the history of a wiki for words that are commonly added in edits that have been reverted, but are uncommon to edits that are not reverted. This means the system tends to pick up curse words and informal words that don't belong in articles ("hello", "woohoo", "yolo", etc). The system also outputs words that are common to all edits as these can be used as stopwords when processing text.

How to sort
BWDS produces a wiki page that includes lists of words that are automatically generated. See WOrdList>M:Research:Revision scoring as a service/Word lists/hu|the research wordlist for magyar language for an example. We need native speakers of the target language to help us sort these lists and remove words that were picked up by mistake.


 * list-generated
 * This list includes words that are added in reverted edits. This list needs to be sorted into badwords and informals


 * list-stop
 * This list includes words that are most common to all edits. This list doesn't need human review and can be ignored.


 * badwords
 * This list should include all of the words from list-generated that are unwelcome on any page. This would include curse words, spam and other content that would be reverted regardless of where it is inserted. Please feel free to supplement this list with additional badwords that were not detected by BWDS.


 * informals
 * This list should include all words that are unwelcome on article namespace but would be acceptable on talk pages. This would include words such as "hello" or "hahaha" which would be fine in discussions but not in articles. Please feel free to supplement this list with additional informal words that were not detected by BWDS.

Where do I find my BWDS list?
We have pre-generated lists for many of the larger wikis. Review WordList>M:Research:Revision scoring as a service/Word lists|our word lists to see if a list is already generated for your wiki. If it isn't, use the button below to request that lists be generated. See also how to support>ORES/Get support|get support.

([//phabricator.wikimedia.org/T131450 example ])