Topic on Talk:Wikimedia Security Team/Password strengthening 2019

Is the top-100K password list really universal?

3
NickK (talkcontribs)

Having looked through the list, it looks very Latin and probably American.

There is no way the most popular Cyrillic passwords like "пароль" would not be there: "пароль" is an equivalent of "password" (2nd place) and even its transliterated variant ("пароль" written in Latin keyboard layout, i.e. "gfhjkm") is there at 111th place. Transliterated versions of another popular passwords like "lhfrjy" ("dragon" - 10th place, translated to "дракон" and transliterated to "lhfrjy" - 9089th place) or "vfcnth" ("master" - 19th place, translated to "мастер" and transliterated to "vfcnth" - 11875th place) are also there.

I would thus bet there should be dozens of Cyrillic passwords (and probably in other alphabets like Arabic or Chinese) in this list. Notably the two most popular (and ridiculous) Cyrillic passwords, "пароль" ("password") and "йцукен" (equivalent of "qwerty", 4th place) should clearly be in top-100. It would be strange to ban "gfhjkm" (a person made at least a minor security effort) and not to ban "пароль" (an immediate guess for a Cyrillic user).

Could you please thus check you picked a really universal list? Thanks

CKoerner (WMF) (talkcontribs)

The list of passwords is based upon the Weakpass project's best wordlists. The list is based upon real passwords used by people from various sources across the internet. Given the general bias of the internet to English, it's understandable that most included would be Latin. They do offer other language lists. I'll bring your recommendation up to the security team.

NickK (talkcontribs)

@CKoerner (WMF): I agree that this is a real list of passwords used by real people. I just believe that for some reason this list includes only passwords containing in standard Latin.

This list may make sense for websites having such a constraint for passwords, but WMF wikis accept passwords with any characters. Thus we miss:

  • non-Latin alphabets (as mentioned above)
  • commas. For instance, the list contains "k.lvbkf" ("людмила" typed in Latin layout, Lyudmila a rather common Cyrillic name), but it does not contain "aen,jk" ("football" translated to "футбол" and typed in Latin layout, while "football" is #14 and is clearly popular in countries where Cyrillic alphabet is used) or ",julfy" ("богдан" typed in Latin layout, Bogdan is a common Cyrillic name and is #3177).
  • extended Latin. It is quite surprising to see "jurgen" and "juergen" and not "jürgen" (Jürgen is one of the most common German first names)

... probably something else.

Thus this is not really a language list issue but probably some constraint imposed on this list. I think the best idea is to look for list of most common passwords without any constraints on what password can contain. Not sure dictionaries by language should really be used, the most popular one contains some very uncommon words that can really be reasonably secure passwords (like "Tuschzeichnungen" or "lawyeresses"). From this point of view top-100K is really good as it does contain all things people really use as passwords (first names, names of sports teams, places, birthdays etc.), it just should be extended beyond some constraints.

Reply to "Is the top-100K password list really universal?"