Architecture meetings/RFC review 2014-02-05
Jump to navigation Jump to search
Requests for Comment to review
No RFCs were proposed for the agenda in advance of this meeting.
Summary and logs
- DataStore -> accepted, Max is tweaking
- REST virtual service -> accepted interface with a wrapper for DataStore; needs updating on RFC from notes; Aaron has implemented most of the interface
- updating min length in DefaultSettings is pretty likely but needs a couple tweaks per RFC to avoid locking people out
- do we have a good rationale for forcing it, other than 'everyone else does'?
- is length of 6 enough? should we do some measuring & estimating of what entropy we require and determine an ideal min length?
- http://pecl.php.net/package/crack <- should be considered for helping this research
- we may need something separate that we can do in client-side for a strength meter though (deliver a small dictionary in JS)
- note due to salting we can't check for duplicate passwords between users easily
- note if using client-side check with a dictionary, roll own compression. not only does this help with dictionary style, but it can help avoid keyword blocking on "naughty words"
- updating min length in DefaultSettings is pretty likely but needs a couple tweaks per RFC to avoid locking people out
- lots of discussion
- maybe we don't need to
- no rush?
- talk about phabricator at zurich though; prep an rfc or other page for more discussion
- test at https://fab.wmflabs.org
- Config db
- https://gerrit.wikimedia.org/r/#/c/109850/ in progress, people need to discuss approach
- maybe consolidate the 3 potential RFCs into 1, maybe with 3 sections -- interface, backend, frontend
- Next time:
- HTML templating still needs focus, talk about this and narrow it down on lists
- TitleValue -- get DanielK to poke at this next week
- Deprecating inline styles -- brion interested in a quick checkin on this maybe, will make some notes
- #action csteipp will research and update the rfc with estimate for online attacks to compromise accounts to get autoconfirmed access.
- and this'll inform how to create a password strength meter
- #info strength meter can likely be comparing against a list of popular passwords
- #info Tim recommends DIY compression for client side dictionary
- #info let's talk about phabricator vs bugzilla in zurich, there's some interest.
- #action ^d put together some notes on that
- #action ^d (& legoktm) will tidy up the RFC status for configuration: backend, frontend bits
* brion waves at the early birds <sumanah> :) <brion> if nobody proposes anything specific we can grab a couple of the things currently in https://www.mediawiki.org/wiki/RFC#In_discussion and see if we want to add notes <sumanah> brion: wanna make yourself op so you can #startmeeting ? <TimStarling> I don't think you need to be an op <sumanah> ah <TimStarling> #startmeeting <brion> \o/ <TimStarling> oh, there is no actual meetbot in here <brion> hah <brion> woops <TimStarling> #treefallsinaforest * Nemo_bis is crashed <brion> we may have to take notes like animals, by cut-n-pasting <brion> anyone know how to start it back up? <TimStarling> I think it was on labs, so I can probably work it out in half an hour or so <TimStarling> or we can just start the meeting <brion> let's just go and we'll copy notes by hand later <sumanah> yeah <sumanah> https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2014-02-05 <brion> so the agenda's empty so far. anything high on peoples' interest to bring up? <sumanah> I see https://www.mediawiki.org/wiki/Architecture_meetings mentions TitleValue, Config database, Deprecating inline styles, and password requirements <TimStarling> anyone want to give an update on RFC work done since the summit? e.g. gwicke MaxSem <TimStarling> bd808 <MaxSem> been working on DataStore, will commit some time soonish <gwicke> Aaron has implemented much of the REST interface <gwicke> I have been packaging Parsoid <gwicke> also implemented a JSON intermediate representation for templates * jorm (~bharris@wikimedia/jorm) has joined #wikimedia-meetbot <bd808> I have started work on a POC for structured logging. No code committed yet but some good initial progress. I hope to have something for folks to look at in a week <gwicke> with a Knockout compiler front-end <bd808> I also expect to have my POC torn apart in review :) <brion> bd808: put me in as a reviewer on that when you have it, i'm interested in taking a peek at that <brion> :D <TimStarling> ok, so DataStore and the REST thing were accepted RFCs, right? <gwicke> https://github.com/gwicke/TemplatePerf/tree/master/QuickTemplate <gwicke> yeah <TimStarling> the RFC pages have still not been updated since the summit <gwicke> they overlap <TimStarling> just looking at https://www.mediawiki.org/wiki/Special:RecentChangesLinked/Requests_for_comment <gwicke> TimStarling, everybody is waiting for you to do so ;) <sumanah> TimStarling: shall I move PHP Virtual REST Service and DataStore to Accepted? <TimStarling> I see that "Passwords" has had a lot of edits <TimStarling> certainly DataStore was accepted <TimStarling> I would have to check the notes about the REST thing I think <sumanah> okay <gwicke> TimStarling, we agreed that we want batch support <brion> so on passwords, i'm happy to bump our default length limit up to 6. i also think a password strength meter on creation/pass change is a nice idea, though they can be ...... shaky <gwicke> so basically the REST interface, with the DataStore key-value implementation as a simple backend <Scott_WUaS> Hello <sumanah> hello Scott_WUaS - we are in the middle of https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2014-02-05 <Scott_WUaS> thanks, Sumanah ... saw your announcement to email just now <sumanah> :) <TimStarling> the proposal is to change it in DefaultSettings.php? <brion> TimStarling: yes, but that requires https://gerrit.wikimedia.org/r/#/c/77645/ to avoid locking out people with shorter old passwords <TimStarling> sure... <brion> that part should be easyish and i don't think too controversial <sumanah> ok, https://www.mediawiki.org/wiki/Requests_for_comment#Accepted and https://www.mediawiki.org/wiki/Requests_for_comment/DataStore now show that DataStore is accepted <csteipp> Well, not just that change, but another patch too to make it happen <brion> btw i'm saving some skeleton notes on https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2014-02-05 in case we forget to save the log ;) <sumanah> thanks brion :D <TimStarling> I haven't really seen a proper rationale or discussion of password length <gwicke> I moved the REST interface to 'accepted' too <jorm> There's been some chatter on the wikipage but nothing resolute. <TimStarling> "we should increase minimum password length because everyone else on the web has a larger password length" is not really a reason <jorm> general consensus (to me) seemed to be "increase default password length but let it be configurable" <TimStarling> since that was true at the outset <brion> TimStarling: you could always write a bot to brute-force all the short passwords <brion> it wouldn't be hard i suspect ;P <jorm> i'd argue that default passwords don't need strength but priviliged ones do. <TimStarling> well, that's why it we increased it from 0 to 1, if you remember <sumanah> gwicke: hm, in https://etherpad.wikimedia.org/p/storage_services I didn't see a solid APPROVE on the REST thing, did I miss something? <TimStarling> someone wrote a script to scan for blank passwords and use those accounts to bypass the autoconfirmed restrictions <brion> jorm: so that'd be covered under https://www.mediawiki.org/wiki/Requests_for_comment/Passwords#Create_new_password_requirements_for_accounts_with_advanced_user_rights which currently says 'let's talk about that later' <TimStarling> anyway, I'm just saying, it would be nice to have some sort of discussion of pros and cons <gwicke> sumanah, Tim agreed that he'd like to support batching <^d> #mediawiki-meetbot != #wikimedia-meetbot. Wondering why I was all alone. <gwicke> but also didn't want to reject one of the RFCs, which is fine <brion> pro: less likely to have insanely stupid breakable passwords <csteipp> TimStarling: Just about raising the limit? Or about the actual lenght? <brion> con: may annoy some people who use short passwords <TimStarling> well, how do you choose what to set the limit to? <TimStarling> is 6 enough? <TimStarling> how can you tell if you have no criteria? <jorm> con: longer passwords don't equate to better password security <gwicke> sumanah, I asked exactly that but it didn't make it into the notes <jorm> con: complex passwords, too. people just write them down somewhere. <brion> "aaaaaaaaaaaaaaaaaaaaaaaa" ain't secure <TimStarling> "password" isn't secure either, and that's 8 <gwicke> maybe TimStarling and brion can clarify <TimStarling> I think "123456" came up as one of the most popular in a recent compromise? <brion> gwicke: on the REST thing, I'm pretty sure we agreed to approve the interface, with an initial implementation using DataStore key-value as a backend <gwicke> *nod*, that's my recollection too <brion> it may not have made it to the notes but this is what we believe in our shared consensus reality :) <sumanah> ok, brion, when I wikify the etherpad notes (next), I'll make sure to indicate that :) <brion> thx :D <TimStarling> batching good, slowness bad <sumanah> sorry <brion> so based on some of the discussion in https://www.mediawiki.org/wiki/Requests_for_comment/Passwords#Discussion i see, there's some interest in not enforcing specific lengths, but just recommending stronger passwords with a meter <sumanah> brion: I agree with that assessment <TimStarling> well, say if we specifically want to protect against automated user account compromise for the purpose of autoconfirmed access etc. <TimStarling> we could calculate what password entropy we need for that, based on plausible attacks <TimStarling> and then maybe come up with a password length from that <TimStarling> although the correlation between entropy and length is pretty weak <mwalker> that also sounds very complex; we'd have to maintain a large list of dictionaries <brion> yeah i'm not sure how easy this is to do. TimStarling are you interested in doing that research? or should we roll some dice :) <TimStarling> well, obviously it's difficult to precisely measure password entropy <TimStarling> there's the "crack" PECL extension, has that been considered? <csteipp> I can work up some numbers from that perspective. At least to give a reasonable estimate for what is available to our online attackers. <csteipp> I think that one just wraps libcrack? <sumanah> csteipp: to me that sounds like a reasonable next step/TODO/"action item"/whatever :) <csteipp> Which would be reasonable. If it's good enough for linux, it's probably more than enough for us. <brion> :) <brion> would that also be suitable for use in a password strength meter? <brion> or would we need something client-side <csteipp> Yeah, we would want it client side. Otherwise you leak password length in the number of web requests, if you check as they type.. <TimStarling> you can deliver a smallish dictionary to the client side, I did it for that captcha response spell checker <brion> *nod* <jorm> this is a dumb idea, but i'm throwing it out: <csteipp> #action csteipp will research and update the rfc with estimate for online attacks to compromise accounts to get autoconfirmed access. <jorm> could we run known cracks against new passwords and say "busted" if it succeeds? <brion> jorm: that's essentially what a strength meter would do <brion> if the tester code can crack you with a dictionary attack etc then we can prevent that pass from being used <brion> (or at least, strongly recommend against using it) <TimStarling> it's what cracklib does <^d> Possibly crazy idea too: <^d> Compare the password hash against others in the database to say "this is a really common password" <TimStarling> "The idea is simple: try to prevent users from choosing passwords that <TimStarling> could be guessed by "Crack" by filtering them out, at source. <TimStarling> CrackLib is an offshoot of the the version 5 "Crack" software, and <TimStarling> contains a considerable number of ideas nicked from the new software." <^d> So even if it's "secure", we can avoid people reusing stuff tooooo much. <brion> ^d: that doesn't really work well with per-user salting <brion> which we want :) <^d> Hmm, yeah <gwicke> ^d, <kidding>could try to log in on a few social networking sites using the same credentials</kidding <TimStarling> " <TimStarling> The upshot of all this is that CrackLib can do indexed, binary searches <TimStarling> in a 1.4 million word dictionary (raw size ~ 15Mb), but the CrackLib <TimStarling> files (data+index+watermarks) occupy only ~ 7Mb. (45% original size) <TimStarling> It's even efficient over NFS ! <TimStarling> " <^d> Nevermind then <brion> back in the day i think we didn't have salt and we actually could do those matches. that was poor practice <jorm> ^d: so something like SELECT COUNT(hash) FROM users WHERE hash = '$newpw' ? <jorm> and if > X, warn? <^d> Naively, I was thinking something like that. <TimStarling> e.g. md5('troll') <brion> so is anyone interested in following up with actually making a strength checker meter bar? or should we wait until we have a clear idea what to check for * TimStarling facepalm <^d> But brion points out it wouldn't work well with per-user salts. <csteipp> Yeah, if we didn't have salts we could... <jorm> "there sure are a lot of people who use '3y3H8w!k!P3d!A' as a password!" <brion> :) <^d> Hehe, that too. * duh (99127724@wikipedia/Legoktm) has joined #wikimedia-meetbot <TimStarling> there was a case where I was trying to find sockpuppets of Lir, before we had salting <jorm> security is hard. let's go shopping. * gwicke remembers TimStarling doing stats before salts were added <TimStarling> and I thought it would be a great idea to search for people with the same password hash <jorm> i used to nail cheaters in nexuswar using a similar thing. no per-user salts. <TimStarling> and there were several accounts, mostly acting like Lir, so I published the list <jorm> find accounts with the same password and similar ip blocks (plus a bunch of other stuff) = multi <TimStarling> and it turned out he used "troll" as his password, and it was really just a collection of trolls <TimStarling> and then it got slashdotted and everyone on the internet hated me <gwicke> ;) <Isarra> Snrk. <brion> "and that's why we have per-user password salt now" <csteipp> So... I'm assuming we don't want to do comparison to existing passwords, otherwise we don't want to implement a strong hash function, which I think we do. Right? <csteipp> (Just want to verify my assumption) <brion> right, that's a non-starter <csteipp> Cool <brion> basically all we can do is dictionary etc 'attacks' as you're entering your password to see if it seems weak, so we can warn you <Isarra> Wouldn't this be client-side? How much of that can we reasonably do? <csteipp> But yeah, I'll come up with a realistic attack scenario, then the strength meter can basically be compare against a list of of X most popular passwords where X is the number of password tries for an attacker per day/month/year. <sumanah> I think that sounds like a reasonable place to draw today's pw discussion to a close and move on to maybe the config db RFC(s) <brion> Isarra: we can only send a smallish dictionary to do client-side probably. but we may be able to do server-side as long as we're careful (but beware of leaking length etc) <sumanah> (imo) <brion> yep <brion> i'm good with that <Isarra> Ah. <TimStarling> btw, an implementation note on client-side dictionaries <TimStarling> some sort of DIY compression is probably a good idea <brion> dictionaries compress *very* well with proper encoding yes <ori> heh <TimStarling> ideally something that obscures the original text slightly <gwicke> tries ftw <ori> Tim is not taking chances after ULS <brion> heh <TimStarling> because sometimes dictionaries have bad words in them that get blocked <brion> lol <brion> oh true <TimStarling> e.g. by parental filters <csteipp> #info strength meter can likely be comparing against a list of popular passwords <jorm> our dictionaries are also befuct due to languages. <jorm> are there any dictionaries we can use in Oriya? <csteipp> #info Tim recommends DIY compression for client side dictionary <gwicke> I wonder if trie + gzip is better than just gzip <ori> i thought this was interesting, btw: <http://insideofthebox.tumblr.com/post/75234834370/late-meditations-on-xkcd-936>: "Let’s say we have a dictionary with 2 ^ 11 (2048) entries. We pick four words, each one at random. A combination of those words would have 2 ^ 44 bits of entropy. Here is an interesting part: a permutation would be 2 ^ 39. That’s a significant hit to security, but it’s still way better than what semi-gibber <ori> ish password gave us. This means it is possible to create a moderately secure password scheme where users wouldn’t even have to remember the word order!" <TimStarling> when I tried to submit my greasemonkey captcha spell checker, it was silently rejected for this reason <csteipp> We could give back hashes of the passwords... not sure if that would compress well though <ori> could be interesting to just do away with the notion of free-form text input for passwords and just try to devise an implementation that generates passwords that are both secure and memorable <gwicke> csteipp, that compresses much worse than the words themselves <Isarra> Looking at it linguistically, what if you don't even use real words, but word structures? Or would that even work? <brion> ori: that'd be very interesting research for another time, probably beyond our scope just now :) <ori> brion: yeah, i deliberately waited for us to be moving on; i don't propose we start discussing that now <ori> just a provocative thought <^d> Use dna for identify verification * brion hides <Isarra> And transmit it using magic so it cannot be intercepted. <ori> ok, configs? <brion> *quantum* magic. <brion> ok moving on :) <sumanah> ^d: whatcha got :) <duh> https://gerrit.wikimedia.org/r/#/c/109850/ <jorm> let's just switch to facebook login. <TimStarling> there's one other RFC which I see has had edits in the last 2 weeks <duh> that's what we're mainly discussing right now for config <sumanah> https://etherpad.wikimedia.org/p/configuration <TimStarling> that is "Overthrow Bugzilla" <^d> Yeah, we're still hashing out the high level stuff on 109850. <Isarra> I don't think we need to overthrow bugzilla anymore. * ori takes a screenshot. <TimStarling> looks like just discussion though <TimStarling> nothing substantive <brion> honestly i'd love to kill bugzilla and replace it with something in-house that integrates with our accounts, our wikis, our chat system, etc. but that's a big project :) <^d> I'm not seeing much discussion since October or so. <^d> Soooo, we might have an option there, if people are willing to Break Everything. <^d> I've started playing around with Phabricator. <^d> Which does a ton of this bug / code / project management stuff. <^d> It's come a *long* way since we talked about it 2 years ago. <Isarra> brion: That. But that's not really specific to bugzilla. <sumanah> (I see https://www.mediawiki.org/wiki/Architecture_meetings mentions Deprecating inline styles in case we want to talk about next steps on that in this meeting) <sumanah> ^d: I heard that from someone at Juniper, that they are enjoying Phabricator <^d> Lots of people like it now. <Isarra> Inline styles? We use those? <ori> I liked it before it was cool! <Isarra> Oh, right, we do. <jorm> it might be easier to modify bugzilla to use our accounts than build a new thing. <ori> architects, can you identify explicitly what it is that we're discussing? <brion> i don't think we need to continue with the bugzilla discussion just now * sumanah agrees with Brion <^d> Can we have the discussion in Zurich or London maybe? <gwicke> re inline styles, https://www.mediawiki.org/wiki/Talk:Requests_for_comment/Allow_styling_in_templates#Class-triggered_CSS_includes might be relevant <brion> someone was interested in configuration db, and someone recommended the inline styles thing which is of interest <ori> ^d: +1! <^d> I'd like us to have the discussion. <brion> ^d: yes please, that'd be a good time for it <sumanah> a "let's switch to Phabricator" RFC would probably be good to prepare before that meeting so we are all on the same pg <brion> #info let's talk about phabricator vs bugzilla in zurich, there's some interest. <sumanah> imo <brion> #action ^d put together some notes on that <ori> that's a good resolution <^d> If anyone's interested in playing with it in the meantime: fab.wmflabs.org <jorm> it would be as difficult if not moreso than the switch to gerrit. <jorm> with much wailing and gnashing of teeth as we cull dead bugs. <sumanah> jorm: even though we would not be switching version control systems at the same time? I'm not sure <Isarra> Cleanup is good. <jorm> trust me. i've done this EXACT THING before. <^d> Anyway, let's move back to Config. <brion> ok config then? <sumanah> anything you need ^d to move forward? <^d> So, Config is kind of moving along at a high level on https://gerrit.wikimedia.org/r/#/c/109850/ <^d> Interested parties please bikeshed. <^d> I don't think anyone's really thought much about the backend parts since the summit, but that's fine. <^d> (At some point, it'd be nice to start making decisions and consolidate the 3 RFCs) <duh> yeah, I think we should focus on getting the interface part done first, and then move on to the backend <ori> I really don't want to work on that. I submitted the patch because there were a few design decisions with the initial patch that seemed like clear-cut errors of judgment to me and I wanted to fix them before we started building things on top <^d> Yeah. And thanks. <^d> Hopefully we can stop bikeshedding on 190850 soon. <ori> if someone wants to take over that patch, I'd be delighted. If not, I'll try to identify where consensus is at at the moment and update the patch to reflect it <sumanah> duh: you mean the wrapper around globals, or something else? <duh> sumanah: yeah, that basically. <^d> ori: I basically rewrote it. <ori> ^d: great; can we consider it yours? <sumanah> so in https://etherpad.wikimedia.org/p/configuration we evidently (according to RobLa) agreed that there'd be 3 RFCs - ^d you'd rather it be 1? <^d> Well, there's 3 rfcs trying to solve the whole thing. <duh> I think ^d wants to consolidate the three existing RfCs into one <^d> Yeah, and then write RFCs for the second and third parts. <duh> At the summit we split it into three parts, interface, backend, and frontend <^d> Or expand it into 3 parts. <^d> Who knows. <^d> Yeah <^d> duh's got it <duh> I'm not sure we need an explicit RfC for the interface since we're mainly hashing it out in gerrit right now <brion> sounds sensible <brion> anything else we need to hash out here? <^d> Yeah, it's mainly the backend and frontend that needs RFCs. <^d> Nope, I don't think so. <sumanah> re the next meeting - would anyone particularly mind if we did this again next week, with a focus on password mgmt and the config db and bd808's work, + I will try to get Daniel Kinzler in to push TitleValue forward, and the HTML Formatting crowd? <^d> brion: Maybe if you could drop a few comments on that gerrit change so to sanity check we're going the right way? <sumanah> I can find a time that works better for different people <brion> ^d, duh, ori : who's volunteering to work on said docs ? :) * bd808 feels peer pressure <gwicke> sumanah, what's HTML formatting? <^d> brion: I'm going to figure out the RFC status. <sumanah> https://etherpad.wikimedia.org/p/html_templating and https://www.mediawiki.org/wiki/Architecture_Summit_2014/HTML_templating <sumanah> sorry, I meant templating <sumanah> we gotta define requirements on that <brion> #action ^d will tidy up the RFC status for configuration: backend, frontend bits <duh> brion: I will help out ^d :P * ori nods <brion> ok <gwicke> sumanah, those are too many subjects <ori> i'm...going to look busy <sumanah> gwicke: you're probably right. we could take out the least urgent 1 or 2 <bd808> I don't think I can commit to being ready for discussion by next week <gwicke> more than two subjects is unlikely to result in any depth <sumanah> ok. how about titlevalue + HTML Formatting for in-depth discussion, and hopefully quick checkins on progress in other RFCs just to break blockers/get reviewers/etc <brion> works for me * sumanah defers to others of course, just throwing it out there <brion> i'd like a quick checkin on deprecating inline styles <brion> if jon's not intereted i'll take that over <sumanah> TimStarling? agenda sound good for next week? <brion> (next week not now) <^d> duh: I'm working on amending the patch again. <TimStarling> well, if we want to talk about TitleValue, it'll have to be in a timezone suitable for europe, right? <sumanah> sure, we can switch the time around, I can run a Doodle or similar to get the Germans in <gwicke> sumanah, I think HTML templating is a bit early <brion> html tempting may still be too ill defined in focus <brion> *templating <duh> ^d: sweet <sumanah> gwicke: ok, let's talk on wikitech-l or similar; sounds like the HTML templating group does need to identify the major questions that still need resolution <Isarra> Oh, random question about phabricator - is there any way for volunteers to investigate it? <brion> but if we think we have a good handle on narrowing it down that'd be great <sumanah> (as RobLa assesses) <csteipp> html templating needs to get a few things bashed out first I think... I think a few more weeks on that front <brion> Isarra: fab.wmflabs.org and bug ^d about any details :D <gwicke> re templating: the basics are available and prototyped, but we need to think about longer-term stuff, in particular for messages and content <^d> Isarra: Yeah, just sign up any any admin will approve your account. <^d> Usually takes just a few mins. <gwicke> how to incorporate data pull in particular <sumanah> ok, it's almost been an hour, sounds like we have action items for various folks <sumanah> I'll reach out to Nik + Kinzler + Aude et alia and ask them to prep TitleValue stuff so we can chat about it next week, and get a good time for next week <sumanah> if that makes sense for people <brion> ok folks please feel free to append or modify https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2014-02-05#Summary_and_logs if you see anything important missing <Isarra> ^d: Email address must be at one of: wikimedia.org, wikimedia.de <^d> Gah. <^d> That didn't work like I expected. <sumanah> #endmeeting <^d> Lemme fix. <sumanah> :) <^d> Isarra: Try again <Isarra> Thanks. >.< * TimStarling sends a carrier pigeon to wm-meetbot with text "#endmeeting" <brion> \o/