Architecture meetings/RFC review 2014-02-05

Wednesday, February 5, 2014 at 10:00 PM UTC at .

Requests for Comment to review
No RFCs were proposed for the agenda in advance of this meeting.

Summary and logs

 * DataStore -> accepted, Max is tweaking
 * REST virtual service -> accepted interface with a wrapper for DataStore; needs updating on RFC from notes; Aaron has implemented most of the interface
 * Passwords
 * updating min length in DefaultSettings is pretty likely but needs a couple tweaks per RFC to avoid locking people out
 * do we have a good rationale for forcing it, other than 'everyone else does'?
 * is length of 6 enough? should we do some measuring & estimating of what entropy we require and determine an ideal min length?
 * http://pecl.php.net/package/crack <- should be considered for helping this research
 * we may need something separate that we can do in client-side for a strength meter though (deliver a small dictionary in JS)
 * note due to salting we can't check for duplicate passwords between users easily
 * note if using client-side check with a dictionary, roll own compression. not only does this help with dictionary style, but it can help avoid keyword blocking on "naughty words"
 * Requests_for_comment/Overthrow_Bugzilla
 * lots of discussion
 * maybe we don't need to
 * no rush?
 * talk about phabricator at zurich though; prep an rfc or other page for more discussion
 * test at https://fab.wmflabs.org
 * Config db
 * https://gerrit.wikimedia.org/r/#/c/109850/ in progress, people need to discuss approach
 * maybe consolidate the 3 potential RFCs into 1, maybe with 3 sections -- interface, backend, frontend
 * Next time:
 * HTML templating still needs focus, talk about this and narrow it down on lists
 * TitleValue -- get DanielK to poke at this next week
 * Deprecating inline styles -- brion interested in a quick checkin on this maybe, will make some notes


 * #action csteipp will research and update the rfc with estimate for online attacks to compromise accounts to get autoconfirmed access.
 * and this'll inform how to create a password strength meter
 * #info strength meter can likely be comparing against a list of popular passwords
 * #info Tim recommends DIY compression for client side dictionary
 * #info let's talk about phabricator vs bugzilla in zurich, there's some interest.
 * #action ^d put together some notes on that
 * #action ^d (& legoktm) will tidy up the RFC status for configuration: backend, frontend bits
 * #endmeeting

Full log
* brion waves at the early birds :) if nobody proposes anything specific we can grab a couple of the things currently in https://www.mediawiki.org/wiki/RFC#In_discussion and see if we want to add notes brion: wanna make yourself op so you can #startmeeting ?  I don't think you need to be an op ah  #startmeeting \o/  oh, there is no actual meetbot in here hah woops  #treefallsinaforest we may have to take notes like animals, by cut-n-pasting anyone know how to start it back up?  I think it was on labs, so I can probably work it out in half an hour or so  or we can just start the meeting let's just go and we'll copy notes by hand later yeah https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2014-02-05 so the agenda's empty so far. anything high on peoples' interest to bring up? I see https://www.mediawiki.org/wiki/Architecture_meetings mentions TitleValue, Config database, Deprecating inline styles, and password requirements  anyone want to give an update on RFC work done since the summit? e.g. gwicke MaxSem  bd808  been working on DataStore, will commit some time soonish Aaron has implemented much of the REST interface I have been packaging Parsoid also implemented a JSON intermediate representation for templates I have started work on a POC for structured logging. No code committed yet but some good initial progress. I hope to have something for folks to look at in a week with a Knockout compiler front-end I also expect to have my POC torn apart in review :) bd808: put me in as a reviewer on that when you have it, i'm interested in taking a peek at that :D  ok, so DataStore and the REST thing were accepted RFCs, right? https://github.com/gwicke/TemplatePerf/tree/master/QuickTemplate yeah  the RFC pages have still not been updated since the summit they overlap  just looking at https://www.mediawiki.org/wiki/Special:RecentChangesLinked/Requests_for_comment TimStarling, everybody is waiting for you to do so ;) TimStarling: shall I move PHP Virtual REST Service and DataStore to Accepted?  I see that "Passwords" has had a lot of edits  certainly DataStore was accepted  I would have to check the notes about the REST thing I think okay TimStarling, we agreed that we want batch support so on passwords, i'm happy to bump our default length limit up to 6. i also think a password strength meter on creation/pass change is a nice idea, though they can be ...... shaky so basically the REST interface, with the DataStore key-value implementation as a simple backend  Hello hello Scott_WUaS - we are in the middle of https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2014-02-05 <Scott_WUaS> thanks, Sumanah ... saw your announcement to email just now :) <TimStarling> the proposal is to change it in DefaultSettings.php? TimStarling: yes, but that requires https://gerrit.wikimedia.org/r/#/c/77645/ to avoid locking out people with shorter old passwords <TimStarling> sure... that part should be easyish and i don't think too controversial ok, https://www.mediawiki.org/wiki/Requests_for_comment#Accepted and https://www.mediawiki.org/wiki/Requests_for_comment/DataStore now show that DataStore is accepted Well, not just that change, but another patch too to make it happen btw i'm saving some skeleton notes on https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2014-02-05 in case we forget to save the log ;) thanks brion :D <TimStarling> I haven't really seen a proper rationale or discussion of password length I moved the REST interface to 'accepted' too There's been some chatter on the wikipage but nothing resolute. <TimStarling> "we should increase minimum password length because everyone else on the web has a larger password length" is not really a reason general consensus (to me) seemed to be "increase default password length but let it be configurable" <TimStarling> since that was true at the outset TimStarling: you could always write a bot to brute-force all the short passwords it wouldn't be hard i suspect ;P i'd argue that default passwords don't need strength but priviliged ones do. <TimStarling> well, that's why it we increased it from 0 to 1, if you remember gwicke: hm, in https://etherpad.wikimedia.org/p/storage_services I didn't see a solid APPROVE on the REST thing, did I miss something? <TimStarling> someone wrote a script to scan for blank passwords and use those accounts to bypass the autoconfirmed restrictions jorm: so that'd be covered under https://www.mediawiki.org/wiki/Requests_for_comment/Passwords#Create_new_password_requirements_for_accounts_with_advanced_user_rights which currently says 'let's talk about that later' <TimStarling> anyway, I'm just saying, it would be nice to have some sort of discussion of pros and cons sumanah, Tim agreed that he'd like to support batching <^d> #mediawiki-meetbot != #wikimedia-meetbot. Wondering why I was all alone. but also didn't want to reject one of the RFCs, which is fine pro: less likely to have insanely stupid breakable passwords TimStarling: Just about raising the limit? Or about the actual lenght? con: may annoy some people who use short passwords <TimStarling> well, how do you choose what to set the limit to? <TimStarling> is 6 enough? <TimStarling> how can you tell if you have no criteria? con: longer passwords don't equate to better password security sumanah, I asked exactly that but it didn't make it into the notes con: complex passwords, too. people just write them down somewhere. "aaaaaaaaaaaaaaaaaaaaaaaa" ain't secure <TimStarling> "password" isn't secure either, and that's 8 maybe TimStarling and brion can clarify <TimStarling> I think "123456" came up as one of the most popular in a recent compromise? gwicke: on the REST thing, I'm pretty sure we agreed to approve the interface, with an initial implementation using DataStore key-value as a backend *nod*, that's my recollection too it may not have made it to the notes but this is what we believe in our shared consensus reality :) ok, brion, when I wikify the etherpad notes (next), I'll make sure to indicate that :) thx :D <TimStarling> batching good, slowness bad sorry so based on some of the discussion in https://www.mediawiki.org/wiki/Requests_for_comment/Passwords#Discussion i see, there's some interest in not enforcing specific lengths, but just recommending stronger passwords with a meter brion: I agree with that assessment <TimStarling> well, say if we specifically want to protect against automated user account compromise for the purpose of autoconfirmed access etc. <TimStarling> we could calculate what password entropy we need for that, based on plausible attacks <TimStarling> and then maybe come up with a password length from that <TimStarling> although the correlation between entropy and length is pretty weak that also sounds very complex; we'd have to maintain a large list of dictionaries yeah i'm not sure how easy this is to do. TimStarling are you interested in doing that research? or should we roll some dice :) <TimStarling> well, obviously it's difficult to precisely measure password entropy <TimStarling> there's the "crack" PECL extension, has that been considered? I can work up some numbers from that perspective. At least to give a reasonable estimate for what is available to our online attackers. I think that one just wraps libcrack? csteipp: to me that sounds like a reasonable next step/TODO/"action item"/whatever :) Which would be reasonable. If it's good enough for linux, it's probably more than enough for us. :) would that also be suitable for use in a password strength meter? or would we need something client-side Yeah, we would want it client side. Otherwise you leak password length in the number of web requests, if you check as they type.. <TimStarling> you can deliver a smallish dictionary to the client side, I did it for that captcha response spell checker *nod* this is a dumb idea, but i'm throwing it out: could we run known cracks against new passwords and say "busted" if it succeeds? jorm: that's essentially what a strength meter would do if the tester code can crack you with a dictionary attack etc then we can prevent that pass from being used (or at least, strongly recommend against using it) <TimStarling> it's what cracklib does <^d> Possibly crazy idea too: <^d> Compare the password hash against others in the database to say "this is a really common password" <TimStarling> "The idea is simple: try to prevent users from choosing passwords that <TimStarling> could be guessed by "Crack" by filtering them out, at source. <TimStarling> CrackLib is an offshoot of the the version 5 "Crack" software, and <TimStarling> contains a considerable number of ideas nicked from the new software." <^d> So even if it's "secure", we can avoid people reusing stuff tooooo much. ^d: that doesn't really work well with per-user salting which we want :) <^d> Hmm, yeah ^d, could try to log in on a few social networking sites using the same credentials</kidding <TimStarling> " <TimStarling> The upshot of all this is that CrackLib can do indexed, binary searches <TimStarling> in a 1.4 million word dictionary (raw size ~ 15Mb), but the CrackLib <TimStarling> files (data+index+watermarks) occupy only ~ 7Mb. (45% original size) <TimStarling> It's even efficient over NFS ! <TimStarling> " <^d> Nevermind then back in the day i think we didn't have salt and we actually could do those matches. that was poor practice ^d: so something like SELECT COUNT(hash) FROM users WHERE hash = '$newpw' ? and if > X, warn? <^d> Naively, I was thinking something like that. <TimStarling> e.g. md5('troll') so is anyone interested in following up with actually making a strength checker meter bar? or should we wait until we have a clear idea what to check for <^d> But brion points out it wouldn't work well with per-user salts. Yeah, if we didn't have salts we could... "there sure are a lot of people who use '3y3H8w!k!P3d!A' as a password!" :) <^d> Hehe, that too. <TimStarling> there was a case where I was trying to find sockpuppets of Lir, before we had salting security is hard. let's go shopping. <TimStarling> and I thought it would be a great idea to search for people with the same password hash i used to nail cheaters in nexuswar using a similar thing. no per-user salts. <TimStarling> and there were several accounts, mostly acting like Lir, so I published the list find accounts with the same password and similar ip blocks (plus a bunch of other stuff) = multi <TimStarling> and it turned out he used "troll" as his password, and it was really just a collection of trolls <TimStarling> and then it got slashdotted and everyone on the internet hated me ;) <Isarra> Snrk. "and that's why we have per-user password salt now" So... I'm assuming we don't want to do comparison to existing passwords, otherwise we don't want to implement a strong hash function, which I think we do. Right? (Just want to verify my assumption) right, that's a non-starter Cool basically all we can do is dictionary etc 'attacks' as you're entering your password to see if it seems weak, so we can warn you <Isarra> Wouldn't this be client-side? How much of that can we reasonably do? But yeah, I'll come up with a realistic attack scenario, then the strength meter can basically be compare against a list of of X most popular passwords where X is the number of password tries for an attacker per day/month/year. I think that sounds like a reasonable place to draw today's pw discussion to a close and move on to maybe the config db RFC(s) Isarra: we can only send a smallish dictionary to do client-side probably. but we may be able to do server-side as long as we're careful (but beware of leaking length etc) (imo) yep i'm good with that <Isarra> Ah. <TimStarling> btw, an implementation note on client-side dictionaries <TimStarling> some sort of DIY compression is probably a good idea dictionaries compress *very* well with proper encoding yes heh <TimStarling> ideally something that obscures the original text slightly tries ftw Tim is not taking chances after ULS heh <TimStarling> because sometimes dictionaries have bad words in them that get blocked lol oh true <TimStarling> e.g. by parental filters our dictionaries are also befuct due to languages. are there any dictionaries we can use in Oriya? I wonder if trie + gzip is better than just gzip i thought this was interesting, btw: <http://insideofthebox.tumblr.com/post/75234834370/late-meditations-on-xkcd-936>: "Let’s say we have a dictionary with 2 ^ 11 (2048) entries. We pick four words, each one at random. A combination of those words would have 2 ^ 44 bits of entropy. Here is an interesting part: a permutation would be 2 ^ 39. That’s a significant hit to security, but it’s still way better than what semi-gibber ish password gave us. This means it is possible to create a moderately secure password scheme where users wouldn’t even have to remember the word order!" <TimStarling> when I tried to submit my greasemonkey captcha spell checker, it was silently rejected for this reason We could give back hashes of the passwords... not sure if that would compress well though could be interesting to just do away with the notion of free-form text input for passwords and just try to devise an implementation that generates passwords that are both secure and memorable csteipp, that compresses much worse than the words themselves <Isarra> Looking at it linguistically, what if you don't even use real words, but word structures? Or would that even work? ori: that'd be very interesting research for another time, probably beyond our scope just now :) brion: yeah, i deliberately waited for us to be moving on; i don't propose we start discussing that now just a provocative thought <^d> Use dna for identify verification <Isarra> And transmit it using magic so it cannot be intercepted. ok, configs? *quantum* magic. ok moving on :) ^d: whatcha got :) https://gerrit.wikimedia.org/r/#/c/109850/ let's just switch to facebook login. <TimStarling> there's one other RFC which I see has had edits in the last 2 weeks that's what we're mainly discussing right now for config https://etherpad.wikimedia.org/p/configuration <TimStarling> that is "Overthrow Bugzilla" <^d> Yeah, we're still hashing out the high level stuff on 109850. <Isarra> I don't think we need to overthrow bugzilla anymore. <TimStarling> looks like just discussion though <TimStarling> nothing substantive honestly i'd love to kill bugzilla and replace it with something in-house that integrates with our accounts, our wikis, our chat system, etc. but that's a big project :) <^d> I'm not seeing much discussion since October or so. <^d> Soooo, we might have an option there, if people are willing to Break Everything. <^d> I've started playing around with Phabricator. <^d> Which does a ton of this bug / code / project management stuff. <^d> It's come a *long* way since we talked about it 2 years ago. <Isarra> brion: That. But that's not really specific to bugzilla. (I see https://www.mediawiki.org/wiki/Architecture_meetings mentions Deprecating inline styles in case we want to talk about next steps on that in this meeting) ^d: I heard that from someone at Juniper, that they are enjoying Phabricator <^d> Lots of people like it now. <Isarra> Inline styles? We use those? I liked it before it was cool! <Isarra> Oh, right, we do. it might be easier to modify bugzilla to use our accounts than build a new thing. architects, can you identify explicitly what it is that we're discussing? i don't think we need to continue with the bugzilla discussion just now <^d> Can we have the discussion in Zurich or London maybe? re inline styles, https://www.mediawiki.org/wiki/Talk:Requests_for_comment/Allow_styling_in_templates#Class-triggered_CSS_includes might be relevant someone was interested in configuration db, and someone recommended the inline styles thing which is of interest ^d: +1! <^d> I'd like us to have the discussion. ^d: yes please, that'd be a good time for it a "let's switch to Phabricator" RFC would probably be good to prepare before that meeting so we are all on the same pg imo that's a good resolution <^d> If anyone's interested in playing with it in the meantime: fab.wmflabs.org it would be as difficult if not moreso than the switch to gerrit. with much wailing and gnashing of teeth as we cull dead bugs. jorm: even though we would not be switching version control systems at the same time? I'm not sure <Isarra> Cleanup is good. trust me. i've done this EXACT THING before. <^d> Anyway, let's move back to Config. ok config then? anything you need ^d to move forward? <^d> So, Config is kind of moving along at a high level on https://gerrit.wikimedia.org/r/#/c/109850/ <^d> Interested parties please bikeshed. <^d> I don't think anyone's really thought much about the backend parts since the summit, but that's fine. <^d> (At some point, it'd be nice to start making decisions and consolidate the 3 RFCs) yeah, I think we should focus on getting the interface part done first, and then move on to the backend I really don't want to work on that. I submitted the patch because there were a few design decisions with the initial patch that seemed like clear-cut errors of judgment to me and I wanted to fix them before we started building things on top <^d> Yeah. And thanks. <^d> Hopefully we can stop bikeshedding on 190850 soon. if someone wants to take over that patch, I'd be delighted. If not, I'll try to identify where consensus is at at the moment and update the patch to reflect it duh: you mean the wrapper around globals, or something else? sumanah: yeah, that basically. <^d> ori: I basically rewrote it. ^d: great; can we consider it yours? so in https://etherpad.wikimedia.org/p/configuration we evidently (according to RobLa) agreed that there'd be 3 RFCs - ^d you'd rather it be 1? <^d> Well, there's 3 rfcs trying to solve the whole thing. I think ^d wants to consolidate the three existing RfCs into one <^d> Yeah, and then write RFCs for the second and third parts. At the summit we split it into three parts, interface, backend, and frontend <^d> Or expand it into 3 parts. <^d> Who knows. <^d> Yeah <^d> duh's got it I'm not sure we need an explicit RfC for the interface since we're mainly hashing it out in gerrit right now sounds sensible anything else we need to hash out here? <^d> Yeah, it's mainly the backend and frontend that needs RFCs. <^d> Nope, I don't think so. re the next meeting - would anyone particularly mind if we did this again next week, with a focus on password mgmt and the config db and bd808's work, + I will try to get Daniel Kinzler in to push TitleValue forward, and the HTML Formatting crowd? <^d> brion: Maybe if you could drop a few comments on that gerrit change so to sanity check we're going the right way? I can find a time that works better for different people ^d, duh, ori : who's volunteering to work on said docs ? :) sumanah, what's HTML formatting? <^d> brion: I'm going to figure out the RFC status. https://etherpad.wikimedia.org/p/html_templating and https://www.mediawiki.org/wiki/Architecture_Summit_2014/HTML_templating sorry, I meant templating we gotta define requirements on that brion: I will help out ^d :P ok sumanah, those are too many subjects i'm...going to look busy gwicke: you're probably right. we could take out the least urgent 1 or 2 I don't think I can commit to being ready for discussion by next week more than two subjects is unlikely to result in any depth ok. how about titlevalue + HTML Formatting for in-depth discussion, and hopefully quick checkins on progress in other RFCs just to break blockers/get reviewers/etc works for me i'd like a quick checkin on deprecating inline styles if jon's not intereted i'll take that over TimStarling? agenda sound good for next week? (next week not now) <^d> duh: I'm working on amending the patch again. <TimStarling> well, if we want to talk about TitleValue, it'll have to be in a timezone suitable for europe, right? sure, we can switch the time around, I can run a Doodle or similar to get the Germans in sumanah, I think HTML templating is a bit early html tempting may still be too ill defined in focus *templating ^d: sweet gwicke: ok, let's talk on wikitech-l or similar; sounds like the HTML templating group does need to identify the major questions that still need resolution <Isarra> Oh, random question about phabricator - is there any way for volunteers to investigate it? but if we think we have a good handle on narrowing it down that'd be great (as RobLa assesses) html templating needs to get a few things bashed out first I think... I think a few more weeks on that front Isarra: fab.wmflabs.org and bug ^d about any details :D re templating: the basics are available and prototyped, but we need to think about longer-term stuff, in particular for messages and content <^d> Isarra: Yeah, just sign up any any admin will approve your account. <^d> Usually takes just a few mins. how to incorporate data pull in particular ok, it's almost been an hour, sounds like we have action items for various folks I'll reach out to Nik + Kinzler + Aude et alia and ask them to prep TitleValue stuff so we can chat about it next week, and get a good time for next week if that makes sense for people ok folks please feel free to append or modify https://www.mediawiki.org/wiki/Architecture_meetings/RFC_review_2014-02-05#Summary_and_logs if you see anything important missing <Isarra> ^d: Email address must be at one of: wikimedia.org, wikimedia.de <^d> Gah. <^d> That didn't work like I expected. <^d> Lemme fix. :) <^d> Isarra: Try again <Isarra> Thanks. >.< \o/
 * Nemo_bis is crashed
 * jorm (~bharris@wikimedia/jorm) has joined #wikimedia-meetbot
 * 1) action csteipp will research and update the rfc with estimate for online attacks to compromise accounts to get autoconfirmed access.
 * TimStarling facepalm
 * duh (99127724@wikipedia/Legoktm) has joined #wikimedia-meetbot
 * gwicke remembers TimStarling doing stats before salts were added
 * 1) info strength meter can likely be comparing against a list of popular passwords
 * 1) info Tim recommends DIY compression for client side dictionary
 * brion hides
 * ori takes a screenshot.
 * sumanah agrees with Brion
 * 1) info let's talk about phabricator vs bugzilla in zurich, there's some interest.
 * 1) action ^d put together some notes on that
 * bd808 feels peer pressure
 * 1) action ^d will tidy up the RFC status for configuration: backend, frontend bits
 * ori nods
 * sumanah defers to others of course, just throwing it out there
 * 1) endmeeting
 * TimStarling sends a carrier pigeon to wm-meetbot with text "#endmeeting"