Architecture meetings/RFC review 2014-02-05


Wednesday, February 5, 2014 at 10:00 PM UTC at #wikimedia-meetbot connect.

Requests for Comment to review[edit]

No RFCs were proposed for the agenda in advance of this meeting.

Summary and logs[edit]

  • DataStore -> accepted, Max is tweaking
  • REST virtual service -> accepted interface with a wrapper for DataStore; needs updating on RFC from notes; Aaron has implemented most of the interface
  • Passwords
    • updating min length in DefaultSettings is pretty likely but needs a couple tweaks per RFC to avoid locking people out
      • do we have a good rationale for forcing it, other than 'everyone else does'?
      • is length of 6 enough? should we do some measuring & estimating of what entropy we require and determine an ideal min length?
        • <- should be considered for helping this research
        • we may need something separate that we can do in client-side for a strength meter though (deliver a small dictionary in JS)
      • note due to salting we can't check for duplicate passwords between users easily
      • note if using client-side check with a dictionary, roll own compression. not only does this help with dictionary style, but it can help avoid keyword blocking on "naughty words"
  • Requests_for_comment/Overthrow_Bugzilla
    • lots of discussion
    • maybe we don't need to
    • no rush?
    • talk about phabricator at zurich though; prep an rfc or other page for more discussion
  • Config db
  • Next time:
    • HTML templating still needs focus, talk about this and narrow it down on lists
    • TitleValue -- get DanielK to poke at this next week
    • Deprecating inline styles -- brion interested in a quick checkin on this maybe, will make some notes

  • #action csteipp will research and update the rfc with estimate for online attacks to compromise accounts to get autoconfirmed access.
    • and this'll inform how to create a password strength meter
  • #info strength meter can likely be comparing against a list of popular passwords
  • #info Tim recommends DIY compression for client side dictionary
  • #info let's talk about phabricator vs bugzilla in zurich, there's some interest.
  • #action ^d put together some notes on that
  • #action ^d (& legoktm) will tidy up the RFC status for configuration: backend, frontend bits
  • #endmeeting

Meeting summary[edit]

Full log[edit]

Meeting logs
* brion waves at the early birds
<sumanah> :)
<brion> if nobody proposes anything specific we can grab a couple of the things currently in and see if we want to add notes
<sumanah> brion: wanna make yourself op so you can #startmeeting ?
<TimStarling> I don't think you need to be an op
<sumanah> ah
<TimStarling> #startmeeting
<brion> \o/
<TimStarling> oh, there is no actual meetbot in here
<brion> hah
<brion> woops
<TimStarling> #treefallsinaforest
* Nemo_bis is crashed
<brion> we may have to take notes like animals, by cut-n-pasting
<brion> anyone know how to start it back up?
<TimStarling> I think it was on labs, so I can probably work it out in half an hour or so
<TimStarling> or we can just start the meeting
<brion> let's just go and we'll copy notes by hand later
<sumanah> yeah
<brion> so the agenda's empty so far. anything high on peoples' interest to bring up?
<sumanah> I see mentions TitleValue, Config database, Deprecating inline styles, and password requirements
<TimStarling> anyone want to give an update on RFC work done since the summit? e.g. gwicke MaxSem
<TimStarling> bd808
<MaxSem> been working on DataStore, will commit some time soonish
<gwicke> Aaron has implemented much of the REST interface
<gwicke> I have been packaging Parsoid
<gwicke> also implemented a JSON intermediate representation for templates
* jorm (~bharris@wikimedia/jorm) has joined #wikimedia-meetbot
<bd808> I have started work on a POC for structured logging. No code committed yet but some good initial progress. I hope to have something for folks to look at in a week
<gwicke> with a Knockout compiler front-end
<bd808> I also expect to have my POC torn apart in review :)
<brion> bd808: put me in as a reviewer on that when you have it, i'm interested in taking a peek at that
<brion> :D
<TimStarling> ok, so DataStore and the REST thing were accepted RFCs, right?
<gwicke> yeah
<TimStarling> the RFC pages have still not been updated since the summit
<gwicke> they overlap
<TimStarling> just looking at
<gwicke> TimStarling, everybody is waiting for you to do so ;)
<sumanah> TimStarling: shall I move PHP Virtual REST Service and DataStore to Accepted?
<TimStarling> I see that "Passwords" has had a lot of edits
<TimStarling> certainly DataStore was accepted
<TimStarling> I would have to check the notes about the REST thing I think
<sumanah> okay
<gwicke> TimStarling, we agreed that we want batch support
<brion> so on passwords, i'm happy to bump our default length limit up to 6. i also think a password strength meter on creation/pass change is a nice idea, though they can be ...... shaky
<gwicke> so basically the REST interface, with the DataStore key-value implementation as a simple backend
<Scott_WUaS> Hello
<sumanah> hello Scott_WUaS - we are in the middle of
<Scott_WUaS> thanks, Sumanah ... saw your announcement to email just now
<sumanah> :)
<TimStarling> the proposal is to change it in DefaultSettings.php?
<brion> TimStarling: yes, but that requires to avoid locking out people with shorter old passwords
<TimStarling> sure...
<brion> that part should be easyish and i don't think too controversial
<sumanah> ok, and now show that DataStore is accepted
<csteipp> Well, not just that change, but another patch too to make it happen
<brion> btw i'm saving some skeleton notes on in case we forget to save the log ;)
<sumanah> thanks brion :D
<TimStarling> I haven't really seen a proper rationale or discussion of password length
<gwicke> I moved the REST interface to 'accepted' too
<jorm> There's been some chatter on the wikipage but nothing resolute.
<TimStarling> "we should increase minimum password length because everyone else on the web has a larger password length" is not really a reason
<jorm> general consensus (to me) seemed to be "increase default password length but let it be configurable"
<TimStarling> since that was true at the outset
<brion> TimStarling: you could always write a bot to brute-force all the short passwords
<brion> it wouldn't be hard i suspect ;P
<jorm> i'd argue that default passwords don't need strength but priviliged ones do.
<TimStarling> well, that's why it we increased it from 0 to 1, if you remember
<sumanah> gwicke: hm, in I didn't see a solid APPROVE on the REST thing, did I miss something?
<TimStarling> someone wrote a script to scan for blank passwords and use those accounts to bypass the autoconfirmed restrictions
<brion> jorm: so that'd be covered under which currently says 'let's talk about that later'
<TimStarling> anyway, I'm just saying, it would be nice to have some sort of discussion of pros and cons
<gwicke> sumanah, Tim agreed that he'd like to support batching
<^d> #mediawiki-meetbot != #wikimedia-meetbot. Wondering why I was all alone.
<gwicke> but also didn't want to reject one of the RFCs, which is fine
<brion> pro: less likely to have insanely stupid breakable passwords
<csteipp> TimStarling: Just about raising the limit? Or about the actual lenght?
<brion> con: may annoy some people who use short passwords
<TimStarling> well, how do you choose what to set the limit to?
<TimStarling> is 6 enough?
<TimStarling> how can you tell if you have no criteria?
<jorm> con: longer passwords don't equate to better password security
<gwicke> sumanah, I asked exactly that but it didn't make it into the notes
<jorm> con: complex passwords, too. people just write them down somewhere.
<brion> "aaaaaaaaaaaaaaaaaaaaaaaa" ain't secure
<TimStarling> "password" isn't secure either, and that's 8
<gwicke> maybe TimStarling and brion can clarify
<TimStarling> I think "123456" came up as one of the most popular in a recent compromise?
<brion> gwicke: on the REST thing, I'm pretty sure we agreed to approve the interface, with an initial implementation using DataStore key-value as a backend
<gwicke> *nod*, that's my recollection too
<brion> it may not have made it to the notes but this is what we believe in our shared consensus reality :)
<sumanah> ok, brion, when I wikify the etherpad notes (next), I'll make sure to indicate that :)
<brion> thx :D
<TimStarling> batching good, slowness bad
<sumanah> sorry
<brion> so based on some of the discussion in i see, there's some interest in not enforcing specific lengths, but just recommending stronger passwords with a meter
<sumanah> brion: I agree with that assessment
<TimStarling> well, say if we specifically want to protect against automated user account compromise for the purpose of autoconfirmed access etc.
<TimStarling> we could calculate what password entropy we need for that, based on plausible attacks
<TimStarling> and then maybe come up with a password length from that
<TimStarling> although the correlation between entropy and length is pretty weak
<mwalker> that also sounds very complex; we'd have to maintain a large list of dictionaries
<brion> yeah i'm not sure how easy this is to do. TimStarling are you interested in doing that research? or should we roll some dice :)
<TimStarling> well, obviously it's difficult to precisely measure password entropy
<TimStarling> there's the "crack" PECL extension, has that been considered?
<csteipp> I can work up some numbers from that perspective. At least to give a reasonable estimate for what is available to our online attackers.
<csteipp> I think that one just wraps libcrack?
<sumanah> csteipp: to me that sounds like a reasonable next step/TODO/"action item"/whatever :)
<csteipp> Which would be reasonable. If it's good enough for linux, it's probably more than enough for us.
<brion> :)
<brion> would that also be suitable for use in a password strength meter?
<brion> or would we need something client-side
<csteipp> Yeah, we would want it client side. Otherwise you leak password length in the number of web requests, if you check as they type..
<TimStarling> you can deliver a smallish dictionary to the client side, I did it for that captcha response spell checker
<brion> *nod*
<jorm> this is a dumb idea, but i'm throwing it out:
<csteipp> #action csteipp will research and update the rfc with estimate for online attacks to compromise accounts to get autoconfirmed access.
<jorm> could we run known cracks against new passwords and say "busted" if it succeeds?
<brion> jorm: that's essentially what a strength meter would do
<brion> if the tester code can crack you with a dictionary attack etc then we can prevent that pass from being used
<brion> (or at least, strongly recommend against using it)
<TimStarling> it's what cracklib does
<^d> Possibly crazy idea too:
<^d> Compare the password hash against others in the database to say "this is a really common password"
<TimStarling> "The idea is simple: try to prevent users from choosing passwords that
<TimStarling> could be guessed by "Crack" by filtering them out, at source.
<TimStarling> CrackLib is an offshoot of the the version 5 "Crack" software, and
<TimStarling> contains a considerable number of ideas nicked from the new software."
<^d> So even if it's "secure", we can avoid people reusing stuff tooooo much.
<brion> ^d: that doesn't really work well with per-user salting
<brion> which we want :)
<^d> Hmm, yeah
<gwicke> ^d, <kidding>could try to log in on a few social networking sites using the same credentials</kidding
<TimStarling> "
<TimStarling> The upshot of all this is that CrackLib can do indexed, binary searches
<TimStarling> in a 1.4 million word dictionary (raw size ~ 15Mb), but the CrackLib
<TimStarling> files (data+index+watermarks) occupy only ~ 7Mb.  (45% original size)
<TimStarling> It's even efficient over NFS !
<TimStarling> "
<^d> Nevermind then
<brion> back in the day i think we didn't have salt and we actually could do those matches. that was poor practice
<jorm> ^d: so something like SELECT COUNT(hash) FROM users WHERE hash = '$newpw' ?
<jorm> and if > X, warn?
<^d> Naively, I was thinking something like that.
<TimStarling> e.g. md5('troll')
<brion> so is anyone interested in following up with actually making a strength checker meter bar? or should we wait until we have a clear idea what to check for
* TimStarling facepalm
<^d> But brion points out it wouldn't work well with per-user salts.
<csteipp> Yeah, if we didn't have salts we could...
<jorm> "there sure are a lot of people who use '3y3H8w!k!P3d!A' as a password!"
<brion> :)
<^d> Hehe, that too.
* duh (99127724@wikipedia/Legoktm) has joined #wikimedia-meetbot
<TimStarling> there was a case where I was trying to find sockpuppets of Lir, before we had salting
<jorm> security is hard. let's go shopping.
* gwicke remembers TimStarling doing stats before salts were added
<TimStarling> and I thought it would be a great idea to search for people with the same password hash
<jorm> i used to nail cheaters in nexuswar using a similar thing. no per-user salts.
<TimStarling> and there were several accounts, mostly acting like Lir, so I published the list
<jorm> find accounts with the same password and similar ip blocks (plus a bunch of other stuff) = multi
<TimStarling> and it turned out he used "troll" as his password, and it was really just a collection of trolls
<TimStarling> and then it got slashdotted and everyone on the internet hated me
<gwicke> ;)
<Isarra> Snrk.
<brion> "and that's why we have per-user password salt now"
<csteipp> So... I'm assuming we don't want to do comparison to existing passwords, otherwise we don't want to implement a strong hash function, which I think we do. Right?
<csteipp> (Just want to verify my assumption)
<brion> right, that's a non-starter
<csteipp> Cool
<brion> basically all we can do is dictionary etc 'attacks' as you're entering your password to see if it seems weak, so we can warn you
<Isarra> Wouldn't this be client-side? How much of that can we reasonably do?
<csteipp> But yeah, I'll come up with a realistic attack scenario, then the strength meter can basically be compare against a list of of X most popular passwords where X is the number of password tries for an attacker per day/month/year.
<sumanah> I think that sounds like a reasonable place to draw today's pw discussion to a close and move on to maybe the config db RFC(s)
<brion> Isarra: we can only send a smallish dictionary to do client-side probably. but we may be able to do server-side as long as we're careful (but beware of leaking length etc)
<sumanah> (imo)
<brion> yep
<brion> i'm good with that
<Isarra> Ah.
<TimStarling> btw, an implementation note on client-side dictionaries
<TimStarling> some sort of DIY compression is probably a good idea
<brion> dictionaries compress *very* well with proper encoding yes
<ori> heh
<TimStarling> ideally something that obscures the original text slightly
<gwicke> tries ftw
<ori> Tim is not taking chances after ULS
<brion> heh
<TimStarling> because sometimes dictionaries have bad words in them that get blocked
<brion> lol
<brion> oh true
<TimStarling> e.g. by parental filters
<csteipp> #info strength meter can likely be comparing against a list of popular passwords
<jorm> our dictionaries are also befuct due to languages.
<jorm> are there any dictionaries we can use in Oriya?
<csteipp> #info Tim recommends DIY compression for client side dictionary
<gwicke> I wonder if trie + gzip is better than just gzip
<ori> i thought this was interesting, btw: <>: "Let’s say we have a dictionary  with 2 ^ 11 (2048) entries. We pick four words, each one at random. A combination of those words would have 2 ^ 44 bits of entropy. Here is an interesting part: a permutation would be 2 ^ 39. That’s a significant hit to security, but it’s still way better than what semi-gibber
<ori> ish password gave us. This means it is possible to create a moderately secure password scheme where users wouldn’t even have to remember the word order!"
<TimStarling> when I tried to submit my greasemonkey captcha spell checker, it was silently rejected for this reason
<csteipp> We could give back hashes of the passwords... not sure if that would compress well though
<ori> could be interesting to just do away with the notion of free-form text input for passwords and just try to devise an implementation that generates passwords that are both secure and memorable
<gwicke> csteipp, that compresses much worse than the words themselves
<Isarra> Looking at it linguistically, what if you don't even use real words, but word structures? Or would that even work?
<brion> ori: that'd be very interesting research for another time, probably beyond our scope just now :)
<ori> brion: yeah, i deliberately waited for us to be moving on; i don't propose we start discussing that now
<ori> just a provocative thought
<^d> Use dna for identify verification
* brion hides
<Isarra> And transmit it using magic so it cannot be intercepted.
<ori> ok, configs?
<brion> *quantum* magic.
<brion> ok moving on :)
<sumanah> ^d: whatcha got :)
<jorm> let's just switch to facebook login.
<TimStarling> there's one other RFC which I see has had edits in the last 2 weeks
<duh> that's what we're mainly discussing right now for config
<TimStarling> that is "Overthrow Bugzilla"
<^d> Yeah, we're still hashing out the high level stuff on 109850.
<Isarra> I don't think we need to overthrow bugzilla anymore.
* ori takes a screenshot.
<TimStarling> looks like just discussion though
<TimStarling> nothing substantive
<brion> honestly i'd love to kill bugzilla and replace it with something in-house that integrates with our accounts, our wikis, our chat system, etc. but that's a big project :)
<^d> I'm not seeing much discussion since October or so.
<^d> Soooo, we might have an option there, if people are willing to Break Everything.
<^d> I've started playing around with Phabricator.
<^d> Which does a ton of this bug / code / project management stuff.
<^d> It's come a *long* way since we talked about it 2 years ago.
<Isarra> brion: That. But that's not really specific to bugzilla.
<sumanah> (I see mentions  Deprecating inline styles in case we want to talk about next steps on that in this meeting)
<sumanah> ^d: I heard that from someone at Juniper, that they are enjoying Phabricator
<^d> Lots of people like it now.
<Isarra> Inline styles? We use those?
<ori> I liked it before it was cool!
<Isarra> Oh, right, we do.
<jorm> it might be easier to modify bugzilla to use our accounts than build a new thing.
<ori> architects, can you identify explicitly what it is that we're discussing?
<brion> i don't think we need to continue with the bugzilla discussion just now
* sumanah agrees with Brion
<^d> Can we have the discussion in Zurich or London maybe?
<gwicke> re inline styles, might be relevant
<brion> someone was interested in configuration db, and someone recommended the inline styles thing which is of interest
<ori> ^d: +1!
<^d> I'd like us to have the discussion.
<brion> ^d: yes please, that'd be a good time for it
<sumanah> a "let's switch to Phabricator" RFC would probably be good to prepare before that meeting so we are all on the same pg
<brion> #info let's talk about phabricator vs bugzilla in zurich, there's some interest.
<sumanah> imo
<brion> #action ^d put together some notes on that
<ori> that's a good resolution
<^d> If anyone's interested in playing with it in the meantime:
<jorm> it would be as difficult if not moreso than the switch to gerrit.
<jorm> with much wailing and gnashing of teeth as we cull dead bugs.
<sumanah> jorm: even though we would not be switching version control systems at the same time? I'm not sure
<Isarra> Cleanup is good.
<jorm> trust me.  i've done this EXACT THING before.
<^d> Anyway, let's move back to Config.
<brion> ok config then?
<sumanah> anything you need ^d to move forward?
<^d> So, Config is kind of moving along at a high level on
<^d> Interested parties please bikeshed.
<^d> I don't think anyone's really thought much about the backend parts since the summit, but that's fine.
<^d> (At some point, it'd be nice to start making decisions and consolidate the 3 RFCs)
<duh> yeah, I think we should focus on getting the interface part done first, and then move on to the backend
<ori> I really don't want to work on that. I submitted the patch because there were a few design decisions with the initial patch that seemed like clear-cut errors of judgment to me and I wanted to fix them before we started building things on top
<^d> Yeah. And thanks.
<^d> Hopefully we can stop bikeshedding on 190850 soon.
<ori> if someone wants to take over that patch, I'd be delighted. If not, I'll try to identify where consensus is at at the moment and update the patch to reflect it
<sumanah> duh:  you mean the wrapper around globals, or something else?
<duh> sumanah: yeah, that basically.
<^d> ori: I basically rewrote it.
<ori> ^d: great; can we consider it yours?
<sumanah> so in we evidently (according to RobLa) agreed that there'd be 3 RFCs - ^d you'd rather it be 1?
<^d> Well, there's 3 rfcs trying to solve the whole thing.
<duh> I think ^d wants to consolidate the three existing RfCs into one
<^d> Yeah, and then write RFCs for the second and third parts.
<duh> At the summit we split it into three parts, interface, backend, and frontend
<^d> Or expand it into 3 parts.
<^d> Who knows.
<^d> Yeah
<^d> duh's got it
<duh> I'm not sure we need an explicit RfC for the interface since we're mainly hashing it out in gerrit right now
<brion> sounds sensible
<brion> anything else we need to hash out here?
<^d> Yeah, it's mainly the backend and frontend that needs RFCs.
<^d> Nope, I don't think so.
<sumanah> re the next meeting - would anyone particularly mind if we did this again next week, with a focus on password mgmt and the config db and bd808's work, + I will try to get Daniel Kinzler in to push TitleValue forward, and the HTML Formatting crowd?
<^d> brion: Maybe if you could drop a few comments on that gerrit change so to sanity check we're going the right way?
<sumanah> I can find a time that works better for different people
<brion> ^d, duh, ori : who's volunteering to work on said docs ? :)
* bd808 feels peer pressure
<gwicke> sumanah, what's HTML formatting?
<^d> brion: I'm going to figure out the RFC status.
<sumanah> and
<sumanah> sorry, I meant templating
<sumanah> we gotta define requirements on that
<brion> #action ^d will tidy up the RFC status for configuration: backend, frontend bits
<duh> brion: I will help out ^d :P
* ori nods
<brion> ok
<gwicke> sumanah, those are too many subjects
<ori> i'm...going to look busy
<sumanah> gwicke: you're probably right. we could take out the least urgent 1 or 2
<bd808> I don't think I can commit to being ready for discussion by next week
<gwicke> more than two subjects is unlikely to result in any depth
<sumanah> ok. how about titlevalue + HTML Formatting for in-depth discussion, and hopefully quick checkins on progress in other RFCs just to break blockers/get reviewers/etc
<brion> works for me
* sumanah defers to others of course, just throwing it out there
<brion> i'd like a quick checkin on deprecating inline styles
<brion> if jon's not intereted i'll take that over
<sumanah> TimStarling? agenda sound good for next week?
<brion> (next week not now)
<^d> duh: I'm working on amending the patch again.
<TimStarling> well, if we want to talk about TitleValue, it'll have to be in a timezone suitable for europe, right?
<sumanah> sure, we can switch the time around, I can run a Doodle or similar to get the Germans in
<gwicke> sumanah, I think HTML templating is a bit early
<brion> html tempting may still be too ill defined in focus
<brion> *templating
<duh> ^d: sweet
<sumanah> gwicke: ok, let's talk on wikitech-l or similar; sounds like the HTML templating group does need to identify the major questions that still need resolution
<Isarra> Oh, random question about phabricator - is there any way for volunteers to investigate it?
<brion> but if we think we have a good handle on narrowing it down that'd be great
<sumanah> (as RobLa assesses)
<csteipp> html templating needs to get a few things bashed out first I think... I think a few more weeks on that front
<brion> Isarra: and bug ^d about any details :D
<gwicke> re templating: the basics are available and prototyped, but we need to think about longer-term stuff, in particular for messages and content
<^d> Isarra: Yeah, just sign up any any admin will approve your account.
<^d> Usually takes just a few mins.
<gwicke> how to incorporate data pull in particular
<sumanah> ok, it's almost been an hour, sounds like we have action items for various folks
<sumanah> I'll reach out to Nik + Kinzler + Aude et alia and ask them to prep TitleValue stuff so we can chat about it next week, and get a good time for next week
<sumanah> if that makes sense for people
<brion> ok folks please feel free to append or modify if you see anything important missing
<Isarra> ^d: Email address must be at one of:,
<^d> Gah.
<^d> That didn't work like I expected.
<sumanah> #endmeeting
<^d> Lemme fix.
<sumanah> :)
<^d> Isarra: Try again
<Isarra> Thanks. >.<
* TimStarling sends a carrier pigeon to wm-meetbot with text "#endmeeting"
<brion> \o/