Talk:Requests for comment/Retained account data self-discovery

IRC meeting 2013-09-24
&lt;TimStarling>	yeah, I'm not sure if we should do this one * 	AaronSchulz feels the same &lt;bawolff>	Seems more a political issue then a technical issue &lt;parent5446>	I think there's a big difference between "let's look at my recent session information" and "let's look at all IP addresses I've ever used". &lt;TimStarling>	I'm worried that it would increase the impact of account compromise &lt;Elsie>	How so? &lt;brion>	bawolff: political issues drive technical decisions ... &lt;Elsie>	parent5446: Not all IP addresses ever. Just the ones being hoarded. &lt;AaronSchulz>	TimStarling: good point &lt;TimStarling>	if someone compromises a user account, with this feature, they would have access to past IP addresses of the user &lt;bawolff>	brion: True, but political issues probably shouldn't be decided in the RFC process &lt;Elsie>	And? &lt;^d>	+1 to Tim &lt;parent5446>	Elsie: hoarded? What do you mean? &lt;brion>	bawolff: true dat &lt;legoktm>	parent5446: whatever is stored in the checkuser table at the time &lt;Elsie>	^ &lt;parent5446>	Ah OK. &lt;Elsie>	It's purged after 90 days. &lt;Elsie>	We think. &lt;sumanah>	cscott: the author is https://www.mediawiki.org/wiki/User:Blackjack48 https://www.mediawiki.org/wiki/Summer_of_Code_Past_Projects#Watchlist_grouping_and_workflow_improvements &lt;parent5446>	Well I can understand the security reasoning behind it, i.e., wanting to automatically log out suspicious sessions. &lt;ori-l>	the only explanation given for why this would be useful is "in the interest of freedom of information and enhancing account security" &lt;cscott>	the rfc says, "it should be possible for users to see the private data stored about themselves at any time." but i find that phrase very problematic. is that "all information from same IP address"? "all information stored with the current logged-in user"? "all information associated with current user-agent"? what? &lt;RoanKattouw>	Eek &lt;manybubbles>	how does google do it? not like they are privacy kings, but they have a similar self reporting feature and they've probably thought through the account compromise issue. &lt;ori-l>	I think that rather than guess what the exactly the author had in mind, we should request that it be expanded, and detailed use-cases provided &lt;Elsie>	cscott: Option 2, of coruse. &lt;manybubbles>	not that we should copy them, but they are a point of view &lt;AaronSchulz>	I'd assume all rows for that user ID &lt;RoanKattouw>	That does sound a bit potentially creepy &lt;parent5446>	GMail only shows active sessions. &lt;gwicke>	the google self reporting is a joke really &lt;AaronSchulz>	but not all info for those IPs &lt;Elsie>	manybubbles: Gmail is mentioned int he RFC. &lt;gwicke>	mostly a PR gag &lt;parent5446>	It shows the client, the IP address, and the last time of activity. &lt;parent5446>	And it lets you log them all out. &lt;ori-l>	gwicke: yeah. &lt;cscott>	Elsie, AaronSchulz: i think it should be specified in the RFC. &lt;parent5446>	This functionality exists in Extension:SecureSessions. &lt;TimStarling>	this does not sound like the proposed feature &lt;Elsie>	cscott: It? &lt;AaronSchulz>	sure &lt;ori-l>	I think we're projecting applications for this RFC that the author may or may not have intended &lt;TimStarling>	the proposed feature is a tab in Special:Preferences &lt;ori-l>	it needs to be spelled out &lt;cscott>	Elsie: the scope of the data which is shown. esp for logged-out users, etc. &lt;TimStarling>	it's not really presented as a session security feature &lt;Elsie>	Is it appropriate for Wikimedia to keep secret data about users? &lt;RoanKattouw>	Google tells me when I logged in using what browser, and from where geographically &lt;RoanKattouw>	I don't see it giving me IP or UA information directly &lt;manybubbles>	Elsie: I mean the account dashboard. &lt;parent5446>	TimStarling: Agreed, which is why I don't exactly support this RFC. &lt;ori-l>	oh, it's Elsie. I didn't realize you were the author. &lt;Elsie>	RoanKattouw: It does, but it's a bit hidden. At least it does for Gmail. &lt;TimStarling>	yes, Elsie is the author and requested this review &lt;brion>	so as a matter of technical implementation, this is totally doable. whether it's a good idea is a bigger question &lt;parent5446>	RoanKattouw: That's weird. I can see it in my Gmail. &lt;AaronSchulz>	just showing the current info in prefs at least doesn't have the account compromise issue &lt;Elsie>	It's fairly trivial to do, given that the lookup feature already exists. &lt;TimStarling>	I'm not really seeing the ethical advantages, if it's not a security feature &lt;parent5446>	Agreed, outside of session security, I don't see the purpose of having this information. &lt;Elsie>	You think Wikimedia should retain secret data about you? &lt;cscott>	why isn't there a patch associated with this? &lt;TimStarling>	maybe it is a principle that people should be given access to information about them &lt;Elsie>	cscott: You haven't written one yet. &lt;TimStarling>	but it's difficult to prove that the information being requested is about the requestor &lt;Elsie>	A fairly accepted principle in Western societies, I think. &lt;TimStarling>	in fact, the motivating application is one where it is not, correct? &lt;brion>	Elsie: so technically speaking you can collect that data yourself, as it comes from your computer &lt;cscott>	i think it's reasonable to ask for a patch as a next step. that doesn't imply that the patch would be accepted. &lt;ori-l>	Elsie: I agree with you, but I don't think that it's enough to go on, in terms of actually articulating an implementation &lt;TimStarling>	if someone else has compromised your session and is acting as you, that is no longer information about you &lt;^d>	cscott: I disagree. &lt;Elsie>	brion: Sure. &lt;^d>	I think there's fundamental questions that need working out before anyone wastes time on code. &lt;gwicke>	it is not reasonable to provide all data that exists somewhere in the system (including logs etc) about a user &lt;Elsie>	gwicke: Why's that? &lt;StevenW>	what ^d said &lt;cscott>	^d: sure, but implementations questions are best answered with an implementation. &lt;parent5446>	I agree with ^d. The purpose of an RFC is to get input *before* submitting a patchset. &lt;Elsie>	One of Wikimedia's core principles is transparency. &lt;gwicke>	Elsie: producing all that info would be very expensive &lt;bawolff>	There is a difference between all information potentially in the system, and all info accessible by a checkuser &lt;Elsie>	gwicke: Like... Special:Log? And Special:Contributions? &lt;brion>	ok, so I don't think we can make the call here, other than a technical recommendation &lt;gwicke>	you'd have to correlate all logs with user IPs etc &lt;Elsie>	It's an indexed data set. &lt;parent5446>	Playing devil's advocate here: we could always require a password to view the data. Makes the situation a bit better. &lt;kylu>	Maybe it would be more reasonable to provide "Information about your current session" and show useragent, ip, xff, etc... for just the requesting session. &lt;brion>	but i've got no idea who to kick a decision to :) &lt;^d>	Ok, so can I agree with everyone here? &lt;Elsie>	So is the CheckUser table, for that matter. &lt;TimStarling>	Elsie, I think you need to be able to specify what exactly are the ethical advantages of this &lt;^d>	In general, I like the idea of letting people have info about themselves. &lt;manybubbles>	would it make it better if we didn't return the ip address? &lt;gwicke>	Elsie: not all logs are in the DB &lt;bawolff>	brion: I'd say the folks who hang out at meta or foundation-l &lt;cscott>	again, people who are worried about efficiency would be answered if we were looking at a concrete implementation, instead of speculating about what information was going to be returned &lt;^d>	But I think Tim's right, we need a better clarification of exactly what info, what problem it solves, and how we can make sure the info remains secure. &lt;ori-l>	in general, legitimate &amp; verifiable requests by users for data that pertains to them should be gratified, but articulating how processes and tools for data disclosure should behave requires concrete use-cases &lt;Elsie>	The account security thing is a red herring. &lt;TimStarling>	I mean in terms of practical benefits, rather than with reference to principles of FOI &lt;Elsie>	Tim is arguing that a compromised account would leak data, but that's true of any auth system ever. &lt;RoanKattouw>	cscott: That can also be done by adding a couple sentences to the RFC along the lines of "we don't propose to collect any new data, just what's already in the checkuser table", or whatever &lt;gwicke>	yeah, some data might have a use case, but *all data* is just not possible &lt;Elsie>	All logged data, then? &lt;Elsie>	All indexed and logged data? &lt;csteipp>	There's also the issue of login csrf.. which could decloak the target. Which I don't like. &lt;parent5446>	I should note that this RFC is also proposing allowing the user to see whenever they are looked up. &lt;manybubbles>	all data that we can grab quickly? &lt;cscott>	RoanKattouw: sure, but i also have some questions about how the data is presented, how corner cases are handled, etc. &lt;Elsie>	parent5446: Is it? &lt;TimStarling>	well, I am leaning towards just closing this, if Elsie has no more arguments &lt;bawolff>	Non-technical users might experience an increase in trust if they can see the information instead of just knowing its some technical mumble jumble they could theoretically collect themselves if they knew what the words "IP address" meant &lt;cscott>	i think we're spending excessive amounts of time on this because the RFC is not concrete enough. &lt;TimStarling>	was anyone in favour other than Elsie? &lt;parent5446>	Elsie: In ===Special:Preference section=== &lt;manybubbles>	csteipp: it'd certainly have to be protected. something like forcing a password verification on a form with csrf protection field &lt;^d>	TimStarling: Given appropriate clarifications and refinement in scope, I could agree to it. &lt;^d>	But as is, no &lt;Elsie>	parent5446: Perhaps just poorly worded. It's not talking about the CU log. &lt;RoanKattouw>	Let's refine the RFC &lt;brion>	TimStarling: i like the general idea, in theory. but i don't think it's our call either &lt;TimStarling>	ok, let's leave it open then &lt;csteipp>	manybubbles: no, if I log you in as me, then I know my password, and I can see your ip :) &lt;Elsie>	It's marked as being in draft. &lt;ori-l>	agree with ^d &lt;MatmaRex>	TimStarling: i don't think anybody is very much opposed to it, either &lt;parent5446>	Elsie: "This log tracks queries of the database when a user checks another user (or themselves)," &lt;MatmaRex>	it just needs clarification &lt;bawolff>	csteipp: Maybe even go futher, require the user to click on link via email to "unlock" info &lt;cscott>	agree with ^d &lt;parent5446>	Maybe worded badly. &lt;manybubbles>	csteipp: that is why I advocate not sending back ips - I said it earlier, but it got lost :) &lt;MatmaRex>	(or concretization, rather) &lt;Elsie>	parent5446: Right. It's discussing the private CU log, but then it transitions. &lt;csteipp>	manybubbles: Ah, yep, I think me too &lt;parent5446>	Oh I see. &lt;^d>	Well now we're at an hour. &lt;manybubbles>	csteipp: the problem is just how much are we willing to leak to people that get your password. if the answer is "nothing" then we should close the rfc as too dangerous &lt;TimStarling>	on brion's point of procedure &lt;brion>	manybubbles: two-factor auth? &lt;TimStarling>	we could ask for legal department review &lt;Elsie>	brion: Who's in a position to decide? &lt;TimStarling>	since they are most familiar with FOI, data protection, etc. &lt;brion>	mmmm could do &lt;cscott>	i think the RFC still needs to be more specific about what data is presented &lt;Elsie>	The legal department can certainly review, but I'm not sure they can decide. &lt;Elsie>	It's not a legal issue to show a user his or her own data, surely. &lt;TimStarling>	Luis Villa would probably be interested &lt;sumanah>	TimStarling: our chart says Michelle Paulson as well &lt;aude>	oh no, missed the meeting.... &lt;aude>	almost &lt;brion>	*nod* advice from legal would be great, but they're not meant to make decisions on what we should do &lt;Elsie>	https://bugzilla.wikimedia.org/show_bug.cgi?id=27242 is the relevant bug. &lt;MaxSem>	Elsie, ...or someone seeing the data stolen from them &lt;Elsie>	MaxSem: Someone compromises an account and then what, exactly? &lt;Elsie>	Let's assume it's a privileged account, even though like 99.99% of accounts aren't. &lt;MaxSem>	and then they see original owner's IP &lt;TimStarling>	ok, so the decision on this is: clarify and expand RFC, request legal opinion &lt;MaxSem>	yup &lt;bawolff>	Finds my IP address, locates me, and coerces me into abusing my privileges (worst case) &lt;parent5446>	Agreed &lt;TimStarling>	that's one hour up &lt;brion>	agreed &lt;cscott>	TimStarling: +1 &lt;Krenair>	yes &lt;Elsie>	bawolff: They've already compromised your privileged account, presumably. &lt;manybubbles>	Elsie: gets the user's ip and list of edits. they decide that they are editing seditious material. they get their internet provider to tell them where they are using the ip and the edit date..... worst case scenario. &lt;bawolff>	Elsie: yeah, I suppose &lt;bawolff>	Someone who wants to do me harm finding out my location is worst case I suppose &lt;Elsie>	bawolff: They would also set up their own Web server and send you a link. &lt;Elsie>	Which costs almost nothing and would be dramatically easier.

Legal Comment
Tim emailed to say hi :) Legal perspective, in a nutshell:

Hope that helps. -LVilla (WMF) (talk) 18:38, 25 September 2013 (UTC)
 * 1) It's a generally accepted norm among privacy people (and sometimes, though not commonly yet, required by law) that users should have the ability to see what information is held about them. This RFC seems to fit pretty well in that vein, so we're generally pretty positive about this.
 * 2) While I think it is overall a positive, it'd still be useful to drill down more on motivations here. Knowing that would help frame how the information is presented; help limit exactly what information is shown; etc. For example, if the answer is just generally "because we should tell people what we know about them", then the energy might be better spent considering a more broad-based framework; if the answer is "this is about login security", then it's probably more useful to show a filtered view (like the gmail geographic information) than the raw checkuser information; etc.
 * 3) Like some of the folks in the IRC chat, I think it'd be good to be very precise about exactly what we're exposing here - I have what I think is a common-sense interpretation of what the RFC means, but there is lots of devil in the details, so we can't really give a blessing yet.

Next steps?
Hi MZMcBride,

Tim, Luis and some other engineers have expressed doubts about the current RFC. Are you planning to incorporate their feedback or will you withdraw this RFC? Drdee (talk) 21:14, 17 December 2013 (UTC)
 * Hi. I think it's accurate to say that Tim expressed doubt. Legal and others (including Brion) have seemed fairly positive on the idea. Some now consider it expected behavior given what sites such as Gmail do (cf. AGK's changing attitude on the relevant bug report). I have no intention of withdrawing this RFC (whatever that means). This is actually fairly easy to implement, I believe, but as Luis notes above, it's just a matter of the details. If anyone else wants to help out, that'd be most welcome, of course. I'm happy to offer guidance where I can. The design also needs thought. Perhaps a dedicated Special page exposed via a link at user preferences (Special:Preferences). Or maybe as part of the info action. Perhaps Jared or Brandon will be able to weigh in. --MZMcBride (talk) 13:56, 18 December 2013 (UTC)

Regarding hackers, wouldn't it be vastly easier to simply trick a user into clicking a link to get his or her IP info and User-Agent? For example, I could easily drop a Labs (née Toolserver) link in a reply to you and simply wait for you to click (or send the link to you privately via e-mail to reduce noise in the server access logs). If you're a dictator, I think you'll spend less time hacking into people's wiki accounts and more time setting up spying software in your government-run Internet infrastructure. We may be getting a bit off-track here, though. The general idea is to expose to a user his or her retained account info. I think there's general agreement that if that's possible, we should. Legoktm and Prtksxna have been working on Extension:AccountInfo. Exciting times ahead! :-) --MZMcBride (talk) 03:58, 2 March 2014 (UTC)
 * I'm a bit worried about this. Under the Swedish sv:personuppgiftslagen, you have the right to get a compilation of all data stored about you once per year, and I believe that this originates from an EU directive, so I'd assume that this also is the case in other EU countries. If members of EU-hosted wiki projects can find the data this way instead of sending an application to the host, then this may mean fewer applications for the host to process, simplifying things for the host. On the other hand, it gives hackers a method to find information about users they don't like. For example, if I wish to sue someone for violating my rights, or if I am the dictator of some country and wish to send people who post inappropriate content to a wiki project to forced labour camps, then I could try hacking into the account and take a look at the most recent web browsers and IP addresses. Hopefully, this will help me identify the user, and then I can sue the user and send him to the forced labour camp. For the moment, all I can find that way is the e-mail address, which might just be an anonymous webmail account. So: This feature should probably be offered (to simplify things for EU-hosted wiki projects), but it should probably not be enabled in Mediawiki installations by default. I'm not sure if it is a good idea to enable this on Wikimedia's wiki projects. --Stefan2 (talk) 00:57, 20 February 2014 (UTC)
 * Hi Stefan2.
 * MZ, I sympathize with the motivations behind the proposal (as far as I understand them), but I think this misses the point on the new threat models that it could open. (BTW, IIRC Tool Labs doesn't give you access to the web access logs for your tool.) Some remarks below. Regards, Tbayer (WMF) (talk) 07:27, 5 April 2015 (UTC)

Opt-out/opt-in
Apart from the hacking scenario discussed above, it seems to me that the discussion hasn't yet considered a different threat model, namely that the user could be pressured by others into accessing his or her data and provide it to them. E.g.:
 * A politically active Wikipedian in an oppressive country stays with various friends/family members/fellow activists over the course of several months while editing Wikipedia. After being arrested, the authorities force the Wikipedian to log into their account and hand over the IPs of all these contact persons.
 * A suspicious spouse/partner/parent asks a highly active Wikipedian to provide evidence that they have only been in certain locations while editing.
 * A company whose workplace policy forbids editing Wikipedia from work computers demands proof from an employee that they haven't done so in the last three months.
 * A local Wikimedia community that considers the Checkuser policy a bit too strict starts encouraging certain users (e.g. those suspected of sockpuppeting, or candidates for higher functions) to "voluntarily" submit a copy of their data to some trusted local users.

I'm not saying that these potential downsides should prevent the new feature, but I think they make a strong argument for providing an opt-out (you tick a box and from then on can't access the data until the box is unticked again, after which you can only access data collected since the box was unticked).

Going further, if the main purpose of the new feature is to enable those who want to keep checks on the Foundation's data collection practices, wouldn't it be sufficient for this if it were opt-in? I.e. you tick a box in the preferences and from then on can access all the data collected after ticking the box. This would also make the below concerns about password security obsolete, because we can then warn the user about this when they opt in.

Regards, Tbayer (WMF) (talk) 07:27, 5 April 2015 (UTC)

Existing examples or best practices?
Are there any other websites (perhaps in Europe) which already offer such a feature? It would be useful to take a look at best practices. E.g., do they give access to this data based on the existing password only, or do they require separate identification?

(As others have remarked, the Gmail example seems quite different to what is being proposed here. On the one hand, because it's designed to detect unauthorized usage - which is rather trivial in case of unauthorized edits - and on the other hand because it is nowhere near providing access to all the logged data.)

Regards, Tbayer (WMF) (talk) 07:27, 5 April 2015 (UTC)

This changes requirements for password security
Last year, the initiator of this RfC argued repeatedly in a different context that generally, "Wikimedia wiki accounts are nearly valueless" and not worth protecting with minimum password strength requirements. Not everyone agreed (I didnt'), but I think this position did have considerable influence on the outcome of Requests for comment/Passwords.

Given that the proposed new feature will suddenly charge the passwords of hundreds of thousands user (those who made an edit in the last 3 months) with an entirely new capability (probably without most of them being aware of it), and that the checkuser data it gives access to is regarded as highly sensitive by many users, I think it might make sense to revisit the password strength requirements before the new feature is deployed.

Regards, Tbayer (WMF) (talk) 07:27, 5 April 2015 (UTC)
 * I completely agree with this. Getting access to an account of a random user will be most likely valueless (in the best case you will know their email, but you can also get it otherwise). Getting access to the list of IPs and UAs of a random user can be extremely valuable: you can get information about their residence (if they edit from home and their IP is static), their work (if they edit from work and whois record provides the name of the company), places they visit (if they edited from any public computers or wireless networks) etc. The most obvious use of this is to identify a user's real-life identity, and in the best case you can get both their home address, the company they work for and some information, places they visit etc. I think that getting unauthorised access to such information can be extremely dangerous for the concerned users — NickK (talk) 17:03, 5 April 2015 (UTC)
 * And by the way, password security cannot be the only solution in such case, two-step verification will be needed. The simplest scenario is that an employer installs a keylogger to make sure their employees do not use Wikipedia. This is enough to get even the most complicated password. The only way to protect an account from this will be a two-step verification, either with a code send by email, by SMS or using a special app. Perhaps two-step verification can be needed only to access this feature, but for security reasons the second step verification should be specified upon registration, not immediately before accessing this feature (otherwise this employer will obviously use his cell phone number to access this feature) — NickK (talk) 17:10, 5 April 2015 (UTC)

Is it technically necessary to include UA informaton?
I'm not too happy about this as it seems to me as though it would make it easier to create socks, or for an editor to see why they were caught when they created sockpuppets. Dougweller (talk) 08:35, 5 April 2015 (UTC)


 * I agree with Dougweller in this point.--Elph (talk) 09:55, 5 April 2015 (UTC)
 * Could the UA be simplified to make it user-friendly? For instance, OS (e.g. Windows XP rather than Windows 5.1) and major broswer details (e.g. Firefox v37 rather than the exact version) should suffice. - Mailer diablo (talk) 10:17, 5 April 2015 (UTC)


 * I think that including UA information would be quite harmful for the following reasons:
 * about 95% of editors will not understand what is it about. Even if some people can look at Google data to make sure everything is fine, they will most likely understand nothing in their CU data (it's not quite readable)
 * some 3% of editors will understand what it is about and make sure everything is fine, but perhaps make no use of it
 * some 1% of editors may use this data for hiding their abuse (e.g. to better mask spambots, vandal accounts, sockpuppets etc.)
 * some 1% of editors may have this data used against them (the cases Tbayer mentioned above — like checks by an employer, a family member etc.)
 * In total, I think that it is more harmful to include this kind of data than not to include it. Of course, users very concerned by privacy of their account can get some profit of it, but I don't think that this level of detail is really needed. Perhaps the best level of detail we can provide can look like "192.168.0.0 Mobile, last action: 25 March 2015; 127.0.0.0 Desktop, last action: 31 March 2015", all the remaining information will hardly be useful — NickK (talk) 16:54, 5 April 2015 (UTC)