Talk:Requests for comment/Retained account data self-discovery

From mediawiki.org

IRC meeting 2013-09-24[edit]

<TimStarling> yeah, I'm not sure if we should do this one
* AaronSchulz feels the same
<bawolff> Seems more a political issue then a technical issue
<parent5446> I think there's a big difference between "let's look at my recent session information" and "let's look at all IP addresses I've ever used".
<TimStarling> I'm worried that it would increase the impact of account compromise
<Elsie> How so?
<brion> bawolff: political issues drive technical decisions ...
<Elsie> parent5446: Not all IP addresses ever. Just the ones being hoarded.
<AaronSchulz> TimStarling: good point
<TimStarling> if someone compromises a user account, with this feature, they would have access to past IP addresses of the user
<bawolff> brion: True, but political issues probably shouldn't be decided in the RFC process
<Elsie> And?
<^d> +1 to Tim
<parent5446> Elsie: hoarded? What do you mean?
<brion> bawolff: true dat
<legoktm> parent5446: whatever is stored in the checkuser table at the time
<Elsie> ^
<parent5446> Ah OK.
<Elsie> It's purged after 90 days.
<Elsie> We think.
<sumanah> cscott: the author is https://www.mediawiki.org/wiki/User:Blackjack48 https://www.mediawiki.org/wiki/Summer_of_Code_Past_Projects#Watchlist_grouping_and_workflow_improvements
<parent5446> Well I can understand the security reasoning behind it, i.e., wanting to automatically log out suspicious sessions.
<ori-l> the only explanation given for why this would be useful is "in the interest of freedom of information and enhancing account security"
<cscott> the rfc says, "it should be possible for users to see the private data stored about themselves at any time." but i find that phrase very problematic. is that "all information from same IP address"? "all information stored with the current logged-in user"? "all information associated with current user-agent"? what?
<RoanKattouw> Eek
<manybubbles> how does google do it? not like they are privacy kings, but they have a similar self reporting feature and they've probably thought through the account compromise issue.
<ori-l> I think that rather than guess what the exactly the author had in mind, we should request that it be expanded, and detailed use-cases provided
<Elsie> cscott: Option 2, of coruse.
<manybubbles> not that we should copy them, but they are a point of view
<AaronSchulz> I'd assume all rows for that user ID
<RoanKattouw> That does sound a bit potentially creepy
<parent5446> GMail only shows active sessions.
<gwicke> the google self reporting is a joke really
<AaronSchulz> but not all info for those IPs
<Elsie> manybubbles: Gmail is mentioned int he RFC.
<gwicke> mostly a PR gag
<parent5446> It shows the client, the IP address, and the last time of activity.
<parent5446> And it lets you log them all out.
<ori-l> gwicke: yeah.
<cscott> Elsie, AaronSchulz: i think it should be specified in the RFC.
<parent5446> This functionality exists in Extension:SecureSessions.
<TimStarling> this does not sound like the proposed feature
<Elsie> cscott: It?
<AaronSchulz> sure
<ori-l> I think we're projecting applications for this RFC that the author may or may not have intended
<TimStarling> the proposed feature is a tab in Special:Preferences
<ori-l> it needs to be spelled out
<cscott> Elsie: the scope of the data which is shown. esp for logged-out users, etc.
<TimStarling> it's not really presented as a session security feature
<Elsie> Is it appropriate for Wikimedia to keep secret data about users?
<RoanKattouw> Google tells me when I logged in using what browser, and from where geographically
<RoanKattouw> I don't see it giving me IP or UA information directly
<manybubbles> Elsie: I mean the account dashboard.
<parent5446> TimStarling: Agreed, which is why I don't exactly support this RFC.
<ori-l> oh, it's Elsie. I didn't realize you were the author.
<Elsie> RoanKattouw: It does, but it's a bit hidden. At least it does for Gmail.
<TimStarling> yes, Elsie is the author and requested this review
<brion> so as a matter of technical implementation, this is totally doable. whether it's a good idea is a bigger question
<parent5446> RoanKattouw: That's weird. I can see it in my Gmail.
<AaronSchulz> just showing the current info in prefs at least doesn't have the account compromise issue
<Elsie> It's fairly trivial to do, given that the lookup feature already exists.
<TimStarling> I'm not really seeing the ethical advantages, if it's not a security feature
<parent5446> Agreed, outside of session security, I don't see the purpose of having this information.
<Elsie> You think Wikimedia should retain secret data about you?
<cscott> why isn't there a patch associated with this?
<TimStarling> maybe it is a principle that people should be given access to information about them
<Elsie> cscott: You haven't written one yet.
<TimStarling> but it's difficult to prove that the information being requested is about the requestor
<Elsie> A fairly accepted principle in Western societies, I think.
<TimStarling> in fact, the motivating application is one where it is not, correct?
<brion> Elsie: so technically speaking you can collect that data yourself, as it comes from your computer
<cscott> i think it's reasonable to ask for a patch as a next step. that doesn't imply that the patch would be accepted.
<ori-l> Elsie: I agree with you, but I don't think that it's enough to go on, in terms of actually articulating an implementation
<TimStarling> if someone else has compromised your session and is acting as you, that is no longer information about you
<^d> cscott: I disagree.
<Elsie> brion: Sure.
<^d> I think there's fundamental questions that need working out before anyone wastes time on code.
<gwicke> it is not reasonable to provide all data that exists somewhere in the system (including logs etc) about a user
<Elsie> gwicke: Why's that?
<StevenW> what ^d said
<cscott> ^d: sure, but implementations questions are best answered with an implementation.
<parent5446> I agree with ^d. The purpose of an RFC is to get input *before* submitting a patchset.
<Elsie> One of Wikimedia's core principles is transparency.
<gwicke> Elsie: producing all that info would be very expensive
<bawolff> There is a difference between all information potentially in the system, and all info accessible by a checkuser
<Elsie> gwicke: Like... Special:Log? And Special:Contributions?
<brion> ok, so I don't think we can make the call here, other than a technical recommendation
<gwicke> you'd have to correlate all logs with user IPs etc
<Elsie> It's an indexed data set.
<parent5446> Playing devil's advocate here: we could always require a password to view the data. Makes the situation a bit better.
<kylu> Maybe it would be more reasonable to provide "Information about your current session" and show useragent, ip, xff, etc... for just the requesting session.
<brion> but i've got no idea who to kick a decision to :)
<^d> Ok, so can I agree with everyone here?
<Elsie> So is the CheckUser table, for that matter.
<TimStarling> Elsie, I think you need to be able to specify what exactly are the ethical advantages of this
<^d> In general, I like the idea of letting people have info about themselves.
<manybubbles> would it make it better if we didn't return the ip address?
<gwicke> Elsie: not all logs are in the DB
<bawolff> brion: I'd say the folks who hang out at meta or foundation-l
<cscott> again, people who are worried about efficiency would be answered if we were looking at a concrete implementation, instead of speculating about what information was going to be returned
<^d> But I think Tim's right, we need a better clarification of exactly what info, what problem it solves, and how we can make sure the info remains secure.
<ori-l> in general, legitimate & verifiable requests by users for data that pertains to them should be gratified, but articulating how processes and tools for data disclosure should behave requires concrete use-cases
<Elsie> The account security thing is a red herring.
<TimStarling> I mean in terms of practical benefits, rather than with reference to principles of FOI
<Elsie> Tim is arguing that a compromised account would leak data, but that's true of any auth system ever.
<RoanKattouw> cscott: That can also be done by adding a couple sentences to the RFC along the lines of "we don't propose to collect any new data, just what's already in the checkuser table", or whatever
<gwicke> yeah, some data might have a use case, but *all data* is just not possible
<Elsie> All logged data, then?
<Elsie> All indexed and logged data?
<csteipp> There's also the issue of login csrf.. which could decloak the target. Which I don't like.
<parent5446> I should note that this RFC is also proposing allowing the user to see whenever they are looked up.
<manybubbles> all data that we can grab quickly?
<cscott> RoanKattouw: sure, but i also have some questions about how the data is presented, how corner cases are handled, etc.
<Elsie> parent5446: Is it?
<TimStarling> well, I am leaning towards just closing this, if Elsie has no more arguments
<bawolff> Non-technical users might experience an increase in trust if they can see the information instead of just knowing its some technical mumble jumble they could theoretically collect themselves if they knew what the words "IP address" meant
<cscott> i think we're spending excessive amounts of time on this because the RFC is not concrete enough.
<TimStarling> was anyone in favour other than Elsie?
<parent5446> Elsie: In ===Special:Preference section===
<manybubbles> csteipp: it'd certainly have to be protected. something like forcing a password verification on a form with csrf protection field
<^d> TimStarling: Given appropriate clarifications and refinement in scope, I could agree to it.
<^d> But as is, no
<Elsie> parent5446: Perhaps just poorly worded. It's not talking about the CU log.
<RoanKattouw> Let's refine the RFC
<brion> TimStarling: i like the general idea, in theory. but i don't think it's our call either
<TimStarling> ok, let's leave it open then
<csteipp> manybubbles: no, if I log you in as me, then I know my password, and I can see your ip :)
<Elsie> It's marked as being in draft.
<ori-l> agree with ^d
<MatmaRex> TimStarling: i don't think anybody is very much opposed to it, either
<parent5446> Elsie: "This log tracks queries of the database when a user checks another user (or themselves),"
<MatmaRex> it just needs clarification
<bawolff> csteipp: Maybe even go futher, require the user to click on link via email to "unlock" info
<cscott> agree with ^d
<parent5446> Maybe worded badly.
<manybubbles> csteipp: that is why I advocate not sending back ips - I said it earlier, but it got lost :)
<MatmaRex> (or concretization, rather)
<Elsie> parent5446: Right. It's discussing the private CU log, but then it transitions.
<csteipp> manybubbles: Ah, yep, I think me too
<parent5446> Oh I see.
<^d> Well now we're at an hour.
<manybubbles> csteipp: the problem is just how much are we willing to leak to people that get your password. if the answer is "nothing" then we should close the rfc as too dangerous
<TimStarling> on brion's point of procedure
<brion> manybubbles: two-factor auth?
<TimStarling> we could ask for legal department review
<Elsie> brion: Who's in a position to decide?
<TimStarling> since they are most familiar with FOI, data protection, etc.
<brion> mmmm could do
<cscott> i think the RFC still needs to be more specific about what data is presented
<Elsie> The legal department can certainly review, but I'm not sure they can decide.
<Elsie> It's not a legal issue to show a user his or her own data, surely.
<TimStarling> Luis Villa would probably be interested
<sumanah> TimStarling: our chart says Michelle Paulson as well
<aude> oh no, missed the meeting....
<aude> almost
<brion> *nod* advice from legal would be great, but they're not meant to make decisions on what we should do
<Elsie> https://bugzilla.wikimedia.org/show_bug.cgi?id=27242 is the relevant bug.
<MaxSem> Elsie, ...or someone seeing the data stolen from them
<Elsie> MaxSem: Someone compromises an account and then what, exactly?
<Elsie> Let's assume it's a privileged account, even though like 99.99% of accounts aren't.
<MaxSem> and then they see original owner's IP
<TimStarling> ok, so the decision on this is: clarify and expand RFC, request legal opinion
<MaxSem> yup
<bawolff> Finds my IP address, locates me, and coerces me into abusing my privileges (worst case)
<parent5446> Agreed
<TimStarling> that's one hour up
<brion> agreed
<cscott> TimStarling: +1
<Krenair> yes
<Elsie> bawolff: They've already compromised your privileged account, presumably.
<manybubbles> Elsie: gets the user's ip and list of edits. they decide that they are editing seditious material. they get their internet provider to tell them where they are using the ip and the edit date..... worst case scenario.
<bawolff> Elsie: yeah, I suppose
<bawolff> Someone who wants to do me harm finding out my location is worst case I suppose
<Elsie> bawolff: They would also set up their own Web server and send you a link.
<Elsie> Which costs almost nothing and would be dramatically easier.

Legal Comment[edit]

Tim emailed to say hi :) Legal perspective, in a nutshell:

  1. It's a generally accepted norm among privacy people (and sometimes, though not commonly yet, required by law) that users should have the ability to see what information is held about them. This RFC seems to fit pretty well in that vein, so we're generally pretty positive about this.
  2. While I think it is overall a positive, it'd still be useful to drill down more on motivations here. Knowing that would help frame how the information is presented; help limit exactly what information is shown; etc. For example, if the answer is just generally "because we should tell people what we know about them", then the energy might be better spent considering a more broad-based framework; if the answer is "this is about login security", then it's probably more useful to show a filtered view (like the gmail geographic information) than the raw checkuser information; etc.
  3. Like some of the folks in the IRC chat, I think it'd be good to be very precise about exactly what we're exposing here - I have what I think is a common-sense interpretation of what the RFC means, but there is lots of devil in the details, so we can't really give a blessing yet.

Hope that helps. -LVilla (WMF) (talk) 18:38, 25 September 2013 (UTC)Reply

Next steps?[edit]

Hi MZMcBride,

Tim, Luis and some other engineers have expressed doubts about the current RFC. Are you planning to incorporate their feedback or will you withdraw this RFC? Drdee (talk) 21:14, 17 December 2013 (UTC)Reply

Hi. I think it's accurate to say that Tim expressed doubt. Legal and others (including Brion) have seemed fairly positive on the idea. Some now consider it expected behavior given what sites such as Gmail do (cf. AGK's changing attitude on the relevant bug report). I have no intention of withdrawing this RFC (whatever that means). This is actually fairly easy to implement, I believe, but as Luis notes above, it's just a matter of the details. If anyone else wants to help out, that'd be most welcome, of course. I'm happy to offer guidance where I can. The design also needs thought. Perhaps a dedicated Special page exposed via a link at user preferences (Special:Preferences). Or maybe as part of the info action. Perhaps Jared or Brandon will be able to weigh in. --MZMcBride (talk) 13:56, 18 December 2013 (UTC)Reply
  • I'm a bit worried about this. Under the Swedish sv:personuppgiftslagen, you have the right to get a compilation of all data stored about you once per year, and I believe that this originates from an EU directive, so I'd assume that this also is the case in other EU countries. If members of EU-hosted wiki projects can find the data this way instead of sending an application to the host, then this may mean fewer applications for the host to process, simplifying things for the host. On the other hand, it gives hackers a method to find information about users they don't like. For example, if I wish to sue someone for violating my rights, or if I am the dictator of some country and wish to send people who post inappropriate content to a wiki project to forced labour camps, then I could try hacking into the account and take a look at the most recent web browsers and IP addresses. Hopefully, this will help me identify the user, and then I can sue the user and send him to the forced labour camp. For the moment, all I can find that way is the e-mail address, which might just be an anonymous webmail account. So: This feature should probably be offered (to simplify things for EU-hosted wiki projects), but it should probably not be enabled in Mediawiki installations by default. I'm not sure if it is a good idea to enable this on Wikimedia's wiki projects. --Stefan2 (talk) 00:57, 20 February 2014 (UTC)Reply
    • Hi Stefan2.
      Regarding hackers, wouldn't it be vastly easier to simply trick a user into clicking a link to get his or her IP info and User-Agent? For example, I could easily drop a Labs (née Toolserver) link in a reply to you and simply wait for you to click (or send the link to you privately via e-mail to reduce noise in the server access logs). If you're a dictator, I think you'll spend less time hacking into people's wiki accounts and more time setting up spying software in your government-run Internet infrastructure.
      We may be getting a bit off-track here, though. The general idea is to expose to a user his or her retained account info. I think there's general agreement that if that's possible, we should. Legoktm and Prtksxna have been working on Extension:AccountInfo. Exciting times ahead! :-) --MZMcBride (talk) 03:58, 2 March 2014 (UTC)Reply
MZ, I sympathize with the motivations behind the proposal (as far as I understand them), but I think this misses the point on the new threat models that it could open. (BTW, IIRC Tool Labs doesn't give you access to the web access logs for your tool.) Some remarks below. Regards, Tbayer (WMF) (talk) 07:27, 5 April 2015 (UTC)Reply
In reply to MZMcBride, I would note that, firstly, it's harder to fool someone with a fake link. Many computer users have had it repeatedly drilled into their heads to never click a link in an email unless you're sure it's from a trusted sender, and while that's not 100%, the success rate of such phishing attacks is far lower than it was years ago. If you're targeting a specific person that way, and they notice, then not only have you not gotten your information, but you may have just alerted them that someone's trying. And even though the probability of this occurring in any individual case is low, the potential for damage is very high. If you quietly compromise their account, and they're not in the habit of checking their "self-checkuser" information (most people won't, or wouldn't know how to spot suspicious patterns even if they did), you could compose a record of their activities and locations over a span of months, and possibly have the opportunity to compromise the machine(s) they use. Not information you want a cracker or a stalker in possession of. The risk of this is therefore very great. The "reward" is...what? People can see we store their IP? That's not secret and happens anywhere you visit. I'd prefer to show only the times of recent logins, while not revealing exactly how far back data is available. That would still allow people to monitor for suspicious activity, while not enabling abuse or placing users at risk. Seraphimblade (talk) 02:32, 6 April 2015 (UTC)Reply
Hi Seraphimblade. What do you think about the comparisons people have made to Gmail? I think other sites have similar session manager-type interfaces as well. And what do you think about Extension:AccountInfo? --MZMcBride (talk) 03:32, 6 April 2015 (UTC)Reply
MZMcBride, I thought the above was clear, but I think it should not be implemented. Letting people see when checkusers were run on them is a tremendously bad idea for combating abuse, letting them see exactly what information is still stored only slightly less so. And it is an exceptional risk if compromised. So far as improving security, Google has taken a far greater step toward that with password security requirements and two-factor authentication, and we should do that if we wish to provide greater security for people's accounts. Google, as a rule, doesn't have to combat skilled sockpuppeteers. We do. Seraphimblade (talk) 05:01, 6 April 2015 (UTC)Reply
Extension:AccountInfo doesn't expose when CheckUsers were run. I think you may be conflating multiple ideas.
I think your view here is fairly representative of the established view. However, my sense is that the tide is turning. To me, it seems like people are more and more: (a) interested in about what sites are privately storing about them; (b) interested in why their IP address seemingly must be exposed if they edit while logged out (Exposure of user IP addresses (talk)); and (c) interested in account session management (seeing which sessions are currently active for their account).
For what it's worth, I think you're dramatically over-simplifying Google's identity challenges, but that's pretty tangential. --MZMcBride (talk) 14:57, 6 April 2015 (UTC)Reply

Opt-out/opt-in[edit]

Apart from the hacking scenario discussed above, it seems to me that the discussion hasn't yet considered a different threat model, namely that the user could be pressured by others into accessing his or her data and provide it to them. E.g.:

  • A politically active Wikipedian in an oppressive country stays with various friends/family members/fellow activists over the course of several months while editing Wikipedia. After being arrested, the authorities force the Wikipedian to log into their account and hand over the IPs of all these contact persons.
  • A suspicious spouse/partner/parent asks a highly active Wikipedian to provide evidence that they have only been in certain locations while editing.
  • A company whose workplace policy forbids editing Wikipedia from work computers demands proof from an employee that they haven't done so in the last three months.
  • A local Wikimedia community that considers the Checkuser policy a bit too strict starts encouraging certain users (e.g. those suspected of sockpuppeting, or candidates for higher functions) to "voluntarily" submit a copy of their data to some trusted local users.

I'm not saying that these potential downsides should prevent the new feature, but I think they make a strong argument for providing an opt-out (you tick a box and from then on can't access the data until the box is unticked again, after which you can only access data collected since the box was unticked).

Going further, if the main purpose of the new feature is to enable those who want to keep checks on the Foundation's data collection practices, wouldn't it be sufficient for this if it were opt-in? I.e. you tick a box in the preferences and from then on can access all the data collected after ticking the box. This would also make the below concerns about password security obsolete, because we can then warn the user about this when they opt in.

Regards, Tbayer (WMF) (talk) 07:27, 5 April 2015 (UTC)Reply

Existing examples or best practices?[edit]

Are there any other websites (perhaps in Europe) which already offer such a feature? It would be useful to take a look at best practices. E.g., do they give access to this data based on the existing password only, or do they require separate identification?

(As others have remarked, the Gmail example seems quite different to what is being proposed here. On the one hand, because it's designed to detect unauthorized usage - which is rather trivial in case of unauthorized edits - and on the other hand because it is nowhere near providing access to all the logged data.)

Regards, Tbayer (WMF) (talk) 07:27, 5 April 2015 (UTC)Reply

Such feature exists in France, as according to French laws (loi informatique et libertés) any website should provide you all information they store about you. However, in most cases they ask you to send them a request (by post, not by email) to their registered address (see, for example, Air France website), which makes the use of this feature close to zero — NickK (talk) 19:21, 5 April 2015 (UTC)Reply

This changes requirements for password security[edit]

Last year, the initiator of this RfC argued repeatedly [1][2] in a different context that generally, "Wikimedia wiki accounts are nearly valueless" and not worth protecting with minimum password strength requirements. Not everyone agreed (I didnt'), but I think this position did have considerable influence on the outcome of Requests for comment/Passwords.

Given that the proposed new feature will suddenly charge the passwords of hundreds of thousands user (those who made an edit in the last 3 months) with an entirely new capability (probably without most of them being aware of it), and that the checkuser data it gives access to is regarded as highly sensitive by many users, I think it might make sense to revisit the password strength requirements before the new feature is deployed.

Regards, Tbayer (WMF) (talk) 07:27, 5 April 2015 (UTC)Reply

I completely agree with this. Getting access to an account of a random user will be most likely valueless (in the best case you will know their email, but you can also get it otherwise). Getting access to the list of IPs and UAs of a random user can be extremely valuable: you can get information about their residence (if they edit from home and their IP is static), their work (if they edit from work and whois record provides the name of the company), places they visit (if they edited from any public computers or wireless networks) etc. The most obvious use of this is to identify a user's real-life identity, and in the best case you can get both their home address, the company they work for and some information, places they visit etc. I think that getting unauthorised access to such information can be extremely dangerous for the concerned users — NickK (talk) 17:03, 5 April 2015 (UTC)Reply
And by the way, password security cannot be the only solution in such case, two-step verification will be needed. The simplest scenario is that an employer installs a keylogger to make sure their employees do not use Wikipedia. This is enough to get even the most complicated password. The only way to protect an account from this will be a two-step verification, either with a code send by email, by SMS or using a special app. Perhaps two-step verification can be needed only to access this feature, but for security reasons the second step verification should be specified upon registration, not immediately before accessing this feature (otherwise this employer will obviously use his cell phone number to access this feature) — NickK (talk) 17:10, 5 April 2015 (UTC)Reply
If someone wanted to collect random users' IP addresses, wouldn't be trivial to just set up a Web server? I don't really see how passwords are relevant to this proposal.
Out of curiosity: how many accounts would actually be affected by this proposed scenario in which the account has recently edited (within the past 90 days) and uses a very weak password that would be a target for attack? Out of X million user accounts, how many are we actually talking about? Less than 1%? The concern over weak passwords is likely overblown given what percent of user accounts would actually be affected, but harder numbers would be great to have here.
Broadly, if you (Tilman) and others are treating IP addresses as private, it raises the larger question of why store and use them at all, I think. --MZMcBride (talk) 03:28, 6 April 2015 (UTC)Reply
Are there many tools that allow you to find the list of all IP addresses you used over the last 3 months? Only Google offers anything similar, and still it doesn't display the entire 3 months.
I think that 1% is a correct estimation of accounts that can be at risk, as this will concern not just weak passwords, but also passwords captured e.g. using a keylogger.
The fact that IP addesses are private is reflected by the fact that: 1) we have privacy policy in place, 2) people with access to these accounts should identify themselves to the Foundation, 3) ombudsmen are here to detect abuse. That's a very good policy compared to other websites, I think — NickK (talk) 16:23, 6 April 2015 (UTC)Reply

Is it technically necessary to include UA informaton?[edit]

I'm not too happy about this as it seems to me as though it would make it easier to create socks, or for an editor to see why they were caught when they created sockpuppets. Dougweller (talk) 08:35, 5 April 2015 (UTC)Reply

I agree with Dougweller in this point.--Elph (talk) 09:55, 5 April 2015 (UTC)Reply
  • Could the UA be simplified to make it user-friendly? For instance, OS (e.g. Windows XP rather than Windows 5.1) and major broswer details (e.g. Firefox v37 rather than the exact version) should suffice. - Mailer diablo (talk) 10:17, 5 April 2015 (UTC)Reply
  • I think that including UA information would be quite harmful for the following reasons:
    • about 95% of editors will not understand what is it about. Even if some people can look at Google data to make sure everything is fine, they will most likely understand nothing in their CU data (it's not quite readable)
    • some 3% of editors will understand what it is about and make sure everything is fine, but perhaps make no use of it
    • some 1% of editors may use this data for hiding their abuse (e.g. to better mask spambots, vandal accounts, sockpuppets etc.)
    • some 1% of editors may have this data used against them (the cases Tbayer mentioned above — like checks by an employer, a family member etc.)
    In total, I think that it is more harmful to include this kind of data than not to include it. Of course, users very concerned by privacy of their account can get some profit of it, but I don't think that this level of detail is really needed. Perhaps the best level of detail we can provide can look like "192.168.0.0 Mobile, last action: 25 March 2015; 127.0.0.0 Desktop, last action: 31 March 2015", all the remaining information will hardly be useful — NickK (talk) 16:54, 5 April 2015 (UTC)Reply
dear NickK. Why we need to publish data that about 95% of editors will not understand what is it about?--Elph (talk) 12:50, 6 April 2015 (UTC)Reply
That was not my idea. My point is as 95% (well, that's my estimation, but I think it is +/- 5% correct) of users do not need their UAs, we shouldn't disclose them — NickK (talk) 16:17, 6 April 2015 (UTC)Reply
I'm in agreement with Dougweller on this issue and am concerned about security issues. Tiptoety (talk) 16:48, 6 April 2015 (UTC)Reply

About goals: let's be more specific[edit]

I think we deserve a bit more than just one sentence of explanation. It looks like a "because we can"-type project. Can you elaborate a bit more about your goals?

(1) A typical privacy-protecting legislation includes also a goal of transparency, understood as a means of accountability. Just listing "security" is too thin - how actionable should be this information? In other words: if the user discovers something in their data they do not like, should they be able to:

  1. request to correct them?
  2. request to remove them? ("It wasn't me")
  3. raise some concerns regarding those data? to whom?

(2) Why it is in the interest of security and freedom of information to publish data in cu_changes table but not in the cu_log table?

(3) What about some secondary storage of users data - say private exchange via e-mail between checkusers for example? How shall we handle this?

(4) Why shouldn't we, in the interest of security and freedom of information, made access logs available to the user; only edits?

Disclosure: I am currently performing checkuser on Polish Wikipedia.

 « Saper // talk »  22:02, 5 April 2015 (UTC)Reply

"Can you elaborate a bit more about your goals?" Yes, please. As I said in the Phabricator ticket, it sounds to me like solving a non existent problem. Plus, what I also wonder: the RfC says "Instead, there could be an "Retained account data" tab in Special:Preferences that shows the information for any rows in the CheckUser tables matching the currently logged in account." Does that mean that every user could then see whether a CU was performed on their account? (I was bugged by someone for over a year for that information and I refused it for a good reason.... so I'm obviously strongly against that) If not, then what will the user see on the userinfo page? Only their current settings? Or their settings from the past three months? Or the logs related to their account? Or all of this? Trijnstel (talk) 22:20, 5 April 2015 (UTC)Reply
"... whether a CU was performed on their account?" --> no. This is talking about reflecting back stored information about a particular user to that same user. --MZMcBride (talk) 03:11, 6 April 2015 (UTC)Reply
Trijnstel: Extension:AccountInfo is a manifestation of this RFC, if you're interested. --MZMcBride (talk) 03:21, 6 April 2015 (UTC)Reply
Saper: Yes, hello. Good to see you. Have you read the related bug report (bugzilla:27242)? It may provide some additional context. Regarding a "one-sentence explanation," this page is a draft. Be bold! :-)
You raise good points, but I'm not sure they're responding to this RFC exactly, but more to the general possible surrounding issues. For example, User-Agent strings are provided by the user. In this RFC, we're proposing to reflect back the stored information to the user. This wouldn't really seem to open the door to requests for correcting or removing the information, would it?
cu_log is a related point. What are your thoughts? I imagine it's politically untenable to reveal to users that a CheckUser has been run on their account (as Trijnstel's reply indicates), but perhaps attitudes have shifted enough that it's no longer an unthinkable proposition. If the Wikimedia Foundation were issued a subpoena for my user records, I'd want to know that as a user. Why would a CheckUser action be similar?
MediaWiki has no control over private (secondary) storage of data. That's outside the scope of MediaWiki development, of course.
Access logs are also mostly outside the scope of this RFC, though Gabriel mentioned a very similar point in an IRC discussion about this topic. If we ultimately index access logs in a non-anonymized way (which seems unlikely...), I would have no issue reflecting back our stored information to the same user. Why not?
I hope these answers help or clarify a bit. Please, keep asking questions if you have more. I'm excited that this proposal is receiving some thought and discussion. --MZMcBride (talk) 03:20, 6 April 2015 (UTC)Reply
I am sorry to say so, but your answers are not helpful at all. You answer is mostly "out of scope of the original project" or turning tables on me to provide input. I am frankly very disappointed about your approach to this proposal. And I was inclined to take your proposal seriously, because - for myself - seeing w:IMEI of my mobile phone on the Google page was very enlightening to me, and led me to giving up their services for the most part. You are using expressions like to reflect back which indicate to me to believe that you have just seen a low-hanging fruit ("hey, we have CheckUser database tables!"), but you are very short on ideas on how a possibly better transparency (towards the user) could have been implemented or how it would actually work. Let me tell you bluntly: "to reflect back" in the scope you have given in this RfC is a meaningless expression. I am not sure I could even translate this or explain that goal in my native language. And this should not be a toy coding project (yes, I have actually deployed and tested the proposed extension). Why don't we just, in the interest of ultimate transparency, publish all the logs on the https://dumps.wikimedia.org? Let us not pretend we don't know why we shouldn't probably be doing this. This community has discussed opening cu_log on few occassions (at least on my home project), and found it too sensitive to disclose. I am very open to ideas that might lead us to reducing the need for checkuser or giving it up altogether; I am very supportive to opening up gray areas of the community towards a better transparency; but please do review the questions asked in a fair manner - because depending on the answers there are important follow-up issues to be resolved. Like for example, who should be the contact person of the users if they have questions regarding their data. Even if they just need an explanation of what they see.  « Saper // talk »  23:03, 6 April 2015 (UTC)Reply

Alternate approach: stop storing IP addresses[edit]

If we stopped storing IP addresses and User-Agent strings, we wouldn't have anything to reflect back to users. Further discussion at Exposure of user IP addresses (talk). --MZMcBride (talk) 03:35, 6 April 2015 (UTC)Reply

Oppose. Apart from the fact that CU would be practically useless without IP addresses, this suggestion is also impossible. The CC-BY-SA license requires that every edit and upload is attributed. Without an IP address, what should be added? @MZMcBride: so what exactly are you trying to solve here? Like we asked on phab:T387: "What problem are we solving?" I get the feeling - correct me if I'm wrong - that you don't trust the CUs with the private information they might get. Trijnstel (talk) 10:13, 6 April 2015 (UTC)Reply
Hi Trijnstel. I'm not sure you're in a position to evaluate the legal requirements of the CC-BY-SA license? Are you a lawyer? Attribution can trivially be provided without using an IP address. For example, we could use a random string that's based on the IP address.
Exposure of user IP addresses (talk) isn't really about trusting CheckUsers or not, it's about documenting the current use of IP addresses and evaluating whether the use of IP addresses as identifiers for logged-out users continues to make sense. Many people feel that exposing IP addresses as casually as we do currently in MediaWiki is an unusual practice. (Can you think of another site or piece of software that exposes user IP addresses in the same way?) --MZMcBride (talk) 14:12, 6 April 2015 (UTC)Reply
I partly failed to understand how a similar tool could really help users with security problems, since a Wikimedia account isn't an email account or similar. There are Special:Contributions, Special:Log and it's possible to ask a CU on your account if you suspect a breach, a marginal issue after all.
Your statements confirm it's a matter of principles. As the alternate approach you suggest is practically tantamount to "stop enforcing https://en.wikipedia.org/wiki/Wikipedia:Open_proxies" IIUC, it should be evident there's something wrong with it. Conversely, avoiding IP exposure is a completely separate (and interesting) issue. So the original question remains unaddressed: "what problem are we solving?".--Shivanarayana (talk) 15:11, 6 April 2015 (UTC)Reply
Is an account breach a marginal issue?
If private information is being recorded about me as a user, shouldn't I have the ability to see such stored private information? --MZMcBride (talk) 16:02, 6 April 2015 (UTC)Reply
(out of history) This feature doesn't prevent an account breach and yes, I have to guess it's a marginal issue, otherwise we need to intervene heavily on user authentication. As regarding your IP and UA, they're already available to you through various services so, again, what is the point? --Shivanarayana (talk) 17:02, 6 April 2015 (UTC)Reply
I think that you should also opt for banning anonymous editing (as we will be unable to ban individual IPs, we need to prevent them from editing) and invitation-only registration (as we will be unable to detect spammers and vandals, we need a filter to prevent them from editing) — NickK (talk) 16:16, 6 April 2015 (UTC)Reply
In short: no, you shouldn't [have ability to see the private information we collect]. Sometimes, maybe. For the most part, not.  « Saper // talk »  23:39, 6 April 2015 (UTC)Reply

Are we going back to the times before the CheckUser got implemented? We would need to staff Wikimedia abuse@ team with WMF employees looking at the access_log? Somehow I'd rather stay with checkuser as it is now, while far from perfect.. My unsolicited advice: before tackling this problem, let's allow Tor users to edit Wikipedia freely. Some actual solutions are in sight. And frankly I like the collateral token idea much better than trust metric. If we solve the Tor problem in an acceptable and practical manner, we can think about rolling out something like a blind token to the masses.  « Saper // talk »  23:39, 6 April 2015 (UTC)Reply

On shooting ourselves in the foot[edit]

I've heard a lot of grumbles that are opposed to this RfC (but aren't here?) that mainly revolve around the concept of handing people tools that will expedite/facilitate abuse. While I appreciate, respect, and understand where these viewpoint(s) are coming from, I vehemently disagree.

The argument is that exposing what CheckUsers see will teach sockpuppeteers how to evade scrutiny. I don't think that this is what teaches that. Speaking from the perspective of the English Wikipedia, all you have to do is read w:Wikipedia:CheckUser and then visit Google to learn how to evade CU. You don't have to see your user agent to learn how to change it. The mantra is that "CheckUser is not magic pixie dust" because it's not, and I don't think or expect that exposing IP/UA details to the end user enables anything special. In the long run almost all serial abusers of Wikimedia projects are identified through their contributions, not through CU data, and nothing that is done here will truly exacerbate socking, or make it harder to find. IP/UA abuse already exists and will continue to. I'm not comfortable couching that as an excuse to not do this.

I think the benefits that are provable of account security and sense of ownership that other sites provide, that we are perfectly capable of doing, is worth the perceived risk. I find the rewards nothing but tangible and the arguments against to be largely theoretical. Keegan (talk) 04:46, 8 April 2015 (UTC)Reply