Talk:User account types

From mediawiki.org
Latest comment: 10 months ago by Neil Shah-Quinn (WMF) in topic Are temp users unregistered/anonymous?

Are temp users unregistered/anonymous?[edit]

@Matma Rex, @Whatamidoing (WMF), @RHo (WMF): I've been looking into this terminology since Data Engineering is discussing how to capture the new account type in their schemas and specifically, whether a field named user_is_anonymous should be true or false for temp users. This in a table that is meant to be made public on Cloud Services in the future, so it seems important to have the most consistent terminology possible.

Initially, I thought that it should be false, since to me "anonymous" is a synonym of "unregistered" and, from a MediaWiki-internal point of view, temp accounts will be registered in the user table. This table supports that view.

However, after looking at how people are using the terms now, I think the vast majority of folks would choose to define "anonymous" (and "unregistered" and "logged-out") to mean either temp or IP. Here are some points:

  • The main Meta page uses both "unregistered" and "non-logged-in" to mean temp or IP ("IP masking hides the IP addresses of unregistered editors", "In the future, before a non-logged-in user completes an edit, they will be informed that their edits will be attributed to a temporary account.", etc.)
  • When I look at the discussion page on Meta, all the people using "anonymous" use it in a way that includes temp accounts ("Should IP Masking [be] a veil for anonymous users, but still transparent for patrollors?", "anonymous users using temporary accounts can never use the very same temporary account when on multiple devices").
  • Comparing the temp and IP columns in your very helpful table, their capabilities are very similar to each other and very distinct from registered users, so they naturally form a group.

I realize this does not totally mesh with the in-code terminology. For example, from task T332205:

  • User::isRegistered() will return true for all registered accounts, including temporary accounts.
  • User::isAnon() will return false for such temporary accounts.
  • User::isTemp() will return true exclusively for temporary accounts.
  • User::isNamed() will return true exclusively for registered accounts that are not temporary accounts.

According to this, only IP accounts are anonymous. However, we are also deviating from the in-code terminology when we say that temp users are not registered, so I think we have already lost perfection in terminology.

So, based on this, I'm inclined to update the table to make "anonymous", "unregistered", and "logged-out" an umbrella term for IP and temp users and then add information about the cases where this doesn't match the User methods. What do y'all think?

cc @MKampurath (WMF), @Milimetric (WMF). Neil Shah-Quinn (WMF) (talk) 01:27, 18 May 2023 (UTC)Reply

I agree – I think the term "unregistered" applies to both IP and temp users. I wrote this page before people started really talking about them much, so it didn't match real world usage.
We should document it that way despite the problem you noted that in the code, we treat temp users as a "subtype" of registered users; that's just a quirk for reasons of backwards-compatibility.
I would avoid the term "anonymous" when possible. It was widely used to describe IP users in the past, even though they were less anonymous than the registered users in some ways. Temp users are more anonymous now, but you try explaining that. I expect that folks will use it for both IP and temp users despite my advice ;), so we should document it on this page, but I would rather not use it in any APIs, databases, etc.
The term "logged-out" is a bit weird, since you can log out of your temporary account. But it's a synonym of "unregistered" in practice, and there are people who have registered, but just edit logged-out, and I don't think we have a better term for this. Matma Rex (talk) 12:37, 18 May 2023 (UTC)Reply
@Neil Shah-Quinn (WMF), by, say, February 2024, I hope that all three accounts types will be in use on WMF wikis. Is there a chance that you will want to use this to compare and contrast all three? If so, we could consider replacing user_is_anonymous entirely, with user_is_IP and user_is_temp. Using "anonymous" to describe an editor whose real-world location may be readily discernable has long been disputed by English-speaking volunteers. Perhaps we should retire that imprecise and misleading language entirely.
(@Danilo.mac, it's possible that this discussion would be interesting to you, since it touches on how some account data is stored.) Whatamidoing (WMF) (talk) 19:55, 19 May 2023 (UTC)Reply
In my point of view (a volunteer), when an user use an username that they chose in the registration, they have a name, that we can trust they will always use if they are a good-faith contributor. And when an user is identified by an IP or a number they didn't choose (temp user), they don't have a name, they are anonymous. Even if they are a good-faith contributor we can not trust the contributions of that user is all aggregated in the same contribution page, and the warnings the user received are all in the same talk page. We can not be sure of all that user did because they don't have a fixed name, that is what I think about the term anonymous. For me it would be confuse to not call a temp user as anonymous. I didn't know other people had other meanings for that that term. Danilo.mac (talk) 23:15, 19 May 2023 (UTC)Reply
Complementing my opinion, about how the data is stored, I didn't like the idea of store temp users in user table, I have tools that use WHERE actor_user IS NOT NULL to get the registered users (users who filled the registration form and have fixed usernames), it would make more sense for me if actor_user was NULL for temp users and we use WHERE actor_user IS NULL AND actor_name LIKE '?%' to get only temp users, and have a separate table to store temp users data. But that is an opinion of who only develop tools in Toolforge and not contribute directly in the MediaWiki code, and I didn't read the phab tasks, I may be missing some details that make that idea not possible. Danilo.mac (talk) 01:04, 20 May 2023 (UTC)Reply
In case this helps - we are adding a user_is_temp column to the user table which you could use to differentiate between temp users and other (registered) users. Here's a ticket that explains this work: https://phabricator.wikimedia.org/T333223
Would that help with your tools? -- NKohli (WMF) (talk) 13:05, 21 May 2023 (UTC)Reply
+1 to what @Matma Rex and @Whatamidoing (WMF) said. I have no attachments to "non-logged-in" terminology - I think we could replace it by "unregistered" on the project pages. And I would like us to refrain from using "anonymous" for the reasons MatmaRex mentioned. I like the suggestion to replace user_is_anonymous with user_is_temp and user_is_IP. NKohli (WMF) (talk) 12:59, 21 May 2023 (UTC)Reply
Thanks everyone for the input! It sounds like everyone agrees that temp users should be considered a type of unregistered/logged-out/anonymous users. I've updated the table to reflect this and to hopefully make it a bit easier to read.
I get what y'all are saying about the problems with the term "anonymous". I have heard about these problems before and, if I had been in charge of naming, I probably would have chosen user_is_unregistered for that reason. I'm not sure if it's worth deprecating the existing field, but definitely when we add fields to make sure we can distinguish all three types of users, we will use the preferred terminology (unregistered, IP, temp, registered). This page makes it a lot easier to do that! Neil Shah-Quinn (WMF) (talk) 20:01, 22 May 2023 (UTC)Reply
I've also just updated the table to document the behavior of the four User functions from MediaWiki. Neil Shah-Quinn (WMF) (talk) 20:09, 22 May 2023 (UTC)Reply
Seeing as temp users are registered users, I think it would be incorrect to call them 'unregistered'. However, perhaps changing the definition of 'anonymous' could be worthwhile. Anonymous could include both 'unregistered' and 'IP/temp' user.
This would be simple from a data perspsective, since there is no concept of 'anonymous' in MW data models, just 'registered' and 'not registered'. Historical anonymous == not registered, but perhaps the code should be changed to that User::isAnon() returns true for temp accounts. Ottomata (talk) 21:08, 22 May 2023 (UTC)Reply
See also T336176 - MediaWiki user types Ottomata (talk) 21:10, 22 May 2023 (UTC)Reply
@Ottomata as I mentioned in my first post, I came into this discussion with the same idea, that temp users should be considered registered because they have rows in the user table and because User::isRegistered() returns true for them.
But @Milimetric (WMF) and @MKampurath (WMF) had a different idea: that temp users should be considered unregistered, because from a user point of view they have almost everything in common with IP users and almost nothing in common with traditional registered users (as @Danilo.mac said above). Also, from a semantic point of view, they have not, in fact, gone through the registration process!
Eventually, I came around to their point of view and came here to see if others agreed as well. @NKohli (WMF) and @Matma Rex did, which is why I changed the page to reflect it.
From this perspective, the rows in the user table are an irrelevant implementation detail and User::isRegistered() returning true is just a "quirk for reasons of backward compatibility", as Matma Rex said.
Overall, I think it makes sense to declare that temp users are unregistered; to make code, documentation, and data reflect this wherever possible; and to document it as a historical quirk whenever that isn't possible. Neil Shah-Quinn (WMF) (talk) 00:32, 23 May 2023 (UTC)Reply
Aye, I think your conclusion of what the term should mean makes sense. But, having the same term mean opposite things will end up confusing a lot of people. It might be worth gathering support from more MediaWiki core devs, and deciding and documenting this decision there, rather than just making it externally. T336176 - MediaWiki user types might be a good place to do that.
We just finalized the page_change schema and stream, which will have is_registered: true and is_temp: true for Temp users, in data that we use to generate new dumps, generate search indexes, and in event streams we will publish publicly. Ottomata (talk) 00:43, 23 May 2023 (UTC)Reply
@Ottomata I see what you're saying and I agree that it's important that everyone be on the same page here.
The user type ticket has a lot of other issues mixed in, so I wouldn't go there. It seems like the discussion is already flowing here, so how about we attract some more participation here? Post on the Phab tickets you mentioned, ping the relevant Foundation teams, maybe even post of the wikitech list? Neil Shah-Quinn (WMF) (talk) 00:51, 23 May 2023 (UTC)Reply
Accountless? :) Elitre (WMF) (talk) 09:40, 23 May 2023 (UTC)Reply
+1 2603:7000:8B07:3111:44A9:9100:9E1C:6B7B 10:29, 23 May 2023 (UTC)Reply
Oops, hah that was me replying 'anonymously' from my phone. :p Ottomata (talk) 11:53, 23 May 2023 (UTC)Reply
FYI, we are going to remove is_registered from the page change event schema. Ottomata (talk) 16:08, 24 May 2023 (UTC)Reply
If User::isRegistered() and User::isAnon() cannot be changed for backward-compatibility reasons, what about deprecating them? A deprecation process would also eventually break backward-compatibility, but this breakage would result in PHP errors instead of silently subtly different behavior, and people using them would be warned now (if they use a capable source code editor) but hit by the breakage only later. —Tacsipacsi (talk) 13:33, 23 May 2023 (UTC)Reply
> what about deprecating them?
I think this is a great idea. This should probably be something that the IP Masking project drives. @NKohli (WMF)?
I was just considering removing the is_registered boolean from the page change event altogether. One could still check for user_id > 0 to get the same semanatics. Ottomata (talk) 14:03, 23 May 2023 (UTC)Reply
I agree with deprecating if possible. My suggestion would be to deprecate and keep:
  • User::isIp() would be true only when the IP of the editor is revealed
  • User::isTemp() would be true only when the user account is temporary (the user_is_temp flag is set in the user table)
With these two methods we could look at any revision past, present, or future, and have a very clear way to tell what type of user is involved. (I'm ignoring other stuff like isSystem for the purpose of this conversation) Milimetric (WMF) (talk) 15:20, 23 May 2023 (UTC)Reply
"Revealing" an editor's IP is new jargon for a non-CheckUser finding out what IPs have been used by a given a temp account, so please re-word that first bullet point before it gets recorded anywhere official/permanent. Whatamidoing (WMF) (talk) 18:12, 23 May 2023 (UTC)Reply
@Whatamidoing (WMF): When a user is editing without logging in, currently, their IP is recorded in the revision history and is visible on the "View history" tab. Is there better terminology to refer to that specifically? Milimetric (WMF) (talk) 18:46, 1 June 2023 (UTC)Reply
Perhaps "is displayed" or "is shown" or "is recorded in"? At least for the View history tab, I think that's what we would say, e.g., "The username of the editor who made that edit is shown on the history page". Whatamidoing (WMF) (talk) 02:43, 6 June 2023 (UTC)Reply
Mentioned this to @Daniel Kinzler (WMDE) and he reminded me that MediaWiki is not getting rid of IP users, only WMF is. We can't deprecate or change the semantics of User::isRegistered, as it will continue to be used outside of WMF. Ottomata (talk) 19:04, 23 May 2023 (UTC)Reply
Why couldn’t we? According to the table, User::isRegistered() === (User::isNamed() || User::isTemp()), so wherever currently User::isRegistered() is used, User::isNamed() || User::isTemp() can be used instead. I assume User::isNamed() and User::isTemp() will be available on any MediaWiki install, regardless of whether IP masking is enabled or not, simply User::isTemp() will always be false if it’s not enabled. —Tacsipacsi (talk) 19:46, 23 May 2023 (UTC)Reply
Indeed. Another alternative for User::isRegistered() is User::getId() !== 0, which would probably be the best in code that wants to check "this user has a row in the user table" (which I'd argue is the real meaning of isRegistered). Matma Rex (talk) 12:36, 24 May 2023 (UTC)Reply
I agree that third party use of User::isRegistered shouldn't affect deprecation plans. There's a proper replacement that makes dealing with the new concepts (temp users) easier. If your MW instance doesn't intend on enabling temp users, the new abstractions still help you as you install extensions and operate in the ecosystem. Milimetric (WMF) (talk) 18:52, 1 June 2023 (UTC)Reply
Instead of deprecating, could we just change the semantics of User::isRegistered? If a MediaWiki install does not use Temp users, nothing will change. But, if temp users are enabled, then User::isRegistered == user_id > 0 && !user_is_temp
? Seems like that would be backwards compatible? @NKohli (WMF)? Ottomata (talk) 19:07, 23 May 2023 (UTC)Reply
No, changing the semantics is exactly the subtle yet backward-incompatible change we want to avoid. Code may do any sorts of assumptions about User::isAnon() and !User::isRegistered() that doesn’t hold anymore, for example that these users have no entries in the user table, that they’re parseable as IPv4 or IPv6 addresses etc. —Tacsipacsi (talk) 19:46, 23 May 2023 (UTC)Reply
Yeah you are right. I think this is why it might be hard to deprecate too though. Of course it is possible, but my understanding from Daniel is that IP masking explicitly needed temp users to be 'registered' users, so that a bunch of existing code and tools (Talk pages, others?) didn't have to be changed.
But, you make a good point. @Neil Shah-Quinn (WMF) What about just using the existent term isNamed vs isTemp, and avoiding all references to isRegistered/Anonymous? Ottomata (talk) 12:24, 24 May 2023 (UTC)Reply
Honestly, I don't think that would be a great outcome. I'm really interested in making the general terminology (as opposed to the very granular in-code terminology) here as simple and accessible as possible, and adding a totally new term goes against that. As much as possible, I want these terms to used consistently in high-level metrics, datasets, documentation, the interface, user conversations, and so on, and it's much, much easier to have few developers change their definition of "registered" than to have a huge group of users replace a familiar term ("registered") with an unfamiliar one ("named").
I know you're under time pressure here because you're working on mediawiki/page/change, but I'm willing to try to drive this conversation to a broader consensus, and I hope that can be done in no more than a week. Neil Shah-Quinn (WMF) (talk) 19:53, 24 May 2023 (UTC)Reply
FWIW, we are going to remove is_registered from mediawiki/page/change to avoid this issue for now, so from my perspective we have more time. Ottomata (talk) 20:15, 24 May 2023 (UTC)Reply
All makes sense. IMO a worse outcome is different conceptions of a singular concept.
You might try to keep mediawiki data out of high level metrics and documentation, but if dataset/metric pipeline developers have to add code that does metric.user.is_registered = mediawiki.user.is_registered && !mediawiki.user.is_temp, I think you are going to have a hard time ensuring that this is true. Ottomata (talk) 20:18, 24 May 2023 (UTC)Reply
Changing the semantics of a method like User::isRegisteres is really problematic - if anything, we should deprecate the method, and replace it with something that has a better name and more clear semantics. However it should be noted that this method is used a **lot** - codesearch finds 390 files that contains references to it in production across core and various extensions. This is work that will have to be explicitly resourced.
That said, I think "registered" is the smaller problem. The use of "anon(ymous)" is much harder. The User::isAnon method isn't used as much as User::isRegistered, but the term anon is used in public APIs. Changing public APIs is much harder and slower than deprecating PHP methods.
Beyond the names, there are features that currently rely on making distinctions based on whether the user ID is 0 or not - e.g. the "registered user" filter on Special:RecentChanges. This would have to be re-implemented to work with the new logic (which may require changes to the database structure to allow for efficient filtering). Do we somewhere have a list of features that would need to be re-implemented to account for the changed meaning of registered/anonymous? DKinzler (WMF) (talk) 13:30, 26 May 2023 (UTC)Reply
I agree. And change the API can result in malfunction in bots and external tools. I don't know how far would be the remove of "anon" terminology, but we have many uses beyond the User::isAnon() and API. Database has ipblock.ipb_anon_only column, where temp user would be anon. Javascript has mw.user.isAnon(), that is used in many gadgets and scripts. The MediaWiki configuration variable $wgDisableAnonTalk. Extensions configuration variables like $wgVisualEditorDisableForAnons and $wgAbuseFilterAnonBlockDuration. And many documentation pages say "anonymous users" meaning those users who didn't fill the registration form. Danilo.mac (talk) 14:19, 28 May 2023 (UTC)Reply
The big question is - what will break more things, treating "temp" users as "aonon" in the API, or not treating "temp" usres as "anon" in the API? We will have to pick one, but neither option seems to be particularly appealing. DKinzler (WMF) (talk) 18:27, 28 May 2023 (UTC)Reply
The tools that use the API to get "anon" users usually want to get the logged-out users, so they probably will not break if temp == anon. And the same with js gadgets and scripts that use mw.user.isAnon(). Danilo.mac (talk) 13:28, 30 May 2023 (UTC)Reply
Yeah, similar to API users, I (and other data folks) are guessing downstream datasets will want to keep the umbrella "anonymous" abstraction on top of both IP and Temp users. But internally we're distinguishing temp users with the new user_is_temp column (thank you for that). Because in the beginning, users of wikistats or other dashboards won't have any use for the distinction. Then we'll study the new data coming in, and if it makes sense to pass on the distinction, we'll update all downstream pipelines. Milimetric (WMF) (talk) 19:09, 1 June 2023 (UTC)Reply
(I am really curious what will happen to the number of unique editors each month. Will it go down, because IP addresses are variable? Or up, because cookies are ephemeral?) Whatamidoing (WMF) (talk) 02:48, 6 June 2023 (UTC)Reply

FYI: Discussion about this has moved to phab:T337103. The leading proposal is now that we should treat temp users as an entirely new category which is neither unregistered/anonymous or registered. More input is of course welcome. Neil Shah-Quinn (WMF) (talk) 20:00, 27 June 2023 (UTC)Reply