Extension:CentralAuth/Shadow users

From mediawiki.org

A longstanding issue with Extension:CentralAuth is the automatic creation of local user accounts when visiting a wiki for the first time with a global session open.

There are privacy issues with the automatic logging (as well as exposure of that info through other means), hence the long drama on bugzilla:19161, and the split-out bugzilla:28227 covering the issue of dealing with the lack of the logging being a potential issue for local administrators on the watch for evil usernames.

Some of these issues could mostly go away if we could remove the requirement to create a local 'user' table entry, or at least if we could delay it until some action required it.

User objects can already exist and do active stuff without an assigned id; we use them for anonymous users, where they carry an IPv4 or IPv6 address as the username but have no id.

Things that may need changing:

  • conflation of having an id and being 'logged in': $user->isLoggedIn() & $user->isAnon() just check the id; some other callers may be doing the same check explicitly and will need to be updated
  • auth stuff might be making id-based assumptions; check if CentralAuth requires live IDs for attachment info or if we can get away with an alternate
  • may need an on-demand way of getting a user saved to the local db when it does become time to use it: when asking for the ID number would seem reasonable, but there's likely many times where this gets asked in a viewing situation. Do we need an explicit save ping? Core and extensions would need updating to use it if so.

Alternates[edit]

Global user ID storage[edit]

What if we break the assumptions of how we store user id references to begin with?

Currently, user references are stored in databases as an (id number, name) pair, where the id number is canonical for live accounts, and the name is canonical for non-accounts (eg, anonymous IP addresses):

rev_user=1234 rev_user_text='Jimmy Wales'
rev_user=0 rev_user_text='192.168.12.34'

the two major problems with this: we need a local user record to store the reference, and renames require us to go update the text portion of the record.

Instead of storing local integer user IDs, we could store a GUID-style marker that ties into CentralAuth

rev_user = -1
rev_user_text = 'centralauth:12179caec26a089cabcbb75c4dbe0bdfe60951f7@wikimedia.org'

Good:

  • This forces us to be more consistent about actually looking up the references, which lets us abandon the old slow parts of user renaming
  • gives us independence from the local user table for a lot of stuff, without having to alter the revision table structure

Problems:

  • doesn't handle other cases like watchlist table that only store a user id.
  • guid string is likely longer, eating more space/memory
  • consistency in lookups?
  • API stuff may still make a lot of id assumptions, could need new considerations