Requests for comment/CheckUser requirements

In T139810 we have come to the conclusion that the best way to revamp the CheckUser extension is to come up with a list of desired features, then decide how to implement them (rewrite the code, patch parts of it, etc.) This RFC allows us to reach consensus about, dare we say, the CU 2.0 (the future CheckUser extension).

Background
Designed c. 2005, the extension is one of the critical tools that helps to deal with problematic cases of abuse such as sock puppetry, vandalism and spam (see history here). As time goes by, the needs of the projects increased, and as we can see in the CheckUser workboard, the bugs accumulate without being resolved, primarily because its code, albeit old, seems also hard to read and work upon it for developers (refer to T132892: CheckUser UI revamp and it's related tasks, as well as the workboard linked above).

Problem
The lack of an active maintainer of the extension lowers the development in this area (i.e., bug resolution, testing and extension development ex Phabricator). One developer describes the CheckUser extension as follows: "the backend design is dubious, and the frontend is pretty archaic."

Proposal
To resolve these issues, we think we need to think about overhauling the CheckUser extension. That overhaul should also be an opportunity to make the extension work with all the new features and the new code MediaWiki has at its current state. We can also take this opportunity to gather opinions from CheckUsers on which new functions the new CheckUser extension should have, etc.

Features needed:


 * Sorting the results by IP (to make it easier to find common IP "ranges") as well as by time (currently, only the latter is supported).
 * Getting a list of distinct user agents (UAs) used by a user.
 * In fact, the "get IPs" function should be replaced by a "get summary" page which shows to you distinct IPs (sortable by time or by IP), distinct UAs (sortable by time or UA), and distinct IP-UA combinations (sorted by time or by IP).
 * Providing additional information about UAs. Something like http://useragentstring.com/ or similar websites, would be the minimum. In the ideal world, the CU tool should analyze the UA and show detailed information like:
 * What browser, OS, etc. does the UA represent
 * What year and month was that particular browser version released, and what year and month was a next version released (I find the date at which a user upgrades their browser a good clue for matching accounts or rejecting their similarity)
 * Does it look like a valid or a forged UA?
 * Providing additional information about IPs. For example, IP's ISP, its country, and the ISP's range. Of course this information changes over time, and is not publicly and freely available. However, the CU code should be modified such that you could "extend" it by providing a CSV file containing IP-to-country and/or IP-to-ISP mappings. That way, we don't need to publish such a mapping as part of the code, but major users of the CU tool such as WMF can pay for proprietary mappings and use them for their wiki.