Talk:XTools

Jump to navigation Jump to search

About this board

This page is a feedback forum for XTools. For reporting bugs, it's preferred that you use Phabricator.

If the issue is urgent and you're unable to use Phabricator, feel free to ping one of the active maintainers.

Davidbena (talkcontribs)

I am the creator of the page "Erich Brauer," but when any onlooker checks the edit history and goes back to the earliest date, the article is listed as being created by User:Magk. The reason for this discrepancy is because, before I created the article, User:Mag2k had already made a "Redirect" for a different article, entitled "Arik Brauer," but he had used the name "Erich" for his redirect. How can I alleviate this problem, and have the article "Erich Brauer" shown in my own list of articles created? ~~~~

MusikAnimal (talkcontribs)

This is phab:T182183. Unfortunately it is a difficult problem to solve. MediaWiki has no formal log of when a redirect became an article. As you can see at https://en.wikipedia.org/w/index.php?title=Erich_Brauer&action=info, MediaWiki also claims Mag2k as the page creator. It's simply looking for the oldest revision. XTools works in the same way. There's a proposed hacky workaround at phab:T190065, but I can't make any promises. The issue is really with MediaWiki, not XTools. Sorry!

Reply to "Erich Brauer"

TopEdits tool not working correctly?

3
Summary by MusikAnimal

Resolved

Ceyockey (talkcontribs)

I was looking for the specific edits by Blueboar and Boracay Bill on the page https://en.wikipedia.org/wiki/Wikipedia_talk:Reliable_sources/Archive_21 and both queries returned 0 edits to that page, which is patently wrong -- see this section where at least they have each edited once --> https://en.wikipedia.org/wiki/Wikipedia_talk:Reliable_sources/Archive_21#Overuse_of_%22third-party%22_in_nutsell_and_intro_paragraphs_causing_problems . Maybe "archive" pages are excluded from the index?? Thanks for input. --~~~~

Ceyockey (talkcontribs)

OK - I found the problem. You need to search the ORIGINAL article from which archives are produced in order to find edits by users which are show in ARCHIVES.

MusikAnimal (talkcontribs)

Correct, the edits themselves were made to the original page, not the archive page. The archive pages are merely a copy/paste of the original text. See the revision history.

Make pie chart consistent with table (Page History: Authorship)

6
Summary by MusikAnimal

Deployed

Minderbinder (talkcontribs)

First of all, thank you for providing such a versatile tool. The German language Wikipedia community are discussing the provision of a deep link to the Authorship section of the Page History tool right now, to be displayed alongside every article. Should the vote come to pass, I would expect to see an increased load on the Page History tool in a few weeks' time. So I hope that there is some caching mechanism.

I would like to ask for a change in the Authorship section. Right now, the rendering of the pie chart takes only the percentages of the first ten contributors into account. If an article has a long tail distribution of contributors, this gives the wrong impression. The top ten contributors to the article Angela Merkel have contributed less than 40% of the current total to the article. Yet the pie chart makes it look as if the #1 contributor has contribued more than a quarter to the article, not 11.7 %. Could you please change the rendering of the pie chart so that the remaining other contributors (lower ranked than #10) get one collective slice of the pie, being as large as their combined total? This pie section could be gray, as this connotes lack of detail. In the Angela Merkel example, this slice of the pie would be 61.5 %, or about two thirds. The ten named top contributors would get proportionally smaller slices.

Thank you! (PS: I am not sure whether I get a feedback or ping through this site, so if you want to contact me, better try my de:WP talk page.)

MusikAnimal (talkcontribs)

Hey! There is a dedicated page you could use to show authorship information, e.g. https://xtools.wmflabs.org/articleinfo-authorship/de.wikipedia.org/Neaira%20%28Hetäre%29 . This will show all contributors, however there are caveats (a) limited to 10 colours, which repeat. (b) I will soon limit it to the top 500 editors or so, because if there are more it sometimes fails to load. I assume this is not a problem for you.

At any rate, yes for the main Article Info page where we only show the top 10 contributors, we can add a slice for the remaining contributors, as you suggest. I'll look into implementing this soon.

Thanks for the suggestions!

MusikAnimal (talkcontribs)
Minderbinder (talkcontribs)

Hello MusikAnimal, a big thank you for the change and incredibly fast deployment. The changed graphic is exactly as I had hoped for. Your work has been well received by the authors in de:WP discussing this topic.

MusikAnimal (talkcontribs)

@Minderbinder My great pleasure :) Regarding caching -- unlike the rest of XTools, the Page History tool actually doesn't cache most data (phab:T208543). This is a caveat of its implementation. However we can easily cache the authorship stats. So my question for you is if it would suffice to link only to the dedicated authorship page, e.g. https://xtools.wmflabs.org/articleinfo-authorship/de.wikipedia.org/Angela%20Merkel, and not the full results? This would be faster for you, and less strain on the XTools servers. If the community wants the full Page History results, that is okay too :) Most of the time it will be no problem, but any high-traffic page such as your Village Pump may be very slow to process or fail entirely.

Another thing I wanted to mention: While I greatly appreciate the praise, the authorship stats you see are fetched from a third-party service called WikiWho. Their superb algorithm provides around 95% accuracy. They should get full credit for this :)

Finally, take note of the path-style URL format that XTools uses. Basically, your link should not replace spaces with + signs ("Foo+bar"), instead use normal percent-encoding like "Foo%20bar". If you are using the {{urlencode:Foo bar}} parser function, just use {{urlencode:Foo bar|PATH}}. The other option is to pass in the page title via query string, e.g. https://xtools.wmflabs.org/articleinfo-authorship/de.wikipedia.org?page=Foo+bar.

Minderbinder (talkcontribs)

@MusikAnimal Thank you for your helpful implementation hints. I will not be changing the GUI to include the link myself, that is left to a group of interface-admins. The formal vote on this change runs until May 8. Though there is currently a 3:1 majority for providing the deep-link to your statistics, it would be premature to discuss implementation details right now. I will point the interface-admins to this discussion after the vote has been tallied.

I like the idea of a dedicated authorship page, both to enable caching and to avoid information overload. Can I make a suggestion though: In the non-dedicated section (i.e. https://xtools.wmflabs.org/articleinfo/de.wikipedia.org/Angela%20Merkel#authorship) the table is cut-off after the tenth contributor, in line with the pie chart. The contributions from rank 11 on are summarized with one line, providing number of remaining contributors and their total contrbution in terms of characters and percentage. That is not the case for the dedicated page (i.e. https://xtools.wmflabs.org/articleinfo-authorship/de.wikipedia.org/Angela%20Merkel ), which renders the table with a different cut-off at rank #500. That makes for a very long page, and effectively prevents the viewing of the pie chart when smaller (mobile) screens are used. Who is going to scroll down 500 lines? Besides, for most articles the lower ranked contributions can be for something as mundane as inserting a wiki link etc. So I would suggest to bring the cut-off of the dedicated page in line with the section of the main page. The last line with contributors from rank 11 should be expandable, so if someone clicks on it, a full table should be rendered. This would also help with chaching, I imagine: Each authorship stats page would have to hold about 24 data items only.

On Wikiwho: I am all for giving credit where credit is due, so I will look into contactiing them.

CheckUsers in Admin Stats

2
Summary by MusikAnimal

This should be fixed now

Mz7 (talkcontribs)

I noticed that the CheckUser group isn't shown in the "User groups" column of admin stats. I was wondering whether there was a reason for this. Seems relevant to include, especially if we're looking for active checkusers on other projects, for example.

MusikAnimal (talkcontribs)

It's supposed to show it but this must have broken when we reworked that tool a while back. Filed a task at phab:T213119

Error querying Wikiwho API: Unknown

7
Summary by MusikAnimal

Fix has been deployed

MisterSynergy (talkcontribs)

As User:Minderbinder already reported earlier on this page, the German Wikipedia community held a vote on the question whether the WikiWho tool in XTools should be linked from each article, in order to make article authorship prominently visible to readers. The vote ended five days ago, and the link was subsequently added to the desktop UI page footer via de:MediaWiki:Wikimedia-copyright. Pretty much directly after the link was added, the tool started to fail showing authorship data; instead, it displays an error message Error querying Wikiwho API: Unknown for most of the requests.

There are several users on German Wikipedia complaining about this problem, and there is some speculation that there may be simply too many requests so that a request quota to the WikiWho server might be exceeded most of the time.

Can you please give some insight into the problem? What can be done to fix this situation?

(I have no idea whether someone else has already contacted you; if that is the case, please link to related discussions.)

Magiers (talkcontribs)

Hello, to add from my observations: It seems "Authorship" has always problems at daytime (in Germany), while it starts to work in the evening/night. So it does not seem to be broken completely, but every day temporarily. I would also be pleased, if someone could give insights or maybe even has a solution to the problem. Thanks!

Count Count (talkcontribs)

According to api.wikiwho.net (not linked due to spam filter) there are API limits in place:

"Currently, there is a limit of 2000 requests/day for unregistered users, and also a 60 requests/minute limit for all users."

It is possible that we are now running into either one of those.

MusikAnimal (talkcontribs)

Hey, sorry for the late reply. I think your assessments are probably right... regardless I will contact the WikiWho maintainers and get this sorted out. I am pretty sure they support this initiative. I am on holiday right now so I may not get to back to you for at least a few more days. As far as I can tell XTools is not suffering in any way, so it's up to you if you want to disable the link in the meantime. Apologies for the disruption!

ToBeFree (talkcontribs)
Magiers (talkcontribs)

Thank you MusikAnimal for looking into this topic. I am sure after living so long without the credits of the authors, de-wp can live some days longer without a working tool. So first enjoy your holiday. But XTools is affected too: When the api is not working, then the section "authorship" in the XTools is empty too (and the Top-Editors are not a surrogate, because they don't deliver useful metrics about authorship). It would be great, if the API limits could be deactivated or maybe even as suggested the tool could be transfered under our own responsibility. Greetings.

MusikAnimal (talkcontribs)

This should be fixed now! \o/ Apologies for the long wait. All the best,

help! Articles incorrectly counted as deleted

7
Summary by ToBeFree

Technical deletions (en:WP:G6) like "G6: Deleted to make way for move" are not treated differently than any other kind of deletion by XTools. The reason for deletion is unimportant; the tool treats all deletions equally. An editor has expressed concern that this lack of distinction may make them appear in a bad light.

Technically, the behavior is not a bug; the tool works as expected by its developers.

DrVogel (talkcontribs)

Hi, would you please be able to help? A couple of my articles have been moved, and now the tool is incorrectly showing them as deleted. This makes it look like I have 2 deleted articles: https://xtools.wmflabs.org/pages/en.wikipedia.org/DrVogel. This is clearly a bug in the tool, and it's also really unfair because I'm always extremely careful :( Thanks for your help, ~~~~

MusikAnimal (talkcontribs)

It is not a bug. A page you created was deleted, so XTools is reporting it. It doesn't know if it was for a good reason or bad. If you hover over the "deleted" text it will say the deletion reason, which in your case is "G6: Deleted to make way for move". For both of those articles, it also says "recreated", and there is a message at the top "Some deleted pages may have been recreated by this user or a different user.". This unfortunately is the best we can do. We want to report all deletions, regardless of why they were deleted. Another example, someone created an article and it was deleted for copyright violation (bad), then they recreated it and the new version was kept. We still want to know about the old version that was deleted. Hopefully that makes sense. In short, "deleted" is not necessarily a bad thing :)

DrVogel (talkcontribs)

Hi, thanks very much for your reply. This was not a deletion, it was a move. And it was not recreated, because it was never deleted. The reason it looks like it was deleted is because there was a redirect at the target address, and because it's impossible to simply swap an article and a redirect, the redirect (not the article) was deleted, then the article moved with a new redirect created. So it is not a deletion at all. So what it shows is incorrect. There was no article deleted. Please help to put this right. Thank you.

MusikAnimal (talkcontribs)

There is already some logic to exclude pages that were moved over redirect (and hence caused a deletion), but that is not what happened here. Here the redirect you created was deleted directly (and for good reason :).

You are correct, however, that these were redirects that got deleted, not full articles. Unfortunately it is not easy (or at least performant) to detect if deleted revisions were redirects. It may not even be possible, not sure... The issue being that there is no formal log of when redirects are created. There are some tags, so we might be able to go off of that... maybe. I've created phab:T190065 for this, feel free to follow it for updates. However I can make no promises that it will get resolved :/

Overall I don't see a big issue here, because one can simply hover over the "deleted" text to see why it was deleted, and in this case would immediately be able to tell it was a deletion to make way for a move.

DrVogel (talkcontribs)

It '''is''' a big issue, because it puts people who are very careful and create valuable content, on a par with people who add rubbish and spam to Wikipedia. That's pretty terrible.

The only reason there was a redirect that got deleted, was that, because I'm so careful when I create articles, I always make sure that all possible spellings etc are also created as redirects. Another user then decided to move one of my articles (over a redirect for a name that I had foreseen, hence the existence of the redirect), and that is why it shows (incorrectly) as a deletion.

Do you realise that this basically punishes me for doing the right thing (creating redirects for all possible alternative spellings), when I would have been better off being lazy and just creating the article?

How can it be a good idea for the project to make people be better off being lazy?

Come on, you have the power to put this right, please use it. Discouraging people who create valuable content can't possibly be a good idea.

MusikAnimal (talkcontribs)

No, I think you've got it all wrong. Deleted != bad, period :) What if you G7'd some of your own articles? Or G2 (test)? Those are innocent, but should still show up, no?

XTools just reports what's there. It is not any sort of an authority, nor is it a way to definitively measure an editor's success. It just gives the data. It's up to the user to decide how to interpret it. For instance, if you were applying for autopatrolled privileges, any admin will know to check the deletion log. We've tried to make this easier by providing the deletion summary when hovering over "deleted".

As I said I can look into phab:T190065, but it will be very challenging, and only handle your edge case. I'm still not sure it's even possible (a lot of data around deleted revisions isn't present on the Toolforge database). Even if we did manage to do this, it wouldn't work for historical data, because the redirect tag system was only recently introduced (roughly December 2017). Otherwise we'd have to go off of the content of the deleted revisions, which is fragile and moreover only accessible to admins, so XTools can't do it :(

There are certainly some improvements that can be made, and we'll do our best to get those done. At the very least, I hope I've convinced you that we're not trying to punish or discourage valuable editors such as yourself. That's downright silly!

Thanks for your understanding :)

DrVogel (talkcontribs)

Clearly, if the tool is counting a name swap between an article and a redirect as a deletion, it is wrong.

Seppi333 (talkcontribs)

I'm interested in knowing how many edits I've made pages I've edited in the article namespace. I realize the Top Edits tool could be used to determine this (for most editors) by simply counting the entries; however, since I've edited more than 1000 articles, this list is truncated. I noticed that the Edit Counter tool displays a count of the total pages edited under the "General statistics" heading in Pages edited (total), but not for specific namespaces.

Would it be possible/feasible to add a count of pages edited by namespace to the Edit Counter tool and/or increase the maximum number of pages returned in the Top Edits tool beyond 1000? I imagine the latter would be easier to implement and prefer it over the former since I'm also interested in looking at the frequency distribution of my edits in the article namespace.

Thanks for your consideration.

Seppi333 (talkcontribs)

Also, I realize some editors have edited tens of thousands of pages in the article namespace, so some truncation limit is necessary in the Top Edits tool. It would probably be useful to provide those editors with a summary of the truncated data (this is a third alternative for implementing what I've requested above); i.e., append a statement like either of the following:

  • Limited to the first 1000 entries. Username performed X more edits across N more pages in this namespace.
  • Limited to the first 1000 entries. Username edited N more pages in this namespace.

Summarizing the X number of edits in those N remaining pages isn't really necessary since one could determine that manually (i.e., subtract the total edits in the namespace by the sum the edits in the first 1000 entries), but that would probably be helpful for some.

MusikAnimal (talkcontribs)

Thanks for the suggestions! If Top Edits is the preferred tool, I think the ideal solution would be to paginate the results if there are more than 1,000. This is how the Pages Created tool works. Adding a count of the unique pages edited in a namespace is something a bit easier to do, and I don't think it will be really slow. I have created a ticket for these features at phab:T218531. We also plan to add date range filtering to Top Edits, along with the Edit Counter, which may be helpful in your case. That is tracked at phab:T202552. Regards.

Seppi333 (talkcontribs)

Thanks! I really appreciate it. Filtering the output by date is a novel solution to the truncation problem. Edit: I agree, pagination would be the best way of addressing this.

Anyway, kudos to you guys for creating a really useful set of edit analysis tools.

Reply to "Articles edited (total)"
Evolution and evolvability (talkcontribs)

Would it be possible to calculate this as the root square to measure absolute differences?

I.e. net change average of +3, -4 and -5 = ±4

Otherwise stats like average edit size can be skewed by deletions and lead to misleadingly low (or even negative) changes.

Alternatively ignore all -ve values when calculating and report average addition size and average reduction size or equivalent.

MusikAnimal (talkcontribs)

Maybe you're talking about variance? If so then yes, this should be an easy change thanks to MySQL built-in functions. They also have a function for standard deviation.

MusikAnimal (talkcontribs)
Evolution and evolvability (talkcontribs)

Although variance is the most mathematically reasonable, it is possibly a little technical for many.

I'd actually intended a simpler-to-interpret measure of average byte change whilst ignoring whether its +ve or -ve. For example:

  • edit A, +2 bytes
  • edit B, +3 bytes
  • edit C, -4 bytes
  • edit D, -5 bytes

Average

Average change

Average addition size

accounting only for addition edits A and B

Average reduction size

accounting only for deletion edits C and D

For the four edits above, the average is -1, which is tricky to interpret. Do they just make small deletions? Do they make large deletions and additions that happen to almost net cancel? By splitting the additions and reductions, it becomes possible to see whether they e.g. mostly make lost of small additions with the occasional massive deletion.

MusikAnimal (talkcontribs)

So much math! =P The issue here is we are bound by what MySQL can do for us. Otherwise we have to pull in the edit size of every edit, and run our calculations, which will consume too much memory for some users. I can say of the options you've laid out, the average addition and reduction size should be doable, assuming the query is still fast enough. Note we do show the number of small edits (< -20 diff size) vs large edits (> 1000), so I hope that in a way also gives an idea of the size of edits a user typically makes.

Seppi333 (talkcontribs)

I think the average addition size and average reduction size would be interesting statistics to know, but being a statistician makes me more of a dork than the typical editor. The edit size distribution for most editors is probably highly positively skewed, so it might be worthwhile to report to median addition size and median reduction size (either as an alternative to or in addition to those averages).

Reply to "Average edit size"

Pages Created not seeing assessment for one page

3
Kenirwin (talkcontribs)
MusikAnimal (talkcontribs)

It's probably pulling from WikiProject United States because it's the first assessment record in the database, and that one doesn't have a specified "class" ranking. I think we implemented it in this way because we tried to get pages created + assessments all in the same query. On second thought, it shouldn't slow things down much to make this into two different queries, such that we can get the first non-null result for class assessments. I'll look into it!

Kenirwin (talkcontribs)

Thanks!

Ken

Reply to "Pages Created not seeing assessment for one page"
Summary by MusikAnimal

This should be fixed.

Manvydasz (talkcontribs)

Why in section Admin Score edit summaries used in the article space and edits to the article space count is same for all users?

MusikAnimal (talkcontribs)

Edit summaries are measured only in the mainspace because that's a strictly collaborative area. For instance you generally aren't expected to use edit summaries in your userspace.

Your second point sounds like a bug. This tool admittedly has not been given much attention. I'll try to look into it soon.