Topic on Talk:Page Curation

Can we apply a similar anti-vandalism approach to NPP?

15 comments • 14:06, 5 August 2017 6 years ago

15

Σ (talkcontribs)

Wikipedia faces the problem of vandalism via edits, which we cannot solve by restricting edits to registered users. In any case, it has been mostly solved by ClueBot NG. Similarly, with the declining of the AC trial, could we instead have a ClueBot equivalent that patrols new pages for unsuitable articles and applies simple-to-judge CSD tags (I believe they are A1 and 3, G1, 2, 3, and 10)? Σ 03:16, 21 September 2011 (UTC)

03:16, 21 September 2011 12 years ago

MER-C (talkcontribs)

See here.

05:38, 21 September 2011 12 years ago

Σ (talkcontribs)

That does not address my concern. What I am worried about is a vandal slapping a crap reference on an attack page or vandalism page just to get past the checks. As the bot would look at content and nothing more, it would be perfect for dealing with clear cases of the aforementioned criteria, as those are the criteria where borderline cases are less often encountered.

06:59, 21 September 2011 12 years ago

Snottywong (talkcontribs)

I think the appropriate place to request that would be w:WP:BOTREQ, and I think you'd find a ton of editors who would oppose that goes around tagging articles for speedy deletion. It's just too easy for an automated script to make a mistake, and that's a task where mistakes can have big consequences.

14:18, 21 September 2011 12 years ago

Raindrift (talkcontribs)

You have a good point—fully automated checks have some serious pitfalls. However, content-aware bots could be handy too. Once we have a data model in place for article tagging (for tracking things like, "Which patrol checks are completed?" for example), bots could use an API to attach tags like "probable-vandalism" and "probable-spam", maybe even with a score parameter. Then, it'd be possible to search for those articles in the NPP interface, or to configure the zoom interface to show them at the top.

Since those are usually clear-cut cases (vandalism is usually obvious and finding it requires learning only a small amount of policy), they could be a good place to start newbie NPPers.

In some future where we have lots of patrollers and can move to multiple people agreeing on a decision before applying templates (there's got to be a short name for that), the bot's vote could count for a couple human reviewers, speeding up the process.

17:40, 21 September 2011 12 years ago

Steven (WMF) (talkcontribs)

Tagging probably vandalism and spam sounds a lot like what AbuseFilter already does.

19:05, 21 September 2011 12 years ago

Raindrift (talkcontribs)

Good point. :)

20:09, 21 September 2011 12 years ago

Σ (talkcontribs)

Fair point, but according to 28bytes, the filters use more resources.

02:24, 22 September 2011 12 years ago

WereSpielChequers (talkcontribs)

A1 and A3 are in my experience the most likely tags to be incorrectly applied and the easiest articles to rescue. I suspect we could get an abuse filter or admin bot that could handle certain types of G10s, such as any article containing certain phrases and where the author's previous contribution was deleted G10.

19:37, 22 September 2011 12 years ago

Σ (talkcontribs)

Of course there will be false positives - even the almighty ClueBot NG has them.

06:41, 24 September 2011 12 years ago

WereSpielChequers (talkcontribs)

Yes but the rate of false positives needs to be very low for community acceptance. A1 and A3 are very difficult ones for a bot to get right, not least because most of the ones where it is ultimately valid will be manually tagged before a bot would think they were legitimate tags. Also it is not easy for a bot to click what links here and do the other basic checks you need to do before you can decide that an A1 or A3 article can't be salvaged through ordinary editing and does indeed need a deletion tag.

G3 and G10 however I think I could see a bot being coded for. "mmmmm is Gay" and so forth are fairly easy tests that could identify quite a few articles that merit summary deletion.

However this isn't really the place for bot requests. If someone can design a bot that reliably spots G3 or G10 articles then I'm sure there would be interest at the bot request page. The difficult bit is coming up with a design that won't have an unacceptable number of false positives. WereSpielChequers 18:01, 27 September 2011 (UTC)

18:01, 27 September 2011 12 years ago

Kudpung (talkcontribs)

I don't believe the control of A1 and A3 can sensibly and usefully be dehumanised. There is almost always something on the page, and in my own experience, it is fairly rare that they are in fact the beginnings of a serious new article. Whether they are or not, a bot can't replace the cognition of an experienced patroller, or the deleting admin. With even very little content A1 and A3, apart from sometimes being random hits at a keyboard, are often attack, linkspam, or test pages that should be summarily deleted, and fast. I guard against looking for electronic solutions for everything just because Wikipedia is an electronic encyclopedia.

05:09, 29 September 2011 12 years ago

Kudpung (talkcontribs)

WSC, I don't know anything much at all about filters and how they work, but as they obviously function on a vast catalogue of keywords, perhaps, if I've understood the ideas you explained to me, they could indeed be used to red-flag certain new pages at the moment of creation (most especially particularly A1, A2, A3, A10, G1, G2, G3, G4, G10 and if CorenBot is someday up and running again, G12 (Copyvio) , and effect some automated functions such as, just for example:

Deactivate them from live mainspace and put them in a holding bay
Prevent them being indexed by Google
Add them to a category: Pages awaiting admin approval
Signal an alert to the admin dashboard
Leave a message on the creator's talk page something like:

Welcome to Wikipedia. The page you just created may not be suitable for immediate inclusion in the encyclopedia and has been temporarily put on hold. Please contact an administrator from this list, who will review your article in a few moments. Thank you.

Nevertheless, in the absence of the ACTRIAL solution, it may be worth considering channelling all new pages from non autoconfirmed to a much improved Article Wizard, Articles for Creation, or a user sub page/sandbox (with a link to the Video tutorial on creating a user page sandbox, on opening an edit window. from a start page on the lines of Article creation workflow.

Anticipated questions:

Q: Why can't a new Page Patroller do the review of the bot/filter flagged pages?

A: Because empirical findings have demonstrated that many patrollers lack basic understanding of the new page patrol process, their tagging is not accurate, and they often pass attack pages as patrolled (or tag them with a less urgent criterion).

Q: Why should an admin review the page?

A: Because admins can immediately delete such pages without further ado if need be.

Q: Isn't this the same as, or similar to, Pending Changes or vandalism patrol?

A: No. It only affects certain new pages, and those created by non-autoconfirmed users.

Edited by External Link to Interwiki (Bot) 01:37, 27 December 2011 12 years ago

WereSpielChequers (talkcontribs)

Hi Kudpung. I think it would be a mistake to put goodfaith articles into a badfaith stream. So if we can get a filter that puts most G3 and G10 into a separately coloured stream then I think we should just do that. The A ones are supposed to be goodfaith, and anything that is correctly filtered as an A code can sit around a little without any risk of harm - A1 andA3 we are supposed not to tag or delete until their creator has had a few minutes to add the second sentence that makes sense of them.

23:14, 2 October 2011 12 years ago

Kudpung (talkcontribs)

I'm aware of the principle on which they are sorted in to As and Gs, etc. However, my own experience is that the vast majority of A1 and A3 rarely in fact develop into serious articles. I religiously wait 15 - 30 minutes before tagging an A1 or A2 that is clearly not vandalism and when I return to them, it's usually to delete them. Nevertheless, this is not to say that they are bad faith creations. I sometimes find (you won't get NPPers to do this kind of research) that the author has simply abandoned that attempt, to start over by recreating the articlepage under a different page name. They don't know yet about sophisticated mechanisms such as 'moving' new pages, creating redirects, making dab pages, etc., and this is one of the areas where we fail to make the creation of new pages a more welcoming experience.
I well remember the first, and only grave error I made when I started creating pages for the first time: I had misspellt the name of a French wine. It took me ages to find out how to correct or change the title of the page, and I had no inkling that in Wikispeak, move means rename; I finished by doing a a copy-paste move. At that time, nobody had left a welcome message on my tp with that nice catalogue of links to all the tutorials and info pages. Now there's a thought for Jorm's user landing page at Article creation workflow. Welcoming and profusely thanking new editors for all their incredibly invaluable creations is one thing, but perhaps if would be better if new registrations were to get a partially pre-formatted user page that already includes that list of links? People are working on a better user page design, here's one, and I seem to remember Fetchcomms has been working on another, but I forget how to link to it.

Edited by External Link to Interwiki (Bot) 02:07, 27 December 2011 12 years ago