Topic on Talk:ORES/Draft topic

New article review funnel dynamics

EpochFail (talkcontribs)

Currently, the processes that support reviewing new article creations in Wikipedia include the New Page Patrol (supported by the Page Curation Tool) (NPP) and the Articles for Creation (AfC) working group. The only substantial difference between these two processes from a workflow perspective is where they operate and the implications of that space. NPP operates in the main article space and thus is motivated to make sure problematic new page creations don't last long because they show up as part of the encyclopedia. AfC operates in the Draft namespace where new articles are not linked from the main encyclopedia and are not indexed by search engines.

While AfC and NPP are somewhat independent, they largely work in parallel as a single line of defense against the introduction of spam, vandalism, and other types of non-encyclopedic content. In the case of both AfC and NPP, a small group of volunteers is responsible for making a wide range of different types of judgement calls about a new article. Some judgments are straightforward and require little expertise: e.g., Is this spam or vandalism? Others require a nuanced understanding of a topic area: e.g., Is Ann Bishop a notable biologist or not?

Historically, new article review has been an overwhelming problem. Both AfC and NPP maintain backlogs with tens of thousands of articles[1] Recently, NPP has become so overwhelmed with the backlog and the potentially negative effects that it has been having on the quality the encyclopedia that they proposed that Wikipedia change policy and restrict the creation of new articles from new editors[2]. This has largely had the effect of re-routing new article review work from NPP to AfC[3], but it has not addressed the backlog at all.

English Wikipedia's multi-stage review funnel for edits to current articles.
English Wikipedia's single-stage review funnel for new articles creations.

But not all review processes result in such backlogs. It's interesting to compare the review process for edits to current articles (edit review) with the review process funnel for new articles. In the case edit review, a dynamic multi-stage filter is implemented that implements a distributed cognition process[4][5]. Geiger and Halfaker describe the multi-stages process where AI-augmented robots form the first line of defense, then AI-augmented human-computation tools catch less obvious damage, then finally, editors from across the encyclopedia review changes to articles that they're interested via watchlists. Through this multi-stage process, all edits are reviewed and no backlog forms.

So why doesn't the new article review process look more like this? Before our work, there was no AI support for detecting problematic new articles (for bots or human-computation) and there were no effective routing mechanisms for distributing the workload across subject matter experts.

  1. E.g. see discussions here:,800
  2. en:Wikipedia:Autoconfirmed_article_creation_trial
  4. Geiger, R. S., & Halfaker, A. (2013, August). When the levee breaks: without bots, what happens to Wikipedia's quality control processes?. In Proceedings of the 9th International Symposium on Open Collaboration (p. 6). ACM.
  5. Geiger, R. S., & Ribes, D. (2010, February). The work of sustaining order in wikipedia: the banning of a vandal. In Proceedings of the 2010 ACM conference on Computer supported cooperative work (pp. 117-126). ACM.
EpochFail (talkcontribs)

Ping: User:Sumit.iitp. Not my best writing. But it's some progress. I'll have more tomorrow.

Sumit.iitp (talkcontribs)

Benefits of learning from the WikiProjects directory?

The directory itself is a result of hand crafted categorization of topics. Our model uses this categorization to generate a machine-readable topic tree which forms the base of our predictions. Having this dependence of the model on the structure of the topic labels, we allow our model to be easily rebuilt and trained as and whenever a change to topic hierarchies is desired. Easy, because the users need not go and change labels in any database, but can directly update the directory page, which they are already familiar with. While rebuilding, our model will generate the new topic tree from the updated directory page, extract labels, and train on those labels all in a single pipeline.

Reply to "New article review funnel dynamics"