Topic on Talk:Growth/Personalized first day/Structured tasks/Add a link

Experiment visual comparison

5
Czar (talkcontribs)

Perhaps I missed it somewhere but what is the visual difference between what the three experiment groups (Groups A, B, C) experienced? Is there a mock-up somewhere? Particularly curious of what the "unstructured" interface looked like since its revert rate was so much higher.

MMiller (WMF) (talkcontribs)

@Czar -- good question. Here's an explanation of the three groups:

  • "Add a link" (40% of new accounts): these users all have the new structured task as their default suggested edit in their feed. In the image, you can see what a user in Hungarian Wikipedia sees after selecting an article from the feed and going through three dialogs that explain the task.
    Add a link experience on desktop
  • "Unstructured" (40% of new accounts): the legacy task is sourced into the suggested edits feed by the presence of a maintenance template. After the user selects it, they are brought to the article where a window pops up explaining that they should add links. They can see the maintenance template at the top and also a blue dot encouraging them to click edit. But they are not guided through the edit -- they have to find their own way. Therefore, it's easy for the newcomer to do the edit wrong or try out a more ambitious change than just adding a link, leading to them being reverted. And interestingly, 25% is about the same revert rate as for the edits that newcomers make outside the Growth features in the four wikis that are included in the data. In English Wikipedia, for example, the revert rate in July 2021 for edits made by accounts less than one month old was 26%. The image shows the experience in English Wikipedia.
    Unstructured link experience on desktop
  • Control (20% of new accounts): these users don't have any Growth team features at all. When they create their accounts, they just land right back where they were before with no homepage or suggested edits. We keep a constant control group so we can always measure how the Growth features are improving the experience.

Does this all make sense? What do you make of the difference in revert rate?

Czar (talkcontribs)

That comment that a structured task is less free-wheeling than an unstructured edit and thus is less likely to be reverted sounds like a great hypothesis to me!

I suppose it also depends on how (if?) "random" edits are patrolled on those wikis. I haven't done anti-vandalism patrol in a while but my understanding is that the tools are showing editors cases in which there is an unclear but high likelihood of vandalism, which I'd suppose is rarely the case with innocuous linking. These kind of edits can fly below the radar, leading to a lower revert rate. As a proxy for community acceptance, I think that's as good a measure as it realistically gets outside of a controlled environment.

MMiller (WMF) (talkcontribs)

I'm glad you brought up ORES classifications, @Czar -- it made me want to check how ORES feels about these edits, and whether it does, in fact consider them high quality. I went to Russian and French Wikipedias as a spot check, and filtered the Recent Changes feed to just "add a link" edits, and then colored them by their ORES quality classification (going best to worst with green, yellow, orange, red). I was surprised by the results: in Russian, though there is no red, there is plenty of yellow and orange, so it seems that ORES is suspicious. And in French, it's basically all yellow, meaning ORES is ambivalent about the quality of the edits.

I'm not totally sure what to make of this yet, but it is pretty interesting -- I'm tagging @MGerlach (WMF), the creator of the link algorithm in case he wants to think about this, too.

MGerlach (WMF) (talkcontribs)

@MMiller (WMF) this is a really interesting observation (and thanks for tagging me). I know very little about the inner workings of ORES, however, I think that ORES does not just rely on the content of the edit but also uses features based on the editing-history of the user. I also believe that ORES is configured differently in the different languages. So I dont know right away how to interpret these observations. Nevertheless this is a really interesting way of thinking about how to assess the quality of the edits recommended by the algorithm. Will have to think more about this.

Reply to "Experiment visual comparison"