Growth/Positive reinforcement/Mentorship preliminary analysis, August 2023

From mediawiki.org

The Growth team has been developing features related to mentorship on Wikipedia for several years. Newcomers have access to the Mentor module, enabling them to ask their mentor a question that gets posted on their mentor's talk page. In addition to findings questions from newcomers on their talk page, mentors also have access to the Mentor dashboard where they can get an overview of their mentees' activity on the wiki.

For the Positive reinforcement project, the Growth team decided to look into the effect of these features to help inform conversations with the community at Wikimania 2023. At the time this analysis was done, there were two Wikipedia wikis where a proportion of newcomers would randomly get access to a mentor, English and Spanish. On the English Wikipedia, the proportion was 10% for a long time but was increased to 25% in July 2023, while on Spanish it has been 50% for a long time. This random assignment allows us to compare the group who has a mentor to the one that doesn't.

We gathered a dataset of registrations from English and Spanish Wikipedia spanning June 1 to July 15, 2023. Details about the dataset can be found below in the methodology section. We focused on high-level questions about mentorship as this is a preliminary analysis. Our questions were:

  1. What proportion of newcomers with access to a mentor asks them a question?
  2. Out of those who ask a question, what proportion asks more than one?
  3. How quickly after registration are questions asked?
  4. What are the activation, retention, productivity, and revert rates for newcomers with and without mentorship?

This analysis is preliminary and exploratory, meaning that we were aiming to get an overview of a few topics and to learn whether there were meaningful similarities and differences in the data. As a result, this report focuses on describing what we found. We hope to follow up on this with an explanatory analysis, where we could learn more about why mentorship behaves the way it does.

Proportion asking a question[edit]

English Wikipedia sees differences in question asking percentage between platform of registration (desktop 1.7%, mobile web 2.7%), while Spanish Wikipedia is stable (2.7% vs 2.5%, respectively)..
Figure 1: English Wikipedia sees differences in question asking percentage between platform of registration, while Spanish Wikipedia is stable.

We find that less than 3% of newcomers with access to a mentor asks them a question, and that the proportion varies by wiki and platform of registration. The overall percentage asking a question on English Wikipedia is 2.1%, and on Spanish Wikipedia 2.6%.

Figure 1 shows these proportions broken down further by platform of registration, where we can see that on Spanish Wikipedia they are quite stable with only a little variation between platforms. On English Wikipedia there is a more distinct difference. There the desktop proportion is 1.7%, which is much lower than the same on Spanish Wikipedia. Mobile web newcomers on English Wikipedia ask at the same rate as those on Spanish Wikipedia, 2.7%.

Proportion asking multiple question[edit]

We also investigated what proportion of newcomers ask more than one question. Here we find a similar overall proportion on both wikis. On English Wikipedia the proportion is 7.5%, while on Spanish Wikipedia it is 7.2%.

When we separate this by platform of registration, we again find variation on English Wikipedia and a fairly stable situation on Spanish Wikipedia. The statistics are shown below in Table 1.

Table 1: Proportion of newcomers asking more than one mentor question by wiki language and platform of registration.
Language Platform % asking more than once
English Desktop 5.1%
Mobile web 10.2%
Spanish Desktop 6.9%
Mobile web 7.6%

Time to first question[edit]

Figure 2: Newcomers on English Wikipedia ask questions quickly, often within minutes.
Figure 3: Newcomers on Spanish Wikipedia also ask questions quickly.

Since most newcomers only ask a single question, we found it most meaningful to examine the time between registration and when a newcomer would ask their first question. Histograms of this time span are shown in Figures 2 and 3, and note that the time axis (the X-axis) in these graphs are on a logarithmic scale that expands the short time spans (minutes and hours) and compresses the longer ones (days and weeks). We use a logarithmic scale because it allows us to meaningfully display the entire time period of 15 days available in our dataset.

The top row in each figure shows the time span for newcomers whose first edit is a mentor question. Here we can see that mobile users ask these questions quickly, they're done before an hour has passed, or in the case of Spanish Wikipedia shortly after the half-hour mark. Users who register on desktop also ask quickly, but there are some newcomers who show up later such as the next day.

When we look at the second row, those are users who have made one or more other edits before they ask their mentor a question. Here we can see that more users are asking question at later times. These are mainly desktop registrations, and we see a fairly large proportion on Spanish Wikipedia. Some mobile web users will also do this, but at a lower rate than on desktop.

Proportion asking on their first edit[edit]

We can see from Figures 2 and 3 that for some newcomers a question to their mentor is their first edit, however the figures don't tell us how frequent this is. Table 2 below shows the proportion of mentor questions that were asked as the first edit. Again, we're looking only at the first question asked by any newcomer since most newcomers only ask a single question.

Table 2: Proportion of newcomers who ask a mentor question as their first edit
Language Platform % asking as first edit
English Desktop 29.0%
Mobile web 32.8%
Spanish Desktop 27.1%
Mobile web 30.5%

We can see from Table 2 that a large proportion, roughly 3/5 to 1/3 of newcomers are making their first edit when they ask their mentor a question. The proportions are fairly stable, and we see that this is more common for newcomers who register on mobile web than on desktop.

Mentorship and newcomer activity[edit]

The Growth team has four metrics that aim to capture newcomer activity: activation, retention, productivity, and revert rate. As mentioned above, we also examined whether having access to a mentor affects these metrics.

More specifically, the four metrics are defined as followed:

  • Activation: making an edit within 24 hours of registration.
  • Retention: an activated newcomer returns to edit again within 14 days.
  • Productivity: the number of edits made.
  • Revert rate: for newcomers who make at least one edit, the proportion of their edits that were reverted within 48 hours.

We will further refine some of these metrics by namespace (e.g. only counting article & talk edits), only counting non-reverted edits, and by removing mentor questions (because those who don't have access to a mentor can't make those edits).

Activation[edit]

Figure 4: Activation increases when newcomers have access to a mentor, but it's often because they ask a question.

We find that newcomers are generally more likely to activate if they have access to a mentor. In many cases this is because they ask their mentor a question. Figure 4 sheds some light on this, where we can see that activation rates for English Wikipedia (top row) are higher on the left (where all edits count) for newcomers with access to a mentor, compared to those on the right (where mentor questions are removed).

Newcomer on Spanish Wikipedia are both similar and different to the trend on English. Spanish users who registered on mobile web are similar in that those with access to a mentor are more likely to activate, but they are different in that this increase stays when we remove mentor edits. Newcomers who registered on desktop are different in that they are equally likely to activate when all edits are counted, and similar in that they are also equally likely to activate when mentor questions are removed.

Retention[edit]

We find no difference in the retention of newcomers who have access to a mentor compared to those who do not, when keeping their first day activity constant. This means that for two newcomers who registered on the same wiki, using the same platform, and who made the same number of edits on their first day, the likelihood of them sticking around and editing again is the same.

Productivity[edit]

We find conflicting patterns of productivity between English and Spanish Wikipedia. On English Wikipedia, we find an increase in productivity for newcomers who registered on desktop, and no difference for those who registered on mobile web. When it comes to Spanish Wikipedia, desktop users show no difference, whereas mobile web users see a decrease in productivity when they have access to a mentor.

Revert rate[edit]

We find no difference in the revert rate between newcomers who have access to a mentor and those who do not, when keeping their activity constant. In other words, comparing two newcomers who registered on the same wiki, on the same platform, and who made the same number of edits across the "activation" and "retention" time periods (which spans 15 days), the number of their edits that were reverted will be the same. As described above, this analysis only examines newcomers who made at least one edit.

Methodology[edit]

We gathered a dataset of registrations from English and Spanish Wikipedia spanning June 1 to July 15 2023. During this time period, 50% of newcomers on Spanish Wikipedia were randomly assigned to have access to a mentor, while the other 50% did not get a mentor. On English Wikipedia, 10% of newcomers would randomly get a mentor until July 11, at which point it increased to 25% (ref).

Our data gathering excludes known test accounts, bots, users registered through Wikipedia's API (these are usually mobile app accounts), and users not registered on the given wiki (i.e. autocreated accounts and those registered by another user). We also excluded users with a non-standard mentorship setting (there were a handful of these on each wiki).

The dataset used in this analysis contains 19,305 Spanish accounts and 114,512 English accounts, of which 50.0% and 11.2% respectively had Mentorship enabled.

In the above analysis of mentorship's effect on the key Growth metrics of activation, retention, productivity, and revert rate, we make extensive use of regression models. Activation and retention are yes/no outcomes, for which we use a logistic regression model. Productivity is measured by a count of the number of edits, which is known to have a long tail distribution. We therefore use a negative binomial model for productivity. Our revert rate analysis uses a zero-one-inflated beta distribution. This is because revert rates calculated across a time window tends to fall into one of three categories: 1) the user has all of their edits reverted (one-inflation), 2) the user has none of their edits reverted (zero-inflation), and 3) the user has some of their edits reverted (resulting in a beta distribution).