Growth/Analytics updates/Work log/2018-10-18

In T206377 I was asked to get statistics on the context of account creations in order to understand the extent to what any intervention directly after account creation would potentially disrupt the workflow. The HQL query I used to grab data for this can be found in this task comment. Said query sums up three specific measures for each of the mobile and desktop sites on a monthly basis, and those three measures are the same specified in the task: number of accounts created from reading the homepage, number of accounts created from editing a page, and number of accounts created from some other reading context (i.e. "neither of the other two"). Note that in the latter case, we do not specifically require the account to have been reading an article, for example it also accounts for contexts like "searched for something".

The task also asks only examine non-autocreated accounts, so autocreated accounts are filtered out. In this comment I list the HQL query I used to verify that they were discarded).

Aggregated Counts
The task asks for aggregated counts from the recent few months, and the data gathered spans across March through September 2018. I summed up the counts across that time period, and created the bar charts seen below. These charts are split by site (desktop and mobile), and then by context (editing, reading the main page, and reading another page).

There are several trends visible when looking at these graphs side-by-side. First, we can see that for the Czech, Korean, English, and Ukrainian desktop sites (the three bars on the left side of those) the proportions are roughly similar. Most accounts are created from the reading context, primarily from somewhere that is not the main page, and relatively few appear to be created from an editing context.

Secondly, for the Czech, Korean, German, and Ukrainian mobile sites, the proportions are also roughly similar. Few accounts are created from reading the main page. Instead, they are either created when editing, or when reading a page that is not the main page. English Wikipedia is also similar in this sense, but where the four others appear to have more accounts created from the mobile editing context, English has more accounts created from the reading context.

Third, the proportions on the German desktop site are different from the aforementioned trend. Here we can see that about as many accounts are created from the editing context, as are made from the main page. I do not know why this happens, but I did try editing without logging in on both German and English and noticed slight differences in the message that comes up. Whereas English links to both "log in" and "create an account", German Wikipedia only links to the latter. There might also be differences in the wording, but as I don't know German I rely on Google Translate to make heads or tails of the messages. Experiments with wording and/or linking would have to be done in order to understand if there's an effect there.

Lastly, Arabic Wikipedia is different from all the others. In our October 5 work log, I found that most accounts are registered from mobile. We see that clearly here as well, the number of registrations from both the editing and reading contexts on mobile are higher than the number of registrations from the reading context on the desktop site.

Monthly counts
While the phab task doesn't ask for movement across time, I made graphs for each of the six wikis because I had already done something similar for the work in the October 5 work log. Once again they're split by desktop and mobile, then by each of the three contexts (reading, reading the main page, and editing), and measures registrations for the whole month for each of the seven months in our datasets.

It is difficult to determine trends in the data because it does not cover a timespan large enough to allow us to understand if we are looking at repeated yearly events, or whether something is out of the ordinary. Looking at the Czech, English, German, and Ukrainian Wikipedias, there appears to be a summer slump on desktop registrations. From working on the ACTRIAL project I know that English activity levels follows the holiday seasons, in that there's less activity in June, July, and August (summer holidays in Europe and the US), as well as over the Christmas holiday. That might explain what we're seeing there.

We also see a couple of trends that could be worth investigating. Korean Wikipedia had a spike in desktop registrations from reading the main page in June and July. Does that occur every year, or is it just 2018 that saw that? Arabic Wikipedia has a large increase in desktop registrations in September. There's also a shift in Arabic Wikipedia in that prior to June most mobile registrations originated from the reading context, while since then it's the editing context that's slightly ahead. Is that a recent trend, or something that's occurred previously? There might be some interesting things to learn here, but it is outside the scope of the current task.

How many accounts?
The phab task asks for the number of accounts created from each context, so let's calculate that for each Wiki and context.