Growth/Analytics updates/Work log/2018-10-18

Third, the proportions of editing on the English and German desktop sites are different from the aforementioned trend. Both see a fair number of accounts created from the editing context, roughly as many as are created from reading the main page. Because this In T206377 I was asked to get statistics on the context of account creations in order to understand the extent to what any intervention directly after account creation would potentially disrupt the workflow. I used two HQL queries for gathering data on this. The first query can be found in this task comment, while the second can be found in this other task comment. A couple of weeks after the initial data gathering, I learned that we appear to be mislabelling the context of potentially a lot of creations. The specifics are outlined in the second comment linked previously, and is the reason for why there are two queries. My updated query improves our data gathering for the English Wikipedia where we can correctly identify additional account creations from the edit context.

In general, the HQL queries sum up three specific measures for each of the mobile and desktop sites on a monthly basis, and those three measures are the same specified in the task: number of accounts created from reading the homepage, number of accounts created from editing a page, and number of accounts created from some other reading context (i.e. "neither of the other two"). Note that in the latter case, we do not specifically require the account to have been reading an article, for example it also accounts for contexts like "searched for something".

The task also asks only examine non-autocreated accounts, so autocreated accounts are filtered out. In this comment I list the HQL query I used to verify that they were discarded).

Aggregated Counts
The task asks for aggregated counts from the recent few months, and the data gathered spans across March through September 2018. I summed up the counts across that time period, and created the bar charts seen below. These charts are split by site (desktop and mobile), and then by context (editing, reading the main page, and reading another page).

There are several trends visible when looking at these graphs side-by-side. First, we can see that for the Czech, Korean, and Ukrainian desktop sites (the three bars on the left side of those) the proportions are roughly similar. Most accounts appear to be created from the reading context, primarily from somewhere that is not the main page. Relatively few appear to be created from the editing context. For Korean and Ukrainian, the actual number might be higher because there are two links to create an account if you try to edit but are not logged in, and one of them is not captured by our data gathering (see this comment on Phabricator for more details). Czech Wikipedia only has one link to create an account, and the low number of creations there might be because it's somewhat difficult to find (compared to English and German, where the warning message when editing without logging in contains a link to create an account).

Secondly, the proportions of editing on the English and German desktop sites are different from the aforementioned trend. Both see a fair number of accounts created from the editing context, roughly as many as are created from reading the main page. Because this proportion is so different from the Arabic and Ukrainian Wikipedias, it might suggest that the proportion of accounts created from an editing context on those is substantially higher than we report, in other words somewhere in between our reported numbers and those for German Wikipedia.

Third, for the Czech, Korean, German, and Ukrainian mobile sites, the proportions are also roughly similar. Few accounts are created from reading the main page. Instead, they are either created when editing, or when reading a page that is not the main page. English Wikipedia is also similar in this sense, but where the four others appear to have more accounts created from the mobile editing context, English has more accounts created from the reading context.

Lastly, Arabic Wikipedia is different from all the others. In our October 5 work log, I found that most accounts are registered from mobile. We see that clearly here as well, the number of registrations from both the editing and reading contexts on mobile are higher than the number of registrations from the reading context on the desktop site.

Monthly counts
While the phab task doesn't ask for movement across time, I made graphs for each of the six wikis because I had already done something similar for the work in the October 5 work log. Once again they're split by desktop and mobile, then by each of the three contexts (reading, reading the main page, and editing), and measures registrations for the whole month for each of the seven months in our datasets.

It is difficult to determine trends in the data because it does not cover a timespan large enough to allow us to understand if we are looking at repeated yearly events, or whether something is out of the ordinary. Looking at the Czech, English, German, and Ukrainian Wikipedias, there appears to be a summer slump on desktop registrations. From working on the ACTRIAL project I know that English activity levels follows the holiday seasons, in that there's less activity in June, July, and August (summer holidays in Europe and the US), as well as over the Christmas holiday. That might explain what we're seeing there.

We also see a couple of trends that could be worth investigating. Korean Wikipedia had a spike in desktop registrations from reading the main page in June and July. Does that occur every year, or is it just 2018 that saw that? Arabic Wikipedia has a large increase in desktop registrations in September. There's also a shift in Arabic Wikipedia in that prior to June most mobile registrations originated from the reading context, while since then it's the editing context that's slightly ahead. Is that a recent trend, or something that's occurred previously? There might be some interesting things to learn here, but it is outside the scope of the current task.

How many accounts?
The phab task asks for the number of accounts created from each context, so let's calculate that for each Wiki and context. We can visualize this table in various ways. Below are stacked bar plots of the proportions of accounts created for each of the six Wikipedias in our dataset. The first shows the proportion of mobile versus desktop, the second shows the overall proportion of accounts by the context they were created, and the last splits it out first by site and then by context.