Growth/Analytics updates/Work log/2018-10-05

Account creations from mobile
In T205760, I was asked to figure out what percentage of accounts were created from mobile devices vs. desktop. This should exclude auto-created accounts, and it would be preferable if it could be split further by device (e.g. iOS and Android).

I found the ServerSideAccountCreation schema, which should enable me to answer these questions. From what I can tell by exploring the data, if the  flag is set the account was most likely created using a mobile app. Access to account creation through the API is not restricted to the mobile apps, so we might encounter non-app creations. While this binary heuristic is crude, it did appear to flag the vast majority of cases I looked at, and this project did not allow for developing a more sophisticated heuristic.

If the  flag is not set, we can distinguish between the mobile and desktop sites using the   flag. EventLogging also has the  field, but that turns out to never be set to the mobile site in this case (perhaps why the   flag is there?) These two flags combined allows us to split account creations into three categories: apps, mobile, and desktop.

Exploring the data further I found that I can use the  property of the event to identify Android and iOS devices. I chose to not investigate further splits (e.g. identifying Windows phones), and instead put everything that is neither Android nor iOS in "other".

I was interested in getting data from 2016-01-01 onwards and plot a monthly time series, but this was hindered by the Data Lake only having consistent data starting in early February 2018. More data is available, but the tight deadline did not allow for merging of these data sources. Instead, I chose to use 6 months of data, covering February through July 2018. The data for February is not complete, but I am assuming that this affects every category equally since it's a time-based exclusion. Since the task is to get proportions and not specific counts, this should not substantially affect the results.

I ended up writing the HQL query below to grab data for this from the Data Lake. The name of the resulting table and the wiki it grabs data from is changed for each of the six wikis we are grabbing data from.

Proportion of desktop, mobile, and apps
Let's first look at the top-level split, between the different sites/methods: apps, desktop, and mobile. First, I use the whole dataset and calculate the overall proportion of registrations for each. This results in six bar charts, seen below.

The general trend in all these plots is that the proportion of registrations through the mobile site is stable, and there's a reduction in desktop registrations in favour of registrations through the mobile app. This means that our previous plots of the overall proportions is slightly overestimating the desktop registrations, while underestimating the app registrations, compared to the more recent months.

Proportion of Android and Apple
Our datasets also allow us to split each site by whether the registration happened on an Android, iOS, or other type of device. Because the overarching question is registrations on apps/mobile/desktop, I decided to focus this on splitting each of those sites, rather than answer the question "what's the overall proportion of Android/Apple/other?"

Splitting up the dataset and calculating proportions for each site results in the six bar charts seen below.

In these graphs, one overall trend is that iOS dominates the apps registrations. German Wikipedia is an exception, where there is a large number of registrations falling into the "other" category. We use the  flag to categorize account creations into the "apps" category, and while that might be correct in the vast majority cases, account creation through the API is not restricted to the apps. In this case it did not seem worthwhile to spend time on developing a more fine-tuned set of heuristics in order to filter out non-app account creations, which means that we end up with cases like this.

Registrations on the desktop site is typically "other", with Arabic and Korean showing a fair proportion of registrations from Android and iOS devices. Lastly, the trend of Android vs iOS is switched for the mobile site, typically 80/20 in Android's favor. The German Wikipedia is the exception here where it is more even.