Crescimento/Atualizações de análises/Relatório inicial da pesquisa de boas-vindas

From mediawiki.org
This page is a translated version of the page Growth/Analytics updates/Welcome survey initial report and the translation is 13% complete.

Como parte do projeto “Primeiro dia personalizado” da Equipe de Crescimento (Growth), implantamos a pesquisa de boas-vindas nas wikipédias checa e coreana em 19 de novembro de 2018 após as 19h00 UTC. O objetivo da pesquisa era reunir informações iniciais sobre novos usuários, com a finalidade de personalizarmos seu primeiro dia na wiki e ajudá-los a alcançar seus objetivos. Antes da implantação, foi publicado um plano de experimento detalhando o que seria avaliado (e o porquê). Essa página detalha um relatório inicial da equipe sobre os resultados da pesquisa, e será sucedida por análises mais detalhadas que sanarão as outras questões do plano de experimento.

Nesse relatório, foi dada uma visão geral da pesquisa e suas respostas a partir das contas registradas entre a implantação e o fim do dia de 17 de dezembro de 2018 (UTC). Não foram calculados intervalos de confiança ou significância estatística para os resultados, bem como não foram feitas conclusões sobre a existência de diferenças significativas (p. ex., entre as duas wikis ou entre grupos de usuários). Também não foram cruzadas as perguntas da pesquisa umas contra as outras e nem analisadas junto dos dados do EditorJourney. Em vez disso, esses dados estão sendo apresentados como descobertas preliminares e acompanhados de discussões acerca dos passos em potencial sugeridos por esses dados. Uma análise mais concisa será feita no trimestre seguinte.

Visão geral

  • Boa parte dos usuários responderam à pesquisa, sendo as taxas de respostas para as wikis checa e coreana 67% e 62%, respectivamente.
  • Não foi levantada a possibilidade da pesquisa ter provocado a saída de novos usuários.
  • A razão mais comum para se criar uma conta na Wikipédia coreana é ler artigos, não editar (29%). Isso muda na checa, onde 18% dão essa resposta. Os números elevados aqui podem representar uma oportunidade para educar esses usuários de que é possível e fácil editar a Wikipédia.
  • A maioria dos respondentes em ambas as línguas não editou a Wikipédia antes (51% na checa e 63% na coreana). Mas estas percentagens significam também que um grande número de pessoas “editaram” antes (anonimamente ou noutra conta), e portanto podem ter algum conhecimento prévio de edição.
  • Os inquiridos coreanos eram muito mais propensos do que os inquiridos checos a digitar os seus próprios tópicos personalizados, em vez de selecionarem apenas as opções já populadas. 28% dos inquiridos coreanos adicionaram o seu próprio tópico, em comparação com 9% dos inquiridos checos.
  • Surpreendentemente, um grande número de inquiridos disse estar interessado em ser contatado para obter ajuda em suas edições: 36% na checa e 53% na coreana. Essa é uma forte afirmação de que existe o potencial e o desejo de ajuda de humano para humano. A lista de inquiridos por si própria já é uma boa seleção para campanhas de divulgação.
  • Poucas pessoas acrescentaram um endereço de correio eletrônico que antes de tê-lo feito durante a criação da conta. Os números são suficientemente grandes para que a opção seja produtiva (6% na coreana e 7% na checa), mas suficientemente pequenos para que se possam considerar melhores apelos para encorajar a adição de um endereço de e-mail.

Contexto

A motivação original desta pesquisa era reunir informações sobre usuários de forma a permitir que personalizemos suas experiências. Leia aqui sobre como poderemos tomar medidas com esses dados na próxima fase do projeto.

Durante as quatro semanas seguintes após a implantação, a pesquisa selecionou aleatoriamente 50% dos usuários que registraram uma nova conta nas duas wikis (o que significa que não foram adicionados usuários de outras wikis que tiveram suas contas autocriadas naquelas após sua primeira visita). Esse teste A/B entre um grupo de pesquisa e um de controle foi escolhido para que se pudesse determinar se a pesquisa leva a uma proporção menor dos usuários fazer sua primeira edição dentro das 24 primeiras horas (o que chamamos de “ativação do editor”). Uma análise dos resultados desse experimento ainda será divulgada.

Uma referência rápida de como a pesquisa parecia e das questões contidas nela está disponível nesta simulação. A pesquisa foi traduzida nas línguas checa e coreana.

Taxa de resposta

Na Wikipédia checa, a pesquisa foi exibida a 669, e a 836 na coreana. Ao usuário é apresentada uma série de questões opcionais. Então, é possível enviar a pesquisa clicando no botão “Finalizar” (mesmo se o usuário não respondeu a nenhuma questão) ou descartar as respostas com o botão “Pular pesquisa”, ou até mesmo simplesmente sair da página fechando o navegador. A essa última ação, demos o nome de “abandono”. A distribuição dessas ações para ambas as wikis é a seguinte:

Tabela 1: Visão geral e taxas de resposta
tcheco coreano
Ação N Prop. N Prop.
Enviado 451 67.4% 518 62.0%
Pulado 81 12.1% 99 11.8%
Abandonado 137 20.5% 219 26.2%

Table 1 shows that most of the users submitted the survey, which is great! As we will see below, users also answer our questions (rather than submit a survey with no answers). The abandonment rate appears to be fairly high, and at first we were concerned this meant that the survey was causing users to leave the website entirely, which would be a counter-productive outcome. To look into this, we dug into the data captured via our team's "Understanding first day" project, which gathers data on what new users view during their first 24 hours. We found that in Czech, only 47 users (7.0%) left the site, while in Korean it was only 99 users (11.8%). Both of these proportions are below the thresholds we had set for whether to change the survey or turning it off. This question will be answered more conclusively when we analyze the control group's rate of abandoning the site after account creation.

It is also possible to split the response rates by whether the account was created on the desktop or mobile site, but we find that the proportions are generally similar.

Why did you create your account today?

Why did you create your account today?

  • To fix a typo or error in a Wikipedia article
  • To add information to a Wikipedia article
  • To create a new Wikipedia article
  • To read Wikipedia
  • Other (please describe)

Our first question asks why the user created an account, and provides several options, as well as an "Other" option where the user is given a text field to explain further. For our two target Wikipedias, the responses pan out as follows, with proportions based on of the number of respondents in each language:

Table 2: Why did you create an account today?
tcheco coreano
Reason N Prop. N Prop.
Create a new article 147 32.6% 102 19.7%
Add information to an article 110 24.4% 131 25.3%
To read Wikipedia 79 17.5% 149 28.8%
Fix a typo or error in an article 76 16.9% 90 17.4%
No answer 21 4.7% 23 4.4%
Other 18 4.0% 23 4.4%

The first thing to notice is perhaps that the most frequent option is different between the two languages. In Czech it is creating a new article that is selected by 32.6% of respondents, while in Korean it is reading (28.8%). In both languages, the other option is third on the list, reading was chosen by 17.5% of Czech respondents, and creating a new article by 19.7% in Korean. It's interesting to learn that reading Wikipedia motivates a lot of account creation, since having an account does not materially change the reading experience. That may point to a misperception around account creation, but may also be an opportunity to engage users both as readers and potential editors.

Adding information to an article is consistently the second option in both languages, and has a comparable proportion of around 25%. The same goes for fixing a typo or an error, which is consistently fourth on the list with about 17% of the responses.

Have you ever edited Wikipedia?

Have you ever edited Wikipedia?

  • Yes, many times
  • Yes, once or twice
  • No, I didn't know I could edit Wikipedia
  • No, other reasons
  • I don't remember

The second question asks whether the user has edited Wikipedia before and lists five potential answers. Some users also submit the survey without responding to this question. Table 3 below gives an overview of the responses, and again proportions are based on total number of survey responses.

Table 3: Have you edited Wikipedia before?
tcheco coreano
Response N Prop. N Prop.
No, I didn't know I could edit Wikipedia 125 27.7% 191 36.9%
No, for other reasons 103 22.8% 136 26.3%
Yes, once or twice 94 20.8% 62 12.0%
I don't remember 52 11.5% 54 10.4%
Yes, many times 50 11.1% 44 8.5%
No response 27 6.0% 31 6.0%

In both languages we find "No, I didn't know I could edit Wikipedia" is the most frequent option, and that a majority of respondents say they had not edited Wikipedia before (combining both "no" options: Czech: 50.5%; Korean: 63.2%). Regarding the "No, I didn't know I could edit Wikipedia" response, it makes sense that many people would give this answer given how many say they are creating their account for the purpose of reading. But we were also surprised that the number was quite so high. One hypothesis is that the question might be interpreted to mean different things by different respondents. One possible interpretation is "No, I didn't know I could edit Wikipedia until this survey question pointed it out", and another is "No, I didn't know I could edit Wikipedia until recently, but once I discovered that I could, I decided to create this account." We will learn some more about this question once we make cross-tabulations against the other questions, and we can consider clearer phrasings of these responses in the future.

It is also worth noting that the order of the responses is the same across both languages, and that it is different from the order the options are shown to the user. This means that the respondents did not simply choose the first answer in the list when responding, but are instead actively letting us know that they haven't edited Wikipedia before.

Select some topics you may wish to edit

People can edit Wikipedia articles on topics that they care about. We've listed a few topics below that are popular for editing. Select some topics that you may wish to edit:

Explicitly listed as checkboxes: Arts, Science, Geography, History, Music, Sports, Literature, Religion, Popular culture.

Available in a typeahead dropdown menu: Entertainment, Food and drink, Biography, Military, Economics, Technology, Film, Philosophy, Business, Politics, Government, Engineering, Crafts and hobbies, Games, Health, Social science, Transportation, Education.

The third part of the survey asks the respondents to select some topics that they may wish to edit. Nine topics are shown as checkboxes, and another eighteen topics show up when the user clicks on or types in the field. The field is free-form, allowing respondents to add additional topic. Respondents may choose and add as many topics as they like.

This analysis only covers the suggested topics. Future analyses will address the user-supplied topics, which require translation before they can be analyzed. We show one table below for each language. The table identifies the way a user can select a topic as either "checkbox", meaning it is one of the nine checkboxes; "prefilled", meaning it is one of the eighteen pre-filled topics found in the free-form field; or "other", meaning it is a topic added by the respondent.

Table 4: Czech topics
Source Topic N Prop.
checkbox science 198 43.9%
checkbox history 187 41.5%
checkbox arts 152 33.7%
checkbox music 146 32.4%
checkbox sports 144 31.9%
checkbox popular culture 132 29.3%
checkbox geography 130 28.8%
checkbox literature 123 27.3%
checkbox religion 94 20.8%
prefilled entertainment 16 3.5%
prefilled games 16 3.5%
prefilled politics 13 2.9%
prefilled film 13 2.9%
prefilled economics 10 2.2%
prefilled food and drink 8 1.8%
prefilled social science 8 1.8%
prefilled biography 7 1.6%
prefilled education 6 1.3%
prefilled crafts and hobbies 6 1.3%
prefilled technology 6 1.3%
prefilled military 4 0.9%
prefilled philosophy 4 0.9%
prefilled business 3 0.7%
prefilled government 2 0.4%
prefilled health 2 0.4%
prefilled transportation 2 0.4%
other other 41 9.1%

We can see that the dominating topics are all the ones listed in the checkboxes. The least frequent checkbox is selected by 20.8% of respondents, while the most frequent topic in the free-form field is only chosen by 3.5% of respondents. It is noteworthy that respondents are selecting multiple topics, as opposed to just one.

Table 5: Korean topics
Source Topic N Prop.
checkbox arts 218 42.1%
checkbox popular culture 205 39.6%
checkbox science 197 38.0%
checkbox history 179 34.6%
checkbox music 158 30.5%
checkbox sports 114 22.0%
checkbox literature 96 18.5%
checkbox religion 84 16.2%
checkbox geography 78 15.1%
prefilled games 21 4.1%
prefilled entertainment 13 2.5%
prefilled business 10 1.9%
prefilled economics 10 1.9%
prefilled technology 9 1.7%
prefilled education 9 1.7%
prefilled film 8 1.5%
prefilled health 8 1.5%
prefilled philosophy 8 1.5%
prefilled engineering 7 1.4%
prefilled crafts and hobbies 7 1.4%
prefilled politics 7 1.4%
prefilled food and drink 6 1.2%
prefilled military 6 1.2%
prefilled social science 6 1.2%
prefilled biography 6 1.2%
prefilled government 4 0.8%
prefilled transportation 2 0.4%
other other 145 28.0%

We see a similar trend in Korean as for Czech: the checkboxes are dominating when it comes to selecting topics, although the difference between the least popular checkbox and the most popular pre-filled topic is smaller (11.0%) in Korean than in Czech (17.3%).

Are you interested in being contacted to get help with editing?

We are considering starting a program for more experienced editors to help newer users with editing. Are you interested in being contacted to get help with editing?

We find that in both languages, a surprisingly large number of users are interested in being contacted. 164 users in Czech (36.4% of all survey respondents) and 273 users in Korean (52.7%) answered "yes" to that question. This means that there's clearly interest among new users to get help to edit Wikipedia, and that this is a potential venue for community outreach. When we dig deeper into the survey responses, we will also compare the responses to this question with the answer to the question of whether the user had already edited Wikipedia, as well as why they signed up to create an account.

Adding an email address

Users who did not add an email address during their initial account creation are given a second opportunity to add their email address in the survey. We find that very few users do so, only 13 on Czech Wikipedia, and 20 on Korean. This corresponds to 6.5% of Czech users who did not already have an email address[footnotes 1] when shown the survey, and 5.7% of the Korean users.

Repeat survey responses

Though there is not an explicit workflow for doing so, users can take the survey multiple times by revisiting the survey URL. We only store their most recent responses, meaning that we regard their most recent answer to accurately reflect their interests and opinions. At the same time, we store a count of how many times they have responded/skipped. Table 8 below shows how the number of responses is distributed, where the proportion is out of all users who either saved or skipped the survey.

Table 8: How many times the survey was taken
tcheco coreano
Number of responses N Prop. N Prop.
1 512 96.2% 593 96.1%
2 14 2.6% 23 3.7%
3 4 0.8% 1 0.2%
4 2 0.4%

We can see that it's relatively rare that users take the survey multiple times, and if someone does, it's typically only one more time. This means that we see little reason to discard responses based on users taking the survey multiple times and potentially changing their answers.

Sanity checks

We have also run various sanity checks on our data in order to ensure that things are working properly. For example, we have calculated the distribution of users assigned into the survey and control groups, which ideally should be 50/50. This also turns out to be the case, overall on Czech Wikipedia the proportions are 49.7%/50.3% survey/control, and on Korean Wikipedia it is the other way around. We do find some variation when accounts are split into registrations from desktop and mobile (e.g. that it's 47/53 in some cases), but not enough to warrant a concern that the randomization has led to imbalanced or biased groups.

While working on this report, we have not yet dug carefully into the data to determine if the responses appear to be truthful. For example, if a user answers that they did not know they can edit Wikipedia but also says they had edited Wikipedia many times, we should most likely discard their answers to at least both those questions, potentially the entire survey. This is noted and will be done as part of a more thorough examination of the survey results at some point in the near future.

Footnotes

  1. For more information about our methodology for determining if a user supplied an email address at registration, see our appendix below.

Appendix A: Email added at registration

How did we determine how many users had not provided an email address at signup to be able to calculate that proportion? This is not trivial, because the MediaWiki database does not store a timestamp of when a user added their email address, nor is there an EventLogging schema in use for logging that kind of information either. The only piece of information in the database that seemed related is the expiration timestamp of the verification token that is emailed to the user when they enter their email address.

We examined the difference between the timestamps of account registration and verification token expiration for accounts registered between January 1 and July 1 2018 on both Wikipedias and found that it is typically set to slightly more than seven days. How much more is "slightly more"? In the vast majority of cases less than ten seconds, which we think is the delay between the system creating the account and the subsequent emailing of the verification token (at which point the expiration timestamp is set to "seven days from now"). We therefore adopted a simple heuristic for determining if the user supplied an email at registration: it happened if the difference between the two timestamps is less than "one week + ten seconds".

Another thing we have to consider is that we do not have information about whether a user supplied an email address at registration but then decided to delete it. This means that they'll show up in our statistic as "did not supply an email at registration". We decided to assert that this is rarely done based on the fact that as of December 19, 64% of Czech registrations and 75% of Korean registrations between January 1 and July 1 did not have a verified email address. This suggested to us that users most likely either supply an email address that they do not check, or do not really care much about email verification, which we took to mean they are also unlikely to delete their email address.

Lastly, the proportion listed in the "added email" section above was not based on an upper limit for how quickly after registration a user can add their email address. This means that users who took the survey shortly after it was deployed have had more time to provide us with an address. In future calculations we will have a limit (e.g. one week), but in the meantime we will assert that if they have not provided us with an address already it's unlikely that they return to do so (in other words, that it's relatively unlikely that a user adds an email address after registration).