Topic on Talk:New Editor Experiences

How do we measure (and define) New Editors success?

16 comments • 18:54, 3 January 2018 6 years ago

16

What, specifically, are the goals of the New Editors project? And what are the numerical measurements that will help us track success on those goals? Daisy, Pau and I met today to try to hash out that latter question in particular—partly because putting metrics tests in place takes time.

Nailing down the specific goals of the New Editors project is not as obvious as it might seem at first. After a lot of discussion, we’d like to posit that that the goal of the New Editors project should be to help new editors progress from a state of wiki ignorance to some (definable) state of wiki experience.

In other words, it’s not our brief to make people experts editors, or to ensure that they contribute over the long term—as valuable as those goals may be. The interventions we pick should be aimed at getting users through their initial period of vulnerability, over the hump, out of the danger zone—to successfully deliver them, in effect, OUT of the state of being “new editors” and into some level of mastery (where they don’t know all the answers but they know how to figure things out). If we agree on that general goal, the questions become: what level defines the upper limit of “newness” for an editor, and how do we measure progress toward that limit?

Here are some ideas for measurements that will help us track users’ progress from ignorance to basic mastery (based on Google’s HEART framework). We're not suggesting any level of improvement on these as a “goal” yet; they are simply indicators of success worth tracking.

Engagement

% of registrants who make their first edit (within a defined timeframe?)
# of editors monthly who make it past their first 10 (non-reverted?) edits
# of new editors monthly who make it past their first 50 (non-reverted?) edits
# of new editors monthly who make it past their first 100 (non-reverted?) edits
# of accounts created (not our focus at this point, but still relevant)
# of edits a new user makes in the talk (or user talk?) namespaces (not a goal itself, but a good diagnostic tool, since communication has been shown to correlate with success)

Retention

% of editors who are active in the second 30 days after their first edit
And/or, the overall # of editors who are active in the second 30 days after their first edit

Task success

% of edits in the users’ first 30 days that are not reverted.
% of edits in users’ second 30 days that are not reverted.

These are just first proposals for general new user metrics. I’m sure there will be lots of discussion, and other metrics will need to be established that measure particular interventions we embark on.

And we didn’t talk about the other two elements of Google’s HEART system—”Happiness” and “Adoption.” The latter corresponds with recruitment, which we’ve agreed shouldn’t be our focus until a later point in this process.

Happiness is something that we probably should create performance indicators for, but these will likely be assessed by other means than site metrics. E.g., it would be quite useful to track users' answers over time to questions like: “When you try a new task on wiki, do you feel you know how to find out what the policies around that activity are?” But we can save that for another post. Thoughts?

Reply Edited 19:38, 8 December 2017 6 years ago

Whatamidoing (WMF) (talkcontribs)

It sounds like you're trying to decide where the dividing line is between "new editors" and "not-new editors". Do you think that any of the existing definitions, such as m:Research:Metrics or m:Analytics/Metric definitions would be suitable for that?

Reply 21:10, 9 December 2017 6 years ago

Alsee (talkcontribs)

Note that ACTRIAL is in progress on EnWiki until March 14th 2018. It may have a significant effect on relevant metrics. It may be a good idea to avoid any data period which spans that date.

It's also notable that during ACTRIAL, new users who wish to create a new article are funneled to make those initial edits in draft space. Metrics for "productive edits" usually exclude edits which are reverted, or page-deleted, within a few days. Edits in draft space are unlikely to be reverted, and unproductive new pages will typically be deleted after half a year rather than deleted in a few days. It would require an unreasonably long time frame an automated system to try to distinguish productive draft edits from unproductive draft edits.

Reply Edited 11:23, 10 December 2017 6 years ago

Pginer-WMF (talkcontribs)

@Alsee, thanks for pointing that ACTRIAL can produce some turbulence in the data for English Wikipedia. In this case the main focus of the New Editor Experiences research is on mid-size wikis, so I'd not expect measurements and initiatives to focus on English Wikipedia. Many findings and solutions probably apply to English too, but it should not be the main or only focus.

In any case, I think the sooner we define and start monitoring the key metrics for the project, the easier it will be to establish solid baselines that account for different variability factors (seasonality, editing peaks due to campaigns/news, other community experiments, etc.).

Reply 11:20, 11 December 2017 6 years ago

Halfak (WMF) (talkcontribs)

OK so, I do not think there's a good distinction between retention metrics and how well newcomers move from "newness" to non-"newness". Retention can be a very fine-detail metric. E.g. what proportion of people came back and did *anything at all* in a second session on Wikipedia while logged in?

I think Alsee made a great point and that we'll need to be wary of the basic "productive edit" measure. I think that if we're not looking for a certain proportion, it'll be a useful metric to track -- especially in an experimental setting where everyone is equally directed to draft-space.

Generally I'd like to step back from defining metrics before we define what "success" is. It seems that Joe is trying to push back against the notion that success is more good edits or more retention. I'd like to explore this more. What's lacking in these operationalizations? E.g. if newcomers because non-newcomers but then leave anyway, is that success? Can you imagine an instance where a newcomer saved more good edits and has longer retention that is undesirable?

Reply 17:50, 11 December 2017 6 years ago

JMatazzoni (WMF) (talkcontribs)

Aaron asks: "if newcomers become non-newcomers but then leave anyway, is that success?" I think the answer is yes, it could be. But let me turn that question around: if a higher percentage of registrants than in the past make it (just to pick something hypothetically) past their 50th edit within a year, but we see no net increase in the ranks of active editorship, is that a failure for New Editors? I don't think so.

My point is this: at various points along any editor's journey, a variety of factors might cause the editor to drop out; interventions could be devised to reduce the risk at each of these junctures. But the New Editors research looked only at the issues that affect editors in their early days, and the interventions we've discussed are designed specifically to get users past these initial challenges. We don't know what interventions would help sustain editors over the long term, and we're not considering such projects.

In a mars landing, if the rocket escapes earth but the lander fails somehow, everyone is sad but no one blames the launch team, who met their objective. That's what I'm "pushing back" on: we should measure lots of relevant indicators, but I want us to pick goals that are appropriate to the the scope of of our activities.

Reply Edited 02:05, 12 December 2017 6 years ago

JMatazzoni (WMF) (talkcontribs)

Actually, here's a better analogy than my mars landing example. Pre-school programs improve grade-school success. It's likely that some higher proportion of those successful grade schoolers (who went to preschool) also go on to college. Which is great. But no one, in setting up a preschool program, would set the goal as more college matriculation. Not because they don't want it, but because too many things can happen between preschool and college that are outside the control of preschool administrators and teachers.

Reply 19:08, 13 December 2017 6 years ago

Jmorgan (WMF) (talkcontribs)

I understand what you're saying, and I believe Aaron does as well. I believe what @Halfak (WMF)'s question was getting at is that we already have good evidence of what kind of newcomer behaviors (at what point in their lifecycle, and for what duration) predict "wiki mastery", because they are the same ones that strongly predict their likelihood of retention over the long term. You don't become a wiki master without sticking around, and vis versa.

And we know you can predict long-term retention really well from relatively short term activity measures. For example, if someone is still editing a week after they've started, they're very likely to be editing a year later. A month in, they're even more likely. We've been calling these "retention" metrics. But you can call them whatever you want, but let's use existing metrics and thresholds as our baseline and starting point. You can and should add to these, but don't ignore or replace them.

Reply 23:05, 13 December 2017 6 years ago

JMatazzoni (WMF) (talkcontribs)

Thanks for clarifying Jonathan. I'll try to clarify what I'm interested in as well, since I think I've touched on a number of ideas.

My main goal here, as I think about it, is tactical. I want to come up with a list of indicators that would help us measure new editor success. Then I want to write some Phabricator tickets to ensure that we are able to get those measurements easily—perhaps even on an automated "report card" like those that exist for various products and constituencies.

My thought is that we can do this AHEAD of starting work on any particular interventions, so that when we propose and then create those interventions, we'll know what type of influence we're trying to have. The way things typically go, by contrast, is that I'll propose that a project should "grow X," but I have no idea how big X is now, so I don't have any particular target. Getting to any destination is a lot harder, needless to say, when you have to make a map of the territory first.

You suggest "let's use existing metrics and thresholds as our baseline and starting point." Sounds good to me. So, what should we be looking at? What's your list of metrics that we should be gathering. Which metrics don't make sense in my list above?

Reply 00:16, 14 December 2017 6 years ago

Whatamidoing (WMF) (talkcontribs)

Korean barely uses the draftspace (complete list of draftspace amounts to 0.07% of articles), and none of the last 500 pages created by new editors used it.

The Czech wiki doesn't have a Draft: namespace.

It might be useful to say more bluntly what Pau appears to have tried to communicate earlier: This is not about the English Wikipedia. ACTRIAL, for example, can be ignored, because that is only happening at the English Wikipedia.

Reply 06:38, 13 December 2017 6 years ago

Jmorgan (WMF) (talkcontribs)

English Wikipedia is not the only Wikipedia that restricts new article creation by user class. Not sure if other wikis have Draft. Regardless, if we're building solutions to improve new editor experiences, we should be wary of creating solutions that won't work on some of our biggest wikis.

Reply 23:10, 13 December 2017 6 years ago

Jmorgan (WMF) (talkcontribs)

I think that it's helpful to use this HEART framework to brainstorm and prioritize metrics. I hadn't been aware of it; thanks for sharing!

I think you might want to look harder at "adoption" metrics in order to get at new users' level of wiki experience. I'm not talking about adoption of new features that you are testing, but rather adoption of features of the wiki that experienced editors use extensively but new editors don't use very much. One example is article talk pages (which you currently have as an "engagement" metric). Others might be creating a user page, using edit summaries and policy citations, or making edits to the Wikipedia namespace. "Adoption" here means that the new user has learned how to (inter)act on the wiki in a similar way to how veteran editors do it.

I see where you are going with your engagement metrics, but given the timescales you're using (months), these look more like retention metrics. I believe that "engagement" metrics make more sense on smaller timescales: for example, edits-per-editing-session, or length-of-edit-session.

Ultimately, I think that retention metrics make more sense if we're trying to show impact, especially if we're trying to measure the impact of specific interventions--for example, a new onboarding tutorial that is offered to a sample of newly-registered editors. Length of retention is the best proxy we currently have for "wiki mastery".

Reply 22:30, 11 December 2017 6 years ago

Mikemorrell49 (talkcontribs)

Two suggestions:

I think the posited goal should be broader than "helping new editors progress from a state of wiki ignorance to some (definable) state of wiki experience". The current goal also implies that the Wikiproject working environment (information, learning, editing, communication, collaboration, tools) is pretty much OK. Incremental improvements are ongoing. The posited goal focuses on helping new editors (in better ways) to learn about - and to work in- this environment. IMHO it's just as likely that the characteristics of the working environment are a factor in editor retention. Various newspaper articles on 'the decline of wikipedia' mention this factor. In both the REBOOT report and the 2017-2018 Plan (Program 4) the goal is stated as: "Improve the new editor experience". This is a broader definition.
To track progress towards this broader goal qualitative measurements of new editor experience (like survey data) are needed. For the quantitative measurements, too much focus on 'number of edits per time period' gives IMHO a too narrow view of what's really going on. I suggest including things like:

number of new pages proposed, accepted and rejected
number of existing article/media items edited (with types of edit, major/minor, etc)
number of article discussion pages edited
number of user talk page pages edited
time spent logged in per month

Hope this helps! Mikemorrell49 (talk) 14:44, 12 December 2017 (UTC)

Reply Edited 14:48, 12 December 2017 6 years ago

Mikemorrell49 (talkcontribs)

One more thing ...

Google's HEART framework looks very applicable in this context. Very useful too! But I would avoid 'cutting corners' in using the framework. The framework is designed so that goals can be defined for each of the HEART components. Then 'signals' (indicators?) can be defined for each component that indicate the extent to which specific goals are being achieved. And lastly, these indicators are translated into more detailed 'metrics' that can be measured and together provide an update on the 'signal' (indicator).

At the moment, we have just one overarching goal and various metrics for 3 of the 5 HEART components.

I strongly suspect that the HEART components are sequentially interrelated. If I'm right, measuring just some of them independently will not tell us what we need to know. For example, A high level of 'Happiness' and 'Satisfaction' is a necessary condition for Engagemement. A high level of Engagement is a necessary condition for Adoption, and so on. Again, if I'm right, the whole process of retention and task success starts off by ensuring that new editors are 'Happy' and 'Satisfied': with the Community, with their interaction with the Community, with their 'onboarding' and with their working environment. New Editors who become - on balance - 'unhappy' or dissatisfied with these are unlikely to become Engaged. The first finding in the REBOOT report was that new contributors sign up with different motivations and expectations. 'Happiness' and 'satisfaction' is not uniform but related to these individual motivations and expectations.

Metrics that together indicate the personal level of Happiness/Satisfaction, Engagement, Adoption, etc. are IMHO much more important than measuring 'number of edits'.

Many years ago I worked in Human Resource Management and Development. Topics like 'Motivation', 'Learning', 'Collaboration', and 'Having the right resources' were as hot then as they are now even if the technology has changed a lot since then. There are multiple theories and models about 'motivation' that all have a degree of merit. Personally, I found 'Herzberg's two-factor theory' to be one of the most useful in practice. He distinguished between factors that - if present - led to positive motivation and other factors that - if absent - led to dissatisfaction. Maybe this distinction is useful in surveying new contributors.

Reply 18:18, 12 December 2017 6 years ago

JMatazzoni (WMF) (talkcontribs)

Thanks for pointing me to Herzberg's two-factor theory @Mikemorrell49. It's a useful lens for looking at the various ideas we've entertained to improve conditions for new editors. I see now that most or our proposals are aimed at reducing dissatisfaction (by providing better information, more efficient tools, etc.), while only a few are designed to produce more satisfaction (by providing recognition, for example). Luckily, the work itself of creating knowledge is meaningful to many, and thus has high intrinsic value.

Reply Edited 23:13, 2 January 2018 6 years ago

Whatamidoing (WMF) (talkcontribs)

Speaking of intrinsic satisfaction: High edit counts are trivial to measure, but they don't necessarily correlate with high satisfaction. Reverting spam or slapping "citation needed" templates on articles is not the most satisfying activity (for most people). But creating significant, satisfying content can often be done in a couple of edits.

Perhaps the list of "success" metrics should include measring the volume of content added (ideally excluding reverts, as reverting page blanking gives spurious ideas of how much content you "added").

Reply 18:54, 3 January 2018 6 years ago

Reply to "How do we measure (and define) New Editors success?"