Contributors/Analysis

The Contributors Analysis team is a part of the WMF Contributors team which provides quantitive analysis and research to help guide product development. It's currently just Neil Quinn.

Our responsibilities

 * Doing research to help with the product development process
 * Maintain dashboards for the Contributors team
 * Comparative statistics about the visual editor and wikitext editor
 * Editor engagement dashboard (and other similar URLs—old, needs to be migrated/pruned)
 * Maintain the Contributors Team's high-level metrics
 * Communicate about metrics and other statistics relevant to Contributors
 * Update the Contributors metrics on the Wikimedia Audiences page each month
 * Present at Contributors's quarterly reviews
 * Present as needed at the metrics meeting

Work process
We track our work using the Contributors-Analysis project on Phabricator.

Requesting analysis
We mainly serve our masters in the Contributors Team, but we also try to help out other parts of the movement when we can. Either way, if you'd like to request editing-related data, analysis, or advice, please use the procedure below. Keep in mind our capacity is limited, so we may have to decline your request.
 * 1) Create a Phabricator task for your request using this link, which automatically tags it Contributors-Analysis and adds the headings for step 2.
 * 2) Give some information in the description:
 * 3) * What's requested. If you know what you want, be specific! For example, don't just ask for "data about multilingual Wiktionary editors", ask for "the number of contributors who edited more than one Wiktionary in the past month". If you have a question but don't know how it can be answered, say what you've already tried.
 * 4) * Why it's requested You don't have to write an essay, but give me enough context that I can interpret, adapt, and prioritize your request. For example, "the number of multilingual Wiktionary users will help us decide whether to give a developer a $10,000 grant to write a tool for them."
 * 5) * When it's requested. If you have a deadline, explain what it is and what it's tied to. For example, "the Wiktionary tool developer needs to make summer plans, so we need this information by 15 May." If you don't have any particular deadline, just leave this blank.
 * 6) ** Don't just say "as soon as possible." If we drop everything we're doing and work all night, tomorrow is probably possible; is your request so urgent that we need to do that? :)
 * 7) * Any other helpful information, like relevant documentation.
 * 8) Sit back, wait, and try to think about something other than that tantalizing data. We aim to at least explain whether we can fulfill a request within 2 workdays. If that period has passed, feel free to ping us in a comment on the task or [mailto:nquinn@wikimedia.org by emailing Neil].

Methodologies
We have a variety of methodologies available to us:
 * Quantitive research using our large suite of environmental data
 * Experimental research (e.g. A/B testing of new products)
 * Survey research
 * Large-scale surveys of existing community members
 * Quick surveys of existing editors or readers
 * Surveys of potential community members (e.g. advertising on Facebook, asking questions of our Facebook followers)
 * User research (small-scale but deep qualitative interview research)
 * Reviews of existing peer-reviewed literature
 * SWAGs

Important questions

 * We know mobile users edit a significant amount (producing about 4–5% of non-bot edits as of 2016). What do those edits do? How does their quality compare to desktop edits? How much content do they actually contribute?
 * What caused the dramatic drop in existing active editors from May to November 2013?
 * What caused the big jump in non-(identified-)bot edits from June to July 2016?
 * What caused the big drop in new active editors from June to July 2016?
 * What are the seasonal trends in our major metrics? Does the accepted wisdom of the main trend being the Sep–May academic year hold up?

Metrics

 * Documentation of our metrics
 * meta:Research:Editor month dataset, which we currently use to calculate our metrics
 * meta:Research:Editor model
 * meta:Research:Metrics standardization (outdated but still useful)
 * meta:Statistics (this is meant to list all the different dashboards and metric sources available. It's an unholy mess, but there are some good links in there).

MediaWiki application databases

 * mw:Manual:Database layout

EventLogging

 * wikitech:Analytics/EventLogging
 * Schemas used by Contributors
 * ServerSideAccountCreation
 * Edit
 * MobileWikiAppEdit
 * ContentTranslation
 * ContentTranslationCTA
 * ContentTranslationError
 * Echo
 * more listed at Research:Schema

Hadoop
Currently we don't use Hadoop much because it don't have a lot of the editing data we need, but that may change in the future.
 * Analytics cluster

Other

 * ORES

Dashboards

 * Dashiki, the dashboarding framework developed by Analytics Engineering
 * Visual and wikitext editing comparison
 * Multimedia health
 * Editor engagement dashboards (outdated, needs to be purged and then consolidated)

Research

 * meta:Research:The Rise and Decline
 * meta:Research:VisualEditor's effect on newly registered editors
 * Use of notifications (February 2016)

Infrastructure

 * wikitech:MariaDB
 * wikitech:Analytics/Data access

Other researchers at the WMF

 * meta:Research:FAQ
 * Analytics team
 * Research department
 * Discovery Analysis team
 * Operations manual
 * Their analyst onboarding procedures

Useful programs

 * MediaWiki analysis utilities
 * Multiquery
 * Sequel Pro