Community metrics

How is the MediaWiki community doing? Let's analyze the data available in order to see the trends in contributions, membership, newcomers...


 * /October 2012

DISCLAIMER: "MediaWiki" here defines any technical activity (development, testing, sysadmin, documentation...) under the hood of mediawiki.org and any Wikimedia projects.

Problems we want to solve
We are lacking objective data to see how is the MediaWiki community doing, best and worse examples, areas trequiring our atention, promotion and extra care...

Questions that need an answer in order to plan better community processes, events and outreach activities:


 * What is the productivity of our community? Are we getting more or less contributions? How is the workflow doing in terms of review time, patches accepted / declined / forgotten?
 * What are the areas with more activity? What are the areas more attractive for new comers? Does this match with community priorities?
 * What is the size of our community? Are we expanding, stalled or shrinking in terms of number of contributors?
 * What is the age of our community? Are we getting fresh contributors? Are we keeping the old ones around and active?
 * Where is the community located? Where are the hot spots for organizing tech activities and collaborating with Wikimedia chapters?
 * Is the current meritocratic structure efficient? What is the weight of the WMF vs the rest of contributors? Can newcomers and non-affiliated strive?

Tactics
Proposed:


 * 1) Dream and document.
 * 2) Prioritize based on feasibility and urgency.
 * 3) Setup a first report refreshed automatically and grow from there.

Contributors
There is no perfect measure for the MediaWiki community. Just for the sake of having a first prototype we will start considering Gerrit users.


 * All users.
 * Contributors with Gerrit account.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Core developers with merge permissions.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Active in the past week / month / year.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Using the script sent to wikitech-l a few months ago!
 * WMF employees, other MediaWiki professionals, hobbyists.
 * [[Image:Attention niels epting.svg|18px]] We seem to be processing this data.
 * New accounts.
 * How many requests (approved, declined?) per week / month / year.
 * [[Image:Attention niels epting.svg|18px]] Are we processing this data? The approved can be retrieved from Gerrit. Is the declined relevant?
 * Primary motivation: new or existing project - which projects.
 * [[Image:Attention niels epting.svg|18px]] We are not processing this data.
 * WMF employees, other MediaWiki professionals, hobbyists.
 * [[Image:Attention niels epting.svg|18px]] We seem to be processing this data.

Software projects

 * Projects in Gerrit
 * Types of project: MediaWiki core, extensions, mobile, infrastructure...
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Active in the past week / month / year.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Officially supported.
 * [[Image:Attention niels epting.svg|18px]] We are not processing this data.
 * Considered stable, beta, experimental.
 * [[Image:Attention niels epting.svg|18px]] We are not processing this data.
 * Data per project:
 * Patches (merged, rejected, waiting) and reviews.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Response time for patches submitted, see Signpost investigation: code review times.
 * Especially interesting to check review wait times for extensions (where many newcomers start) compared with core features maintained by WMF employees).
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Committers and reviewers.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * WMF employees, other MediaWiki professionals, hobbyists.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?

Suggest more
What else do we want to know? Let's agree on the answers without being conditioned by existing data or tools. Then we will see what can be reasonably done.


 * Projects activity
 * Most active: continuous contributions, a diversity of contributors, newcomers...
 * Quality: open bugs, response to issues, user satisfaction.
 * Collaboration channels
 * Which channels are being used for technical collaboration.
 * Population: ins, outs, active, idle.
 * Participation: volume, signal, noise.
 * Contributors
 * Who are we? What skills are we contributing? Where are we based? How long have we been around?
 * Most active, productive, committed, responsive.
 * Newcomers: income flux, popular motivations and destinations.
 * Meritocracy: who has extra permissions, responsibilities, reputation.
 * Countries where they work from.
 * [[Image:Attention niels epting.svg|18px]] Can this data be retrieved from the Gerrit web server? Is it ok to do it?
 * This is not logged, and would not be available if it was due to the privacy policy.

Own infra
Several tools we are using produce data. Sometimes the data is processed and ready to be consumed, many times it's raw data.


 * Gerrit for code contribution and review.
 * cmd-query for Gerrit.
 * bash script from Mark Traceur irc link (number of gerrit committers)
 * Bugzilla for bug and task handling.
 * Mailman for mailing list activity.
 * IRC channels for chat activity.
 * MediaWiki instances for doc editing.
 * Events, online & offline.

3rd parties
MediaWiki technical activity an also be found and measured out there.


 * GitHub (many projects?)
 * python script from Mark Traceur irc link
 * Ohloh (many projects)
 * Missing projects/repositories as of 13:32, 6 November 2012 (UTC):
 * 33 extensions recently migrated from SVN,
 * apps*, integration*, labs*
 * all operations* except operations/puppet
 * all wikimedia* except wikimedia/fundraising*
 * less important (?) ones like qa, search
 * Twitter and other Microblogging handles, e.g. Wikimedia Tech Staff.

Tools to analyze and report data
Free software is a requirement.


 * MediaWiki Gerrit stats.
 * How to query Gerrit data.
 * Pentaho community edition - see the Pentaho page at Wikitech.
 * Metrics Grimoire.

Team
Who is working on this.


 * Quim Gil volunteers in pushing this task forward.

Also wondering whether the Analytics team wants / should be involved / aware.