Community metrics

Measuring the MediaWiki community
How is the MediaWiki community doing? Let's analyze the data available in order to see the trends in contributions, membership, newcomers...

DISCLAIMER: "MediaWiki" here defines any technical activity (development, testing, sysadmin, documentation...) under the hood of mediawiki.org and any Wikimedia projects.

Tactics
Proposed:


 * 1) Dream and document.
 * 2) Prioritize based on feasibility and urgency.
 * 3) Setup a first report refreshed automatically and grow from there.

Contributors
There is no perfect measure for the MediaWiki community. Just for the sake of having a first prototype we will start considering Gerrit users.


 * All users.
 * Contributors with Gerrit account.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Core developers with merge permissions.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Active in the past week / month / year.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * WMF employees, other MediaWiki professionals, hobbyists.
 * [[Image:Attention niels epting.svg|18px]] We are not processing this data.
 * Countries where they work from.
 * [[Image:Attention niels epting.svg|18px]] Can this data be retrieved from the Gerrit web server? Is it ok to do it?
 * New accounts.
 * How many requests (approved, declined?) per week / month / year.
 * [[Image:Attention niels epting.svg|18px]] Are we processing this data? The approved can be retrieved from Gerrit. Is the declined relevant?
 * Primary motivation: new or existing project - which projects.
 * [[Image:Attention niels epting.svg|18px]] We are not processing this data.
 * WMF employees, other MediaWiki professionals, hobbyists.
 * [[Image:Attention niels epting.svg|18px]] We are not processing this data.
 * Countries where they work from.
 * [[Image:Attention niels epting.svg|18px]] Can this data be retrieved from the Gerrit web server? Is it ok to do it?

Software projects

 * Projects in Gerrit
 * Types of project: MediaWiki core, extensions, mobile, infrastructure...
 * [[Image:Attention niels epting.svg|18px]] We are not processing this data.
 * Active in the past week / month / year.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Officially supported.
 * [[Image:Attention niels epting.svg|18px]] We are not processing this data.
 * Considered stable, beta, experimental.
 * [[Image:Attention niels epting.svg|18px]] We are not processing this data.
 * Data per project:
 * Commits (merged, rejected, waiting) and reviews.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * Committers and reviewers.
 * [[Image:Attention niels epting.svg|18px]] How to extract this from Gerrit?
 * WMF employees, other MediaWiki professionals, hobbyists.
 * [[Image:Attention niels epting.svg|18px]] We are not processing this data.
 * Countries where they work from.
 * [[Image:Attention niels epting.svg|18px]] Can this data be retrieved from the Gerrit web server? Is it ok to do it?

Suggest more
What else do we want to know? Let's agree on the answers without being conditioned by existing data or tools. Then we will see what can be reasonably done.


 * Projects activity
 * Most active: continuous contributions, a diversity of contributors, newcomers...
 * Quality: open bugs, response to issues, user satisfaction.
 * Response time, see Signpost investigation: code review times.
 * Collaboration channels
 * Which channels are being used for technical collaboration.
 * Population: ins, outs, active, idle.
 * Participation: volume, signal, noise.
 * Contributors
 * Who are we? What skills are we contributing? Where are we based? How long have we been around?
 * Most active, productive, committed, responsive.
 * Newcomers: income flux, popular motivations and destinations.
 * Meritocracy: who has extra permissions, responsibilities, reputation.

Own infra
Several tools we are using produce data. Sometimes the data is processed and ready to be consumed, many times it's raw data.


 * Gerrit for code contribution and review.
 * Bugzilla for bug and task handling.
 * Mailman for mailing list activity.
 * IRC channels for chat activity.
 * MediaWiki instances for doc editing.
 * Events, online & offline.

3rd parties
MediaWiki technical activity an also be found and measured out there.


 * GitHub (many projects?)
 * Ohloh (many projects)
 * Twitter - Wikimedia Tech Staff.

Tools to analyze and report data
Free software is a requirement.


 * MediaWiki Gerrit stats.
 * Pentaho community edition - see the Pentaho page at Wikitech.
 * Metrics Grimoire.

Team
Who is working on this.


 * Quim Gil volunteers in pushing this task forward.

Also wondering whether the Analytics team wants / should be involved / aware.