Community metrics

How is the MediaWiki / Wikimedia tech community doing? Let's analyze the data available in order to highlight the contributors and areas setting an example, and also the bottlenecks or inactive corners requiring our attention.

Your feedback and requests are welcome at the discussion page and at the Analytics mailing list.

Reports
We aim to publish reports interpreting the data obtained on a quarterly basis. Below you can find the initial reports published, based on data retrieved manually.
 * /2013-Q1/ - we are starting doing the reports on a quarter basis.
 * /December 2012/
 * /November 2012/
 * /October 2012/

Metrics dashboard
Under development since June 2013: Provisional URL. The final location will be at Wikitech Labs in a few days, and then we will announce it officially to the community.
 * http://bitergia.com/projects/mediawiki-dashboard/browser/

Updated daily, this dashboard provides data about our Git repositories, Bugzilla and mailing lists. Gerrit (see ) and IRC are planned and coming soon. Below you can find the details about what sources are being scanned.

We are also polishing the data (finding duplicates, assigning contributors to the WMF and other organizations...). If you see any mistake or possibility of improvement please report it.

Powered by Open Source projects Metrics Grimoire and Viz Grimoire. See also the development specific to this dashboard in GitHub. Bugs, enhancement requests and patches for these projects must be submitted directly upstream.

Git
ssh -p 29418 gerrit.wikimedia.org gerrit ls-projects | grep "mediawiki/extensions
 * The source code repos analyzed are mediawiki/core and all the mediawiki extensions:
 * FIXME: This is only a portion (a big one, yes) of all the repositories we need to scan. The default is everything at gerrit.wikimedia.org but let's look at every repo before adding it just in case.

gerrit.wikimedia.org

 * FIXME - Soon-ish (see ).

bugzilla.wikimedia.org

 * The products analyzed are MediaWiki and MediaWiki extensions
 * FIXME: This is only a portion (a big one, yes) of all the repositories we need to scan. The default is everything at gerrit.wikimedia.org but let's look at every repo before adding it just in case.
 * FIXME - Time to fix graph goes over the roof! (funny, isn't it?)  :)
 * FIXME - Fine tune By repository.

mediawiki.org

 * FIXME - What is the plan with MediaWiki metrics?

lists.wikimedia.org

 * FIXME - mailing lists missing: mediawiki-l, ee, qa... more?
 * FIXME - Is it possible to specify the number of subscribers?

IRC

 * It will incorporated to the analysis before end of August

Contributors
The process to merge users identities from different data sources has three steps: At the end there is a common upeople (unique people) table for all data sources and all data sources map its people table to this common upeople table. Gerrit (SCR) and IRC are not yet supported.
 * unifypeople.py analyzes people in SCM (Git) trying to join identities from the email and the name
 * its2identities.py does the same process for ITS identities (Bugzilla)
 * mls2identities.py does the same process for MLS identities (mailman)

The user pages for Top contributors are linked in the top tables in the metrics browser. For example for SCM the third global committer has his own personal page.

Once unique people exists, other categories are created using it. For example, companies classification is done initially with a script that uses email domains if available. The classification supports periods of time to cover that a unique people has worked for several companies. There is some experimental support also for countries.
 * FIXME - Contributors must be linked to WMF and other orgs.
 * FIXME - Is By country relevant? Do we want to gather that data?
 * FIXME - Plan for linking this data to user profiles? Where?

Other data sources and tools
Git
 * Wikimedia stats in Ohloh including many projects.
 * "How many unique contributors submitted unique pull requests to a https://github.com/wikimedia/ repo" - Python script by marktraceur.

Gerrit
 * MediaWiki Gerrit stats  (Is it working? 2013-06-28)  and how to query Gerrit data.
 * Number of gerrit committers (marktraceur's bash script)
 * cmd-query for Gerrit.

Bugzilla
 * Bugzilla Weekly Report.

mediawiki.org
 * monthly Statistics of page views and how the data is gathered.

Mailman
 * Wikimedia Mail Stats: PowerPosters.

Problems we want to solve
We are lacking objective data to see how is the MediaWiki community doing, best and worse examples, areas requiring our attention, promotion and extra care...

Questions that need an answer in order to plan better community processes, events and outreach activities:


 * What is the productivity of our community? Are we getting more or less contributions? How is the workflow doing in terms of review time, patches accepted / declined / forgotten?
 * What are the areas with more activity? What are the areas more attractive for new comers? Does this match with community priorities?
 * What is the size of our community? Are we expanding, stalled or shrinking in terms of number of contributors?
 * What is the age of our community? Are we getting fresh contributors? Are we keeping the old ones around and active?
 * Where is the community located? Where are the hot spots for organizing tech activities and collaborating with Wikimedia chapters?
 * Is the current meritocratic structure efficient? What is the weight of the WMF vs the rest of contributors? Can newcomers and non-affiliated strive?

Team
Quim Gil from the Wikimedia Engineering Community team is coordinating the Metrics Dasboard project, which is being implemented by Bitergia as contractors.

The Bitergia team working in the MediaWiki dashboard is formed by Daniel Izquierdo, Luis Cañas and Jesus Gonzalez Barahona and Alvaro del Castillo as project manager.

The ownership of this project will transition to the Wikimedia Analytics team during 2013-14.