Community metrics

How is the MediaWiki / Wikimedia tech community doing? Let's analyze the data available in order to highlight the contributors and areas setting an example, and also the bottlenecks or inactive corners requiring our attention.

Your feedback and requests are welcome at the discussion page and at the Analytics mailing list.

Metrics dashboard
Under development since June 2013: Provisional URL. The final location will be at Wikitech Labs in a few days, and then we will announce it officially to the community.
 * http://bitergia.com/projects/mediawiki-dashboard/browser/

Updated daily, this dashboard provides data about our Git repositories, Bugzilla and mailing lists. Gerrit and IRC are planned and coming soon. Below you can find the details about what sources are being scanned.

We are also polishing the data (finding duplicates, assigning contributors to the WMF and other organizations...). If you see any mistake or possibility of improvement please report it.

Reports
We aim to publish reports interpreting the data obtained on a quarterly basis. Below you can find the initial reports published, based on data retrieved manually.
 * /2013-Q1/ - we are starting doing the reports on a quarter basis.
 * /December 2012/
 * /November 2012/
 * /October 2012/

Problems we want to solve
We are lacking objective data to see how is the MediaWiki community doing, best and worse examples, areas requiring our attention, promotion and extra care...

Questions that need an answer in order to plan better community processes, events and outreach activities:


 * What is the productivity of our community? Are we getting more or less contributions? How is the workflow doing in terms of review time, patches accepted / declined / forgotten?
 * What are the areas with more activity? What are the areas more attractive for new comers? Does this match with community priorities?
 * What is the size of our community? Are we expanding, stalled or shrinking in terms of number of contributors?
 * What is the age of our community? Are we getting fresh contributors? Are we keeping the old ones around and active?
 * Where is the community located? Where are the hot spots for organizing tech activities and collaborating with Wikimedia chapters?
 * Is the current meritocratic structure efficient? What is the weight of the WMF vs the rest of contributors? Can newcomers and non-affiliated strive?

Suggest more
What else do we want to know? Let's agree on the answers without being conditioned by existing data or tools. Then we will see what can be reasonably done.


 * Projects activity
 * Most active: continuous contributions, a diversity of contributors, newcomers...
 * Quality: open bugs, response to issues, user satisfaction.
 * Collaboration channels
 * Which channels are being used for technical collaboration.
 * Population: ins, outs, active, idle.
 * Participation: volume, signal, noise.
 * Contributors
 * Who are we? What skills are we contributing? Where are we based? How long have we been around?
 * Most active, productive, committed, responsive.
 * Newcomers: income flux, popular motivations and destinations.
 * Meritocracy: who has extra permissions, responsibilities, reputation.
 * Countries where they work from.
 * [[Image:Attention niels epting.svg|18px]] Can this data be retrieved from the Gerrit web server? Is it ok to do it?
 * This is not logged, and would not be available if it was due to the privacy policy.

See also Analytics/Dreams.

Own infra
Several tools we are using produce data. Sometimes the data is processed and ready to be consumed, many times it's raw data.


 * Gerrit for code contribution and review.
 * cmd-query for Gerrit.
 * bash script from Mark Traceur irc link (number of gerrit committers)
 * See - Gerrit statistics
 * Bugzilla for bug and task handling.
 * See Bugzilla Weekly Report.
 * Mailman for mailing list activity. (see also wikistats viz of power posters)
 * IRC channels for chat activity.
 * MediaWiki instances for doc editing.
 * See monthly statistics of page views and how the data is gathered.
 * Events, online & offline.

3rd parties
MediaWiki technical activity can also be found and measured out there.


 * GitHub (many projects?)
 * python script from Mark Traceur irc link
 * Ohloh (many projects)
 * Missing projects/repositories as of 13:32, 6 November 2012 (UTC):
 * 33 extensions recently migrated from SVN; almost all of them listed in [//www.mediawiki.org/w/index.php?title=Git/Conversion/Extensions_still_in_svn&oldid=593520#Migrate_to_Git_2]
 * apps*, labs*: apps/mobile/WikiLovesMonuments is the only real project, already tracked here.
 * integration*
 * Under integration* most projects are upstream forks, not real genuine code. The Integration team said Puppet was the only project worth tracking by now. If you find something relevant is missing, please specify.--Qgil (talk) 16:32, 9 November 2012 (UTC)
 * I don't know the code here but I'm very surprised that e.g. integration/testswarm should not be relevant. --Nemo 08:05, 10 November 2012 (UTC)
 * all operations* except operations/puppet
 * Most projects are deb packages or upstream forks, not real genuine code. The Integration team said Puppet was the only project worth tracking by now. If you find something relevant is missing, please specify.--Qgil (talk) 16:32, 9 November 2012 (UTC)
 * operations/mediawiki-config is surely relevant; also why not apache-config, dns, dumps, dumps/test?, mediawiki-multiversion, network-diagrams
 * all wikimedia* except wikimedia/fundraising*
 * I asked and the answer was that there is nothing worth tracking. If you find something relevant is missing, please specify.--Qgil (talk) 16:32, 9 November 2012 (UTC)
 * They all seem relevant to me. If for some reason custom plugins and so are not considered relevant, I guess all bot* should as we already have PWB. --Nemo 08:05, 10 November 2012 (UTC)
 * qa, search: nothing worth tracking
 * we are starting to get more and more contributions in qa/browsertests, could we get Ohloh tracking for that? --Zeljko.filipin(WMF) (talk) 10:16, 19 June 2013 (UTC)
 * Twitter and other Microblogging handles, e.g. Wikimedia Tech Staff.
 * Good idea! Done quickly, feel free adding data (Ohloh is like a wiki). Otherwise I'll do it before the workshop.--Qgil (talk) 17:31, 19 June 2013 (UTC)

Tools to analyze and report data
Free software is a requirement.


 * MediaWiki Gerrit stats.
 * How to query Gerrit data.
 * Pentaho community edition - see the Pentaho page at Wikitech.
 * Metrics Grimoire.
 * bugdaystats

Team
Who is working on this.


 * Quim Gil volunteers in pushing this task forward.

Also wondering whether the Analytics team wants / should be involved / aware.