Community metrics

How is the MediaWiki / Wikimedia tech community doing? Let's analyze the data available in order to highlight the contributors and areas setting an example, and also the bottlenecks or inactive corners requiring our attention.

Your feedback and requests are welcome in Phabricator (project Analytics-Tech-community-metrics). You can also comment at the discussion page and at the Analytics mailing list.

Median age of open changesets
"Time from last patchset" in days at "Age of open changesets (monthly snapshots)".

Open changesets waiting for review
"Waiting for review" at "Backlog of open changesets (monthly snapshots)"

New changesets submitted per month
The "submitted" line in "submitted vs. Merged changes vs. Abandoned".

Active Gerrit code review users per month
Uploaders, Reviewers, Committers

{ "legends": [{"stroke": "color","title": "type","fill": "color"}], "scales": [ {     "type": "time", "name": "x", "domain": {"data": "chart","field": "x"}, "zero": false, "range": "width", "nice": true },   {      "clamp": true, "type": "linear", "name": "y", "domain": {"data": "chart","field": "y"}, "zero": true, "range": "height", "nice": true },   {      "domain": {"data": "chart","field": "series"}, "type": "ordinal", "name": "color", "range": "category10" } ],  "version": 2, "marks": [ {     "type": "group", "marks": [ {         "properties": { "enter": { "y": {"scale": "y","field": "y"}, "x": {"scale": "x","field": "x"}, "stroke": {"scale": "color","field": "series"}, "strokeWidth": {"value": 2.5} },           "update": {"stroke": {"scale": "color","field": "series"}}, "hover": {"stroke": {"value": "red"}} },         "type": "line" }     ],      "from": { "data": "chart", "transform": [{"groupby": ["series"],"type": "facet"}] }   }  ],  "height": 350, "axes": [{"scale": "x","type": "x"},{"scale": "y","type": "y"}], "data": [ {     "name": "chart", "format": {"type": "json","parse": {"x": "date"}}, "values": [ {"y": 205,"series": "uploaders","x": "Sep 2014"}, {"y": 202,"series": "uploaders","x": "Oct 2014"}, {"y": 188,"series": "uploaders","x": "Nov 2014"}, {"y": 209,"series": "uploaders","x": "Dec 2014"}, {"y": 207,"series": "uploaders","x": "Jan 2015"}, {"y": 210,"series": "uploaders","x": "Feb 2015"}, {"y": 218,"series": "uploaders","x": "Mar 2015"}, {"y": 202,"series": "uploaders","x": "Apr 2015"}, {"y": 207,"series": "uploaders","x": "May 2015"}, {"y": 209,"series": "uploaders","x": "Jun 2015"}, {"y": 194,"series": "uploaders","x": "Jul 2015"}, {"y": 193,"series": "uploaders","x": "Aug 2015"}, {"y": 213,"series": "uploaders","x": "Sep 2015"}, {"y": 213,"series": "uploaders","x": "Oct 2015"}, {"y": 201,"series": "uploaders","x": "Nov 2015"}, {"y": 213,"series": "uploaders","x": "Dec 2015"}, {"y": 228,"series": "uploaders","x": "Jan 2016"}, {"y": 207,"series": "uploaders","x": "Feb 2016"}, {"y": 228,"series": "uploaders","x": "Mar 2016"}, {"y": 211,"series": "uploaders","x": "Apr 2016"}, {"y": 202,"series": "uploaders","x": "May 2016"}, {"y": 204,"series": "uploaders","x": "Jun 2016"}, {"y": 184,"series": "uploaders","x": "Jul 2016"}, {"y": 196,"series": "uploaders","x": "Aug 2016"}, {"y": 215,"series": "uploaders","x": "Sep 2016"}, {"y": 206,"series": "uploaders","x": "Oct 2016"}, {"y": 210,"series": "uploaders","x": "Nov 2016"}, {"y": 226,"series": "uploaders","x": "Dec 2016"}, {"y": 181,"series": "reviewers","x": "Sep 2014"}, {"y": 176,"series": "reviewers","x": "Oct 2014"}, {"y": 170,"series": "reviewers","x": "Nov 2014"}, {"y": 170,"series": "reviewers","x": "Dec 2014"}, {"y": 187,"series": "reviewers","x": "Jan 2015"}, {"y": 190,"series": "reviewers","x": "Feb 2015"}, {"y": 184,"series": "reviewers","x": "Mar 2015"}, {"y": 183,"series": "reviewers","x": "Apr 2015"}, {"y": 178,"series": "reviewers","x": "May 2015"}, {"y": 188,"series": "reviewers","x": "Jun 2015"}, {"y": 200,"series": "reviewers","x": "Jul 2015"}, {"y": 189,"series": "reviewers","x": "Aug 2015"}, {"y": 195,"series": "reviewers","x": "Sep 2015"}, {"y": 192,"series": "reviewers","x": "Oct 2015"}, {"y": 180,"series": "reviewers","x": "Nov 2015"}, {"y": 181,"series": "reviewers","x": "Dec 2015"}, {"y": 198,"series": "reviewers","x": "Jan 2016"}, {"y": 180,"series": "reviewers","x": "Feb 2016"}, {"y": 181,"series": "reviewers","x": "Mar 2016"}, {"y": 182,"series": "reviewers","x": "Apr 2016"}, {"y": 179,"series": "reviewers","x": "May 2016"}, {"y": 165,"series": "reviewers","x": "Jun 2016"}, {"y": 163,"series": "reviewers","x": "Jul 2016"}, {"y": 169,"series": "reviewers","x": "Aug 2016"}, {"y": 181,"series": "reviewers","x": "Sep 2016"}, {"y": 169,"series": "reviewers","x": "Oct 2016"}, {"y": 186,"series": "reviewers","x": "Nov 2016"}, {"y": 181,"series": "reviewers","x": "Dec 2016"}, {"y": 113,"series": "committers","x": "Sep 2014"}, {"y": 123,"series": "committers","x": "Oct 2014"}, {"y": 119,"series": "committers","x": "Nov 2014"}, {"y": 112,"series": "committers","x": "Dec 2014"}, {"y": 117,"series": "committers","x": "Jan 2015"}, {"y": 125,"series": "committers","x": "Feb 2015"}, {"y": 121,"series": "committers","x": "Mar 2015"}, {"y": 117,"series": "committers","x": "Apr 2015"}, {"y": 119,"series": "committers","x": "May 2015"}, {"y": 132,"series": "committers","x": "Jun 2015"}, {"y": 132,"series": "committers","x": "Jul 2015"}, {"y": 126,"series": "committers","x": "Aug 2015"}, {"y": 130,"series": "committers","x": "Sep 2015"}, {"y": 131,"series": "committers","x": "Oct 2015"}, {"y": 125,"series": "committers","x": "Nov 2015"}, {"y": 119,"series": "committers","x": "Dec 2015"}, {"y": 125,"series": "committers","x": "Jan 2016"}, {"y": 126,"series": "committers","x": "Feb 2016"}, {"y": 128,"series": "committers","x": "Mar 2016"}, {"y": 127,"series": "committers","x": "Apr 2016"}, {"y": 125,"series": "committers","x": "May 2016"}, {"y": 121,"series": "committers","x": "Jun 2016"}, {"y": 114,"series": "committers","x": "Jul 2016"}, {"y": 129,"series": "committers","x": "Aug 2016"}, {"y": 132,"series": "committers","x": "Sep 2016"}, {"y": 126,"series": "committers","x": "Oct 2016"}, {"y": 134,"series": "committers","x": "Nov 2016"}, {"y": 130,"series": "committers","x": "Dec 2016"} ]   }  ],  "width": 800 }

MediaWiki Core code review
Number of MediaWiki Core changesets waiting for review (CR0 or CR+1):

Age in days of open MediaWiki Core patchsets:

Active users in Phabricator
Monthly active users in Bugzilla (from 2013-02 to 2014-10) and in Phabricator (from 2014-12 to last month).

New accounts in Phabricator
New Phabricator accounts created every month. About 4003 users have registered to Wikimedia Phabricator between its creation on September 2014 and October 2015.

korma.wmflabs.org
korma.wmflabs.org was our tool until January 2017.

How to update user data
Our goal is to provide a tool allowing users to edit their own data directly (T60585). Meanwhile, users can request updates to their personal data creating a Phabricator task including:
 * real name
 * username(s) and email address(es) used for your contributions
 * current and previous affiliations, with the dates of change of affiliation
 * Current location (country)

At the moment we can only process single affiliations (T95238). If you are contributing from different affiliations (i.e. Wikimedia Foundation as part of your work, Independent in your free time), then we recommend you to use different usernames and email addresses.

Managing identities
SortingHat is the tool to manage identities. This helps in the following way:
 * To centralize all information in a database.
 * To deal with several identities: a developer may have several identities depending on the data source she is working on. This tool helps to identify for each identity of a developer where that information came from.
 * To avoid the use of direct database: a command line interface deals with ITS.
 * To manage extra developer attributes: it has support for managing affiliations and other developer attributes such as nationalities or bot activity.
 * To manage black lists: this is typically used in cases where bots are committing changes, or too generic names or emails addresses such as "root".

The process to merge all of the identities into one database could be done in two ways: a more detailed one, or an incremental one. The first process is done through the use of extra scripts to parse such information. However this is a heavy-time process and this is typically used in the first identities database creation. Later updates of the database typically follows the second step.

SortingHat also provides a way to export all of this data. This helps to look for other developer identities and merge them through the command line.

These exported JSON files follows the same structure:

As an example, if an identity is required to be merged with another identity, the command "sortinghat merge" is used and the original ".identities.id" is merged into the specified "".

The most useful SortingHat commands to deal with identities are the following ones:
 * sortinghat merge: to merge unique identities
 * sortinghat affiliate: to affiliate an identity to some organization
 * sortinghat show: to show information about an identity
 * sortinghat profile: to show profile information of that unique identity

User pages are linked from the contributor names in the top tables of each data source section.

In addition to several identities, there is extra information per contributor: Extra information about the Sorting Hat usage is available at its README page.
 * If the contributor is a bot
 * The country of the contributor
 * Canonical uuid (hash) to identify such contributor
 * Canonical name and email to identify such contributor

Bots
Sorting Hat also keeps information about which identities correspond to bots. For that, it uses the "is_bot" field in the "profiles" table. If the field is 1, the identity is considered as a bot. Currently, except for changing the database there is no other way of tagging an identity as a bot.

wikimedia.biterg.io
This section is work in progress.

wikimedia.biterg.io is the Wikimedia Tech community metrics dashboard. It was preceded by korma.wmflabs.org until 2016. Data sources include Git and Gerrit repositories, Phabricator's Maniphest (though only basic support as per February 2017), mediawiki.org, some mailing lists, and some IRC channels (Phabricator's Differential will be supported in the future). Its data is refreshed regularly.

Bugs and feature requests about wikimedia.biterg.io can be reported in Wikimedia Phabricator's Analytics-Tech-community-metrics project.

User interface


The top bar lists Dashboards (also called Panels). By default the  is chosen. Each dashboard offers numerous widgets, and a result list at the bottom of the page (commits in Git, emails in mailing lists, etc.).

The interactive Widgets at the bottom display the actual data. Some panels support clicking displayed items to get more specific information about those items and some panels also allow downloading and exporting the displayed data as CSV or JSON.

You can share URLs of dashboards with applied filters by selecting the Share icon to the right of the Advanced filter field.

Applying filters
In the right corner of the top bar, the Time filter allows adjusting the time span of all the data being displayed in the widgets.

Some widgets allow creating Filters: The mouse pointer turns into a plus symbol when hovering over a listed panel item and clicking the item will apply an additional filter for that item. When creating a filter in Kibana, the filter is displayed in green below the Advanced filter text field and is applied to the view. In the screenshot above, 'Bots' and 'Empty commits' are excluded from the data displayed in the panels. When hovering over a filter, you can enable/disable, pin/unpin (the filter will still be applied when you open that page again), invert (e.g. to get all companies listed except for one), remove or edit (e.g. to change the organization name) the filter. The "Actions" menu to the right of the filters offers the same actions to apply them to all filters at once. For more information, see Discover Filters.

The Advanced filter text field allows searching for text in any items (commit messages, user names, repository names, etc.). It allows querying a subset of results provided by the time filter and filters already applied. By default, any free text items in any database columns are included (entering this also resets a search). The query syntax is based on the Lucene query syntax. Also see Kibana Queries and Filters for more information.

Using the advanced filter, you can also prefix searches by names of database columns to match a phrase (like  for numbers and   for strings

TODO: In the future, list some query examples for advanced filters here.

Configuration
Files in https://github.com/Bitergia/mediawiki-repositories define which mailing lists and IRC channels get indexed. It also defines which Git and Gerrit repositories get blacklisted/ignored (for example when they are only imported from upstream sources) in order to not have activity shown that did not happen in our community. Note: As per February 2017 blacklisting is not yet working properly.

It is based on Kibana dashboards and Eleasticsearch. The database provides indexes whose fields are used in panels, widgets and for searches.

Architecture and source code
Details on the underlying software architecture can be found on grimoirelab.github.io.

Source code is available on https://github.com/grimoirelab. Most code is written in Python. The existing repositories are:
 * : Numerous JSON files. Contains all of the panels currently available for current architecture.
 * : Data retrieval platform which creates JSON files.  contains the available backends. Data is stored in Elasticsearch.
 * : Commander tool to run perceval and set up the panels.
 * : A fork of Kibana which contains changes until they get merged in the upstream code base.
 * : An incubator for new ideas.

Further links

 * Extensive user documentation by the Xen Project
 * Kibana User Guide (upstream documentation)
 * Kibana User Guide: Dashboards and Panels (upstream documentation)
 * Example dashboards of other organizations: Eclipse, Opnfv, CoreOS

If you would like to see specific customizations, please file a request in Wikimedia Phabricator including a user story.

Other data sources and tools
Git
 * Wikimedia stats in OpenHub/Ohloh including many projects.
 * "How many unique contributors submitted unique pull requests to a https://github.com/wikimedia/ repo" - Python script by marktraceur.

Gerrit
 * Gerrit/Navigation
 * MediaWiki Gerrit stats  (Is it working? 2013-06-28)  and how to query Gerrit data.
 * Number of gerrit committers (marktraceur's bash script)
 * cmd-query for Gerrit.

Phabricator
 * "Phabricator monthly statistics" emails on the wikitech-l mailing list - see its archives.

mediawiki.org
 * monthly Statistics of page views and how the data is gathered.

Mailman
 * Wikimedia Mail Stats: PowerPosters.

Team
Quim Gil and Andre Klapper from the Wikimedia Engineering Community team are coordinating the Metrics Dashboard project, which is being implemented by Bitergia as contractors.

The Bitergia team working in the MediaWiki dashboard is formed by Daniel Izquierdo, Luis Cañas and Jesus Gonzalez Barahona and Alvaro del Castillo as project manager.

The ownership of this project might get transfered to the Wikimedia Analytics team at some point.