Community metrics

How is the MediaWiki / Wikimedia tech community doing? Let's analyze the data available in order to highlight the contributors and areas setting an example, and also the bottlenecks or inactive corners requiring our attention.

Your feedback and requests are welcome in Phabricator (project Analytics-Tech-community-metrics). You can also comment at the discussion page and at the Analytics mailing list.

Median age of open changesets
"Time from last patchset" in days

Note: Updated data for 2017 is blocked on T151557.

Open changesets waiting for review
"Waiting for review"

Note: Updated data for 2017 is blocked on T151555.

New changesets submitted per month

 * For each month, see the corresponding bar in the "Changesets Per Status" widget on https://wikimedia.biterg.io:443/goto/0b8a63737b0ae35e0415ac97462a3189
 * For the complete time frame, see the "# Reviews" value in the "Gerrit" widget on https://wikimedia.biterg.io:443/goto/0b8a63737b0ae35e0415ac97462a3189

Active Gerrit code review users per month
''Note: Updated data for 2017 on code reviewers per month is blocked on T151559. Updated data for 2017 on code committers per month is blocked on T151558.''


 * For uploaders for each month, see the corresponding bar in the "Changesets Submitters" widget on https://wikimedia.biterg.io:443/goto/0b8a63737b0ae35e0415ac97462a3189
 * For uploaders for the complete time frame, see the "# Submitters" value in the "Gerrit" widget on https://wikimedia.biterg.io:443/goto/0b8a63737b0ae35e0415ac97462a3189

{ "legends": [{"stroke": "color","title": "type","fill": "color"}], "scales": [ {     "type": "time", "name": "x", "domain": {"data": "chart","field": "x"}, "zero": false, "range": "width", "nice": true },   {      "clamp": true, "type": "linear", "name": "y", "domain": {"data": "chart","field": "y"}, "zero": true, "range": "height", "nice": true },   {      "domain": {"data": "chart","field": "series"}, "type": "ordinal", "name": "color", "range": "category10" } ],  "version": 2, "marks": [ {     "type": "group", "marks": [ {         "properties": { "enter": { "y": {"scale": "y","field": "y"}, "x": {"scale": "x","field": "x"}, "stroke": {"scale": "color","field": "series"}, "strokeWidth": {"value": 2.5} },           "update": {"stroke": {"scale": "color","field": "series"}}, "hover": {"stroke": {"value": "red"}} },         "type": "line" }     ],      "from": { "data": "chart", "transform": [{"groupby": ["series"],"type": "facet"}] }   }  ],  "height": 350, "axes": [{"scale": "x","type": "x"},{"scale": "y","type": "y"}], "data": [ {     "name": "chart", "format": {"type": "json","parse": {"x": "date"}}, "values": [ {"y": 205,"series": "uploaders","x": "Sep 2014"}, {"y": 202,"series": "uploaders","x": "Oct 2014"}, {"y": 188,"series": "uploaders","x": "Nov 2014"}, {"y": 209,"series": "uploaders","x": "Dec 2014"}, {"y": 207,"series": "uploaders","x": "Jan 2015"}, {"y": 210,"series": "uploaders","x": "Feb 2015"}, {"y": 218,"series": "uploaders","x": "Mar 2015"}, {"y": 202,"series": "uploaders","x": "Apr 2015"}, {"y": 207,"series": "uploaders","x": "May 2015"}, {"y": 209,"series": "uploaders","x": "Jun 2015"}, {"y": 194,"series": "uploaders","x": "Jul 2015"}, {"y": 193,"series": "uploaders","x": "Aug 2015"}, {"y": 213,"series": "uploaders","x": "Sep 2015"}, {"y": 213,"series": "uploaders","x": "Oct 2015"}, {"y": 201,"series": "uploaders","x": "Nov 2015"}, {"y": 213,"series": "uploaders","x": "Dec 2015"}, {"y": 228,"series": "uploaders","x": "Jan 2016"}, {"y": 207,"series": "uploaders","x": "Feb 2016"}, {"y": 228,"series": "uploaders","x": "Mar 2016"}, {"y": 211,"series": "uploaders","x": "Apr 2016"}, {"y": 202,"series": "uploaders","x": "May 2016"}, {"y": 204,"series": "uploaders","x": "Jun 2016"}, {"y": 184,"series": "uploaders","x": "Jul 2016"}, {"y": 196,"series": "uploaders","x": "Aug 2016"}, {"y": 215,"series": "uploaders","x": "Sep 2016"}, {"y": 206,"series": "uploaders","x": "Oct 2016"}, {"y": 210,"series": "uploaders","x": "Nov 2016"}, {"y": 226,"series": "uploaders","x": "Dec 2016"}, {"y": 181,"series": "reviewers","x": "Sep 2014"}, {"y": 176,"series": "reviewers","x": "Oct 2014"}, {"y": 170,"series": "reviewers","x": "Nov 2014"}, {"y": 170,"series": "reviewers","x": "Dec 2014"}, {"y": 187,"series": "reviewers","x": "Jan 2015"}, {"y": 190,"series": "reviewers","x": "Feb 2015"}, {"y": 184,"series": "reviewers","x": "Mar 2015"}, {"y": 183,"series": "reviewers","x": "Apr 2015"}, {"y": 178,"series": "reviewers","x": "May 2015"}, {"y": 188,"series": "reviewers","x": "Jun 2015"}, {"y": 200,"series": "reviewers","x": "Jul 2015"}, {"y": 189,"series": "reviewers","x": "Aug 2015"}, {"y": 195,"series": "reviewers","x": "Sep 2015"}, {"y": 192,"series": "reviewers","x": "Oct 2015"}, {"y": 180,"series": "reviewers","x": "Nov 2015"}, {"y": 181,"series": "reviewers","x": "Dec 2015"}, {"y": 198,"series": "reviewers","x": "Jan 2016"}, {"y": 180,"series": "reviewers","x": "Feb 2016"}, {"y": 181,"series": "reviewers","x": "Mar 2016"}, {"y": 182,"series": "reviewers","x": "Apr 2016"}, {"y": 179,"series": "reviewers","x": "May 2016"}, {"y": 165,"series": "reviewers","x": "Jun 2016"}, {"y": 163,"series": "reviewers","x": "Jul 2016"}, {"y": 169,"series": "reviewers","x": "Aug 2016"}, {"y": 181,"series": "reviewers","x": "Sep 2016"}, {"y": 169,"series": "reviewers","x": "Oct 2016"}, {"y": 186,"series": "reviewers","x": "Nov 2016"}, {"y": 181,"series": "reviewers","x": "Dec 2016"}, {"y": 113,"series": "committers","x": "Sep 2014"}, {"y": 123,"series": "committers","x": "Oct 2014"}, {"y": 119,"series": "committers","x": "Nov 2014"}, {"y": 112,"series": "committers","x": "Dec 2014"}, {"y": 117,"series": "committers","x": "Jan 2015"}, {"y": 125,"series": "committers","x": "Feb 2015"}, {"y": 121,"series": "committers","x": "Mar 2015"}, {"y": 117,"series": "committers","x": "Apr 2015"}, {"y": 119,"series": "committers","x": "May 2015"}, {"y": 132,"series": "committers","x": "Jun 2015"}, {"y": 132,"series": "committers","x": "Jul 2015"}, {"y": 126,"series": "committers","x": "Aug 2015"}, {"y": 130,"series": "committers","x": "Sep 2015"}, {"y": 131,"series": "committers","x": "Oct 2015"}, {"y": 125,"series": "committers","x": "Nov 2015"}, {"y": 119,"series": "committers","x": "Dec 2015"}, {"y": 125,"series": "committers","x": "Jan 2016"}, {"y": 126,"series": "committers","x": "Feb 2016"}, {"y": 128,"series": "committers","x": "Mar 2016"}, {"y": 127,"series": "committers","x": "Apr 2016"}, {"y": 125,"series": "committers","x": "May 2016"}, {"y": 121,"series": "committers","x": "Jun 2016"}, {"y": 114,"series": "committers","x": "Jul 2016"}, {"y": 129,"series": "committers","x": "Aug 2016"}, {"y": 132,"series": "committers","x": "Sep 2016"}, {"y": 126,"series": "committers","x": "Oct 2016"}, {"y": 134,"series": "committers","x": "Nov 2016"}, {"y": 130,"series": "committers","x": "Dec 2016"} ]   }  ],  "width": 800 }

MediaWiki Core code review
Number of MediaWiki Core changesets waiting for review in Gerrit (CR0 or CR+1):

Note: Updated data for 2017 is blocked on T151555.

Age in days of open MediaWiki Core patchsets:

Note: Updated data for 2017 is blocked on T151557.

Active users in Phabricator
Monthly active users in Bugzilla (from 2013-02 to 2014-10) and in Phabricator (from 2014-12 to last month). Data source: "Phabricator monthly statistics" email on wikitech-l

New accounts in Phabricator
New Phabricator accounts created every month. Data source: "Phabricator monthly statistics" email on wikitech-l

wikimedia.biterg.io
wikimedia.biterg.io is the Wikimedia Tech community metrics dashboard and work in progress. It was preceded by korma.wmflabs.org until 2016. Data sources include Git and Gerrit repositories, Phabricator's Maniphest (though only basic support), mediawiki.org, some mailing lists, and some IRC channels (Phabricator's Differential will be supported in the future). Its data is refreshed regularly.

Bugs and feature requests about wikimedia.biterg.io can be reported in Wikimedia Phabricator's Analytics-Tech-community-metrics project.

wikimedia.biterg.io offers:
 * Drill down: clicking an element and a filtered view will be applied
 * Time frame selection
 * Sharing URLs for certain views or embedding views
 * Exporting data
 * API access via the Elasticsearch API
 * Wikimedia administrators to create widget and panels themselves
 * an advanced filter search box

User interface


The side bar lists Dashboards (also called Panels). By default the  is chosen. Each dashboard offers numerous widgets, and a result list at the bottom of the page (commits in Git, emails in mailing lists, etc.).

The interactive Widgets at the bottom display the actual data. Some panels support clicking displayed items to get more specific information about those items and some panels also allow downloading and exporting the displayed data as CSV or JSON.

You can share URLs of dashboards with applied filters via "Share" in the top bar. It offers creating Short URLs.

Applying filters
In the right corner of the top bar, the Time filter allows adjusting the time span of all the data being displayed in the widgets.

Some widgets allow creating Filters: The mouse pointer turns into a plus symbol when hovering over a listed panel item and clicking the item will apply an additional filter for that item. When creating a filter in Kibana, the filter is displayed in grey below the Advanced filter text field and is applied to the view. In the screenshot above, only changesets with 'status: Merged' and by independent authors are shown in the panels. When hovering over a filter, you can enable/disable, pin/unpin (the filter will still be applied when you open that page again), invert (e.g. to get all companies listed except for one), remove or edit (e.g. to change the organization name) the filter. The "Actions" menu to the right of the filters offers the same actions to apply them to all filters at once. For more information, see Discover Filters.

The Advanced filter text field allows searching for text in any items (commit messages, user names, repository names, etc.). It allows querying a subset of results provided by the time filter and filters already applied. By default, any free text items in any database columns are included (entering this also resets a search). You basically enter a field name and its value, such as. The query syntax is based on the Lucene query syntax. Also see Kibana Queries and Filters for more information.

As of August 2017 the list of available fields (database columns) is only available to administrators and no auto-complete suggestions are offered (this problem could be worked around by creating a panel that lists the names of the available columns, via "Discover"). To perform advanced search queries, you need to know the names of the available fields.

Some more notes on advanced filters:
 * The type of field (string, number, date, etc.) influences the query syntax
 * Queries are case sensitive
 * You can only create queries which use fields within the respective index (simplified, "indices" in ElasticSearch are kind of databases) that is used in a panel, otherwise the search will return "No results found".
 * Fields not available in an index by default use  for numbers and   for strings

Configuration
Files in https://github.com/Bitergia/mediawiki-repositories define which mailing lists and IRC channels get indexed. Until February 2017 they also defined which Git and Gerrit repositories were blacklisted/ignored (for example when they are only imported from upstream sources) in order to not have activity shown that did not happen in our community. For the time being, you can apply explicit filters to exclude such repositories.

It is based on Kibana dashboards and Eleasticsearch. The database provides indexes whose fields are used in panels, widgets and for searches.

Architecture and source code
Details on the underlying software architecture can be found on grimoirelab.github.io. A comprehensive GrimoireLab Training Tutorial is available.

Source code is available. Most code is written in Python. The existing repositories are:
 * : Data retrieval platform which creates JSON files.  contains the available backends. Data is stored in Elasticsearch.
 * : Commander tool to run perceval and set up the panels.
 * : Visualization on top of ElasticSearch. A fork of Kibana which contains changes until they get merged in the upstream code base.
 * : Numerous JSON files. Contains all of the panels currently available for the current architecture.
 * : An incubator for new ideas.
 * : Command line interface to manage the data in our database. For admins, a complete database dump is available as a JSON file which allows manual account merging, updating affiliations, adding country information or marking an account as a bot. See its upstream documentation for more details.

The steps performed are basically: Sources → Data gathering (mining via Perceval) → Data enrichment (e.g. producing indexes in ElasticSearch via GrimoireELK) → Visualization (ElasticSearch and Kibana).

For administrators
The Edit mode for administrators provides additional functionality in the top bar and the side bar. You can analyze specific data, create and edit widgets, visualizations and dashboards (also custom elements).

Discover allows you to analyze specific data.


 * Choose a database from the dropdown in the left panel. Then expand the time span.
 * Results are displayed as a list of dropdown data items. Opening a dropdown displays all fields and their values as JSON or a table. A Kibana/ES visualization based on the JSON data is displayed on top.
 * Specific fields can be added as columns to the displayed results by adding/removing those fields in the left panel. It is basically a huge matrix, and if we wanted more data, more fields could be added in the future (e.g. "Gender").

Visualize allows creating a new visualization/widget (available types are e.g. data table, line chart, pie chart) or opening an existing saved visualization. Admins could rearrange and save. If you alter a saved visualization and want to keep the previous one, save the new one under a new name and then insert it into the dashboard.


 * When opening an existing saved visualization, the right panel shows the visualization view. The left panel shows the definitions: There are y-axis metrics (for each group; what am I going to solve) and x-axis buckets (grouping things).
 * Metrics have an Aggregation (e.g. medium, sum, unique count, percentiles) on a certain Field and a CustomLabel to display.
 * Buckets have the same parameters and an Interval (e.g. to display yearly instead of weekly bars).


 * To write a new visualization from scratch, choose Create a new visualization and select for example "pie chart". Choose From a new search (which requires to know the name of the index in Kibana) and select for example the "git" index. An empty pie chart will be shown as nothing is defined yet (no buckets, hence it is the total number of everything).
 * Under buckets, choose Select buckets type and choose for example Split Slices. Set Aggregation to for example Terms (means: look for a specific field in every commit). Set Field to a value, for example "author_org_name" (means: by organization names). Set Order for example to Descending and Size to 10 to display the ten biggest companies in the pie chart. To display these changes, click the green Apply changes bottom at the top on the left.


 * Advanced: You can also Add sub-buckets at the bottom. For example, if you visualize bars and want to split each displayed bar to display several companies, go for Split bars. The order of buckets can be important when having sub-buckets, for example if you split bars before the x-axis in the previous example, the legend field in the visualization will be ordered by displaying the most active company in the first place of the legend list.


 * Advanced: When creating a new visualization you can also choose From a saved search instead of From a new search to create new visualizations on top of searches instead of indices to avoid using a full index. Beforehand, under Discover you have to define a search as a specific view of a search.

Dashboard allows creating and editing dashboards.


 * When an administrator loads an existing dashboard (via Load Saved Dashboard), modifies it (e.g. dragging around widgets), and saves the changes under the same name, the view of that dashboard is modified for all users. When using a different name for a dashboard, an administrator would still have to add the link to that new dashboard to make it available for all users.

Timelion is supposed to allow you create time series using DSL queries.

Dev Tools: The Console allows building custom Elasticsearch queries.

Management offers access to internal stuff.


 * The Indices tab allows to configure an index pattern. It lists information about all indices and all index series (a collection of indices). You can see all the fields by name or type. Via controls on the right, you could for example convert the type of a field from "string" to "date". This is also the place to make Kibana know about new stuff in ElasticSearch by adding the name of the index in ElasticSearch.
 * The Objects tab lists all saved objects such as Dashboards, Searches and Visualizations and allows editing them directly, e.g. to change the number of buckets from 5 to 10 in a visualization. This is currently not possible via the UI and it is also prone to break the raw configuration.
 * You can choose any object from the lists and export it as a JSON file.

Further links

 * GrimoireLab training tutorial
 * Extensive user documentation by the Xen Project
 * Kibana User Guide (upstream documentation)
 * Kibana User Guide: Dashboards and Panels (upstream documentation)
 * Building visualizations, GrimoireCon 2017, Brussels
 * Example dashboards of other organizations: Document Foundation, Eclipse, Opnfv, CoreOS, Mozilla's Rust, Linux.

If you would like to see specific customizations, please file a request in Wikimedia Phabricator including a user story.

Other data sources and tools
Git
 * Wikimedia stats in OpenHub/Ohloh including many projects.
 * "How many unique contributors submitted unique pull requests to a https://github.com/wikimedia/ repo" - Python script by marktraceur.

Gerrit
 * Gerrit/Navigation
 * MediaWiki Gerrit stats  (Is it working? 2013-06-28)  and how to query Gerrit data.
 * Number of gerrit committers (marktraceur's bash script)
 * cmd-query for Gerrit.

Phabricator
 * "Phabricator monthly statistics" emails on the wikitech-l mailing list - see its archives.

mediawiki.org
 * monthly Statistics of page views and how the data is gathered.

Mailman
 * Wikimedia Mail Stats: PowerPosters.

Team
Quim Gil and Andre Klapper from the Wikimedia Engineering Community team are coordinating the Metrics Dashboard project, which is being implemented by Bitergia as contractors.

The Bitergia team working in the MediaWiki dashboard is formed by Daniel Izquierdo, Luis Cañas and Jesus Gonzalez Barahona and Alvaro del Castillo as project manager.

The ownership of this project might get transfered to the Wikimedia Analytics team at some point.