Community metrics

From MediaWiki.org
(Redirected from Community Metrics)
Jump to: navigation, search

How is the MediaWiki / Wikimedia tech community doing? Let's analyze the data available in order to highlight the contributors and areas setting an example, and also the bottlenecks or inactive corners requiring our attention.

Your feedback and requests are welcome in Phabricator (project Analytics-Tech-community-metrics). You can also comment at the discussion page and at the Analytics mailing list.

Key performance indicators[edit]

Median age of open changesets[edit]

"Time from last patchset" in days

Open changesets waiting for review[edit]

"Waiting for review"

New changesets submitted per month[edit]

The "submitted" line

Active Gerrit code review users per month[edit]

Other reports[edit]

MediaWiki Core code review[edit]

Number of MediaWiki Core changesets waiting for review in Gerrit (CR0 or CR+1):

Age in days of open MediaWiki Core patchsets:

Active users in Phabricator[edit]

Monthly active users in Bugzilla (from 2013-02 to 2014-10) and in Phabricator (from 2014-12 to last month).

New accounts in Phabricator[edit]

New Phabricator accounts created every month. About 4003 users have registered to Wikimedia Phabricator between its creation on September 2014 and October 2015.

wikimedia.biterg.io[edit]

wikimedia.biterg.io is the Wikimedia Tech community metrics dashboard. It was preceded by korma.wmflabs.org until 2016. Data sources include Git and Gerrit repositories, Phabricator's Maniphest (though only basic support as per February 2017), mediawiki.org, some mailing lists, and some IRC channels (Phabricator's Differential will be supported in the future). Its data is refreshed regularly.

Bugs and feature requests about wikimedia.biterg.io can be reported in Wikimedia Phabricator's Analytics-Tech-community-metrics project.

wikimedia.biterg.io offers:

  • Drill down: clicking an element and a filtered view will be applied
  • Time frame selection
  • Sharing URLs for certain views or embedding views
  • Exporting data
  • API access via the ElastiSearch API
  • Wikimedia administrators to create widget and panels themselves
  • an advanced filter search box

User interface[edit]

Screenshot

The top bar lists Dashboards (also called Panels). By default the Overview is chosen. Each dashboard offers numerous widgets, and a result list at the bottom of the page (commits in Git, emails in mailing lists, etc.).

The interactive Widgets at the bottom display the actual data. Some panels support clicking displayed items to get more specific information about those items and some panels also allow downloading and exporting the displayed data as CSV or JSON.

You can share URLs of dashboards with applied filters by selecting the Share icon to the right of the Advanced filter field.

Applying filters[edit]

In the right corner of the top bar, the Time filter allows adjusting the time span of all the data being displayed in the widgets.

Some widgets allow creating Filters: The mouse pointer turns into a plus symbol when hovering over a listed panel item and clicking the item will apply an additional filter for that item. When creating a filter in Kibana, the filter is displayed in green below the Advanced filter text field and is applied to the view. In the screenshot above, 'Bots' and 'Empty commits' are excluded from the data displayed in the panels. When hovering over a filter, you can enable/disable, pin/unpin (the filter will still be applied when you open that page again), invert (e.g. to get all companies listed except for one), remove or edit (e.g. to change the organization name) the filter. The "Actions" menu to the right of the filters offers the same actions to apply them to all filters at once. For more information, see Discover Filters.

The Advanced filter text field allows searching for text in any items (commit messages, user names, repository names, etc.). It allows querying a subset of results provided by the time filter and filters already applied. By default, any free text items in any database columns are included (*; entering this also resets a search). You basically enter a field name and its value, such as author_org_name:Independent AND author_bot:TRUE. The query syntax is based on the Lucene query syntax. Also see Kibana Queries and Filters for more information.

As of February 2017 the list of available fields (database columns) is only available to administrators and no auto-complete suggestions are offered (this problem could be worked around by creating a panel that lists the names of the available columns, via "Discover"). To perform advanced search queries, you need to know the names of the available fields.

Some more notes on advanced filters:

  • The type of field (string, number, date, etc.) influences the query syntax
  • Queries are case sensitive
  • You can only create queries which use fields within the respective index (simplified, "indices" in ElasticSearch are kind of databases) that is used in a panel, otherwise the search will return "No results found".
  • Fields not available in an index by default use -1 for numbers and na for strings

Configuration[edit]

Files in https://github.com/Bitergia/mediawiki-repositories define which mailing lists and IRC channels get indexed. It also defines which Git and Gerrit repositories get blacklisted/ignored (for example when they are only imported from upstream sources) in order to not have activity shown that did not happen in our community. Note: As per February 2017 blacklisting is not yet working properly.

It is based on Kibana dashboards and Eleasticsearch. The database provides indexes whose fields are used in panels, widgets and for searches.

Architecture and source code[edit]

Details on the underlying software architecture can be found on grimoirelab.github.io. A comprehensive GrimoireLab Training Tutorial is available.

Source code is available. Most code is written in Python. The existing repositories are:

  • perceval: Data retrieval platform which creates JSON files. perceval/backends contains the available backends. Data is stored in Elasticsearch.
  • arthur: Commander tool to run perceval and set up the panels.
  • kibiter: Visualization on top of ElasticSearch. A fork of Kibana which contains changes until they get merged in the upstream code base.
  • panels: Numerous JSON files. Contains all of the panels currently available for the current architecture.
  • GrimoireELK: An incubator for new ideas.
  • SortingHat: Command line interface to manage the data in our database. For admins, a complete database dump is available as a JSON file which allows manual account merging, updating affiliations, adding country information or marking an account as a bot. See its upstream documentation for more details.

The steps performed are basically: Sources → Data gathering (mining via Perceval) → Data enrichment (e.g. producing indexes in ElasticSearch via GrimoireELK) → Visualization (ElasticSearch and Kibana).

For administrators[edit]

Analyzing specific data[edit]

Under Discover, choose a database from the dropdown in the left panel. Then expand the time span. Results are displayed as a list of dropdown data items. Opening a dropdown displays all fields and their values as JSON or a table. A Kibana/ES visualization based on the JSON data is displayed on top. Specific fields can be added as columns to the displayed results by adding/removing those fields in the left panel. It is basically a huge matrix, and if we wanted more data, more fields could be added in the future (e.g. "Gender").

Creating visualizations and dashboards[edit]

This section is work in progress.

Further links[edit]

If you would like to see specific customizations, please file a request in Wikimedia Phabricator including a user story.

Other data sources and tools[edit]

Git

Gerrit

Phabricator

  • "Phabricator monthly statistics" emails on the wikitech-l mailing list - see its archives.


mediawiki.org

Mailman

Team[edit]

Quim Gil and Andre Klapper from the Wikimedia Engineering Community team are coordinating the Metrics Dashboard project, which is being implemented by Bitergia as contractors.

The Bitergia team working in the MediaWiki dashboard is formed by Daniel Izquierdo, Luis Cañas and Jesus Gonzalez Barahona and Alvaro del Castillo as project manager.

The ownership of this project might get transfered to the Wikimedia Analytics team at some point.

See also[edit]