Community metrics

From MediaWiki.org
(Redirected from Community Metrics)
Jump to: navigation, search

How is the MediaWiki / Wikimedia tech community doing? Let's analyze the data available in order to highlight the contributors and areas setting an example, and also the bottlenecks or inactive corners requiring our attention.

Your feedback and requests are welcome in Bugzilla (product Analytics, component Tech community metrics). You can also comment at the discussion page and at the Analytics mailing list.

Reports[edit | edit source]

We aim to publish reports interpreting the data obtained on a quarterly basis. Below you can find the initial reports published, based on data retrieved manually.

Metrics dashboard[edit | edit source]

Under development since June 2013:

Updated daily (check), this dashboard provides data about our Git repositories, Bugzilla, Mailing Lists, Gerrit and IRC. Below you can find the details about what sources are being scanned.

We are also polishing the data (finding duplicates, assigning contributors to the WMF and other organizations...). If you see any mistake or possibility of improvement please report it.

Powered by Open Source projects Metrics Grimoire and Viz Grimoire. See also the development specific to this dashboard in GitHub. Bugs, enhancement requests and patches for these projects must be submitted directly upstream.

Sources[edit | edit source]

Work in progress. The list here includes the sources currently scanned and the ones planned to be added in the short term.

Git[edit | edit source]

  • The source code repos analyzed are mediawiki/core and all the mediawiki extensions:
    • FIXME: This is only a portion (a big one, yes) of all the repositories we need to scan. The default is everything at gerrit.wikimedia.org but let's look at every repo before adding it just in case.
ssh -p 29418 gerrit.wikimedia.org gerrit ls-projects | grep "mediawiki/extensions

gerrit.wikimedia.org[edit | edit source]

  • The source code repos analyzed are mediawiki/core and all the mediawiki extensions

bugzilla.wikimedia.org[edit | edit source]

mediawiki.org[edit | edit source]

lists.wikimedia.org[edit | edit source]

  • FIXME - mailing lists missing: mediawiki-l, ee, qa... more?
  • FIXME - Is it possible to specify the number of subscribers?

IRC[edit | edit source]

  • Pending to define the channels to be added to the dashboard.

Contributors[edit | edit source]

The process to merge users identities from different data sources has three steps:

At the end there is a common upeople (unique people) table for all data sources and all data sources map its people table to this common upeople table. Gerrit (SCR) and IRC are not yet supported.

The user pages for Top contributors are linked in the top tables in the metrics browser. For example for SCM the third global committer has his own personal page.

Once unique people exists, other categories are created using it. For example, companies classification is done initially with a script that uses email domains if available. The classification supports periods of time to cover that a unique people has worked for several companies. There is some experimental support also for countries.

  • FIXME - Contributors must be linked to WMF and other orgs.
  • FIXME - Is By country relevant? Do we want to gather that data?
  • FIXME - Plan for linking this data to user profiles? Where?

Other data sources and tools[edit | edit source]

Git

Gerrit

Bugzilla

mediawiki.org

Mailman

Key performance indicators[edit | edit source]

Key factors to watch, in the scope of projects deployed in Wikimedia servers:

  • Are the teams more efficient processing contributions?
  • Is the share of non-WMF contributions growing?
  • Are WMF and non-WMF contributions treated equally?
  • Are the attraction and retention of new contributors improving?
  • Are we improving the sustainability of our community?

All this indicators are computed using the databases updated daily.

Who contributes code[edit | edit source]

Who is contributing merged code each quarter? How is the weight of the WMF evolving? What regions have a higher density of contributors? The evolution of the total amount of merged commits should be visible too.

  • Number of developers and commits by organization: Wikimedia (WMF, WMDE...), known companies, OSS projects (if relevant) and independents.
  • Number of developers and commits by country, based on the data provided.

Two charts? What type?

Queries[edit | edit source]

Reviews Database

  • Basic:

SELECT COUNT(issues.id) AS total, people.id, name, email, user_id 

FROM issues, people

WHERE issues.submitted_by=people.id and status='merged' 

GROUP BY people.id ORDER BY total DESC

  • Email domains:

SELECT DISTINCT(SUBSTRING(email,LOCATE('@',email))) AS domain, SUM(t.total) as dtotal 

FROM people, (SELECT COUNT(issues.id) as total, people.id

FROM issues, people

WHERE issues.submitted_by=people.id AND status='merged'

GROUP BY people.id) t 

WHERE people.id=t.id

GROUP BY domain ORDER BY dtotal DESC

Metrics[edit | edit source]

  • DB updated on 2013-08-22
  • Total revisions: 56127
  • Total abandoned: 3015
  • Total contributors with merged code: 319
  • Total contributors with abandoned code: 263
  • Total organizations with merged code: 108
  • Total organizations with abandoned code: 82
  • Number of revisions merged per contributor
  • Number of revisions abandoned per contributor
  • Number of revisions merged per organization
  • Number of revisions abandoned per organization
  • Ratios merged/abandoned
  • Tops

Analysis[edit | edit source]

  • Top 10 for contributors: 7 WMF, 3 WMDE.
  • Only one organization has more than 500 merged: Wikimedia (@wikimedia.org + @wikimedia.de): 13593 + 2132
  • Only one organization has more than 300 abandoned: Wikimedia.

Gerrit review queue[edit | edit source]

Changes dates from gerrit ssh API are wrong until 2013-05 so time to review is only available after that. Time zones are not covered yet.

How long is the Gerrit review queue over time? How long does it take to review code contributions? Are we improving? Are Wikimedia's staff and non-staff contributions processed equally? Who is doing better and worse?

  • Number of Wikimedia staff / non-Wikimedia-staff commits reviewed in <2 days, <1 week, <1 month, <3 months, >3 months or unreviewed (values may be fine tuned based on actual data).
  • Same as above, per project. Ranked from slowest to fastest.

How to calculate the average time? A few old open changes might distort the picture. What if we compare the pure average of the 100% of commits with the average removing the 10% of slowest and fastest?

One chart with extra data in a table? What kind of chart?

The people and companies and countries analysis will be trivial once we get the size and time to review basic queries working.

Queries[edit | edit source]

  • Total new reviews:
SELECT COUNT(*) FROM issues WHERE status='NEW';
  • Distribution in time for open reviews
SELECT COUNT(*),YEAR(submitted_on),MONTH(submitted_on) 
FROM issues 
WHERE status='NEW'
GROUP BY YEAR(submitted_on),MONTH(submitted_on) ORDER BY submitted_on;
  • Time evolution for merged reviews
SELECT count(*) as merged,YEAR(mod_date) AS year, MONTH(mod_date) AS month
FROM issues,issues_ext_gerrit WHERE issues.id=issues_ext_gerrit.issue_id 
AND status='merged' 
GROUP BY YEAR(mod_date),MONTH(mod_date) ORDER BY mod_date
  • Time evolution for abandoned reviews
SELECT count(*) as merged,YEAR(mod_date) AS year, MONTH(mod_date) AS month
FROM issues,issues_ext_gerrit WHERE issues.id=issues_ext_gerrit.issue_id 
AND status='abandoned' 
GROUP BY YEAR(mod_date),MONTH(mod_date) ORDER BY mod_date
  • Time evolution for all reviews SUM (NEW + MERGED month)
SELECT merged+abandoned+new, t.year, t.month FROM
( SELECT count(*) as merged,YEAR(mod_date) AS year, MONTH(mod_date) AS month
  FROM issues,issues_ext_gerrit WHERE issues.id=issues_ext_gerrit.issue_id 
  AND (status='merged' OR status='abandoned'
  GROUP BY YEAR(mod_date),MONTH(mod_date) ORDER BY mod_date) t,
( SELECT COUNT(*) as new,YEAR(submitted_on) as year,MONTH(submitted_on) as month
  FROM issues 
  WHERE status='NEW'
  GROUP BY YEAR(submitted_on),MONTH(submitted_on) ORDER BY submitted_on) t1
WHERE t.year = t1.year and t.month = t1.month;
  • Time to review basic in days
SELECT issues.id, DATEDIFF(max(changed_on),submitted_on) AS revtime, max(changed_on) as date, submitted_by, email 
FROM issues, changes, issues_ext_gerrit, people 
WHERE changed_on>'2013-05' AND issues_ext_gerrit.issue_id = issues.id 
  AND issues.id = changes.issue_id AND issues.submitted_by=people.id
  AND field='Code-Review' and new_value='2' 
GROUP BY changes.issue_id order by revtime desc limit 10
  • Time to review evolution in time in days
SELECT SUM(revtime)/COUNT(revtime), YEAR(date), MONTH(date) 
FROM (
  SELECT issues.id, DATEDIFF(max(changed_on),submitted_on) AS revtime, max(changed_on) as date, submitted_by, email 
  FROM issues, changes, issues_ext_gerrit, people 
  WHERE changed_on>'2013-05' AND issues_ext_gerrit.issue_id = issues.id 
    AND issues.id = changes.issue_id AND issues.submitted_by=people.id
    AND field='Code-Review' and new_value='2'  
GROUP BY changes.issue_id order by revtime desc) t
GROUP BY YEAR(date),MONTH(date) ORDER BY date;

Metrics[edit | edit source]

  • Total current queue size: 1033
  • Evolution in time queue size (for the current pending reviews), new reviews:
|      NEW |               YEAR |               MONTH |
|        1 |               2012 |                   3 |
|        7 |               2012 |                   4 |
|       11 |               2012 |                   5 |
|       10 |               2012 |                   6 |
|        5 |               2012 |                   7 |
|       11 |               2012 |                   8 |
|       15 |               2012 |                   9 |
|       20 |               2012 |                  10 |
|       21 |               2012 |                  11 |
|       46 |               2012 |                  12 |
|       28 |               2013 |                   1 |
|       59 |               2013 |                   2 |
|       73 |               2013 |                   3 |
|       55 |               2013 |                   4 |
|      145 |               2013 |                   5 |
|      103 |               2013 |                   6 |
|      182 |               2013 |                   7 |
|      241 |               2013 |                   8 |
  • Evolution in time for merged issues
| MERGED | YEAR | MONTH |
|     15 | 2012 |     2 |
|    404 | 2012 |     3 |
|   1399 | 2012 |     4 |
|   2397 | 2012 |     5 |
|   2828 | 2012 |     6 |
|   2468 | 2012 |     7 |
|   4329 | 2012 |     8 |
|   2784 | 2012 |     9 |
|   4283 | 2012 |    10 |
|   3903 | 2012 |    11 |
|   4311 | 2012 |    12 |
|   3752 | 2013 |     1 |
|   3412 | 2013 |     2 |
|   3645 | 2013 |     3 |
|   3216 | 2013 |     4 |
|   3168 | 2013 |     5 |
|   3654 | 2013 |     6 |
|   3958 | 2013 |     7 |
|   2201 | 2013 |     8 |
  • Evolution in time for abandoned issues
|ABANDONED | YEAR | MONTH |
|       20 | 2012 |     2 |
|      101 | 2012 |     3 |
|      132 | 2012 |     4 |
|       98 | 2012 |     5 |
|      171 | 2012 |     6 |
|      111 | 2012 |     7 |
|      148 | 2012 |     8 |
|      136 | 2012 |     9 |
|      172 | 2012 |    10 |
|      221 | 2012 |    11 |
|      162 | 2012 |    12 |
|      193 | 2013 |     1 |
|      164 | 2013 |     2 |
|      337 | 2013 |     3 |
|      201 | 2013 |     4 |
|      175 | 2013 |     5 |
|      207 | 2013 |     6 |
|      196 | 2013 |     7 |
|       70 | 2013 |     8 |


  • Evolution in time of queue size for all issues (merged+abandoned+new):
|      TOTAL | YEAR | MONTH |
|        506 | 2012 |     3 |
|       1538 | 2012 |     4 |
|       2506 | 2012 |     5 |
|       3009 | 2012 |     6 |
|       2584 | 2012 |     7 |
|       4488 | 2012 |     8 |
|       2935 | 2012 |     9 |
|       4475 | 2012 |    10 |
|       4145 | 2012 |    11 |
|       4519 | 2012 |    12 |
|       3973 | 2013 |     1 |
|       3635 | 2013 |     2 |
|       4055 | 2013 |     3 |
|       3472 | 2013 |     4 |
|       3488 | 2013 |     5 |
|       3964 | 2013 |     6 |
|       4336 | 2013 |     7 |
|       2512 | 2013 |     8 |


  • Time to review for the Top 10 slowest reviews
|   id  | revtime | date                | submitted_by | email                        |
|   146 |     437 | 2013-08-16 07:46:28 |           11 | liangent@g           |
|   208 |     378 | 2013-08-13 03:39:28 |           59 | hashar@f               |
| 13796 |     317 | 2013-07-22 19:41:39 |          208 | jan@j           |
|   323 |     307 | 2013-08-02 19:21:13 |           56 | daniel@n   |
| 17642 |     300 | 2013-08-02 19:17:39 |           90 | chughakshay16@g      |
| 17641 |     300 | 2013-08-02 19:18:08 |           90 | chughakshay16@g      |
|    23 |     295 | 2013-08-21 11:12:16 |           30 | amir.aharoni@m |
| 49272 |     284 | 2013-07-10 23:09:57 |          203 | toniher@c              |
|   920 |     278 | 2013-07-01 23:09:47 |          105 | helder.wiki@g        |
| 20147 |     276 | 2013-07-01 17:04:39 |           75 | vitalif@y           |
  • Evolution in time to review in days:
| SUM(revtime)/COUNT(revtime) | YEAR(date) | MONTH(date) |
|                      2.8988 |       2013 |           5 |
|                      2.5753 |       2013 |           6 |
|                      2.7894 |       2013 |           7 |
|                      4.8007 |       2013 |           8 |

Analysis[edit | edit source]

  • Total open reviews is growing faster in Jul and Aug 2013.
  • The time to review has grown in August
  • Merged issues has declined in August.
  • The rhythm of merged issues changes between months at the start. Now (2013-08) is more stable.
  • High abandoned rate in 2013-03: 337

Issues[edit | edit source]

  • Time to review is only computed once changes dates are correct: after 2013-05

Code contributors new / gone[edit | edit source]

Who are the new code contributors (commits + reviews)? Are they increasing their involvement? Who seems to be on a way out or gone? How are our contributor intake & loss evolving?

  • Number of new contributors with 1 / 2-5 / 6+ changes submitted in the past 3 months (values may be fine tuned based on actual data).
  • (How to register increasing engagement versus one-offs or new contributors disengaging and vanishing after a short period?)
  • Number of contributors stopping contributing or decreasing continuously in the past 3 months.

Two charts? Which kind of charts?

Queries[edit | edit source]

  • New (min(submitted_on)) code contributors for last 3 months with more than 5 contributions:
SELECT id, email, total, age 
FROM 
( SELECT people.id, COUNT(*) AS total, DATEDIFF(NOW(),min(submitted_on)) AS age, email 
  FROM issues, people 
  WHERE issues.submitted_by=people.id 
  GROUP BY people.id) a 
WHERE age <= 90 and total>5 
ORDER BY AGE
  • New (min(submitted_on)) code contributors for last 3 months with more than 5 contributions MERGED:
SELECT id, email, total, age 
FROM 
( SELECT people.id, COUNT(*) AS total, DATEDIFF(NOW(),min(submitted_on)) AS age, email 
  FROM issues, people 
  WHERE issues.submitted_by=people.id AND status='merged'
  GROUP BY people.id) a 
WHERE age <= 90 and total>5 
ORDER BY AGE
  • New (min(submitted_on)) code contributors for last 3 months with more than 5 contributions ABANDONED:
SELECT id, email, total, age 
FROM 
( SELECT people.id, COUNT(*) AS total, DATEDIFF(NOW(),min(submitted_on)) AS age, email 
  FROM issues, people 
  WHERE issues.submitted_by=people.id AND status='abandoned'
  GROUP BY people.id) a 
WHERE age <= 90 and total>5 
ORDER BY AGE


  • Gone code contributors, last contributions more than 6 months ago.
SELECT email,age FROM 
( SELECT people.id, COUNT(*) AS total, DATEDIFF(NOW(),max(submitted_on)) AS age, email 
  FROM issues, people
  WHERE issues.submitted_by=people.id
  GROUP BY people.id) t 
WHERE age > 180 order by age
  • Total gone code contributors, last contributions more than 6 months ago.
SELECT COUNT(email), age FROM 
( SELECT people.id, COUNT(*) AS total, DATEDIFF(NOW(),max(submitted_on)) AS age, email 
  FROM issues, people
  WHERE issues.submitted_by=people.id
  GROUP BY people.id) t 
WHERE age > 180 order by age
  • Evolution in time of age of gone code contributors:
SELECT COUNT(email) as total, YEAR(last_contrib), MONTH(last_contrib) FROM
( SELECT email, age, last_contrib FROM 
  ( SELECT people.id, COUNT(*) AS total, DATEDIFF(NOW(),max(submitted_on)) AS age, max(submitted_on) AS last_contrib, email 
    FROM issues, people
    WHERE issues.submitted_by=people.id
    GROUP BY people.id) t 
  WHERE age > 180 order by age
) t1
GROUP BY YEAR(last_contrib), MONTH(last_contrib) ORDER BY last_contrib

Metrics[edit | edit source]

  • Total new code contributors: 13
  • Total gone code contributors: 127 (total contributors: 394)
  • New code contributors
| id  | email                     | total | age  |
|  21 | bdavis@w      |     6 |    7 |
| 198 | jack@c |    14 |   31 |
|  20 | rainerrillke@h  |    19 |   40 |
| 156 | kartik.mistry@g   |    10 |   54 |
| 332 | karima.rafes@g    |    15 |   58 |
| 411 | nilesh@n        |    41 |   59 |
| 409 | simon.eu@g        |     7 |   59 |
| 360 | abreault@w    |    20 |   65 |
|  69 | neverett@w    |    46 |   66 |
|  97 | ebrahim@b       |     9 |   66 |
| 330 | eu.vlasenko@g     |     6 |   76 |
| 313 | sam@s       |     7 |   87 |
| 402 | david@s   |     6 |   88 |
13 rows in set (0.18 sec)
  • New code contributors merged
| id  | email                        | total | age  |
|  21 | bdavis@w         |     6 |    7 |
| 198 | jack@c    |    12 |   31 |
|  20 | rainerrillke@h     |    13 |   40 |
| 411 | nilesh@n           |    31 |   54 |
| 156 | kartik.mistry@g      |    10 |   54 |
| 332 | karima.rafes@g       |    14 |   58 |
| 360 | abreault@w       |    18 |   64 |
|  69 | neverett@w       |    38 |   66 |
|  97 | ebrahim@b          |     8 |   66 |
|  87 | yuvipanda+suchabot@g |    10 |   66 |
10 rows in set (0.17 sec)
  • New code contributors abandoned
| id  | email                      | total | age  |
|  82 | addshore@w     |    15 |   54 |
| 314 | joel.natividad@o |     6 |   69 |
2 rows in set (0.05 sec)
  • Evolution of last contribution of gone code contributors:
| total | YEAR(last_contrib) | MONTH(last_contrib) |
|     1 |               2012 |                   2 |
|     2 |               2012 |                   3 |
|     7 |               2012 |                   4 |
|     3 |               2012 |                   5 |
|    22 |               2012 |                   6 |
|    13 |               2012 |                   7 |
|     7 |               2012 |                   8 |
|     8 |               2012 |                   9 |
|    11 |               2012 |                  10 |
|    13 |               2012 |                  11 |
|    15 |               2012 |                  12 |
|     8 |               2013 |                   1 |
|    17 |               2013 |                   2 |

17 people sent their last contribution on 2013-02 and do not contribute more with code submissions.

Analysis[edit | edit source]

  • 3 new people from WMF, 7 from other email domains

Comments[edit | edit source]

The analysis could be done also for organization or country, not just people.

Evolution of new comers in time could be really cool.

Bugzilla response time[edit | edit source]

How long does it take to give a first response to reporters of Low-Immediate / trivial-blocker bugs? How long until acknowledged bugs are resolved? Are we using the importance parameters consistently? Are we improving?

  • Average time for an accepted bug report between bug creation date and PATCH_TO_REVIEW status being set
  • Average time for an accepted bug report between PATCH_TO_REVIEW status being set and RESOLVED FIXED status being set.
  • Average time for an accepted bug report between bug creation date and first comment by not the reporter her/himself.

Same question as in Gerrit review queue about how to calculate average times.

Three charts? What chart types?

Related bug report.

Top contributors[edit | edit source]

Wikimedia professionals apart, who are the top tech community contributors, what are their areas of activity and where are they based? Let's list everybody, not just the top 10. This will help the WMF and the Wikimedia movement knowing and supporting these contributors better.

  • Combined ranking of contributors of Git/Gerrit, Bugzilla, MediaWiki, Mailman, IRC. We'll need to find the formula.
  • Rankings for each channel.

Tables are good. No need for charts?

Team[edit | edit source]

Quim Gil from the Wikimedia Engineering Community team is coordinating the Metrics Dasboard project, which is being implemented by Bitergia as contractors.

The Bitergia team working in the MediaWiki dashboard is formed by Daniel Izquierdo, Luis Cañas and Jesus Gonzalez Barahona and Alvaro del Castillo as project manager.

The ownership of this project will transition to the Wikimedia Analytics team during 2013-14.

See also[edit | edit source]