Talk:Community metrics/Archive 1

Fishing expeditions
To avoid problems associated with data analysis, you may want to describe your problem statements/hypotheses first. - Amgine (talk) 16:07, 2 October 2012 (UTC)
 * Very good point! A first stab to the fish: User:Qgil/MediaWiki Community Metrics.--Qgil (talk) 16:34, 2 October 2012 (UTC)

Drafting Bugzilla metrics
The agreed entries have been moved to Community_metrics/Master_report. Thank you for the discussion!

Below you can find some hard bones left.

People

 * Active in the past 12 months. (likely needs a SQL query -- User:Malyacko)
 * New accounts in the last month. (needs a SQL query -- User:Malyacko)
 * From these, how many filed a bug or comment.

Activity

 * Average time of resolution (Fixed, Invalid, etc).
 * In the past year.
 * In the past month.


 * New comments (total, last year, last month).

New bugs

 * Average time to response to new bugs.
 * In the past year.
 * In the past month.

What is the MediaWiki community?
Might be a good idea to define that before you start asking questions about it :-)

Are you talkint about the community of MediaWiki developers, or MediaWiki reusers (as in: people who are involved in operating a MediaWiki site in some form), or MediaWiki users (as in: anyone who is a registered and active user of a MediaWiki wiki), or the community around MediaWiki.org specifically? The proposed metrics seems to concentrate on the first and the last (there is no mention of e.g. number of MediaWiki installations worldwide, which is a good metric of how large the wider MediaWiki community is), in which case it might be clearer to call it the Wikimedia tech community. --Tgr (talk) 01:50, 11 November 2012 (UTC)
 * So far, it seems to be simply defined as "all those who care about improving MediaWiki": does it need to be more than this? It means not only using it but also filing and triaging bugs, working on code, improving documentation on this wiki (is there some metrics on this wiki). Wikimedians are not the only ones editing this wiki, and the more users we have the more bug reports will presumably be filed; not to mention corporate users which contribute code, of course. --Nemo 02:02, 11 November 2012 (UTC)


 * The truth is, I'm not sure myself. :) The main goal is to map contributors, contributions, and opportunities to contribute. So far it hasn't been a priority (for me) to find out how many passive MW users are out there, but this doesn't mean that it can't be a priority in the future or a priority for someone else right now. What is clear is that a community of many and diverse users of a piece of software is better than a community with e.g. only one superuser. In that sense knowing whether the number of WM instances out there is growing or not would be useful as well. I'm happy to include new metrics in the reports as long as there is a way to retrieve them. PS: about the name, maybe it is a good idea to all it Wikimedia tech community anyway. Not sure yet, though... --Qgil (talk) 19:27, 26 November 2012 (UTC)

What wiki stats?
It's time to start retrieving statistics related to documentation. What useful stats can we extract fro this wiki? Starting with Special:Statistics.--Qgil (talk) 18:13, 4 December 2012 (UTC)
 * would probably give all needed info, but only if 35198 is fixed. Nemo 10:04, 6 December 2012 (UTC)
 * I didn't know about http://stats.wikimedia.org/wikispecial/EN/TablesWikipediaMEDIAWIKI.htm . Thank you!--Qgil (talk) 15:42, 6 December 2012 (UTC)

Bug fixers this month
How useful is this statistic? Most bugs are not assigned to anybody when they're fixed, or the assignee is not the real fixer, or it's impossible to say who's the fixer (for instance on shell requests it's hard to say who "fixed" the bug, the most important responsibility belongs to the shell user). Maybe it should be renamed to "Assignees of bugs closed this month" for clarity. Nemo 10:04, 6 December 2012 (UTC)
 * At least it's useful to show that our bug handling workflow could be improved. When a developer takes on a bug report he should assign it to himself. This is a good way to avoid having two people working in the same issue. Let's keep it for now, perhaps we see a trend to fix this problem.--Qgil (talk) 15:36, 6 December 2012 (UTC)
 * The statistic doesn't take in account the assignee field, but the user setting a RESOLVED state. --Dereckson (talk) 16:12, 6 December 2012 (UTC)
 * Addendum. This is only true for the weekly statistic received by mail, not for this report. --Dereckson (talk) 17:00, 6 December 2012 (UTC)
 * When you quickly fix an issue (not: work on something for a while), setting the assignee field is an additional step that doesn't add much value for anybody. Statistics for RESOLVED FIXED should be gathered in Gerrit instead. For all other Bugzilla resolutions this could be seen as statistics about triagers etc... --Malyacko (talk) 16:28, 6 December 2012 (UTC)
 * Good news. Bugzilla 4.2 has a link take to quickly set yourself as the bug assignee. --Dereckson (talk) 17:00, 6 December 2012 (UTC)
 * Are we seeing such a trend? --Nemo 15:32, 22 January 2013 (UTC)


 * At least between November and December yes, but I made a misake and forgot to update the data in the December report. :(  Nobody 242 ---> 105. Wikidata bugs 75 ---> 24. Let's see what happens in January.--Qgil (talk) 19:17, 22 January 2013 (UTC)

Code reviews
It should be interesting to add a statistic on number of code reviews (specifically, comments), especially until is fixed. I think the only "reliable" way is to use gerrit-wm feeds, although sometimes it's down and some non-mediawiki repos are not on #mediawiki: after downloading the logs tarball, something like  is enough to count. Merges are harder to measure because self-merges should be excluded, obviously. --Nemo 15:32, 22 January 2013 (UTC)


 * Makes sense. I'm happy adding the data to the metrics report if it's available somewhere I can look at.--Qgil (talk) 19:18, 22 January 2013 (UTC)

Suggest more
(((Moved from the Page))

What else do we want to know? Let's agree on the answers without being conditioned by existing data or tools. Then we will see what can be reasonably done.


 * Projects activity
 * Most active: continuous contributions, a diversity of contributors, newcomers...
 * Quality: open bugs, response to issues, user satisfaction.
 * Collaboration channels
 * Which channels are being used for technical collaboration.
 * Population: ins, outs, active, idle.
 * Participation: volume, signal, noise.
 * Contributors
 * Who are we? What skills are we contributing? Where are we based? How long have we been around?
 * Most active, productive, committed, responsive.
 * Newcomers: income flux, popular motivations and destinations.
 * Meritocracy: who has extra permissions, responsibilities, reputation.
 * Countries where they work from.
 * [[Image:Attention niels epting.svg|18px]] Can this data be retrieved from the Gerrit web server? Is it ok to do it?
 * This is not logged, and would not be available if it was due to the privacy policy.

See also Analytics/Dreams. --Qgil (talk) 22:43, 28 June 2013 (UTC)

Tool Labs access to Gerrit database
Have filed 50422 to replicate a (redacted) version of the Gerrit database to toollabs. This allows volunteer devs to hook up whatever stats they want in a more efficient way than the current method (of having to trawl through the API / git). Yuvipanda (talk) 14:06, 29 June 2013 (UTC)

"What mailing lists are worth scanning?"
Asks Quim. Answer: all of them (if public), and if you don't manage to then your tool is broken (doesn't scale). :) We already have http://www.infodisiac.com/Wikipedia/ScanMail/, you only have to take all the blue mailing lists there I suppose (plus the bunch created lately). --Nemo 05:34, 2 July 2013 (UTC)
 * Does "all of them" mean the ones not defunct under Mailing_lists/overview? Are all those lists active? Do we really need to scan, say, MetaVid-l? I agree the current scope is very limited, insufficient, but we should avoid adding cruft just because technically we can.--Qgil (talk) 20:47, 2 July 2013 (UTC)
 * Ok, see http://korma.wmflabs.org/browser/mls-repos.html --Qgil (talk) 07:06, 17 July 2013 (UTC)

GitHub
I saw an edit summary: "Let's forget about GitHub. Everything that matters should in Gerrit." I believe some of our mobile work is GitHub-only right now, correct? Sharihareswara (WMF) (talk) 22:21, 5 July 2013 (UTC)
 * Yes, after checking with the Mobile team we confirmed that GitHub isn't relevant for Wikimedia tech metrics nowadays.--Qgil (talk) 14:51, 6 July 2013 (UTC)

Bugzilla products to scan
Looking at the number of tickets per product, bigger products are more interesting and complex than products you can easily overview. Though in order to find inactive ones or bus factors, all could be nice.

Must haves: Total    Open MediaWiki               21257     4393 MediaWiki extensions    15424     4192 Wikimedia                9602     1066 VisualEditor             1135      371 Nice to have: Total    Open Wikipedia App             784      109 Wikimedia Labs            553      164 ...and the rest feels optional. --AKlapper (WMF) (talk) 14:38, 11 July 2013 (UTC)

Response time statistics
Apart from statistics mentioned above (e.g. number of people active in Bugzilla in last month etc.), these are stats that I've been interested in and that Bugzilla itself does not provide. I know that some researchers have used Bugzilla database dumps so far to find this out and it would be awesome if this wouldn't require dumps first. I don't expect us to provide these stats at any time soonish, so it's really a wishlist only.


 * Average time for a (non-enhancement) bug report between bug creation date and PATCH_TO_REVIEW status being set
 * Average time for a (non-enhancement) bug report between PATCH_TO_REVIEW status being set and RESOLVED FIXED status being set
 * Average time for a (non-enhancement) bug report between bug creation date and first comment by not the reporter her/himself

One problem I see specifically for the "MediaWiki extensions" Bugzilla product is that I'm only interested in official extensions that are deployed on WMF servers, and not in all the other third party extensions. --AKlapper (WMF) (talk) 14:46, 11 July 2013 (UTC)


 * "between bug creation date and PATCH_TO_REVIEW" wouldn't be more useful if we could restrict it to accepted bugs? Otherwise you get into the mix the bugs that are left open because there is disagreement about them. The other two yes, make sense.--Qgil (talk) 15:34, 12 July 2013 (UTC)


 * Oh true, good point! +1! --AKlapper (WMF) (talk) 12:19, 15 July 2013 (UTC)


 * Added to Community_metrics.--Qgil (talk) 22:11, 20 August 2013 (UTC)

"Wikimedia (@wikimedia.org + @wikimedia.de)"
What does this mean? You are not using email domains to identify the organisation, are you? Especially the core developers often use their personal email addresses they started with as volunteers many years ago. In gerrit, for WMF you can use the group 'wmf' which is always perfectly updated as it's sync'ed with LDAP, as far as I understand. I use it to filter my searches with ownerin:wmf etc. WMDE is another story. --Nemo 07:43, 24 August 2013 (UTC)

Definition of unreviewed
You lack a definition of unreviewed commit in here. I always use this, which was originally recommended by Robla in a wikitech-l post: Gerrit/Navigation ( : if a commit is ok or with no CR what's needed is CR/+2; only if it has -1/-2 it needs coding work and hence is probably not a current target for CR). --Nemo 07:46, 24 August 2013 (UTC)

"The time to review has grown in August"
I don't understand how the average above is calculated. What time to review to you calculate for open commits? If you consider only merged commits and make an average of the time they took, you would show an increase of time to review when the backlog of old commits is worked on. --Nemo 13:05, 26 August 2013 (UTC)