Developer Satisfaction Survey/2019

Background
During FY2019 annual planning, Developer Satisfaction was identified as a target outcome for the Wikimedia Technology department's Developer Productivity program.

In our first attempt at measuring progress towards this target, the Release Engineering Team conducted a survey in which we collected feedback from the Wikimedia developer community about their overall satisfaction with various systems and tools. This page attempts to summarize the feedback into some numbers as well as broad themes that we were able to identify in the feedback received.

Since the privacy of respondents is important to us, we will not be publishing the raw responses, instead we will roughly paraphrase the most common complaints, suggestions and comments, along with some stats and other observations.

What we asked
Respondents were asked to rate their satisfaction with several broad areas of our developer tooling and infrastructure. Each area was given a rating of 1 (very dissatisfied) to 5 (very satisfied). This was followed by several free-form questions which solicit feedback in the respondent's own words. The satisfaction ratings have been summarized by taking the average of all responses in each category.

Who responded
In total, we received 58 responses. Of those, 47 came from Wikimedia staff/Contractors, additionally, 10 were from Volunteers and just 1 was from a 3rd Party Contributor. In the future we should make an effort to reach more volunteers and 3rd party developers. For the this first survey, just keep in mind that the data are heavily biased towards the opinions of staff and contractors.

Analysis
Most of the responses could be summarized as simply "satisfied" with most of the categories averaging near 4. Below we will discuss notable exceptions to this generalization and any other observations that can be gleaned from the available data. One thing stands out immediately when looking at the average scores:: Code Review is the lowest score by quite a margin. At 3.0 it's pretty far below the next lowest score which was Local Development at 3.7. Given that it was not possible to respond with zero in any category, effectively the lowest possible score is 1. That number is even worse if we look at just the responses that we received from Volunteers. That subgroup gave code review a very disappointing average rating of 2.75, with Wikimedia staff and contractors averaging 3.42.

Qualitative Analysis of Respondent Comments
We attempted to divide the content of respondent comments into several categories to identify common themes and areas for improvement. We created diagrams with the comments in each theme in order to come up with "How can we..." questions that will help kick off future investigation and brainstorming for improvements in each category.



Categories

 * Automated Testing
 * Collaborating
 * Debugging
 * Deploying
 * Developing
 * Finding Things
 * Gaining Knowledge
 * Getting Reviews
 * UI TestingUI Testing Comments Diagram.png UI Testing analysis produced three questions:
 * How can we do better catching bugs before they hit production?
 * How can we enable quick and easy production-like testing environments for developers?
 * How can we share information about the staging environment?