Developer Satisfaction Survey/2020

What we asked
Respondents were asked to rate their satisfaction with several broad areas of our developer tooling and infrastructure. Each area was given a rating of 1 (very dissatisfied) to 5 (very satisfied). This was followed by several free-form questions which solicit feedback in the respondent's own words. The satisfaction ratings have been summarized by taking the average of all responses in each category.

Who responded
In total, we received 75 responses. Of those, 60 came from Wikimedia staff/Contractors, additionally, 11 were from Volunteers and just 4 were from 3rd Party Contributors.

Overall results




Qualitative
This section focused on the local development environment, of which there is currently no standard. Ongoing work by Release Engineering has focused on two things: (1) supplying a simple Mediawiki development environment, and (2) determining requirements for and creating a more complex development environment. One goal is more parity with the production and continuous integration environments, which are migrating to containers.

Questions were focused on two areas: how people selected their development environment, and how the ongoing work could be improved.


 * How people decided on an environment
 * Most respondents wanted a simple, fast, and easy to use environment. More respondents selected a docker-based environment to get this result, while a minority thought Vagrant and LAMP would provide such benefits.
 * Based on the feedback, recommendations and usage within a development team play a large role in selecting the development environment, and Docker and Vagrant were both almost equally recommended.
 * Many respondents are comfortable with what they know, so don’t try out new development environment solutions
 * A fair amount of respondents express problems setting up and using the Vagrant environment, or have problems with its lack of support and maintenance, while a couple of folks expressed the opposite sentiment.
 * Some people need a complex environment that is more like production, and they almost equally selected LAMP or Vagrant to accommodate those needs.


 * How can we improve?
 * The vast majority of respondents indicated that they didn’t know about the efforts to improve the developer environment or didn’t notice anything, while a few expressed feeling like improvements were being made.
 * Many respondents expressed the desire for a supported docker-based development environment.
 * Most requested features were improved ability to run tests, more documentation, more standardization, and access to production-like content when testing.

Key Takeaways
 * 1) Release Engineering could work to improve communication with the developer community. People also want to see more planning documents and know a timeline.
 * 2) The decision to move to a docker-based development environment is well-received, but people want to have a standard, well maintained and supported environment.
 * 3) It’s unclear whether the desire for documentation is the result of having no standard place for documentation, it being out of date, or it not existing. It’s probably a combination of all three.
 * 4) Improving testing abilities in the developing environment and providing wiki content could help developers feel more satisfied with their environment.

Qualitative
This section focused on the Beta Cluster environment, which provides a frequently-updated imperfect replication of production. The two main areas of feedback given were functionality and ownership.


 * Functionality
 * The overwhelming majority observe the Beta Cluster to be unstable and hard to debug what the issue is when it is not performing correctly (their new code or something else).
 * There were a modest number of appreciative comments in its existence and their reliance on it.


 * Ownership
 * Ownership is seen as the tantamount issue with the current Beta Cluster; given it is a system that attempts to (partially) mimic production the assumption is that whomever (team or individual) owns the service/part of code in production should be responsible for it’s up-keep in Beta Cluster. This is not always feasible given the differences between the two environments and prioritizing production stability.


 * Suggestions
 * Most suggestions revolved around a system for providing multiple single use (on demand) test environments that can be shared by engineers, test engineers, or product managers to test specific commited or yet-to-be-committed changes.

Qualitative
This section focused on Continuous Integration and Testing generally. Other than general feedback, respondents were asked for feedback on running tests locally, CI queue (wait) time, and CI configuration for their repositories. Suggestions are collected at the end for all areas.


 * General
 * A few respondents repeated a desire for a front end build step, citing previous conversations on the topic.
 * One comment mentioned that the test suite had issues, not the infrastructure for running them. This is presumably in reference to the low test coverage in MediaWiki.


 * Running tests locally
 * Generally respondents have seen this area as improving.


 * CI queue (wait) time
 * Generally respondents indicate that this is improving or mostly fine.


 * Configuring CI
 * The centralized and “arcane” setup for CI was a repeated complaint.


 * Suggestions
 * Running extension tests against multiple versions of MediaWiki and PHP.
 * For local testing, having the ability to run tests only for changed file(s).
 * Report back failures from CI as soon as they happen (instead of waiting for the full test suite to finish)
 * Put CI configuration in the repos directly, instead of centralized. “Blubber for non-services” was mentioned more than once as a quick way of explaining the level of configurability that people hoped for.

Qualitative
This section is focused on “Code Review workflows” so the answers are less tool specific and more workflow/process and cultural. Highlights include:

Patchset vs pull request: There is a mix of positive and negative feedback regarding Gerrit’s patchset focused reviews (versus pull-request or feature branch reviewing) with some indicating they “have grown to love it” and others lamenting not using more popular hosted solutions.

Code review standards: Respondents identified both a “bikeshed culture” and variation in review standards. Automated tooling has helped the later (eg: phpcs) but there is a desire for more explicit guiding principles for all reviews. There was a callout of appreciation for the Code Review Working Group.

Identifying reviewers: Overall there is acceptance that the Wikimedia technical community is struggling with quickly and easily finding effective reviewers for code submissions.

One problem is the high incidence of unmaintained software in use with suggestions of: increased use of Developer/Maintainers, including a CODE_OWNERS (or similar) file in each repository, re-exploring the short-lived bot that automatically added reviewers based on file commit history.

Another is the divide between Wikimedia Foundation staff and the rest of the community; those on Foundation teams get prompt(er) feedback than those not on teams. There is speculation that patch-set level review compounds this problem by splitting attention across multiple reviews.

Timeliness of reviews: Overall an acceptance that review timeliness is either variable based on Foundation staff status or languishing generally. A few respondents indicated greater managerial support to improve timeliness through measures such as explicitly allotted time per quarter.

Qualitative
This section focused on deployments including tooling, scheduling, and getting new services or extensions into production.


 * General
 * There was a general (though not universal) desire to do MediaWiki deployments (aka “the train”) more frequently.
 * Relatedly, there were a few respondents that noted more frequent deploys may be hindered by blocker issues arising in production.
 * Tooling
 * An almost universal desire for more automation of our deployment mechanisms, notably including rollbacks to known good states.
 * Some expressed desire to have more/improve tooling for testing and monitoring in the Beta Cluster
 * Scheduling
 * A repeated desire as above for more frequent train deploys.
 * Relatedly, some identified that the train going from goup0 to group2 over 3 days was “too quick”.
 * At the same time, it was identified that ending the train on a Thursday meant an increased chance that the train would be blocked until the following week (due to Friday being a no-deploy day).
 * Some European-based volunteers noted that the current SWAT windows are inconvenient for them (either during normal work hours or during normal dinner hours).
 * The mechanism of adding yourself and changes to SWAT windows was identified as cumbersome and should be streamlined/simplified.
 * Risk mitigation
 * One respondent identified the complexity and risk in working on complex and long lived features that are hard to roll out in smaller chunks. It seemed they leaned towards some sort of shared feature-complete testing environment for their team could meet the need.
 * One respondent identified the lack of a complete testing environment for deployments and related tooling as a risk.

Qualitative
This section focused on Wikimedia production systems including logging and alerting.


 * Logging and alerting
 * There is a desire to enable or make easier the creation of alerts based on Logstash queries (eg: ~”I wish my team could be alerted when our extension is showing up in log spam”).
 * A suggestion of every team being responsible for monitoring logs for errors and bugs in their codebase instead of having that work be done by other individuals (eg: the Engineering Productivity teams).
 * At the same time, one respondent noted that the pings in Phabricator about errors in their team’s code from those reviewing logs was very helpful.
 * Multi-tenancy (with associated policies/standards) was identified as a missing feature of our logging and alerting systems.


 * Other
 * A repeated desire for a staging or similar environment.
 * Using more systems testing (in addition to/instead of monitoring) was identified as a way of improvement.

Qualitative
This section had respondents give feedback on the tooling that makes the suite of official development and productivity tooling (including Phabricator and Gerrit).


 * Phabricator:
 * Of the 29 comments directly regarding Phabricator generally ranged from “don’t mind it” to “Love it” (with emoticons). There were few (3) explicitly stating it should be replaced.
 * There was a grouping of explicit requests to improve search within Phabricator as well as other suggestions; multiple assignees, task graph improvements, scrum analysis tooling, and improved APIs to measure team effectiveness.


 * Gerrit:
 * Gerrit had a much higher variability in overall satisfaction ranging from “burn it down/move away from it now” to “I love it/please never move away.” Of those who voice concerns with Gerrit the primary motivation tended to be familiarity with pull/merge requests (vs patchsets).
 * Regarding the patchset vs pull-request issue two non-contradictory points were primarily brought up: patchset level reviewing increases the thoroughness of reviews while the pull-request model is more familiar to new developers.
 * Lack of familiarity with patchset reviewing was also identified as a major hindrance to outside contributions. 7 comments indicated this directly.
 * Suggested improvements to Gerrit included: improvements to commenting (eg: allow reaction emojis per comment), improved code browsing, and more self-service for most actions (eg: new repo creation).


 * Code Search:
 * Of those that are aware of the Code Search functionality there is near uniformity in the ecstatic appreciation of its availability.
 * However, 5 comments explicitly indicated they were not familiar with the service.
 * Suggested improvements included improved navigation, ensuring all new repositories are added in a timely manner, better handling of whitespace (eg: searching for “\n\n”), and more context aware search.

Qualitative
This section primarily focused on two areas of documentation: MediaWiki (inclusive of mediawiki.org, code comments, and doc.wikimedia.org) and the Wikitech wiki (primarily used for Wikimedia production and Wikimedia Cloud Services).


 * Mediawiki
 * Generally the feedback was consistent in the appreciation of the documentation that is there but acknowledging the large variance of quality, up-to-dateness, and discoverability. All common issues with documentation in any environment.
 * Code-level documentation (eg: function level block comments) was seen as more useful than on-wiki information.
 * Basic use-cases were seen as covered well while more complex ones were not. An example is how to set up a Structured Data on Commons installation.
 * Suggestions included: versioning documentation along with the software, enforcing documentation as a part of writing software, and using structured data within the documentation along with more robust widgets and templates.


 * Wikitech
 * Overall the feedback is similar to the feedback for MediaWiki documentation though a recurring theme of Wikitech Wiki being more out of date than MediaWiki.
 * The deployment calendar was mentioned once as a point of frustration (editing the wall of text).
 * Suggestions included regular reviews of content and integrating tools’ pages with the tooladmin interface to encourage keeping things up to date for tool authors.

[The raw comments, including suggestions, from this section were shared directly with the Wikimedia team driving documentation work, Technical Engagement.]

Qualitative
This section of the survey focused on eliciting missing tooling or services that our developers want. Some notable desires:


 * Demo/Test wikis: There is a consistent desire for a service that provides one-off/single use wikis (or sets of wikis) including an yet-unmerged-patch to do in-depth testing or product demos. It is acknowledged that the complexity required to do this is very high.
 * Frontend build step: While the number of respondents is low (2), the consistent desire for a frontend build step is acknowledged.
 * Single Sign-On (SSO): Respondents were disheartened by the number of separate accounts in use to accomplish work here and expressed interest in a Single Sign On solution.

Qualitative
There are other one-off suggestions that will be shared directly (summarized) with relevant stakeholders on topics such as: browser testing, local development, product approval of code, and tutorials.

Improvements for the next survey

 * Include explicit satisfaction rating of Gerrit, Phabricator, Code Search, and any other high use tools of interest, particularly in the “Development and Productivity tooling” section.