Code Health Group/Quality Big Picture

From mediawiki.org

The following is a summary of my (JBranaa (WMF)'s) observations, thoughts, and suggested actions based on several formal discussions as well as many informal discussions I’ve participated in since starting at WMF in January 2017.

I am a strong believer that everyone is responsible for Quality. Although there may be roles, and/or groups that have “Quality” in their title/name, they should be focused on facilitating high quality practices, behaviors, and decisions throughout the organization. In order to do so, a multi-pronged approach is necessary. To that end, the summary is segmented into three primary areas: People, Process, and Tools.

Before we dig in, I’d like to acknowledge those that I spoke to - Erik Bernhardson, Erika Bjune, Tyler Cipriani, Gilles Dubuc, Dan Duvall, Elliot Eggleston, Zeljko Filipin, Grace Gellerman, Greg Grossmeier, Roan Kattouw, Andre Klapper, Giuseppe Lavagetto, Guillaume Lederrey, Marko Obrovac, Jon Robson, Bernd Sitzmann, Elena Tonkovidova, Rummana Yasmeen.

People (Culture)[edit]

Culture is a key influencer in the decisions that people make day in and day out. If people truly believe that delivering something of sub-par quality is unacceptable, then they will do the things necessary to assure the quality of their work.

A recurring theme throughout the various discussions I’ve been part of is that although there are pockets of goodness, there are also things that are impeding us from consistently delivering high quality software. Although this can be taken as a negative comment, I believe it is rather positive as it indicates a pretty broad desire to continue to improve things. This is mostly anecdotal due to the lack of data, but it is worth taking note of.

Shifting a culture can be difficult. It’s debatable if shifting a culture is an overt "this will be our culture" type activity, or if it's more about defining actions, decision making criteria, and measurements that are in support of a desired culture and just letting it get there organically. In either case, the following are some suggested improvements that could move the needle in the right direction.

Prospective Areas of Improvement[edit]

QA Tribe (now called QA SIG)[edit]

This is an existing group of internal folks that discuss and support each other in the space of QA and testing. Right now it's a free form sync-up. This is fine, but we could also have specific topics that people want to discuss. Although this isn't only for QA folks, it's predominantly QA folks. We may want to actively try to expand this group or invite guests on occasion.

Why is this important?

The QA tribe helps build an internal community of those interested in QA topics. This helps strengthen the culture around QA and testing. It also enables cross-pollination of approaches as well as organic growth of standard practices.

Next Steps: Ask tribe if they'd be interested in broadening the participants or simply having guest attendees. Identify a specific topic for each month.

Tech Talks[edit]

This is a fairly well established tool that we can use to build the environment needed to foster a Quality culture. These talks will cover software development/testing best practices that are implemented at Wikimedia. This would support both new developers to the foundation as well as provide refreshers and on-going education to existing developers.

Why is this important?

Tech talks can enable the creation of a community through shared practices. They can also expose more junior developers to more senior/respected developers that place a lot of importance on quality.

Next Steps:

- Select topics - Identify external/internal speakers

Code Health Group[edit]

In the spirit of the QA Tribe, the Code Health Group would be a cross-team group made up of people from various engineering disciplines (developers, architects, release engineers, QA Engineers, etc…). At a high level, this group would identify and develop engineering practices and infrastructure to enable and measure code health.

The group would focus on things such as testability, technical debt, code readability/reviewability, code complexity, branchings strategies, etc... Unlike the Team Practices Group, the Code Health Group is focused predominantly on engineering and doesn’t delve into project/task management. We would strive to reduce any potential overlap in topic area.

Why is this important?

Code health is a cornerstone to delivering high quality software frequently and consistently because it allows for developers to make small well understood changes that don’t threaten the stability of the broader system.

Next Steps:

  • Identify prospective members of group.
  • Meet up and establish charter of group.

Process[edit]

Not unlike tools, processes are a good thing to have as long as they support the end-goals and enable people to do their work. From what I’ve been able to surmise thus far, WMF has many different ways of getting the job done.

Not unlike many organizations, these existing approaches/process have grown organically at a team level. That being said, there are certain areas where processes are more standard out of necessity such as the continuous integration and deployment. In addition to that, TPG is also working towards bringing common practices to the organization.

Prospective Areas of Improvement[edit]

Estimation[edit]

Along with requirements, estimation is a key upstream contributor to the success of our projects. Although WMF doesn’t seem to have the same schedule-driven culture as many organizations, “schedule” still tends to drive decisions that may not be in the best interest of quality. One thing that came across during my discussions was that much of the schedule pressure that people feel is self-inflicted.

Why is this important?

Estimation, or better estimation, will allow us to actively avoid those situations where testing and other important activities get cut due to schedule constraints (self-inflicted or otherwise). Let’s face it, when it comes to software, code is king. It’s the one thing that you can’t deploy/ship without.

This area focuses on developing best practices. Although the primary outcome of the estimation process may seem to be the estimate, as with planning, the value is truly found in doing the activity itself. Planning and estimation activities tend to lead to discussion between people as well as a richer understanding of all the work that needs to be done.

Next Steps:

  • Map out current estimation practices with the assistance of TPG.
  • Define estimation best practices and promote Tech Talks, TPG, and grassroots efforts like the QA Tribe.

Bug Management[edit]

Bugs are an important insight into the quality of the software. Generally speaking, any issues, concerns, or dissatisfactions with software make their way to the developers via bug reports.

Today, WMF uses Phabricator to manage defects as well as any other tasks. However, unlike when Bugzilla was in use, you cannot consistently distinguish between defects and all other tasks.

Why is this important?

It allows us to get a view of reported bugs. Bugs are not just a bad experience for users, but also are costly due to the rework. The more we know about the bugs that escape, the better we will be able to address contributing factors. Bug management will also enable us to track progress as we implement improvements.

Next Steps:

  • Review all tags to see what there is out there to use as a basis.
  • Create a query to report all bugs (exclude feature requests etc from query results)
  • Define common approach to identify bugs.

Test Metrics[edit]

Test metrics are not currently tracked. Things like coverage, results, and test churn provide an insight into test effectiveness. Today, the CI infrastructure does track test results, but due to the 30 day window, there is no historical context available.

Why is this important?

It allows us to review the progress in-terms of testing. It also allows us to correlate changes in test with improvements in other areas like reduction of escaped bugs. It's not only good to see what is happening in-flight, but how that compares to where we were. One often overlooked aspect is effectiveness of tests.

Next Steps:

  • Identify KPIs for Testing
  • Create a central results repository
  • Develop mechanism that collects and stores metrics
  • Develop data visualization (dashboard)

Test Strategy[edit]

Testing at the Foundation is diverse with pockets of goodness. Although the CI pipeline encourages some common language such as “Unit Testing”, “Integration Testing”, and “Browser Based End to End testing”, those activities are not always understood or applied in a common way. As a result, efficiency and effectiveness tends to suffer.

In addition to the testing language and approach, the testing strategy can also help us identify the roles of various team members in the testing activities (developer, quality assurance engineers, exploratory testers, outsourced testing, etc...).

Why is this important?

It allows us to effectively communicate about testing and make sure the right kinds of testing are occurring at the right times at the right scope. In the end, if tests are not effective or efficient, they quickly lose support and testing is eroded.

Next Steps:

  • With the help of QA Tribe, develop first pass of testing language.
  • Define roles and responsibilities.

Tools (Technology)[edit]

Continuous Integration[edit]

This is an area that seems to be fairly well under control. There's a good amount of activity in this area and people don't seem to think it gets in the way. WMF has a functioning CI environment that is also in the process of being updated. Overall, those that are aware of future CI work being done today, are very supportive.

Browser-Based Testing[edit]

The browser-based testing infrastructure is also fairly well received. The primary requests are that more advanced notice be given prior to upgrading components like we did recently with Selenium. Apparently it’s caused disruption that may have been avoided. The other piece of feedback is that they’d like to be able to engage with RelEng on more of a project basis vs adhoc pairing sessions.

Test Infrastructure[edit]

Those that have seen the progression of the testing infrastructure such as Beta Cluster and MediaWiki-Vagrant have very positive things to say about how far things have come. That being said, the most common improvements they request are:

  • Stability of MW-Vagrant and Beta Cluster.
  • Parity with production environment.
  • Broader and production-like data to test against.

Prospective Areas of Improvement[edit]

Environment Parity[edit]

I think the primary improvement would be the parity of testing environments to production. The work with the new CI is working towards that, but may still be lacking parts like more realistic test data and common configurations (same extensions, etc...).

Why is this is important?

Dependencies on the environment are becoming more and more common in system development. Whether it be libraries, apps, or configurations, if the system under test is different than the environment it's deployed in, the lower the confidence is.

Next Steps:

  • Explicitly define what “environment parity” means.
Robust Test Data[edit]

This revolves around having a more reflective dataset to test against. I think the ideal situation is that we have a mirror of the production dataset to test against. This MAY not be practical as the dataset for the wikis is very large.

Why is this important?

Currently developers are required to reproduce entries in the test databases in order to do some of their testing. This is not only inefficient, but also may result in some testing not occurring until new code is deployed due to the overhead of test data setup.

Next Steps:

  • Investigate options for gaining access to larger dataset.


Data Retention and Visualization Infrastructure[edit]

The current environment is not lacking in data. Between the CI environment, Phabricator, and varied other data source, in-flight analysis is possible. However, the environment seems to lack in long-term data retention (for CI build logs/metadata). Tools like Phabricator facilitate historical data analysis as it relates to tasks, but our CI environment is focused on a 30 day window of data. The lack of long-term data retention impedes any true analysis.

Prospective Areas of Improvement[edit]

Long-Term Data Retention[edit]

This focuses on the ability to retain and access engineering data for the long-term. Part of this work is to define which data and to what degree of detail to retain for the long-term. In some cases it may be adequate to store metrics.

Why is this important?

Simply put, it enables change and helps measure progress. For an organization to be working to improve itself, it needs to be able to measure itself. Today, that’s difficult, at least as it relates to software quality.

Although anecdotal information may be adequate to trigger change and improvement, it’s not enough to sustain the effort. As improvement efforts are put in place, an organization needs to be able to make decisions about whether to invest more, less, or even abandon an effort. This is difficult to accomplish without data.

Next Steps:

  • Identify what to retain.
  • Identify how to retain it.
Visualization[edit]

Data’s great, but information is better. Visualization is about transforming the data into information that can be acted on. Often visualization is delivered through dashboards. The underlying goal of the visualization effort is to create dashboards that pull together data from various sources and create visualizations.

Why is this important?

Visualizations and dashboards provide a consistent view into data that all the organization can understand and collaborate on. Raw data can have a high barrier of entry and may not be actionable without additional analysis of related data points. Visualizations can pull together varied data points to provide actionable information that is readily available.

Next Steps:

  • Identify metrics to gather.
  • Design dashboard.