Wednesday, December 8, 2021

What of Quality?

In our industry, we have conferred the word ‘quality’ to mean what the Quality Organization does, in other words, we have defined quality to mean product defect tracking, measurement, and prediction based upon testing and validation of functionality.  This has been the norm for over 70 years, ever since the term ‘bug’ was introduced into our software jargon.  Now is a good time to step back and ask, what are all the attributes that determine product quality?  Is product quality more than defect rates over time? 

From a business perspective, customers consume our software product that creates value for them and therefore delivers revenue to us.  The value delivered to the customer is what enables us to continue to create on-going new capabilities for them.  What if we assign our definition of quality to the value that we continuously improve and deliver to our customers?  If you agree, then we can ask, what aspects of our process helps us to improve value, hence quality?  Remember, we make business decisions to fund developers to write code delivered in releases that are validated and consumed by customers. If you tend to agree with this, we could inspect Quality by inspecting aspects that create Value, those being Business Decisions, Development, Code, Releases, Validation and Customer Usage. Let’s call these six aspects Quality Contributors. 

Can we compare Quality Contributors between organizations doing Traditional/Sequential development and operating businesses, and Scrum-based ones?  We could deploy the hordes of analysts to gather and process all types of numbers. Still not sure that we can address the apples-to-oranges comparison problem.

Maybe we can use relative comparisons based upon the fundamental behaviors or trends on how Traditional/Sequential and Scrum-based organizations create Customer Value. What if the relative comparison of the six Quality Contributors shows that one approach is relatively better in four of the six Quality Contributors. Would we consider that the quality is higher in one approach over the other? Maybe. Let’s try this thinking and hope that we side-step a three-to-three tie. 

Before we start, allow me to specify that in this comparison, we are talking about world-class, mature Traditional/Sequential and Scrum organizations who have mastered their respective methods.  There are examples of both types of organizations who have mastered what is presented here. There are no magic-happens-here gaps in what is compared (albeit, there are gaps in how any one organization may do Traditional/Sequential and Scrum methods today). This posting does not contemplate mixed modes, like Kanban, Scrum-Fall, SAFe variations, Spiral, etc.

The Business Decisions Quality Contributor is defined to be ‘time from business commitment to customer availability, and the cost of changes during that period’. For example, a business will study the market, determine a list of required features, develop a comprehensive plan and at some point, early in development, make a business commitment to the program. This starts the clock ticking. After this time, the cost of change means the cost of disruption and adjustment to the plan. The cost of change could be large depending upon how far along the program is and how impactful the change.

The Development Quality Contributor needs a bit of an introduction before we define it. A key question is, ‘When is quality created during development?’ Some argue that quality is tested in by QA. I argue quality is validated by QA. QA doesn’t create or test in quality. Rather, quality is created in the mind of the developer once they comprehend the customer need and before they finish the last line of code. 

There is a time lag from when the developer finishes writing that last line of code and knowing that the code has been validated as ‘known-good’. Prior to knowing the code is known-good, if a defect is found, the developer must recall precisely the customer need and their code to correctly fix the defect. This period is the ‘Risk of Recall’. We know that developers who are distracted, even for short periods of time, must expend effort to re-engage the creative process. We know that the longer the time between creation and recall, the more intense the effort to recall the details and fidelity of the work. Erroneous recall can create follow-on defects that are unintentionally added with the fixes. At the point of GA, the developer knows that their code is known-good. At this point, they have the ‘Freedom to Forget’. 

For this paper, the Development Quality Contributor is ‘the magnitude of Risk of Recall over time until they reach the Freedom to Forget point’.

Let’s go to the next Quality Contributor, Code. All code has defects and defect densities. Let’s break the defects out into two groups. Going back to code quality and defect densities studies of the 1970’s and 1980’s where they found based upon the original code’s design and implementation, there is a ‘basal’ rate of defect discovery that is relatively unchanged over time. Even with intensive testing and defect fixing efforts, basal defect rates remain constant. The only method that fundamentally changes a component’s basal defect rate is to re-design and re-implement (from scratch) the component. What they don’t guarantee us, is that after doing the re-design/re-implementation the basal defect rates will be lower. In fact, the basal defect rate could be higher. The basal defect rate is the first component of the Code Quality Contributor. There are also ‘development’ defects added during development and discovered during validation prior to the code being known-good. The rate of discovery of development defects is the second component of Code Quality Contributor. 

The Code Quality Contributor is ‘the development defect rate plus the basal defect rate over time’.

The Release Quality Contributor is how much work accumulates in partially done state that must be completed before the release is shipped, or work in process (WIP) over time. Let’s look at manufacturing for a comparison. Assume a business does assembly of the same product at two factories, each factory has four manufacturing steps with a capacity of 1,000 products produced per month.

Factory A first assembles the month’s 1,000 subassemblies through Step 1. Factory A then takes the 1,000 subassemblies through Step 2 then through Step 3. Hopefully, by the end of the month, Factory A finishes the 1,000 products by completing Step 4. Factory A queue depth is 1,000 at each step in the process. Factory B however, takes one subassembly through Step 1, moves it to Step 2 before starting another at Step 1. Factory B continues this process as the subassembly moves to Step 3 and Step 4. The Factory B queue depth at each step is one. Hopefully, Factory B’s rate is 250 products shipped each week to reach 1,000 products shipped by the end of the month. Factory A has accumulating amounts of work-in-process (WIP) peaking at 1,000 subassemblies during the month. Factory B has at most, 4 work-in-progress (WIP) subassemblies at any moment in time. While both Factory A and B are shipping 1,000 products per month, lower WIP is considered consistent with higher quality. If a process or material defect is discovered, say during Step 4, Factory B with lower WIP will discover the defect sooner, total rework is lower, and waste is minimized.

As in manufacturing, product development cycles also carry WIP. WIP is the partial work accumulated until the product is ready for shipment to customers. WIP in development, like a factory, is an indicator of quality and represents risk due to the unknown work remaining. In development, we don’t have discrete product development steps easily discernable as manufacturing steps. Nor are we able to inspect software WIP as easily as subassembly WIP. To ensure that we have finished all WIP, we run validation tests to ensure no WIP remains before the product goes to market. We are able to discern the effort placed in development (WIP building) and the amount of valuation completed (WIP decreasing) until the release is back at known-good. 

We will use the definition ‘development effort less validation effort until known-good is reached over time’ as our Release Quality Contributor.

Validation is the process by which we come to know that the product is known-good and ready for customer consumption. Validation proves that all WIP has been completed, otherwise a flag (a development defect) is raised for development to ensure that the work gets completed before shipment. Let’s assume that Traditional/Sequential and Scrum-based validation processes are rigorous and equally funded. We will inspect the utilization of validation resources and frequency of validation stalls. A validation stall is when the validation process is stopped and reverted to a previous state due to a failure in the product under test. Validation stalls create inefficiency of testing and, potentially, schedule impacts.  Resiliency of the validation process can minimize validation stalls, and our assumption is that the validation process is world class. However, defects will periodically halt validation. 

The Validation Quality Contributor is ‘the percentage of validation resource utilized and rate of validation stalls over time’.

‘Code Currency’ is a major industry topic. Code Currency is defined as the percentage of customers who operate ‘current’ or GA(n-1) code. (if Finance was hoping that currency was about making more money with the code, they were disappointed.) There are two key quality reasons that customers should always be using the latest code. The most recent release has the benefits of the most mature version of the validation process and has the most fixes that address the basal defects. The customer, even without using a single new feature, has a better-quality product. Of course, the new features are an added benefit. Any time delayed in using the newest release is a needless lowering of a customer’s perception of a product’s quality. 

The last Quality Contributor is Customer Usage based upon Code Currency, where we measure the rates of customers’ adoption or usage of the most recent release.

One final point, I use GA(n) to refer to the current release under development, GA(n-1) for previous release and GA(n+1) as next release.

For a Traditional/Sequential business, I’m going to fix the period between GA releases to be 3 quarters with commitment for the GA(n+1) happening a quarter before GA(n) ships. The planning horizon is approximately a year. The Traditional/Sequential business development processes are all mature and best practice. 

For Scrum business, I’m going to fix the sprint at a two-week duration where they deliver GA(n) at the end of the sprint. GA(n+1) happens two weeks later. The Scrum business is mature in doing correct grooming, modern code validation and state-of-art deployment techniques for either as-a-service delivery or enterprise software (yes, there is enterprise software being updated every two weeks... just watch your laptop do its Windows thing or watch Amazon deploy their AWS services in their data centers).

Ready to start comparing Traditional/Sequential and Scrum-based organizations? At the end of each, I’ll declare whether Traditional/Sequential or Scrum-based businesses win and why or draw on the comparison.

Business Decision Quality Contributor

A Traditional/Sequential business commits to GA(n+1) release one quarter before GA(n). The period for potential re-planning is 4 quarters and the cost of a re-plan increases as WIP builds because the plan is already in execution and the code is in the state of partial implementation. Changes in plans means going back and rooting out partial work while adding in new work. Cost of a re-plan after Functional Complete are lower due to lack of time for new development so rational options are limited. Another cost of change is when the narrow, isolated change impacts the GA(n+1) date. In this case, all other work is delayed simply because the release is delayed. If a competitor does something near mid-cycle of GA(n+1), the business is faced with the most-costly change to GA(n+1) or wait to respond in the next release, in this case, 1.5 years out in time.

A Scrum business spends extensive resources in grooming work in such a fashion that all scrum teams can complete the customer increment GA(n), however small, within a two-week sprint. Changes in plans impact the previous grooming and depending on the degree of the change impacts GA(n+1) and later releases. An isolated change in one team does not have impact on the other teams in delivering the GA(n) or the subsequent GA(n+1).

If a competitor does something mid-cycle of a sprint, the team can take on grooming and tradeoffs to decide when to phase it into the teams’ work plans. While that is happening, teams continue to complete customer value increments in GA(n).

Comparing the two, the Traditional/Sequential business has a longer period of re-planning and cost impacts on a release simply because of the longer release time. The intensity and cost of grooming is higher in Scrum and with a change some or all the grooming can become waste.  One could argue that these are roughly equal. The difference that surfaces is how much of the existing work or customer value goes to market and how quickly the future work can be redirected and brought to market. In this comparison, the Scrum business has the advantage with teams creating small increments and all increments going to market at the end of the Sprint, GA(n). The increments reflecting the changes will get to the market quicker in the GA(n+1) and GA(n+2) sprints.

Development Quality Contributor

Traditional/Sequential development starts doing work as early as, if not before, commitment agreement, a year before GA(n+1) releases. The developer’s Risk to Recall will start slowly and raise continuously until GA(n+1) happens. All developers hit their peak of risk to recall just before release because validation could uncover a development defect at any time. Even at GA(n+1), they have already started development on GA(n+2), so they never truly hit a Freedom to Forget point at GA. While they will eventually reach a freedom to forget point for a specific piece of work when it releases, given the fact that there is always WIP they never have a point in time where they can completely forget during the release.

Scrum teams will start doing work right after the sprint review meeting. They must return their code to GA quality (known-good) with each completed user story; this can happen multiple times within a Sprint. Their risk of recall raises and falls with each user story implemented and completed.

The Development Quality Contributor is significantly better with Scrum as developers can be focused on one customer increment until done at known-good. Once done, they can freshly take on the next piece of work with the freedom to forget the previous story’s work.

Code Quality Contributor

For simplicity reasons, let’s assume post-GA code for both Traditional/Sequential and Scrum businesses have roughly the same basal defect rates.  I’m happy to re-consider this assumption if there’s research showing one being materially different than another.

Focusing on development defect rates, Traditional/Sequential business due to the large-batch, large-WIP nature and validation happening late in the release plan, means an on-going buildup of development defects in the code. These defects are uncovered when QA fires up the validation. As validation progresses, development defect rate spikes. Developers focus on fixes. The development defect rate drops until reduced to zero for the final validation.

Scrum business has a near constant rate of development defect rates. The reason being that the test pressures are constantly being applied to the code and developers must return the code back to known-good continuously. There is no on-going buildup of development defects. The feedback cycles to developers are much quicker. Any defects newly introduced are readily surfaced and fixed.

Some will give me eye-rolls while I am saying that while Traditional/Sequential development has spiky defect rates, the net number of bugs found and fixed will be similar to Scrum development over the same period of time. Looks like where heading to a tie on the Code Quality Contributor by admitting this, right? But wait. While the number of development defects of both methods may be the same over time, the faster time to detection and fix means that there will be a higher quality fix available sooner with less propagating impacts to other teams with Scrum.

Additionally, over the course of a Traditional/Sequential development effort, there will be scope change; the team may spend time finishing WIP and fixing defects that are no longer important to the release. Whereas with Scrum, the scope change would push down the product backlog items, so the team won’t have the defects for that now deprioritized feature because it was never created. Code Quality Contributor is better in Scrum over Traditional/Sequential.

Release Quality Contributor

A Traditional/Sequential business builds WIP at commitment until functional complete and starts to burn down WIP as validation takes hold. The WIP build up lasts multiple quarters with a quarter or so backend to burn WIP back down as the product returns to known-good at GA(n). In other words, once development starts on the release, the code is in known bad state until GA(n) where it momentarily returns to known-good.

In Scrum, the WIP is extremely small and contained within each Scrum team. They return the quality of their code back to known-good multiple times within a sprint and always before a story is ‘done’. In other words, the product is always expected to be known-good with short, frequent windows where a team has the code known bad.

Release Quality Contributor is better in Scrum due to lower WIP and time the product spends in known-good state.

Validation Quality Contributor

In Traditional/Sequential business, given the size and duration of WIP and the development defect rates, the impact on validation stalls is significant. Even with state-of-the-art validation and resources, the code is untestable during development and early validation. Validation resources must wait until the code reaches functional complete and still there are validation stalls due to the spike in development defect rates. In Scrum business, it has equally but opposite impact; the size and duration of WIP and development defect rates, impacts the validation stalls positively. Because the code is kept in near known-good state always, the validation can run non-stop. Even the most minor stalls become noticed immediately and have major impacts, so teams constantly are developing ways to keep the system operational even when their updated code fails. (Read up on continuous deployment with blue/green operations as examples.) 

Validation Quality Contributor is better in Scrum.

Customer Usage Quality Contributor

According to some Traditional/Sequential business Quality organizations, after GA(n-1) release is declared ‘target code’ the reasonable adoption rate of that type of code is approximately 15% per quarter. It takes the time needed to declare the GA release ‘target code’ plus up to 6 quarters to hit 100% adoption; approximately 2 years. Why so slow? I would argue because Traditional/Sequential businesses have conditioned customers to expect releases won’t work until they fix a few remaining bugs post GA or ‘target code’, and these organizations often make upgrades visible, opt-in, carefully planned, and resource intensive events. 

Scrum business adoptions rates push adoption to 100% within a sprint post GA. That’s a two-week period. If you want to account for the blue/green and phased automated pushes, the 100% adoption is achieved within two sprints or four weeks post GA. Why so fast? I would argue that it is because the code is kept in known-good state always, always under continuously improving validation pressure, under near 100% adoption with automated updated, rollback and phased usage, where customers never see the impacts of the failures. 

Customer Usage Quality Contributor is better in Scrum.

My tally shows 6 to 0 in favor of Scrum business. If you are wondering did I stack the deck or fake the comparisons, the answer is an absolutely no.

What has happened is that Scrum and other innovations have created a self-re-enforcing positive cycle where improvements in one Quality Contributor re-enforces improvement in another Quality Contributor. While any one Quality Contributor may only be incrementally better, the combination of all six creates a new powerful, dynamic in quality. Scrum businesses deliver demonstrably higher quality customer value every time.