Cloud Savvy

Monday, April 11, 2022

Strategy, Backlogs, Sprints and Scrum

Every business operates in a market and develops a business strategy to guide its way forward to capitalize on its strengths and opportunities. The challenge is how to operationalize the strategy in meaningful steps forward to achieve strategic goals as a single operating business.

A company can be focused on how to construct Scrum@Scale with a structured, hierarchical, product backlog that supports the business strategy. The business strategy is supported by a small number of Strategy outcomes. The Strategy outcomes are supported by Theme outcomes. Themes are broken down into Epics, and further into User Stories. Scrum Teams use this backlog context to plan their sprints and create the highest customer value possible.

Scrum is focused on short sprints with small increments of customer value delivered. The focus of a Scrum-based organization tends to be on ensuring Scrum Teams are always productive. With a two-week sprint, the drum beat of unrelenting sprint planning, daily standups, sprint reviews, retrospectives and grooming under the pressure of burn-down charts and accelerating velocity, organizations often become extremely tactical and struggle to get above the fray.

Organizations who become so focused on the urgent needs of the Scrum Teams, quickly lose sight of the business strategy and long-term customer needs. Enter the Executive MetaScrum or EMS! The company and Scrum@Scale empowers the EMS with the duties to get above the fray and guide the Scrum organization towards fulfilling the business strategy. However nice this sounds, leaders struggle to translate strategic direction into two-week increments for the Scrum Teams.

Before jumping into more details on how strategy and Scrum work together, allow me to develop an example of a business strategy and groomed backlog for a moment.

Let’s say that the business strategy talks about switching the business to a subscription based, software-as-a-service SaaS model and shifting away from a purchased software license model. Let’s say that the EMS has groomed a roughly one-year out Strategy outcome to get 80% of purchased software license customers using a new SaaS based model. This includes monitoring, support and an update tool that automates defect queries, recommends updates, assesses the likelihood of a successful upgrade and, eventually, guides the upgrade process. This Strategy outcome is the first of other Strategy outcomes on the backlog that continues the movement of purchased software towards SaaS. But this also means that any further Strategy outcomes are, at least, over a year away.

The business grooms the Strategy outcome into the following Theme outcomes. The Theme outcomes are abbreviated here and would be expressed in:

‘As a Kubernetes clusters admin overwhelmed by the complexity of dealing with so many applications, I only have 1/100 of my cycles available for any one on-prem application to be updated and need <some customer outcome>, and I know that I’m done when <some customer acceptance criteria>. ‘

For brevity, we have: A) Provide Kubernetes Admin Real-time monitoring and support information and B) Provide Kubernetes Admin update decision making insights, and C) Take over Kubernetes Admin update tasks. These are then broken down into the following Epic outcomes. Abbreviated here as: A1) SaaS set up (basic accounts, licensing access, customer information), A2) SaaS customer connectivity (SaaS to on-prem SW communication), A3) SaaS defect connectivity (SaaS to defect DB with correct filters), A4) SaaS update connectivity (presents basic information to customer on updates), A5) SaaS support chat, A6) SaaS defect search, A7) SaaS update search, A8) SaaS account resets, A9) SaaS customer signup promotion discount, A10) SaaS customer survey, A11) SaaS usage metrics, B12) SaaS defect notification recommendation based upon installed SW, B13) SaaS update recommendation based upon installed SW, B14) SaaS code currency metrics, B15) SaaS recording of on-prem SW update metrics, C16) SaaS auto-recording of on-prem SW update failures, C17) SaaS basic risk assessment of on-prem SW upgrade, and C18) SaaS basic ‘guided’ update of on-prem SW. Whew, that’s a lot of work to be done isn’t it.

I’ll spare the on-going details of breaking the Epic outcomes into specific User-Stories and tasks, but you can start to image various Scrum Teams from SaaS development, on-prem SW development, IT operations and support development involved in the on-going work break down into customer story increments, sprint goals, and sprint backlogs. As this work fans out across the organization, the complexity grows. All attention from the business to the individual engineers is consumed by the details.

Not bad, right? You can see how we would go about getting the Strategy done. Start A Theme. Do A1 then A2, A3, A4 and A5, etc. Follow with B Theme. Do B12, B13, etc. Then on to C Theme and Epics. Easy.

The sequencing of A Theme with Epics, A1, A2, etc. then B Themes and Epics, seems obvious to good planning. Doing out-of-order execution of parts of Themes and Epics seems to defy good predictive planning and good hierarchical structure. No one is that good at planning and grooming at onset, otherwise, we’d all retire at 30. Since we’re not that good at planning and grooming, expect to see a lot of out-of-order execution happening, as well as refactoring as we go. Still easy. Right?

Well, it’s worse. Since when do we only have one Strategy outcome and three Themes? The reality is that the business has 3 to 7 Strategy outcomes and up to 50+ Themes all happening in parallel. So, the on-prem SW Scrum Teams has 2 or more other Strategies outcomes and maybe 30 Theme outcomes, as do the SaaS Scrum Teams, the IT Scrum Teams, etc. The various stories, dependencies, short-term work, longer-term work start to interact forming a complex array of customer story increments, sprint goals and sprint backlogs. Yikes!

Let’s not also forget about sales deals, revenue pressures, demanding customers, longer term architecture projects, and technical debt. These too are pressing for attention to address their needs. This adds to the near-term focus pressure and takes away from the business strategy. Does this seem real enough yet?

Strategies can be simple. They can be complex. Depending upon our abilities, the execution can be straight forward or prolonged. Depending upon our competitors, the results can be extremely rewarding, muted and/or confounded. Let’s use some games to help us illustrate these points. Consider tic-tac-toe, checkers and chess.

Tic-tac-toe also known as ‘Noughts and Crosses’ or X’s and O’s is a paper-and-pencil game for two players, X and O, who take turns marking the spaces in a 3×3 grid. The player who succeeds in placing three of their marks in a horizontal, vertical, or diagonal row is the winner.

Draughts or Checkers is a group of strategy board games played on a checkered board with 64 squares arranged in an 8x8 grid, for two players which involve diagonal moves of uniform game pieces and mandatory captures by jumping over opponent pieces.

Chess is a two-player strategy board game played on a checkered board with 64 squares arranged in an 8×8 grid. Play involves no hidden information. Each player begins with 16 pieces: one king, one queen, two rooks, two knights, two bishops, and eight pawns. Each piece type moves differently, with the most powerful being the queen and the least powerful the pawn. The objective is to checkmate the opponent's king by placing it under an inescapable threat of capture.

In tic-tac-toe, the strategy is simple. The moves are simple. When the marketplace is simple and the first mover advantage is in play, then focusing everything on the first and second moves determines the winner. There are at most 9 total moves in the game. Moves are big decisions in the context of a game. In this case, having one Strategy outcome, a few Themes and everyone focused on one or two major outcomes is best. Very straight forward but not a typical business reality for big companies.

In checkers, there are more pieces, and the game board is bigger. However, movements are restricted, and capabilities must be earned. Checkers requires strategy and initiative refinements throughout the game. Strategy can be bold at the beginning until the opponent’s moves take effect. Moves are small and incremental, much like a sprint. Strategies are brought about by a series of incremental moves. Checkers is like a complex business market because players have strategy and potentially one or more outcomes they are trying to bring about. However, it can also be unlike a complex business market as products have infinite flexibility in how they are brought about and operate in the marketplace.

Are you noticing the pattern? The game board, opponent, strategy, play outcomes, pieces and moves are akin to a market, competitors, strategy, Strategy outcomes, Theme outcomes, and sprints delivering incremental customer value, respectively.

Let’s keep this pattern in mind as we look at chess. Chess is like checkers, except there are various capabilities of six game piece types. Pawns are numerous but are restricted to forward movement, one space at a time. The queen is most powerful as she can move any number of spaces in any direction per move. The other pieces have capabilities in between these two. The game strategy must take the game pieces capabilities into account and engage them smartly. Constant monitoring and adjustments are made to strategy and outcomes as the other player’s make their moves. Moving a piece to a desired location may take moves of other pieces before the desired piece can even move, during which time, the opponent is also making moves. This complexity is more like a business market and competitors. While we want to deliver certain outcomes, we’re inhibited and/or enabled by our capabilities, we must keep certain pieces on the board viable as we develop the next plays for other pieces, yet to be engaged in the game. We may be focusing on one area of the game board while our competitor is focused on another. While moves are technically visible, the strategy is invisible and constantly being re-evaluated. Every play, every sprint, seems minor but is important for the strategy of the long game.

Like all analogies, the chess game isn’t a perfect comparison to a real business strategy or market, but it illustrates well the value of constant re-evaluation of strategy and the cost/necessity of strategic change. It also demonstrates well how minor moves can upset an opponents’ strategy and/or outcomes. The most important lesson here is that chess provides a well-known, learning framework to explain how strategy relates to Strategy Outcomes to Themes Outcomes to Epics Outcomes to Stories (completed incrementally with each sprint), can build into a powerful market force.

Back to our earlier example, where our Strategy outcome is ‘to get 80% of purchased software license customers using a new SaaS based model, within a year’. Let’s call our Strategy S6 outcome as the following: A monitoring, support and update tool that automates defect queries, recommends updates, assesses the likelihood of a successful upgrade and eventually, guides the upgrade process

Adding complexity, let’s say that we have 5 other Strategy outcomes, S1 to S5. For ease, each Strategy outcome relates to a specific software offering that we already have (S1 to S5) or will have (in the case of S6). There’s no need to specify the S1 to S5 Themes and/or Epics outcomes in detail here, except to note that they have similar complexity of Themes, Epics, etc. as S6 has.

If we continue with the chess analogy, our marketplace is the chess board, and our major competitor is our opponent. We and our opponent have our products and/or offerings in the marketplace and are investing in them. The changes in the products reflect the piece moves.

Imagine that we have our offerings reflected by our Strategy outcomes that we eventually want to achieve over the course of the year (the game period). S1 to S6 outcomes are positioning of our six key pieces in the marketplace by year end. This means capability in position to take action for the coming year. (I’m using a year as a game play horizon for this post.)

Now, we must move our offerings to their associated Strategy outcomes through a series of moves based upon our capabilities, in the given period. While our business will worry about each Strategy outcome, for the purposes of this paper, we are going to let S1 to S5 randomly impact our ability to get S6 in position by year end (the Strategy outcome). Remember, we will also have impacts to our S6 outcome caused by our competitors moves.

The last part of the setup are the moves. A move is taken at the end of a two-week sprint with every Scrum Team creating Definition of Done-Production Ready (DoD-PR) increments. This completeness of increments is critical to the game because partial moves or the inability to move a piece when needed, would undermine the game. Equally, because we’re doing mature Scrum, each offering is ready to be moved at any moment in time. We are always in production-ready quality at the end of each sprint cycle for all offerings.

The point of small increments being DoD-PR at the end of each sprint for all offerings cannot be stressed enough here. Having work-in-progress (WIP) preventing us moving forward on the strategy, would be akin to moving a chess piece back and forth on the game board because that’s the only move available. Consider the cost of these wasted moves during the game.

Let’s look at how we’ll bring about our Strategy S6 outcome during the year using two week sprints or moves.

Remember our S6 Outcome is: ‘80% of purchased software license customers using a new SaaS based monitoring, support and update tool that automates defect queries, recommends updates, assesses the likelihood of a successful upgrade and, eventually, guides the upgrade process.’

S6 THEMES are:

A) Provide Kubernetes Admin Real-time monitoring and support information

B) Provide Kubernetes Admin update decision making insights

C) Take over Kubernetes Admin update tasks

S6 EPICS are:

A1) SaaS set up (basic accounts, licensing access, customer information)

A2) SaaS customer connectivity (SaaS to on-prem SW communication)

A3) SaaS defect connectivity (SaaS to defect DB with correct filters)

A4) SaaS update connectivity (presents basic information to customer on updates)

A5) SaaS support chat

A6) SaaS defect search

A7) SaaS update search

A8) SaaS account resets

A9) SaaS customer signup promotion discount

A10) SaaS customer survey

A11) SaaS usage metrics,

B12) SaaS defect notification recommendation based upon installed SW

B13) SaaS update recommendation based upon installed SW

B14) SaaS code currency metrics

B15) SaaS recording of on-prem SW update metrics

C16) SaaS auto-recording of on-prem SW update failures

C17) SaaS basic risk assessment of on-prem SW upgrade

C18) SaaS basic ‘guided’ update of on-prem SW.

For simplicity, I have clocked the S1 to S6 position changes for each sprint or move. One could imagine any one of these taking more/less time, and you see the pattern laid out to bring about the Strategy outcomes.

Strategy	Position	Moves
S6	1^st	Get A1, A2 done by the SaaS and IT Scrum Teams
	2^nd	Include SaaS A) in all S1 to S5 offerings’ licenses and drive volunteer customers to sign-up
	3^rd	Get A3, A4 and A8 done
	4^th	Get A6, A9 and A10 done, include SaaS A) discount if customers sign up account to use defect DB tool, incremental discount for update usage, and a final discount for automated updates.
	5^th	(and next position for S1) Get A6 done for offering S1
	6^th	(and next positions for S2 and S3) Get A6 done for offerings S2 and S3
	7^th	Get A5 done, drive usage as high as possible by offering support discount for customer defect queries
	8^th	(and next positions for S4 and S5) Get A6 done for offerings S4 and S5
	9^th	Get A11 and B13 done.
	10^th	(and next position for S1) Get B13 and B12 done for S1.
	11^th	(and next position for S2 and S3) Get B13 and B12 done for S2 and S3
	12^th	(and next position for S4 and S5) Get B13 and B12 done for S4 and S5
	13^th	Get A14 and B15 done
	14^th	(and next position for S1) Get B15 done for S1
	15^th	(and next position for S2 and S3) Get B15 done for S2 and S3
	16^th	(and next position for S4 and S5) Get B15 done for S4 and S5, get B13 and C16 done
	17^th	(and next position for S1) Get B13 and C16 done for S1
	18^th	(and next position for S2 and S3) Get B13 and C16 done for S2 and S3
	19^th	(and next position for S4 and S5) Get B13 and C16 done for S4 and S5, get C17 done
	20^th	(and next position for S1) Get C17 and C18 done for S1
	21^st	(and next position for S2 and S3) Get C17 and C18 done for S2 and S3
	22^nd	(and next position for S4 and S5) Get C17 and C18 done for S4 and S5

Did you keep track of all the moves and complexity in the 22 sprints? Pretty difficult but if you do tick off the Themes and Epics, you see that by year end, S6 is in good shape to fulfill the outcome.

Now, try to imagine a situation anywhere above where we are unable to deliver our next move. Having work partially complete (WIP) does this. Then imagine the accumulation of this WIP preventing other pieces from moving. See how much more complex the sequencing becomes. Therefore DoD-PR and small increments must be reached and maintained across the Strategies and Scrum Teams.

Now let’s see how much you were following.

Did you notice that A7 was not implemented? As we laid out the steps, the plan was modified to no longer allow the customers the ability to query for updates (the feature usage was expected to be short lived, and the feature wouldn’t drive any incremental customers to sign up). Not doing A7 is not a failure, rather, it is a saving. Doing something unnecessary is a form of waste. Seeking these refinements during the year means retiring Strategy outcomes faster.

Did you notice that measurement of the Strategy outcome A11, was not completed until the 9th move? While critical to management and/or the outcome, it was deemed less valuable until enough of the SaaS offering was completed with the S1 to S5 offerings’ work was integrated before work on measures was taken on. If necessary, the A11 work could have been moved out in time as needed.

Equally, you can imagine that as we started to work this plan, that the competitor of the S2 offering makes a change, putting pressure on the business to bring our S2 offering to market sooner, even to the point of delaying other offerings to the following year.

Also, you could imagine that while we’re clocking away incrementally, that S5 work is always highly problematic due to its extreme complexity. The plan could be re-worked to move all S5 work to beyond the S6 outcome period to isolate this risk. We could even go further by deciding to stop all work on S5, effectively, sacrificing S5 to the market forces in order to ensure S1 to S4 and S6 outcomes.

Next, imagine that the other Strategy outcomes have an equally detailed sequence of moves, internal difficulties and competitors moves to consider. You can see why as we’re engaging the marketplace competitor, we must continuously reconsider the moves, reconsider the ordering, and/or reconsider the Strategy outcomes.

What makes this all possible is that the ordering is thought through and written down, moves are small production increments, and when change is needed, the context and impacts are readily understood. The changes can be communicated with the rationale across all Scrum Teams consistently, so everyone knows the state of play.

Equally, you can see why the Executive MetaScrum (EMS) review of strategy and Strategy outcomes is both constant and on-going.

The EMS role is to reconsider with each sprint, the validity of keeping Strategy outcomes unchanged and the Theme outcome priorities. This does not mean changes with each sprint. The EMS must even be open to retirement of a Strategy when there is sufficient progress made, and the business needs another Strategy outcome started.

The metametaScrums’ reviewing Strategy outcome fulfillment is equally engaging for their level of work. Same for the metaScrums at the Epics and User-Story level. You can see why each Scrum Team reaching production-ready for each User-Story within a sprint helps the business’ flexibility. Equally, incremental progress of Epics aids the understanding of progress on Themes, which aids the understanding of progress on Strategies so each level of metaScrums make the right decisions on retirement, changes and on bring forward new outcomes.

While in our example, we set the Strategy S6 outcome to be ‘80% adoption’, you could see that goal raising or lowering during the year, based upon strategy and/or other Strategy outcome changes. Additionally, had we hit 90% adoption with just Themes A and B, we might have delayed the work on Theme C and moved out the associated discounts to help profit margins in the current year.

This raises the point that often Strategy outcomes are ‘gray’ because of the time horizon (a year out) and unknown nature of the game play during the year. While 80% is concrete, the EMS retains the ability to adjust the goal and/or outcomes as the year plays out.

Does this mean that Annual Planning cycles are meaningless to the business? Annual Plans remain critical to set annual goals and align various functions like Sales to reduce on-going thrashing and churn. Annual Plans do change, albeit infrequently, as the year progresses, and new understanding of reality happens. You can imagine that while Annual Plans may change infrequently, that Strategy outcome changes might occur a bit more frequently, that Theme outcome changes might occur much more frequently and that Epic outcome changes happen very frequently. The key is constructing a systematic and meaningful way of assessing, controlling and communicating the changes.

Finally, let’s head back to tic-tac-toe and checkers. While these games are simpler and the strategies to win are easier to master, there is less flexibility to tolerate mistakes or to overcome obstacles as they materialize. The challenge of chess is its flexibility is derived by its complexity to bring about the outcome. Chess requires a deeper comprehension of the game, the pieces, the strategy and each move. Winning in a multi-billion dollar marketplace requires a similar deep comprehension.

Every move must count!

Friday, April 8, 2022

T-shaped Development Model

(Note: this posting is written in story format. The situation described did occur.)

‘It cannot be done. We can prove it to you. When is your next visit?’ They sounded almost giddy. My engineering team had been hard at work in their Scrum transformation. They were doing well with shift-left testing, story grooming, spikes, sprints, planning, reviews, retrospectives, and agile architecture. They were solid on implementing small increments of work to known good. Their next hurdle was, to completely implement and release stories within a two-week sprint. After attempting this for a fair number of sprints, they had proof that a story could not be implemented within such a small window of time. I was heading out for my routine visit, so I scheduled the meeting to hear their proof.

A bit of back story on this team. They were a Scrum team who was one of 20 teams who were building a private cloud offering with fully automated virtual and physical provisioning with full ITIL integration and reporting. The 20 teams were dispersed in the US and India. They had deployments in production with hosting providers and end-user customers. The architecture of the product was complex and built initially with 3rd party components. The team was removing the costly 3rd party components and replacing them with their own IP and open source to lower costs and improve integration. They were over 2 years into Scrum adoption and were reaching maturity. The team with the proof had two of the smartest engineers on the project.

As I walked into the conference room, the team was prepared. They knew that that they challenged me before and walked away with more to learn. However, this time they had everything worked out. They started to lay out their case. 2 days to do the architecture review and ensure that the solution the change would pass muster. 3 days for design work and working any dependencies across components. 4 days implementation. 2 days for documentation and 3 days for test and final integration. Add them up, we have 14 days of work. A two-week sprint is at best 9.5 days of work. The work cannot be completed within a sprint. Even if they could reduce a day off each (except of course documentation because that goes to another group), they need 10 days. This still doesn’t fit. They went even further. They tried to break the user-stories into even smaller increments. They showed how they reached the point of diminishing returns where the overhead of architecture checks and validation did not justify the amount of work. They tried to reduce the implementation time. But, they needed time to verify the implementation could be done and adjust the design as they implemented. They had to ensure that everything was consistent before going to documentation. They knew that they had me. They rested their case.

I responded that they were, in fact, correct. Given how they did the work, really a mini-waterfall process, that they could not reduce the time any further. They started to smile. I saw a fist pump in the air from one of the junior team members. However, they could, in fact make the work fit within a two-week sprint if they re-ordered the work, even for the example of the longest time. They stared at me. The fist dropped.

I went to the whiteboard to write. The 2-day architecture review would stand on its own. The 3 days of design work, 3 days of test development, (this is known as test driven development, I prefer test driven design), and 2 days for documentation could, in fact be done in parallel. The final step is doing the implementation exactly to the design, documentation and to pass the tests, all unmodified. This adds up to 2 days plus 3 days plus 4 days or 9 days. It fits within the 9.5 days for the two-week sprint. In their example where they trimmed a day off the steps, this shortens to 6 days, well within a two-week sprint. I termed this style of out-of-ordered work, the T-shaped development model because architecture verification is the left arm of the T, design, documentation and test make up the center of the T and are done in parallel with implementation the right side of the T.

They challenged.

The team learns a lot during implementation. They adjust the design, make improvements and sometimes renegotiate with the PO on the story or user-outcome. Doing the implementation at the end, exactly to the design, documentation and test, means that this cannot be done. I confirmed that the T-shaped development model would prevent this learning iteration during the implementation. I claimed that preventing the iteration is exactly the right thing to do. Scrum values transparency. If the team is doing improvements hidden away from the PO and the business, how do the stakeholders via the PO weigh into the discussion? The T-shaped development model requires that any learning made during implementation goes into the backlog as future improvements or increments. All future changes, improvements or new features in the backlog are prioritized and groomed based upon value. If the ideas uncovered during implementation are valued sufficiently, the PO will move them up against other backlog items. If the ideas uncovered during implementation aren’t of sufficient value, they are never implemented. The team is rightly directed to higher value backlog items. Hence, the team avoids waste of unvalued work/features delivered to customers when using the T-shaped development model.

They challenged more.

They use the implementation to know what can be done. I pointed out that if they shift to Test Driven Design where they implement the tests (that all fail at onset, of course) based upon the design and end user documentation, they can reach a similar understanding of what should be done. Since there is no implementation yet, they can adjust the design, documentation and/or test faster. If necessary, the PO can review the documentation to verify that, when done, it meets the user outcome that she is expecting. The implementation is just done to those exacting expectations. I argued that during implementation, the engineer is 100% focused and knows exactly what is needed. There is no second guessing. The engineer knows exactly what ‘done’ is. The implementation quality will be higher.

They weren’t done challenging.

They cannot engage documentation in such a fashion. The documentation team only works on final increments and in batches from all the Scrum teams. I agreed. Now is the time to change how documentation is done. Historically, documentation owned the content, editing (editor role) and delivery mechanism. Now, documentation content is done by the Scrum team, incrementally. The documentation team takes on the role of enabler of Scrum team developed content to flow through to customers in a consistent and incremental fashion. The Documentation team moves from editor to publisher. They own and set the standard (really an aspect of architecture here) for how content flows in a coherent fashion from Scrum team to customer.

The team paused and thought. They moved to acceptance.

They agreed that T-shaped development model, if done, would enable the team to implement stories within a 2-week sprint. To their credit, they responded that they now understood how work re-ordering could be done and would result in a viable, well-engineered outcome. They did not know how to do it. They agreed that they would take on this approach and figure out how to make it work.

It took the team about six months to master the T-shaped development model. They had to hammer out an agreement with documentation on this new approach. They invested in mastery of TDD tools and methods. After they got the T-shaped development model working, they did achieve stories being done within a two-week sprint. They also discovered that there was leverage in their work one story to the next story. They increased their velocity. They found that they could lower the overhead of architecture, design, TDD, documentation and implementation too because they took out time built in for iterations and improvements. They used the backlog and grooming instead. They no longer held the 14-day, or even 6-day minimum cycle time due to their previously claimed ‘overhead’ dogmas.

Thursday, April 7, 2022

Engineering Management Is Critical To Scrum

Who’s accountable? Who’s responsible? Who’s leading? These are key questions to every organization’s basic design. In traditional, functional organizations, Engineering Managers are accountable, responsible and leading development activities. In Scrum-based organizations, the answers may be different than one might expect.

Traditional, Functional Organizations

In traditional, functional organizations, Engineering Managers are accountable, responsible and leading development activities. If there’s an issue or commitment needed, just engage the Engineering Manager to resolve it. Move up the organization as needed. Simple!

Or is it really? As we’ve discussed in the Affecting Change blog posting, traditional, functional organizations are highly complex due to their process, planning and functional specialization. Seldom are Engineering Managers solely the accountable, responsible leaders. One must consider who owns architecture, product requirements/priorities, process and personnel management to truly address who’s accountable.

Scrum-based Organizations

In Scrum-based organizations, the who’s accountable, responsible and leading questions exist and have different answers than one might expect. While this blog posting is focused on Engineering Managers’ role, let’s step back to Scrum, as defined by ScrumAlliance.Org.

At the core of Scrum is the Scrum team. In Scrum, the team has been provided everything that they need, as a self-organized team, to convert a Sprint Backlog into production quality increments of customer value. This includes the right talent/skills, tools/methods, architecture/design and product backlog. The Product Owner sets the sprint goals and product backlog. The Product Owner is responsible for the ‘what’ and the Scrum team is responsible for the ‘how’. The Scrum Master is responsible for making sure that the Scrum Team follows the Scrum process correctly, that would be: Sprint Planning, Daily Scrum Stand-ups, Impediment Management, Sprint Reviews and Sprint Retrospectives. Beyond these responsibilities, Scrum is silent because each organization will approach the specifics of team formation, tools/methods, architecture/design and product backlog construction appropriately for their business needs.

A key question from Engineering leadership has been, is that all there is to the Engineering Manager role? Just some talent management and technical advice? Fortunately, the answer is that there’s much, much more to the Engineering Manager role.

Taking a step back

We have been exploring the differences between large-batch, waterfall development and small-batch, Scrum development processes. Let’s review several of the differences before we jump back into the Engineering Management responsibilities discussion.

There are, at least, six major shifts in how work is done when going from large-batch, waterfall development to small-batch, Scrum development and these are: Batch Size, Scrum Management, Scrum Maturity, Teamwork, Architecture and Product Backlogs.

The Batch Size change from a multi-month or multi-year development effort to engineering effort measured in days, is major. A stepwise approach from large batches to small batches where the Definition of Done is Functional Complete, then to small batches where the Definition of Done is System Test Complete, and then to small batches where the Definition of Done is Customer Ready requires a coach and guide. The re-engineering of how engineers work to reach each of these steps is extremely large and complicated.

Organizations often create Scrum management guidelines to describe precisely and explicitly how Scrum is done at in their organizations at scale. Scrum management guidelines must build strictly on Scrum as defined by the ScrumAlliance.Org, that builds on Agile principles. Each of these layers adds constraints and agreements as to how Scrum Teams operate, especially across teams and organizations. Scrum management guidelines will refine over time as the Scrum capabilities mature and as teams reach new agreements on how to work. Scrum management guidelines must be understood and explained to all engineers and teams.

Scrum roles are complex and take time to master. Scrum management guidelines specify the Scrum Maturity Model for all roles, including the Scrum Team. The Scrum Maturity Model is used for teams and individuals to understand all aspects of their roles and to self-assess on how they are doing. The models are used in retrospectives as potential areas of improvement and development.

Teamwork often is not discussed widely within organizations and is important to Scrum. Briefly, there is a stepwise progression to teamwork: individuals, pairs, teams and team of teams. Most traditional organizations are focused on the individual and their individual performance. In Scrum organizations, frequently, they have individuals paired in all their development activities (often called pair programming or pair development). Pair programming has been demonstrated to deliver higher quality code, faster learning and more creativity. Pairs are formed into Scrum Teams who work together to achieve the Sprint Goals based upon the prioritized Sprint Backlog. Scrum Teams have demonstrated self-organizing, self-management, quick learning, high creativity and high productivity. Scrum holds the team accountable for their decisions and results during a Sprint Review. Scrum@Scale is based on the concept of Team of Teams (as popularized by McChrystal’s book, Team of Teams) and extends the basic Scrum models for scale.

Another major shift is how Architecture and Design principles are approached and developed. In Mary Poppendieck’s book, Lean Software Development, An Agile Toolkit, she describes Built-in Integrity (Chapter 6). Built-in Integrity has both perceived integrity and conceptual integrity with the ability to allow refactoring as knowledge/understanding builds. The key is to be clear on your core principles for how you are approaching architecture, then be clear on the customer-perceived integrity (or user model) and then specify the conceptional integrity (or offer/product/system model). Both integrities must allow for learning and refactoring. With this, comes the need to allow teams to operate correctly given the principles, perceived integrity/model and conceptual integrity/model at scale. Exactly how this should and will be done has to be guided by Engineering Managers and Architects.

The last major shift, is how product requirements are specified, prioritized and groomed. Organizations groom the product backlog in the following manner, starting with the Strategy, deriving strategic outcomes, that derive Themes, Epics and Stories. POs and their teams will groom Epics, Stories and Tasks.

Areas of influence and roles

A key aspect to all roles is how they interact and influence others. This needs to be considered as we look at four key roles: Engineering Manager, Product Owner, Architect and Engineering Services (tools, build, integration, validation and DevOps). While each own their specific area, they also each interact, collaborate and influence the others. For example, Engineering Managers will work with Product Owners to help with grooming, dependencies and team capabilities. Similarly, Architects will work with Product Owners and Engineering Managers. Everyone will collaborate with Engineering Services, as needed.

Scrum works because Scrum Teams are given everything that they need to convert Sprint Backlog Items into customer value within a two-week sprint. While this is simple to state, there is a tremendous amount of work done outside of the team to make this statement a possibility.

There are four key external-to-the Scrum Team critical inputs that ‘enable’ Scrum Teams to accomplish their mission. These inputs are a well-groomed Product Backlog, written agile architecture, engineering services and engineering management. When these four inputs are well done, Scrum Teams have the potential to function correctly as a Scrum Team. If any of these inputs are poorly done, Scrum Teams will struggle, if not outright fail. As you would expect, there’s tremendous influencing and collaboration happening between the Scrum Team, Engineering Manager, Product Owner, Architect and Engineering Services to ensure the inputs are exceptional.

Isn’t this about the Engineering Manager?

In addition to the importance of the Engineering Manager, let’s highlight other major areas of responsibility.

First, the Engineering Manager is ultimately responsible for ensuring the Scrum Team is correctly formed, skilled, self-organizing, functional and enabled. As noted above, the Engineering Manager must be fully aware of the six major shifts and where the team is on their journey to mastery with regards to each shift. While the Product Owner is responsible for the product backlog, and the Architect is responsible for architecture, the Engineering Manager must guide the team, the Product Owner and the Architect through their collective transitions, learning and mastery. No one else has this role and no one else has this perspective.

Second, the Engineering Manager is key to ensuring that the critical inputs to the team are of exceptional quality. If the quality of the team inputs is an issue, the Engineering Manager must identify and communicate root-causes, corrections and next-steps with leadership.

Third, the Engineering Managers are engaged directly with the Scrum Team only when they must act and are never engaged directly when the Scrum Team is successful. Let’s break this down. When Engineering Managers must intervene directly in the Scrum Team, they are doing so because the Scrum Team is not well formed, or not correctly skilled, or not self-organizing, or not functioning. Whenever the Engineering Manager is active directly within the team, the team is failing in some respect. Conversely, whenever the Scrum Team is creating customer value per their role with Product Owners and others, and the Engineering Manager is not active in the Scrum Team, the Engineering Manager has a well-formed Scrum Team.

To achieve this, an Engineering Manager must work ahead of the team in each of the six major shifts by working with their peer Engineering Managers, their Engineering Leaders, their partners (Product Owners, Architects and Engineering Services) ahead of the teams’ maturing. Hence, the Engineering Manager role is fulfilled by leading the change into each of the six major shifts, working with those responsible to move the organizational maturity forward.

Examples

As teams are finishing up the last of their big-batch WIP, an Engineering Manager must be working with the Product Owner to guide them to small-batches, work with the validation team on better system test, work with architects on better articulation of principles, etc.

An Engineering Manager may intervene with the Scrum team to explain how their retrospectives only concern external entities that are beyond the control of the team, and to focus the team on what they can control and where the team can improve.

An Engineering Manager should jump into a failing Sprint Planning meeting to explain what the Product Owner is struggling to explain, help the Scrum Master guide the meeting, and get the Scrum Team into the right planning behavior so that the Scrum Team takes over by the end of the Sprint Planning Meeting.

In short, the Engineering Manager must lead the change, the maturity, the mastery of all things Scrum while knowing when the time is right to let the team and individuals fly solo until the team is ready for their next step in their maturity journey.

Thursday, March 31, 2022

Large Organizations Affect or Effect Change?

Do you remember when to use the word ‘affect’ and when to use the word ‘effect’? ‘Affect’ is normally a verb and means to cause change, whereas ‘effect’ is normally a noun and means the result of the change. You got it! Easy, right?

Let’s expand that thought a little. How about ‘affecting change’ and ‘effecting change’? Using ‘effect’ as a verb jumps out at you, doesn’t it? Let’s go to the all-knowing web, where we find, affecting change and effecting change are not the same. ‘Affecting change’ means the change was already in progress when an external force acted on it. ‘Effecting change’ means that an external force caused the change.

This subtle but important difference is at the core of this blog post and helps explain why change in Scrum-based organizations is different than change in traditional, functional organizations. Simply put, one affects change in Scrum-based organizations whereas one effects change in traditional organizations.

Traditional Organizations

Many of today’s traditional organizations can trace their roots back to the explosive growth of the automotive industry and the days of Alfred Sloan around the 1920s. We still have many of the vestiges from those early organizations: functional organization, large product development cycles and formal processes. Generally, traditional businesses organize functionally. They have sales, marketing, operations, manufacturing, engineering and quality assurance departments. The concept is to bring together similar expertise, areas of responsibilities and hierarchical leaders to bring about highly efficient, effective and consistent decisions and results.

Traditional functional groups develop processes and decision frameworks to engage other functions. The cross-functional lifecycle processes, like a Product Development Lifecycle, are used to make coordinated, consistent agreements between functional groups. As functional organizations grow, they develop processes and decision frameworks to work consistently within their function. They create engineering lifecycles, engineering program management, and specialized functions within their organization, for example software engineering, hardware engineering and quality engineering. Within these specialized functions, they create processes and decision frameworks, for example, design reviews, code reviews, and product build processes.

As these processes are developed and used, they become well-known and common across the traditional business and within the functional organization. The process effectiveness is constantly evaluated and, when necessary, follows a modification process. Eventually, some processes become codified. Some processes even become standardized with an external committee who monitors for compliance.

Why are these processes needed? Simply put, they are needed for consistency of execution and to facilitate commitments between functions and groups. Traditional organizations develop products that are time phased in decision making between the functions.

For example, Product Marketing gains approval for funding and requirements at a commitment phase. Engineering takes the lead to develop the product. Product Marketing engages to price and position the resulting product. Manufacturing gears up production. Corporate Marketing organizes the product launch. Sales does sales/channel training and takes orders. Support fixes what breaks.

It is unclear if traditional organizations created large product development efforts or conversely, large product development efforts necessitated traditional, functional organizations. My guess is that they evolved together over time. Large development efforts or big batch releases often use a Waterfall development process. With big batch releases, we build up large amounts of momentum, or work-in-process (WIP) and use a Product Development Lifecycle process to keep functional organizations aligned.

This creates a constant fly-wheel of overlapping programs under constant development as the organization delivers new products to customers.

An unintended consequence of functional organization is a lack of transparency. As a functional organization develops its processes and codifies them, the deep understanding remains within the functional organization. The details of how a functional organization operates can be obscured behind this ‘process jargon’. The functional leadership can use their organizations’ process jargon to provide updates, thus undermining communication of information. This is a key reason why program managers, as arbiters of organizational jargon, are used to effectively translate to increase communication and visibility between functions.

Scrum-based Organizations.

Scrum-based organizations are relatively new. Scrum first jumped on to the scene in 1986 when proposed in a Harvard Business Review article. Then again, when the Scrum development process was published in 1995 by Sutherland and Schwaber at OOPSLA (Object-Oriented Programming, Systems, Languages & Applications) conference in Austin, Texas. The first Scrum book was published in 2001 and ScrumAlliance.Org formed in 2009. The industry really has yet to perfect how to build a large business based upon Scrum. Given the relative success of Amazon, Google and Microsoft, some large experiments are underway.

At the core of Scrum are transparency, small batches of work, constant improvements and constant change. All desired future work is kept in a visible, prioritized and incrementally/well-groomed backlog. All work progress and results are equally visible and inspectable.

The work batch size is kept very small where increments are completed by a team within 2 to 4 weeks, with the preferred time duration being 2 weeks. Each time duration, or sprint, has a retrospective at the end where the team seeks improvements to employ for the next sprint. Since the duration is short and the batches are small, the teams can wrap-up work in short order and pick up new work quickly.

This doesn’t mean that Scrum organizations are devoid of strategy, architecture, documentation, functional specialization, and processes. They have these too, however, what is at the core of a Scrum organization is small batches and constant change. The amount of work-in process, the amount of time from executive decision to initial results, and the amount of complexity is relatively lower.

Did you happen to notice that I didn’t say anything about the function using Scrum? Scrum isn’t just for product development. There are company boards of directors who use Scrum, as do marketing departments and sales departments. So, one could imagine a Scrum organization from top-to-bottom, across all functions.

Given the small batches, bringing each of those to market quickly means communication must happen frequently. Alignment of priorities happens transparently with dependencies and work ordering factored into the backlog grooming and planning. To make this happen, you can see that functional organization processes take a back seat to cross-functional work output. Transparency is required from team to team, function to function and lower-levels to upper-levels.

The small batches are key at keeping organizational risk minimal, since any loss due to failure or redirection is relatively small. Small batches require frequent communication between functions to keep the flow of value to customers going.

How change happens in traditional organizations.

Traditional organizations, realistically, don’t change much. If you look at large high-tech companies, we can all list companies who failed to see a technology shift and closed shop. Large high-tech companies aren’t the only ones to resist change. Many companies are struggling today with the shift to online shopping or Uber-ization of their industry.

Effective change happens in traditional organizations only when there’s strong executive support and concerted focus on making the change happen. For example, some companies have a Business Transformation organization that enables cross-organizational change management. The change process has been formalized to consistently and effectively engage with all organizations to make change happen. Once change has happened, the traditional organizations tend to standardize on the resulting adjusted process. To force change again, a new change effort must be undertaken.

By looking at how a traditional organization is built, we can see why a concerted effort is needed. There is a large amount of hysteresis (the dependence of the state of a system on its history) built into the functional organization. The cross-functional and internal processes being codified means that carefully applied efforts must be taken before modification can happen. The large batch size of development programs magnifies any impacts due to change. Lastly, with overlapping development programs, there never seems to be a good time with low risk for learning and improvement.

One can see why strong executive support with a specialized transformation organization is required to effect change in a traditional organization. Transformation effects change in a traditional organization.

How change happens in Scrum-based Organizations.

Scrum organizations effectively have change built into their DNA. With transparency and small batch sizes, organizations have a low-risk threshold to adjustments and the basic demand that change is required. Change is simply done by refining the backlog, adjusting the priorities, etc.

There’s no longer a need for an external force. The internal mechanism for change always exists. Thus, one affects change in a Scrum organization.

If Scrum organizations have strategy, architecture, documentation, functional specialization, and processes as well, why don’t they suffer from the same change problems as well?

Changing the cross-functional processes will be complex and will likely require the same senior executive support and an external function to drive change. Scrum organizations do have the benefit of transparency, small batches and no large, overlapping development programs that lowers the risk of change. The level of executive sponsorship for change and the risk of change can be taken on at lower levels of the organization.

Traditional Transformation and Scrum Organizations, can they co-exist?

Transformation for a traditional organization often requires an external transformation organization with senior executive sponsorship to take on the risk management of the resulting change. To do this, the transformation organization will create a support structure to make visible the plans and risk to mitigate impacts. As needed, the transformation organization will work cross-functionally and at all levels within the organization to achieve successful change. The transformation organization becomes a temporary scaffolding to support the traditional organization until the change is done and accepted.

In a Scrum organization, the visible product backlog with the Product Owner as the sole interface to a team, is key to keeping the complex system of small batches and teams functioning. If another, albeit temporary, transitional scaffolding is put into place for change, the Scrum organization will fail to function as teams will effectively have two or more PO’s.

Transformation takes place when the organization’s strategy necessitates change and new strategic steps, and theme outcomes are added with business outcomes clearly required. (Note: recall that strategic steps are achieved with 1 to 3 years of effort and are retired as themes/epics are achieved.) Themes can be added to facilitate change at a lower level and at a quicker pace. Even quicker and lower, epics and stories can be added to the backlog.

According to how the backlog priorities are set, Scrum organizations take on change. Hence, if a new shift in how a product is brought to market, with the right strategy, strategic steps, themes and epics, along with the change in priorities, the Scrum organization will respond within a few sprint cycles. The amount of disruption to existing work will be small. The impact to previously high-priority pillars, themes and, possibly, epics will be instantaneous. Caution is required when making quick changes at such a high level.

Transformation needs to be ‘groomed’ into the backlog just as other customer value is groomed. As the strategy changes, as business needs come to light, as competitors respond and as we innovate, transformation is integrated with planning to utilize the change mechanisms natively inherent in Scrum organizations.

Transformation affects change in Scrum organizations.

Wednesday, December 8, 2021

What of Quality?

In our industry, we have conferred the word ‘quality’ to mean what the Quality Organization does, in other words, we have defined quality to mean product defect tracking, measurement, and prediction based upon testing and validation of functionality. This has been the norm for over 70 years, ever since the term ‘bug’ was introduced into our software jargon. Now is a good time to step back and ask, what are all the attributes that determine product quality? Is product quality more than defect rates over time?

From a business perspective, customers consume our software product that creates value for them and therefore delivers revenue to us. The value delivered to the customer is what enables us to continue to create on-going new capabilities for them. What if we assign our definition of quality to the value that we continuously improve and deliver to our customers? If you agree, then we can ask, what aspects of our process helps us to improve value, hence quality? Remember, we make business decisions to fund developers to write code delivered in releases that are validated and consumed by customers. If you tend to agree with this, we could inspect Quality by inspecting aspects that create Value, those being Business Decisions, Development, Code, Releases, Validation and Customer Usage. Let’s call these six aspects Quality Contributors.

Can we compare Quality Contributors between organizations doing Traditional/Sequential development and operating businesses, and Scrum-based ones? We could deploy the hordes of analysts to gather and process all types of numbers. Still not sure that we can address the apples-to-oranges comparison problem.

Maybe we can use relative comparisons based upon the fundamental behaviors or trends on how Traditional/Sequential and Scrum-based organizations create Customer Value. What if the relative comparison of the six Quality Contributors shows that one approach is relatively better in four of the six Quality Contributors. Would we consider that the quality is higher in one approach over the other? Maybe. Let’s try this thinking and hope that we side-step a three-to-three tie.

Before we start, allow me to specify that in this comparison, we are talking about world-class, mature Traditional/Sequential and Scrum organizations who have mastered their respective methods. There are examples of both types of organizations who have mastered what is presented here. There are no magic-happens-here gaps in what is compared (albeit, there are gaps in how any one organization may do Traditional/Sequential and Scrum methods today). This posting does not contemplate mixed modes, like Kanban, Scrum-Fall, SAFe variations, Spiral, etc.

The Business Decisions Quality Contributor is defined to be ‘time from business commitment to customer availability, and the cost of changes during that period’. For example, a business will study the market, determine a list of required features, develop a comprehensive plan and at some point, early in development, make a business commitment to the program. This starts the clock ticking. After this time, the cost of change means the cost of disruption and adjustment to the plan. The cost of change could be large depending upon how far along the program is and how impactful the change.

The Development Quality Contributor needs a bit of an introduction before we define it. A key question is, ‘When is quality created during development?’ Some argue that quality is tested in by QA. I argue quality is validated by QA. QA doesn’t create or test in quality. Rather, quality is created in the mind of the developer once they comprehend the customer need and before they finish the last line of code.

There is a time lag from when the developer finishes writing that last line of code and knowing that the code has been validated as ‘known-good’. Prior to knowing the code is known-good, if a defect is found, the developer must recall precisely the customer need and their code to correctly fix the defect. This period is the ‘Risk of Recall’. We know that developers who are distracted, even for short periods of time, must expend effort to re-engage the creative process. We know that the longer the time between creation and recall, the more intense the effort to recall the details and fidelity of the work. Erroneous recall can create follow-on defects that are unintentionally added with the fixes. At the point of GA, the developer knows that their code is known-good. At this point, they have the ‘Freedom to Forget’.

For this paper, the Development Quality Contributor is ‘the magnitude of Risk of Recall over time until they reach the Freedom to Forget point’.

Let’s go to the next Quality Contributor, Code. All code has defects and defect densities. Let’s break the defects out into two groups. Going back to code quality and defect densities studies of the 1970’s and 1980’s where they found based upon the original code’s design and implementation, there is a ‘basal’ rate of defect discovery that is relatively unchanged over time. Even with intensive testing and defect fixing efforts, basal defect rates remain constant. The only method that fundamentally changes a component’s basal defect rate is to re-design and re-implement (from scratch) the component. What they don’t guarantee us, is that after doing the re-design/re-implementation the basal defect rates will be lower. In fact, the basal defect rate could be higher. The basal defect rate is the first component of the Code Quality Contributor. There are also ‘development’ defects added during development and discovered during validation prior to the code being known-good. The rate of discovery of development defects is the second component of Code Quality Contributor.

The Code Quality Contributor is ‘the development defect rate plus the basal defect rate over time’.

The Release Quality Contributor is how much work accumulates in partially done state that must be completed before the release is shipped, or work in process (WIP) over time. Let’s look at manufacturing for a comparison. Assume a business does assembly of the same product at two factories, each factory has four manufacturing steps with a capacity of 1,000 products produced per month.

Factory A first assembles the month’s 1,000 subassemblies through Step 1. Factory A then takes the 1,000 subassemblies through Step 2 then through Step 3. Hopefully, by the end of the month, Factory A finishes the 1,000 products by completing Step 4. Factory A queue depth is 1,000 at each step in the process. Factory B however, takes one subassembly through Step 1, moves it to Step 2 before starting another at Step 1. Factory B continues this process as the subassembly moves to Step 3 and Step 4. The Factory B queue depth at each step is one. Hopefully, Factory B’s rate is 250 products shipped each week to reach 1,000 products shipped by the end of the month. Factory A has accumulating amounts of work-in-process (WIP) peaking at 1,000 subassemblies during the month. Factory B has at most, 4 work-in-progress (WIP) subassemblies at any moment in time. While both Factory A and B are shipping 1,000 products per month, lower WIP is considered consistent with higher quality. If a process or material defect is discovered, say during Step 4, Factory B with lower WIP will discover the defect sooner, total rework is lower, and waste is minimized.

As in manufacturing, product development cycles also carry WIP. WIP is the partial work accumulated until the product is ready for shipment to customers. WIP in development, like a factory, is an indicator of quality and represents risk due to the unknown work remaining. In development, we don’t have discrete product development steps easily discernable as manufacturing steps. Nor are we able to inspect software WIP as easily as subassembly WIP. To ensure that we have finished all WIP, we run validation tests to ensure no WIP remains before the product goes to market. We are able to discern the effort placed in development (WIP building) and the amount of valuation completed (WIP decreasing) until the release is back at known-good.

We will use the definition ‘development effort less validation effort until known-good is reached over time’ as our Release Quality Contributor.

Validation is the process by which we come to know that the product is known-good and ready for customer consumption. Validation proves that all WIP has been completed, otherwise a flag (a development defect) is raised for development to ensure that the work gets completed before shipment. Let’s assume that Traditional/Sequential and Scrum-based validation processes are rigorous and equally funded. We will inspect the utilization of validation resources and frequency of validation stalls. A validation stall is when the validation process is stopped and reverted to a previous state due to a failure in the product under test. Validation stalls create inefficiency of testing and, potentially, schedule impacts. Resiliency of the validation process can minimize validation stalls, and our assumption is that the validation process is world class. However, defects will periodically halt validation.

The Validation Quality Contributor is ‘the percentage of validation resource utilized and rate of validation stalls over time’.

‘Code Currency’ is a major industry topic. Code Currency is defined as the percentage of customers who operate ‘current’ or GA(n-1) code. (if Finance was hoping that currency was about making more money with the code, they were disappointed.) There are two key quality reasons that customers should always be using the latest code. The most recent release has the benefits of the most mature version of the validation process and has the most fixes that address the basal defects. The customer, even without using a single new feature, has a better-quality product. Of course, the new features are an added benefit. Any time delayed in using the newest release is a needless lowering of a customer’s perception of a product’s quality.

The last Quality Contributor is Customer Usage based upon Code Currency, where we measure the rates of customers’ adoption or usage of the most recent release.

One final point, I use GA(n) to refer to the current release under development, GA(n-1) for previous release and GA(n+1) as next release.

For a Traditional/Sequential business, I’m going to fix the period between GA releases to be 3 quarters with commitment for the GA(n+1) happening a quarter before GA(n) ships. The planning horizon is approximately a year. The Traditional/Sequential business development processes are all mature and best practice.

For Scrum business, I’m going to fix the sprint at a two-week duration where they deliver GA(n) at the end of the sprint. GA(n+1) happens two weeks later. The Scrum business is mature in doing correct grooming, modern code validation and state-of-art deployment techniques for either as-a-service delivery or enterprise software (yes, there is enterprise software being updated every two weeks... just watch your laptop do its Windows thing or watch Amazon deploy their AWS services in their data centers).

Ready to start comparing Traditional/Sequential and Scrum-based organizations? At the end of each, I’ll declare whether Traditional/Sequential or Scrum-based businesses win and why or draw on the comparison.

Business Decision Quality Contributor

A Traditional/Sequential business commits to GA(n+1) release one quarter before GA(n). The period for potential re-planning is 4 quarters and the cost of a re-plan increases as WIP builds because the plan is already in execution and the code is in the state of partial implementation. Changes in plans means going back and rooting out partial work while adding in new work. Cost of a re-plan after Functional Complete are lower due to lack of time for new development so rational options are limited. Another cost of change is when the narrow, isolated change impacts the GA(n+1) date. In this case, all other work is delayed simply because the release is delayed. If a competitor does something near mid-cycle of GA(n+1), the business is faced with the most-costly change to GA(n+1) or wait to respond in the next release, in this case, 1.5 years out in time.

A Scrum business spends extensive resources in grooming work in such a fashion that all scrum teams can complete the customer increment GA(n), however small, within a two-week sprint. Changes in plans impact the previous grooming and depending on the degree of the change impacts GA(n+1) and later releases. An isolated change in one team does not have impact on the other teams in delivering the GA(n) or the subsequent GA(n+1).

If a competitor does something mid-cycle of a sprint, the team can take on grooming and tradeoffs to decide when to phase it into the teams’ work plans. While that is happening, teams continue to complete customer value increments in GA(n).

Comparing the two, the Traditional/Sequential business has a longer period of re-planning and cost impacts on a release simply because of the longer release time. The intensity and cost of grooming is higher in Scrum and with a change some or all the grooming can become waste. One could argue that these are roughly equal. The difference that surfaces is how much of the existing work or customer value goes to market and how quickly the future work can be redirected and brought to market. In this comparison, the Scrum business has the advantage with teams creating small increments and all increments going to market at the end of the Sprint, GA(n). The increments reflecting the changes will get to the market quicker in the GA(n+1) and GA(n+2) sprints.

Development Quality Contributor

Traditional/Sequential development starts doing work as early as, if not before, commitment agreement, a year before GA(n+1) releases. The developer’s Risk to Recall will start slowly and raise continuously until GA(n+1) happens. All developers hit their peak of risk to recall just before release because validation could uncover a development defect at any time. Even at GA(n+1), they have already started development on GA(n+2), so they never truly hit a Freedom to Forget point at GA. While they will eventually reach a freedom to forget point for a specific piece of work when it releases, given the fact that there is always WIP they never have a point in time where they can completely forget during the release.

Scrum teams will start doing work right after the sprint review meeting. They must return their code to GA quality (known-good) with each completed user story; this can happen multiple times within a Sprint. Their risk of recall raises and falls with each user story implemented and completed.

The Development Quality Contributor is significantly better with Scrum as developers can be focused on one customer increment until done at known-good. Once done, they can freshly take on the next piece of work with the freedom to forget the previous story’s work.

Code Quality Contributor

For simplicity reasons, let’s assume post-GA code for both Traditional/Sequential and Scrum businesses have roughly the same basal defect rates. I’m happy to re-consider this assumption if there’s research showing one being materially different than another.

Focusing on development defect rates, Traditional/Sequential business due to the large-batch, large-WIP nature and validation happening late in the release plan, means an on-going buildup of development defects in the code. These defects are uncovered when QA fires up the validation. As validation progresses, development defect rate spikes. Developers focus on fixes. The development defect rate drops until reduced to zero for the final validation.

Scrum business has a near constant rate of development defect rates. The reason being that the test pressures are constantly being applied to the code and developers must return the code back to known-good continuously. There is no on-going buildup of development defects. The feedback cycles to developers are much quicker. Any defects newly introduced are readily surfaced and fixed.

Some will give me eye-rolls while I am saying that while Traditional/Sequential development has spiky defect rates, the net number of bugs found and fixed will be similar to Scrum development over the same period of time. Looks like where heading to a tie on the Code Quality Contributor by admitting this, right? But wait. While the number of development defects of both methods may be the same over time, the faster time to detection and fix means that there will be a higher quality fix available sooner with less propagating impacts to other teams with Scrum.

Additionally, over the course of a Traditional/Sequential development effort, there will be scope change; the team may spend time finishing WIP and fixing defects that are no longer important to the release. Whereas with Scrum, the scope change would push down the product backlog items, so the team won’t have the defects for that now deprioritized feature because it was never created. Code Quality Contributor is better in Scrum over Traditional/Sequential.

Release Quality Contributor

A Traditional/Sequential business builds WIP at commitment until functional complete and starts to burn down WIP as validation takes hold. The WIP build up lasts multiple quarters with a quarter or so backend to burn WIP back down as the product returns to known-good at GA(n). In other words, once development starts on the release, the code is in known bad state until GA(n) where it momentarily returns to known-good.

In Scrum, the WIP is extremely small and contained within each Scrum team. They return the quality of their code back to known-good multiple times within a sprint and always before a story is ‘done’. In other words, the product is always expected to be known-good with short, frequent windows where a team has the code known bad.

Release Quality Contributor is better in Scrum due to lower WIP and time the product spends in known-good state.

Validation Quality Contributor

In Traditional/Sequential business, given the size and duration of WIP and the development defect rates, the impact on validation stalls is significant. Even with state-of-the-art validation and resources, the code is untestable during development and early validation. Validation resources must wait until the code reaches functional complete and still there are validation stalls due to the spike in development defect rates. In Scrum business, it has equally but opposite impact; the size and duration of WIP and development defect rates, impacts the validation stalls positively. Because the code is kept in near known-good state always, the validation can run non-stop. Even the most minor stalls become noticed immediately and have major impacts, so teams constantly are developing ways to keep the system operational even when their updated code fails. (Read up on continuous deployment with blue/green operations as examples.)

Validation Quality Contributor is better in Scrum.

Customer Usage Quality Contributor

According to some Traditional/Sequential business Quality organizations, after GA(n-1) release is declared ‘target code’ the reasonable adoption rate of that type of code is approximately 15% per quarter. It takes the time needed to declare the GA release ‘target code’ plus up to 6 quarters to hit 100% adoption; approximately 2 years. Why so slow? I would argue because Traditional/Sequential businesses have conditioned customers to expect releases won’t work until they fix a few remaining bugs post GA or ‘target code’, and these organizations often make upgrades visible, opt-in, carefully planned, and resource intensive events.

Scrum business adoptions rates push adoption to 100% within a sprint post GA. That’s a two-week period. If you want to account for the blue/green and phased automated pushes, the 100% adoption is achieved within two sprints or four weeks post GA. Why so fast? I would argue that it is because the code is kept in known-good state always, always under continuously improving validation pressure, under near 100% adoption with automated updated, rollback and phased usage, where customers never see the impacts of the failures.

Customer Usage Quality Contributor is better in Scrum.

My tally shows 6 to 0 in favor of Scrum business. If you are wondering did I stack the deck or fake the comparisons, the answer is an absolutely no.

What has happened is that Scrum and other innovations have created a self-re-enforcing positive cycle where improvements in one Quality Contributor re-enforces improvement in another Quality Contributor. While any one Quality Contributor may only be incrementally better, the combination of all six creates a new powerful, dynamic in quality. Scrum businesses deliver demonstrably higher quality customer value every time.

Thursday, September 9, 2021

Trusting In Pairs

I was hired into a company to run an enterprise product engineering organization. My job beyond feature development and release delivery was correcting a failing two-year Scrum transition.
After two years, the organization was delivering higher productivity per engineer, improved feature quality, and increased development velocity. Scrum development process was corrected and maturing. The time came for restructuring.

Customers were using more Cloud offerings and were requesting a Cloud offering from the company. Since less development engineers were needed on the enterprise product team, as part of the restructuring, I spun off a team of 30 highly skilled engineers and a manager to do strategy and early offering development for Cloud users. I took leadership of the team due to my Cloud experience.

I needed the new team to increase their learning speed by an order of magnitude, adopt new architectures, use new languages and embrace DevOps. While none of this was in dispute with executive management and the team, adopting pairwise development (programming) would be the key enabler. I had discussed pairwise development earlier with executive management. Pairwise development was considered too radical due to the potential of talent loss and confusion with HR’s focus on individual performance.

Getting a credible strategy and initial offering into the executives’ hands was critical. Other alternatives such as outsourcing, using a different internal team, hiring a new technical leader and hiring a new team were considered and ruled out because of timing, lack of skilled talent and lack of executive management support. Most of the company considered Cloud offerings to be more fad than reality. Using the existing team members meant that we had to increase their learning speed, knowledge and new value creation.

While there are many blogs and articles describing the benefits and pitfalls of pairwise programming, I found no definitive measures of before and after improvements, no assessment of team characteristics that would indicate success, and no clear business analysis of economic pros and cons. I decided to build an argument based upon knowledge, earned trust and focused experimentation.

Pivotal Labs was founded on the principles of Agile, customer value first and pairwise programming with their hands-on Dojo Labs. I asked the team to join me for an afternoon visiting the Cambridge Dojo Lab. After the visit, I held a group meeting where I talked about our journey together over the past two years, our skills gaps, our need for quick customer value development, and what lies ahead in the for our customers who will adopt Cloud offerings. I asked the team to take the next step to pairwise programming for the next 90 days as we focused on our first offering. I offered that the risk of failure would be owned by me and that the first pairings would be a starting point to be adjusted as needed. After 90 days we would assess their experiences and choose to adopt or adjust together as a team. I left the meeting to allow team discussion.

The team agreed with the reasoning and the approach. They took the weekend to consider who each would choose as a partner. Fortunately, the pairing requests were reasonable, and the manager was able to handle the conflicts. While awkward, they started to work in pairs on the sprint objectives. Slowly, most pairs took on their own singular identity and worked closely together on assignments. For the pairs who were struggling, the manager worked with them and restructured a few pairs to increase the chances for functional pairings.

As soon as the team agreed to pairing, I engaged executive management across the division. I informed them of the reasoning and team agreement. Since the risk was limited, need was clear and potential upside explained, management signed off on us continuing with pairwise development. I engaged HR to ensure that we would abide to their focus on individual performance.

Immediately, the rate of learning accelerated. Pairs were more willing to take up things that they didn’t know. They helped each other understand and apply the new technologies. While individuals historically wanted months to read, learn and apply. Pairs took days to dig into new topics and quickly showed results. Pairs instructed the team at-large on their findings and demonstrated the newfound value.

Pairs showed an increase in their willingness to take risks. The ‘I have got your back’ reality of a pair’s partnership helped a pair member share concerns and find solutions. This allowed the pair to take on more risk since another person was actively engaged to identify and address risks immediately.

After the 90-day period, the team agreed to keep pairwise development. Only one senior developer decided that he could not work in this model and left the project. The team created a go forward strategy and delivered the cloud offering as a viability proof. Subsequently, the team joined another SaaS effort and released this new SaaS offering into production. Two plus years later, the team continued to use pairwise development. When asked if they would go back, they could not imagine doing development any other way.

I should have done a few things differently. I should have dug deeper into documented cases where pairwise programming improved productivity, especially increased learning/application speeds and risk taking. I should have bootstrapped a smaller team earlier, for example, the initial team of 30 might have gone pairwise up to 6 months earlier. I should have spent more time with management early on especially with risk mitigations and leveraged the company culture more by invoking company culture of employee skill investment.

Looking back, I believe that highly skilled engineers want to continue to improve their professional skills. Showing how new approaches and technology helps their productivity and professional standing was a powerful force to motivate learning and facilitate application.

Reading List

Lean Software Development, An Agile Toolkit, Mary and Tom Poppendieck
More Effective Agile, A Roadmap for Software Leaders, Steve McConnell
Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation, Jez Humble and David Farley
Leading Change, John P Kotter
Coaching for Improved Work Performance, F Fournies
Measuring and Managing Performance In Organizations, Robert Austin
Accelerate, Nicole Forsgren, Jez Humble, Gene Kim
The Phoenix Project: A Novelabout IT, G Kim, K Behr and G Spafford
Cloud Application Architectures, Building Applications and Infrastructure in the Cloud, George Reese
Agile Software Development with Scrum, Ken Schwaber, Mike Beedle