This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Planning and Estimation

Estimation, scheduling, and mindset anti-patterns that create unrealistic commitments and resistance to change.

Anti-patterns related to how work is estimated, scheduled, and how the organization thinks about the feasibility of continuous delivery.

Anti-patternCategoryQuality impact

1 - Distant Date Commitments

Fixed scope committed to months in advance causes pressure to cut corners as deadlines approach, making quality flex instead of scope.

Category: Organizational & Cultural | Quality Impact: Medium

What This Looks Like

A roadmap is published. It lists features with target quarters attached: Feature A in Q2, Feature B in Q3, Feature C by year-end. The estimates were rough - assembled by combining gut feel and optimistic assumptions - but they are now treated as binding commitments. Stakeholders plan marketing campaigns, sales conversations, and partner timelines around these dates.

Months later, the team is three weeks from the committed quarter and the feature is 60 percent done. The scope was more complex than the estimate assumed. Dependencies were discovered. The team makes a familiar choice: ship what exists, skip the remaining testing, and call it done. The feature ships incomplete. The marketing campaign runs. Support tickets arrive.

What makes this pattern distinctive from ordinary deadline pressure is the time horizon. The commitment was made so far in advance that the people making it could not have known what the work actually involved. The estimate was pure speculation, but it acquired the force of a contract somewhere between the planning meeting and the stakeholder presentation.

Common variations:

  • The annual roadmap. Every January, leadership commits the year’s deliverables. By March, two dependencies have shifted and one feature turned out to be three features. The roadmap is already wrong, but nobody is permitted to change it because it was “committed.”
  • The public announcement problem. A feature is announced at a conference or in a press release before the team has estimated it. The team finds out about their new deadline from a news article. The announcement locks the date in a way that no internal process can unlock.
  • The cascading dependency commitment. Team A commits to delivering something Team B depends on. Team B commits to something Team C depends on. Each team’s estimate assumed the upstream team would be on time. When Team A slips by two weeks, everyone slips, but all dates remain officially unchanged.
  • The “stretch goal” that becomes the plan. What was labeled a stretch goal in the planning meeting appears on the roadmap without the qualifier. The team is now responsible for delivering something that was never a real commitment in the first place.

The telltale sign: when a team member asks “can we adjust scope?” the answer is “the date was already communicated externally” - and nobody remembers whether that was actually true.

Why This Is a Problem

A team discovers in week six that the feature requires a dependency that does not yet exist. The date was committed four months ago. There is no mechanism to surface this as a planning input, so quality absorbs the gap. Distant date commitments break the feedback loop between discovery and planning. When the gap between commitment and delivery is measured in months, the organization has no mechanism to incorporate what is learned during development. The plan is frozen at the moment of maximum ignorance.

It reduces quality

When scope is locked months before delivery and reality diverges from the plan, quality absorbs the gap. The team cannot reduce scope because the commitment was made at the feature level. They cannot move the date because it was communicated to stakeholders. The only remaining variable is how thoroughly the work is done. Tests get skipped. Edge cases are deferred to a future release. Known defects ship with “will fix in the next version” attached.

This is not a failure of discipline - it is the rational response to an impossible constraint. A team that cannot negotiate scope or time has no other lever. Teams that work with short planning horizons and rolling commitments can maintain quality because they can reduce scope to match actual capacity as understanding develops.

It increases rework

Distant commitments encourage big-batch planning. When dates are set a quarter or more out, the natural response is to plan a quarter or more of work to fill the window. Large batches mean large integrations. Large integrations mean complex merges, late-discovered conflicts, and rework that compounds.

The commitment also creates sunk-cost pressure. When a team has spent two months building toward a committed feature and discovers the approach is wrong, they face pressure to continue rather than pivot. The commitment was based on an approach; changing the approach feels like abandoning the commitment. Teams hide or work around fundamental problems rather than surface them, accumulating rework that eventually has to be paid.

It makes delivery timelines unpredictable

There is a paradox here: commitments made months in advance feel like they increase predictability

  • because dates are known - but they actually decrease it. The dates are not based on actual work understanding; they are based on early guesses. When the guesses prove wrong, the team has two choices: slip visibly (missing the committed date) or slip invisibly (shipping incomplete or defect-laden work on time). Both outcomes undermine trust in delivery timelines.

Teams that commit to shorter horizons and iterate deliver more predictably because their commitments are based on what they actually understand. A two-week commitment made at the start of a sprint has a fundamentally different information basis than a six-month commitment made at an annual planning session.

Impact on continuous delivery

CD shortens the feedback loop between building and learning. Distant date commitments work against this by locking the plan before feedback can arrive. A team practicing CD might discover in week two that a feature needs to be redesigned. That discovery is valuable - it should change the plan. But if the plan was committed months ago and communicated externally, the discovery becomes a problem to manage rather than information to act on.

CD depends on the team’s ability to adapt as they learn. Fixed distant commitments treat the plan as more reliable than the evidence. They make the discipline of continuous delivery harder to justify because they frame “we need to reduce scope to maintain quality” as a failure rather than a normal response to new information.

How to Fix It

Step 1: Map current commitments and their basis

List every active commitment with a date attached. For each one, note when the commitment was made, what information existed at the time, and how much has changed since. This makes visible how far the original estimate has drifted from current reality. Share the analysis with leadership - not as an indictment, but as a calibration conversation about how accurate distant commitments tend to be.

Step 2: Introduce a commitment horizon policy

Propose a tiered commitment structure:

  • Hard commitments (communicated externally, scope locked): Only for work that starts within 4 weeks. Anything further is a forecast, not a commitment.
  • Soft commitments (directionally correct, scope adjustable): Up to one quarter out.
  • Roadmap themes (investment areas, no scope or date implied): Beyond one quarter.

This does not eliminate planning - it reframes what planning produces. The output is “we are investing in X this quarter” rather than “we will ship feature Y with this exact scope by this exact date.”

Step 3: Establish a regular scope-negotiation cadence (Weeks 2-4)

Create a monthly review for any active commitment more than four weeks out. Ask: Is the scope still accurate? Has the estimate changed? What is the latest realistic delivery range? Make scope adjustment a normal part of the process rather than an admission of failure. Stakeholders who participate in regular scope conversations are less surprised than those who receive a quarterly “we need to slip” announcement.

Step 4: Practice breaking features into independently valuable pieces (Weeks 3-6)

Work with product ownership to decompose large features into pieces that can ship and provide value independently. Features designed as all-or-nothing deliveries are the root cause of most distant date pressure. When the first slice ships in week four, the conversation shifts from “are we on track for the full feature in Q3?” to “here is what users have now; what should we build next?”

Step 5: Build the history that enables better forecasts (Ongoing)

Track the gap between initial commitments and actual delivery. Over time, this history becomes the basis for realistic planning. “Our Q-length features take on average 1.4x the initial estimate” is useful data that justifies longer forecasting ranges and more scope flexibility. Present this data to leadership as evidence that the current commitment model carries hidden inaccuracy.

ObjectionResponse
“Our stakeholders need dates to plan around”Stakeholders need to plan, but plans built on inaccurate dates fail anyway. Start by presenting a range (“sometime in Q3”) for the next commitment and explain the confidence level behind it. Stakeholders who understand the uncertainty plan more realistically than those given false precision.
“If we don’t commit, nothing will get prioritized”Prioritization does not require date-locked scope commitments. Replace the next date-locked roadmap item with an investment theme and an ordered backlog. Show stakeholders the top five items and ask them to confirm the order rather than the date.
“We already announced this externally”External announcements of future features are a separate risk-management problem. Going forward, work with marketing and sales to communicate directional roadmaps rather than specific feature-and-date commitments.

Measuring Progress

MetricWhat to look for
Commitment accuracy ratePercentage of commitments that deliver their original scope on the original date - expect this to be lower than assumed
Lead timeShould decrease as features are decomposed and shipped incrementally rather than held for a committed date
Scope changes per featureShould be treated as normal signal, not failure - an increase in visible scope changes means the process is becoming more honest
Change fail rateShould decrease as the pressure to rush incomplete work to a committed date is reduced
Time from feature start to first user valueShould decrease as features are broken into smaller independently shippable pieces

2 - Velocity as a Team Productivity Metric

Story points are used as a management KPI for team output, incentivizing point inflation and maximizing velocity instead of delivering value.

Category: Organizational & Cultural | Quality Impact: Medium

What This Looks Like

Every sprint, the team’s velocity is reported to management. Leadership tracks velocity on a dashboard alongside other delivery metrics. When velocity drops, questions come. When velocity is high, the team is praised. The implicit message is clear: story points are the measure of whether the team is doing its job.

Sprint planning shifts focus accordingly. Estimates creep upward as the team learns which guesses are rewarded. A story that might be a 3 gets estimated as a 5 to account for uncertainty - and because 5 points is worth more to the velocity metric than 3. Technical tasks with no story points get squeezed out of sprints because they contribute nothing to the number management is watching. Work items are split and combined not to reduce batch size but to maximize the point count in any given sprint.

Conversations about whether to do things correctly versus doing things quickly become conversations about what yields more points. Refactoring that would improve long-term delivery speed has no points and therefore no advocates. Rushing a feature to get the points before the sprint closes is rational behavior when velocity is the goal.

Common variations:

  • Velocity as capacity planning. Management uses last sprint’s velocity to determine how much to commit in the next sprint, treating the estimate as a productivity floor to maintain rather than a rough planning tool.
  • Velocity comparison across teams. Teams are compared by velocity score, even though point values are not calibrated across teams and have no consistent meaning.
  • Velocity as performance review input. Individual or team velocity numbers appear in performance discussions, directly incentivizing point inflation.
  • Velocity recovery pressure. When velocity drops due to external factors (vacations, incidents, refactoring), pressure mounts to “get velocity back up” rather than understanding why it dropped.

The telltale sign: the team knows their average velocity and actively manages toward it, rather than managing toward finishing valuable work.

Why This Is a Problem

Velocity is a planning tool, not a productivity measure. When it becomes a KPI, the measurement changes the system it was meant to measure.

It reduces quality

A team skips code review on a Friday afternoon to close one more story before the sprint ends. The defect ships on Monday. It shows up in production two weeks later. Fixing it costs more than the review would have taken - but the velocity metric never records the cost, only the point. That calculation repeats sprint after sprint.

Technical debt accumulates because work that does not yield points gets consistently deprioritized. The team is not negligent - they are responding rationally to the incentive structure. A high-velocity team with mounting technical debt will eventually slow down despite the good-looking numbers, but the measurement system gives no warning until the slowdown is already happening.

Teams that measure quality indicators - defect escape rate, code coverage, lead time, change fail rate - rather than story output maintain quality as a first-class concern because it is explicitly measured. Velocity tracks effort, not quality.

It increases rework

A story is estimated at 8 points to make the sprint look good. The acceptance criteria are written loosely to fit the inflated estimate. QA flags it as not meeting requirements. The story is reopened, refined, and completed again - generating more velocity points in the process. Rework that produces new points is a feature of the system, not a failure.

When the team’s incentive is to maximize points rather than to finish work that users value, the connection between what gets built and what is actually needed weakens. Vague scope produces stories that come back because the requirements were misunderstood, implementations that miss the mark because the acceptance criteria were written to fit the estimate rather than the need.

Teams that measure cycle time from commitment to done - rather than velocity - are incentivized to finish work correctly the first time, because rework delays the metric they are measured on.

It makes delivery timelines unpredictable

Management commits to a delivery date based on projected velocity. The team misses it. Velocity was inflated - 5-point stories that were really 3s, padding added “for uncertainty.” The team was not moving as fast as the number suggested. The missed commitment produces pressure to inflate estimates further, which makes the next commitment even less reliable.

Story points are intentionally relative estimates, not time-based. They are only meaningful within a single team’s calibration. Using them to predict delivery dates or compare output across teams requires them to be something they are not. Management decisions made on velocity data inherit all the noise and gaming that the metric has accumulated.

Teams that use actual delivery metrics - lead time, throughput, cycle time - can make realistic forecasts because these measures track how long work actually takes from start to done. Velocity tracks how many points the team agreed to assign to work, which is a different and less useful thing.

Impact on continuous delivery

Continuous delivery depends on small, frequent, high-quality changes flowing steadily through the pipeline. Velocity optimization produces the opposite: large stories (more points per item), cutting quality steps (higher short-term velocity), and deprioritizing pipeline and infrastructure investment (no points). The team optimizes for the number that management watches while the delivery system that CD depends on degrades.

CD metrics - deployment frequency, lead time, change fail rate, mean time to restore - measure the actual delivery system rather than team activity. Replacing velocity with CD metrics aligns team behavior with delivery outcomes. Teams measured on deployment frequency and lead time invest in the practices that improve those measures: automation, small batches, fast feedback, and continuous integration.

How to Fix It

Step 1: Stop reporting velocity externally

Remove velocity from management dashboards and stakeholder reports. It is an internal planning tool, not an organizational KPI. If management needs visibility into delivery output, introduce lead time and release frequency as replacements.

Explain the change: velocity measures team effort in made-up units. Lead time and release frequency measure actual delivery outcomes.

Step 2: Introduce delivery metrics alongside velocity (Weeks 2-3)

While stopping velocity reporting, start tracking:

These metrics capture what management actually cares about: how fast does value reach users and how reliably?

Step 3: Decouple estimation from capacity planning

Teams that do not inflate estimates do not need velocity tracking to forecast. Use historical cycle time data to forecast completion dates. A story that is similar in size to past stories will take approximately as long as past stories took - measured in real time, not points.

If the team still uses points for relative sizing, that is fine. Stop using the sum of points as a throughput metric.

Step 4: Redirect sprint planning toward flow

Change the sprint planning question from “how many points can we commit to?” to “what is the highest-priority work the team can finish this sprint?” Focus on finishing in-progress items before starting new ones. Use WIP limits rather than point targets.

ObjectionResponse
“How will management know if the team is productive?”Lead time and release frequency directly measure productivity. Velocity measures activity, which is not the same thing.
“We use velocity for sprint capacity planning”Use historical cycle time and throughput (stories completed per sprint) instead. These are less gameable and more accurate for forecasting.
“Teams need goals to work toward”Set goals on delivery outcomes - “reduce lead time by 20%,” “deploy daily” - rather than on effort metrics. Outcome goals align the team with what matters.
“Velocity has been stable for years, why change?”Stable velocity indicates the team has found a comfortable equilibrium, not that delivery is improving. If lead time and change fail rate are also good, there is no problem. If they are not, velocity is masking it.

Step 5: Replace performance conversations with delivery conversations

Remove velocity from any performance review or team health conversation. Replace with: are users getting value faster? Is quality improving or degrading? Is the team’s delivery capability growing?

These conversations produce different behavior than velocity conversations. They reward investment in automation, testing, and reducing batch size - all of which improve actual delivery speed.

Measuring Progress

MetricWhat to look for
Lead timeDecreasing trend as the team focuses on finishing rather than accumulating points
Release frequencyIncreasing as the team ships smaller batches rather than large point-heavy sprints
Change fail rateStable or decreasing as quality shortcuts decline
Story point inflation rateEstimates stabilize or decrease as gaming incentive is removed
Technical debt items in backlogShould reduce as non-pointed work can be prioritized on its merits
Rework rateStories requiring revision after completion should decrease

3 - Estimation Theater

Hours are spent estimating work that changes as soon as development starts, creating false precision for inherently uncertain work.

Category: Organizational & Cultural | Quality Impact: Medium

What This Looks Like

The sprint planning meeting has been running for three hours. The team is on story number six of fourteen. Each story follows the same ritual: a developer reads the description aloud, the team discusses what might be involved, someone raises a concern that leads to a five-minute tangent, and eventually everyone holds up planning poker cards. The cards show a spread from 2 to 13. The team debates until they converge on 5. The number is recorded. Nobody will look at it again except to calculate velocity.

The following week, development starts. The developer working on story six discovers that the acceptance criteria assumed a database table that does not exist, the API the feature depends on behaves differently than the description implied, and the 5-point estimate was derived from a misunderstanding of what the feature actually does. The work takes three times as long as estimated. The number 5 in the backlog does not change.

Estimation theater is the full ceremony of estimation without the predictive value. The organization invests heavily in producing numbers that are rarely accurate and rarely used to improve future estimates. The ritual continues because stopping feels irresponsible, even though the estimates are not making delivery more predictable.

Common variations:

  • The re-estimate spiral. A story was estimated at 8 points last sprint when context was thin. This sprint, with more information, the team re-estimates it at 13. The sprint capacity calculation changes. The process of re-estimation takes longer than the original estimate session. The final number is still wrong.
  • The complexity anchor. One story is always chosen as the “baseline” complexity. All other stories are estimated relative to it. The baseline story was estimated months ago by a different team composition. Nobody actually remembers why it was 3 points, but it anchors everything else.
  • The velocity treadmill. Velocity is tracked as a performance metric. Teams learn to inflate estimates to maintain a consistent velocity number. A story that would take one day gets estimated at 3 points to pad the sprint. The number reflects negotiation, not complexity.
  • The estimation meeting that replaces discovery. The team is asked to estimate stories that have not been broken down or clarified. The meeting becomes an improvised discovery session. Real estimation cannot happen without the information that discovery would provide, so the numbers produced are guesses dressed as estimates.

The telltale sign: when a developer is asked how long something will take, they think “two days” but say “maybe 5 points” - because the real unit has been replaced by a proxy that nobody knows how to interpret.

Why This Is a Problem

A team spends three hours estimating fourteen stories. The following week, the first story takes three times longer than estimated because the acceptance criteria were never clarified. The three hours produced a number; they did not produce understanding. Estimation theater does not eliminate uncertainty - it papers over it with numbers that feel precise but are not. Organizations that invest heavily in estimation tend to invest less in the practices that actually reduce uncertainty: small batches, fast feedback, and iterative delivery.

It reduces quality

Heavy estimation processes create pressure to stick to the agreed scope of a story, even when development reveals that the agreed scope is wrong. If a developer discovers during implementation that the feature needs additional work not covered in the original estimate, raising that information feels like failure - “it was supposed to be 5 points.” The team either ships the incomplete version that fits the estimate or absorbs the extra work invisibly and misses the sprint commitment.

Both outcomes hurt quality. Shipping to the estimate when the implementation is incomplete produces defects. Absorbing undisclosed work produces false velocity data and makes the next sprint plan inaccurate. Teams that use lightweight forecasting and frequent scope negotiation can surface “this turned out to be bigger than expected” as normal information rather than an admission of planning failure.

It increases rework

Estimation sessions frequently substitute for real story refinement. The team spends time arguing about the number of points rather than clarifying acceptance criteria, identifying dependencies, or splitting the story into smaller deliverable pieces. The estimate gets recorded but the ambiguity that would have been resolved during real refinement remains in the work.

When development starts and the ambiguity surfaces - as it always does - the developer has to stop, seek clarification, wait for answers, and restart. This interruption is rework in the sense that it was preventable. The time spent generating the estimate produced no information that helped; the time not spent on genuine acceptance criteria clarification creates a real gap that costs more later.

It makes delivery timelines unpredictable

The primary justification for estimation is predictability: if we know how many points of work we have and our velocity, we can forecast when we will finish. This math works only when points translate consistently to time, and they rarely do. Story points are affected by team composition, story quality, technical uncertainty, dependencies, and the hidden work that did not make it into the description.

Teams that rely on point-based velocity for forecasting end up with wide confidence intervals they do not acknowledge. “We’ll finish in 6 sprints” sounds precise, but the underlying data is noisy enough that “sometime in the next 4 to 10 sprints” would be more honest. Teams that use empirical throughput - counting the number of stories completed per period regardless of size - and deliberately keep stories small tend to forecast more accurately with less ceremony.

Impact on continuous delivery

CD depends on small, frequent changes moving through the pipeline. Estimation theater is symptomatically linked to large, complex stories - the kind of work that is hard to estimate and hard to integrate. The ceremony of estimation discourages decomposition: if every story requires a full planning poker ritual, there is pressure to keep the number of stories low, which means keeping stories large.

CD also benefits from a team culture where surprises are surfaced quickly and plans adjust. Heavy estimation cultures punish surfacing surprises because surprises mean the estimate was wrong. The resulting silence - developers not raising problems because raising problems is culturally costly - is exactly the opposite of the fast feedback that CD requires.

How to Fix It

Step 1: Measure estimation accuracy for one sprint

Collect data before changing anything. For every story in the current sprint, record the estimate in points and the actual time in days or hours. At the end of the sprint, calculate the average error. Present the results without judgment. In most teams, estimates are off by a factor of two or more on a per-story basis even when the sprint “hits velocity.” This data creates the opening for a different approach.

Step 2: Experiment with #NoEstimates for one sprint

Commit to completing stories without estimating in points. Apply a strict rule: no story enters the sprint unless it can be completed in one to three days. This forces the decomposition and clarity that estimation sessions often skip. Track throughput - number of stories completed per sprint - rather than velocity. Compare predictability at the sprint level between the two approaches.

Step 3: Replace story points with size categories if estimation continues (Weeks 2-3)

Replace point-scale estimation with a simple three-category system if the team is not ready to drop estimation entirely: small (one to two days), medium (three to four days), large (needs splitting). Stories tagged “large” do not enter the sprint until they are split. The goal is to get all stories to small or medium. Size categories take five minutes to assign; point estimation takes hours. The predictive value is similar.

Step 4: Make refinement the investment, not estimation (Ongoing)

Redirect the time saved from estimation ceremonies into story refinement: clarifying acceptance criteria, identifying dependencies, writing examples that define the boundaries of the work. Well-refined stories with clear acceptance criteria deliver more predictability than well-estimated stories with fuzzy criteria.

Step 5: Track forecast accuracy and improve (Ongoing)

Track how often sprint commitments are met, regardless of whether you are using throughput, size categories, or some estimation approach. Review misses in retrospective with a root-cause focus: was the story poorly understood? Was there an undisclosed dependency? Was the acceptance criteria ambiguous? Fix the root cause, not the estimate.

ObjectionResponse
“Management needs estimates for planning”Management needs forecasts. Empirical throughput (stories per sprint) combined with a prioritized backlog provides forecasts without per-story estimation. “At our current rate, the top 20 stories will be done in 4-5 sprints” is a forecast that management can plan around.
“How do we know what fits in a sprint without estimates?”Apply a size rule: no story larger than two days. Multiply team capacity (people times working days per sprint) by that ceiling and you have your sprint limit. Try it for one sprint and compare predictability to the previous point-based approach.
“We’ve been doing this for years; changing will be disruptive”The disruption is one or two sprints of adjustment. The ongoing cost of estimation theater - hours per sprint of planning that does not improve predictability - is paid every sprint, indefinitely. One-time disruption to remove a recurring cost is a good trade.

Measuring Progress

MetricWhat to look for
Planning time per sprintShould decrease as per-story estimation is replaced by size categorization or dropped entirely
Sprint commitment reliabilityShould improve as stories are better refined and sized consistently
Development cycle timeShould decrease as stories are decomposed to a consistent size and ambiguity is resolved before development starts
Stories completed per sprintShould increase and stabilize as stories become consistently small
Re-estimate rateShould drop toward zero as the process moves away from point estimation
  • Work Decomposition - The practice that makes small, consistent stories possible
  • Small Batches - Why smaller work items improve delivery more than better estimates
  • Working Agreements - Establishing shared norms around what “ready to start” means
  • Metrics-Driven Improvement - Using throughput data as a more reliable planning input than velocity
  • Limiting WIP - Reducing the number of stories in flight improves delivery more than improving estimation

4 - Velocity as Individual Metric

Story points or velocity are used to evaluate individual performance. Developers game the metrics instead of delivering value.

Category: Organizational & Cultural | Quality Impact: High

What This Looks Like

During sprint review, a manager pulls up a report showing how many story points each developer completed. Sarah finished 21 points. Marcus finished 13. The manager asks Marcus what happened. Marcus starts padding his estimates next sprint. Sarah starts splitting her work into more tickets so the numbers stay high. The team learns that the scoreboard matters more than the outcome.

Common variations:

  • The individual velocity report. Management tracks story points per developer per sprint and uses the trend to evaluate performance. Developers who complete fewer points are questioned in one-on-ones or performance reviews.
  • The defensive ticket. Developers create tickets for every small task (attending a meeting, reviewing a PR, answering a question) to prove they are working. The board fills with administrative noise that obscures the actual delivery work.
  • The clone-and-close. When a story rolls over into the next sprint, the developer closes it and creates a new one to avoid the appearance of an incomplete sprint. The original story’s history is lost. The rollover is hidden.
  • The seniority expectation. Senior developers are expected to complete more points than juniors. Seniors avoid helping others because pairing, mentoring, and reviewing do not produce points. Knowledge sharing becomes a career risk.

The telltale sign: developers spend time managing how their work appears in Jira rather than managing the work itself.

Why This Is a Problem

Velocity was designed as a team planning tool. It helps the team forecast how much work they can take into a sprint. When management repurposes it as an individual performance metric, every incentive shifts from delivering outcomes to producing numbers.

It reduces quality

When developers are measured by points completed, they optimize for throughput over correctness. Cutting corners on testing, skipping edge cases, and merging code that “works for now” all produce more points per sprint. Quality gates feel like obstacles to the metric rather than safeguards for the product.

Teams that measure outcomes instead of output focus on delivering working software. A developer who spends two days pairing with a colleague to get a critical feature right is contributing more than one who rushes three low-quality stories to completion.

It increases rework

Rushed work produces defects. Defects discovered later require context rebuilding and rework that costs more than doing it right the first time. But the rework appears in a future sprint as new points, which makes the developer look productive again. The cycle feeds itself: rush, ship defects, fix defects, claim more points.

When the team owns velocity collectively, the incentive reverses. Rework is a drag on team velocity, so the team has a reason to prevent it through better testing, review, and collaboration.

It makes delivery timelines unpredictable

Individual velocity tracking encourages estimate inflation. Developers learn to estimate high so they can “complete” more points and look productive. Over time, the relationship between story points and actual effort dissolves. A “5-point story” means whatever the developer needs it to mean for the scorecard. Sprint planning based on inflated estimates becomes fiction.

When velocity is a team planning tool with no individual consequence, developers estimate honestly because accuracy helps the team plan, and there is no personal penalty for a lower number.

It destroys collaboration

Helping a teammate debug their code, pairing on a tricky problem, or doing a thorough code review all take time away from completing your own stories. When individual points are tracked, every hour spent helping someone else is an hour that does not appear on your scorecard. The rational response is to stop helping.

Teams that do not track individual velocity collaborate freely. Swarming on a blocked item is natural because the team shares a goal (deliver the sprint commitment) rather than competing for individual credit.

Impact on continuous delivery

CD depends on a team that collaborates fluidly: reviewing each other’s code quickly, swarming on blockers, sharing knowledge across the codebase. Individual velocity tracking poisons all of these behaviors. Developers hoard work, avoid reviews, and resist pairing because none of it produces points. The team becomes a collection of individuals optimizing their own metrics rather than a unit delivering software together.

How to Fix It

Step 1: Stop reporting individual velocity

Remove individual velocity from all dashboards, reports, and one-on-one discussions. Report only team velocity. This single change removes the incentive to game and restores velocity to its intended purpose: helping the team plan.

If management needs visibility into individual contribution, use peer feedback, code review participation, and qualitative assessment rather than story points.

Step 2: Clean up the board

Remove defensive tickets. If it is not a deliverable work item, it does not belong on the board. Meetings, PR reviews, and administrative tasks are part of the job, not separate trackable units. Reduce the board to work that delivers value so the team can see what actually matters.

Step 3: Redefine what velocity measures

Make it explicit in the team’s working agreement: velocity is a team planning tool. It measures how much work the team can take into a sprint. It is not a performance metric, a productivity indicator, or a comparison tool. Write this down. Refer to it when old habits resurface.

Step 4: Measure outcomes instead of output

Replace individual velocity tracking with outcome-oriented measures:

  • How often does the team deliver working software to production?
  • How quickly are defects found and fixed?
  • How predictable are the team’s delivery timelines?

These measures reward collaboration, quality, and sustainable pace rather than individual throughput.

ObjectionResponse
“How do we know if someone isn’t pulling their weight?”Peer feedback, code review participation, and retrospective discussions surface contribution problems far more accurately than story points. Points measure estimates, not effort or impact.
“We need metrics for performance reviews”Use qualitative signals: code review quality, mentoring, incident response, knowledge sharing. These measure what actually matters for team performance.
“Developers will slack off without accountability”Teams with shared ownership and clear sprint commitments create stronger accountability than individual tracking. Peer expectations are more motivating than management scorecards.

Measuring Progress

MetricWhat to look for
Defensive tickets on the boardShould drop to zero
Estimate consistencyStory point meanings should stabilize as gaming pressure disappears
Team velocity varianceShould decrease as estimates become honest planning tools
Collaboration indicators (pairing, review participation)Should increase as helping others stops being a career risk

5 - Deadline-Driven Development

Arbitrary deadlines override quality, scope, and sustainability. Everything is priority one. The team cuts corners to hit dates and accumulates debt that slows future delivery.

Category: Organizational & Cultural | Quality Impact: High

What This Looks Like

A stakeholder announces a launch date. The team has not estimated the work. The date is not based on the team’s capacity or the scope of the feature. It is based on a business event, an executive commitment, or a competitor announcement. The team is told to “just make it happen.”

The team scrambles. Tests are skipped. Code reviews become rubber stamps. Shortcuts are taken with the promise of “cleaning it up after launch.” Launch day arrives. The feature ships with known defects. The cleanup never happens because the next arbitrary deadline is already in play.

Common variations:

  • Everything is priority one. Multiple stakeholders each insist their feature is the most urgent. The team has no mechanism to push back because there is no single product owner with prioritization authority. The result is that all features are half-done rather than any feature being fully done.
  • The date-then-scope pattern. The deadline is set first, then the team is asked what they can deliver by that date. But when the team proposes a reduced scope, the stakeholder insists on the full scope anyway. The “negotiation” is theater.
  • The permanent crunch. Every sprint is a crunch sprint. There is no recovery period after a deadline because the next deadline starts immediately. The team never operates at a sustainable pace. Overtime becomes the baseline, not the exception.
  • Maintenance as afterthought. Stability work, tech debt reduction, and operational improvements are never prioritized because they do not have a deadline attached. Only work that a stakeholder is waiting for gets scheduled. The system degrades continuously.

The telltale sign: the team cannot remember the last sprint where they were not rushing to meet someone else’s date.

Why This Is a Problem

Arbitrary deadlines create a cycle where cutting corners today makes the team slower tomorrow, which makes the next deadline even harder to meet, which requires more corners to be cut. Each iteration degrades the codebase, the team’s morale, and the organization’s delivery capacity.

It reduces quality

When the deadline is immovable and the scope is non-negotiable, quality is the only variable left. Tests are skipped because “we’ll add them later.” Code reviews are rushed because the reviewer knows the author cannot change anything significant without missing the date. Known defects ship because fixing them would delay the launch.

Teams that negotiate scope against fixed timelines can maintain quality on whatever they deliver. A smaller feature set that works correctly is more valuable than a full feature set riddled with defects.

It increases rework

Every shortcut taken to meet a deadline becomes rework later. The test that was skipped means a defect that ships to production and comes back as a bug ticket. The code review that was rubber-stamped means a design problem that requires refactoring in a future sprint. The tech debt that was accepted becomes a drag on every future feature in that area.

The rework is invisible in the moment because it lands in future sprints. But it compounds. Each deadline leaves behind more debt, and each subsequent feature takes longer because it has to work around or through the accumulated shortcuts.

It makes delivery timelines unpredictable

Paradoxically, deadline-driven development makes delivery less predictable, not more. The team’s actual velocity is masked by heroics and overtime. Management sees that the team “met the deadline” and concludes they can do it again. But the team met it by burning down their capacity reserves. The next deadline of equal scope will take longer because the team is tired and the codebase is worse.

Teams that work at a sustainable pace with realistic commitments deliver more predictably. Their velocity is honest, their estimates are reliable, and their delivery dates are based on data rather than wishes.

It erodes trust in both directions

The team stops believing that deadlines are real because so many of them are arbitrary. Management stops believing the team’s estimates because the team has been meeting impossible deadlines through overtime (proving the estimates were “wrong”). Both sides lose confidence in the other. The team pads estimates defensively. Management sets earlier deadlines to compensate. The gap between stated dates and reality widens.

Impact on continuous delivery

CD requires sustained investment in automation, testing, and pipeline infrastructure. Every sprint spent in deadline-driven crunch is a sprint where that investment does not happen. The team cannot improve their delivery practices because they are too busy delivering under pressure.

CD also requires a sustainable pace. A team that is always in crunch cannot step back to automate a deployment, improve a test suite, or set up monitoring. These improvements require protected time that deadline-driven organizations never provide.

How to Fix It

Step 1: Make the cost visible

Track two things: the shortcuts taken to meet each deadline (skipped tests, deferred refactoring, known defects shipped) and the time spent in subsequent sprints on rework from those shortcuts. Present this data as the “deadline tax” that the organization is paying.

Step 2: Establish the iron triangle explicitly

When a deadline arrives, make the tradeoff explicit: scope, quality, and timeline form a triangle. The team can adjust scope or timeline. Quality is not negotiable. Document this as a team working agreement and share it with stakeholders.

Present options: “We can deliver the full scope by date X, or we can deliver this reduced scope by your requested date. Which do you prefer?” Force the decision rather than absorbing the impossible commitment silently.

Step 3: Reserve capacity for sustainability

Allocate 20 percent of each sprint to non-deadline work: tech debt reduction, test improvements, pipeline enhancements, and operational stability. Protect this allocation from stakeholder pressure. Frame it as investment: “This 20 percent is what makes the other 80 percent faster next quarter.”

Step 4: Demonstrate the sustainable pace advantage (Month 2+)

After a few sprints of protected sustainability work, compare delivery metrics to the deadline-driven period. Development cycle time should be shorter. Rework should be lower. Sprint commitments should be more reliable. Use this data to make the case for continuing the approach.

ObjectionResponse
“The business date is real and cannot move”Some dates are genuinely fixed (regulatory deadlines, contractual obligations). For those, negotiate scope. For everything else, question whether the date is a real constraint or an arbitrary target. Most “immovable” dates move when the alternative is shipping broken software.
“We don’t have time for sustainability work”You are already paying for it in rework, production incidents, and slow delivery. The question is whether you pay proactively (20 percent reserved capacity) or reactively (40 percent lost to accumulated debt).
“The team met the last deadline, so they can meet this one”They met it by burning overtime and cutting quality. Check the defect rate, the rework in subsequent sprints, and the team’s morale. The deadline was “met” by borrowing from the future.

Measuring Progress

MetricWhat to look for
Shortcuts taken per sprintShould decrease toward zero as quality becomes non-negotiable
Rework percentageShould decrease as shortcuts stop creating future debt
Sprint commitment reliabilityShould increase as commitments become realistic
Change fail rateShould decrease as quality stops being sacrificed for deadlines
Unplanned work percentageShould decrease as accumulated debt is paid down

6 - The 'We're Different' Mindset

The belief that CD works for others but not here - “we’re regulated,” “we’re too big,” “our technology is too old” - is used to justify not starting.

Category: Organizational & Cultural | Quality Impact: Medium

What This Looks Like

A team attends a conference talk about CD. The speaker describes deploying dozens of times per day, automated pipelines catching defects before they reach users, developers committing directly to trunk. On the way back to the office, the conversation is skeptical: “That’s great for a startup with a greenfield codebase, but we have fifteen years of technical debt.” Or: “We’re in financial services - we have compliance requirements they don’t deal with.” Or: “Our system is too integrated; you can’t just deploy one piece independently.”

Each statement contains a grain of truth. The organization is regulated. The codebase is old. The system is tightly coupled. But the grain of truth is used to dismiss the entire direction rather than to scope the starting point. “We cannot do it perfectly today” becomes “we should not start at all.”

This pattern is often invisible as a pattern. Each individual objection sounds reasonable. Regulators do impose constraints. Legacy codebases do create real friction. The problem is not any single objection but the pattern of always finding a reason why this organization is different from the ones that succeeded - and never finding a starting point small enough that the objection does not apply.

Common variations:

  • “We’re regulated.” Compliance requirements are used as a blanket veto on any CD practice. Nobody actually checks whether the regulation prohibits the practice. The regulation is invoked as intuition, not as specific cited text.
  • “Our technology is too old.” The mainframe, the legacy monolith, the undocumented Oracle schema is treated as an immovable object. CD is for teams that started with modern stacks. The legacy system is never examined for which parts could be improved now.
  • “We’re too big.” Size is cited as a disqualifier. “Amazon can do it because they built their systems for it from the start, but we have 50 teams all depending on each other.” The coordination complexity is real, but it is treated as permanent rather than as a problem to be incrementally reduced.
  • “Our customers won’t accept it.” The belief that customers require staged rollouts, formal release announcements, or quarterly update cycles - often without ever asking the customers. The assumed customer requirement substitutes for an actual customer requirement.
  • “We tried it once and it didn’t work.” A failed pilot - often underresourced, poorly scoped, or abandoned after the first difficulty - is used as evidence that the approach does not apply to this organization. A single unsuccessful attempt becomes generalized proof of impossibility.

The telltale sign: the conversation about CD always ends with a “but” - and the team reaches the “but” faster each time the topic comes up.

Why This Is a Problem

The “we’re different” mindset is self-reinforcing. Each time a reason not to start is accepted, the organization’s delivery problems persist, which produces more evidence that the system is too hard to change, which makes the next reason not to start feel more credible. The gap between the organization and its more capable peers widens over time.

It reduces quality

A defect introduced today will be found in manual regression testing three weeks from now, after batch changes have compounded it with a dozen other modifications. The developer has moved on, the context is gone, and the fix takes three times as long as it would have at the time of writing. That cost repeats on every release.

Each release involves more manual testing, more coordination, more risk from large batches of accumulated changes. The “we’re different” position does not protect quality; it protects the status quo while quality quietly erodes. Organizations that do start CD improvement, even in small steps, consistently report better defect detection and lower production incident rates than they had before.

It increases rework

An hour of manual regression testing on every release, run by people who did not write the code, is an hour that automation would eliminate - and it compounds with every release. Manual test execution, manual deployment processes, manual environment setup each represent repeated effort that the “we’re different” mindset locks in permanently.

Teams that do not practice CD tend to have longer feedback loops. A defect introduced today is discovered in integration testing three weeks from now, at which point the developer has to context-switch back to code they no longer remember clearly. The rework of late defect discovery is real, measurable, and avoidable - but only if the team is willing to build the testing and integration practices that catch defects earlier.

It makes delivery timelines unpredictable

Ask a team using this pattern when the next release will be done. They cannot tell you. Long release cycles, complex manual processes, and large batches of accumulated changes combine to make each release a unique, uncertain event. When every release is a special case, there is no baseline for improvement and no predictable delivery cadence.

CD improves predictability precisely because it makes delivery routine. When deployment happens frequently through an automated pipeline, each deployment is small, understood, and follows a consistent process. The “we’re different” organizations have the most to gain from this routinization - and the longest path to it, which the mindset ensures they never begin.

Impact on continuous delivery

The “we’re different” mindset prevents CD adoption not by identifying insurmountable barriers but by preventing the work of understanding which barriers are real, which are assumed, and which could be addressed with modest effort. Most organizations that have successfully adopted CD started with systems and constraints that looked, from the outside, like the objections their peers were raising.

The regulated industries argument deserves direct rebuttal: banks, insurance companies, healthcare systems, and defense contractors practice CD. The regulation constrains what must be documented and audited, not how frequently software is tested and deployed. The teams that figured this out did not have a different regulatory environment - they had a different starting assumption about whether starting was possible.

How to Fix It

Step 1: Audit the objections for specificity

List every reason currently cited for why CD is not applicable. For each reason, find the specific constraint: cite the regulation by name, identify the specific part of the legacy system that cannot be changed, describe the specific customer requirement that prevents frequent deployment. Many objections do not survive the specificity test - they dissolve into “we assumed this was true but haven’t checked.”

For those that survive, determine whether the constraint applies to all practices or only some. A compliance requirement that mandates separation of duties does not prevent automated testing. A legacy monolith that cannot be broken up this year can still have its deployment automated.

Step 2: Find one team and one practice where the objections do not apply

Even in highly constrained organizations, some team or some part of the system is less constrained than the general case. Identify the team with the cleanest codebase, the fewest dependencies, the most autonomy over their deployment process. Start there. Apply one practice - automated testing, trunk-based development, automated deployment to a non-production environment. Generate evidence that it works in this organization, with this technology, under these constraints.

Step 3: Document the actual regulatory constraints (Weeks 2-4)

Engage the compliance or legal team directly with a specific question: “Here is a practice we want to adopt. Does our regulatory framework prohibit it?” In most cases the answer is “no” or “yes, but here is what you would need to document to satisfy the requirement.” The documentation requirement is manageable; the vague assumption that “regulation prohibits this” is not.

Bring the regulatory analysis back to the engineering conversation. “We checked. The regulation requires an audit trail for deployments, not a human approval gate. Our pipeline can generate the audit trail automatically.” Specificity defuses the objection.

Step 4: Run a structured constraint analysis (Weeks 3-6)

For each genuine technical constraint identified in Step 1, assess:

  • Can this constraint be removed in 30 days? 90 days? 1 year?
  • What would removing it make possible?
  • What is the cost of not removing it over the same period?

This produces a prioritized improvement backlog grounded in real constraints rather than assumed impossibility. The framing shifts from “we can’t do CD” to “here are the specific things we need to address before we can adopt this specific practice.”

Step 5: Build the internal case with evidence (Ongoing)

Each successful improvement creates evidence that contradicts the “we’re different” position. A team that automated their deployment in a regulated environment has demonstrated that automation and compliance are compatible. A team that moved to trunk-based development on a fifteen-year-old codebase has demonstrated that age is not a barrier to good practices. Document these wins explicitly and share them. The “we’re different” mindset is defeated by examples, not arguments.

ObjectionResponse
“We’re in a regulated industry and have compliance requirements”Name the specific regulation and the specific requirement. Most compliance frameworks require traceability and separation of duties, which automated pipelines satisfy better than manual processes. Regulated organizations including banks, insurers, and healthcare companies practice CD today.
“Our technology is too old to automate”Age does not prevent incremental improvement. The first goal is not full CD - it is one automated test that catches one class of defect earlier. Start there. The system does not need to be fully modernized before automation provides value.
“We’re too large and too integrated”Size and integration complexity are the symptoms that CD addresses. The path through them is incremental decoupling, starting with the highest-value seams. Large integrated systems benefit from CD more than small systems do - the pain of manual releases scales with size.
“Our customers require formal release announcements”Check whether this is a stated customer requirement or an assumed one. Many “customer requirements” for quarterly releases are internal assumptions that have never been tested with actual customers. Feature flags can provide customers the stability of a formal release while the team deploys continuously.

Measuring Progress

MetricWhat to look for
Number of “we can’t do this because” objections with specific cited evidenceShould decrease as objections are tested against reality and either resolved or properly scoped
Release frequencyShould increase as barriers are addressed and deployment becomes more routine
Lead timeShould decrease as practices that reduce handoffs and manual steps are adopted
Number of teams practicing at least one CD-adjacent practiceShould grow as the pilot demonstrates viability
Change fail rateShould remain stable or improve as automation replaces manual processes

7 - Deferring CD Until After the Rewrite

CD adoption is deferred until a mythical rewrite that may never happen, while the existing system continues to be painful to deploy.

Category: Organizational & Cultural | Quality Impact: Medium

What This Looks Like

The engineering team has a plan. The current system is a fifteen-year-old monolith: undocumented, tightly coupled, slow to build, and painful to deploy. Everyone agrees it needs to be replaced. The new architecture is planned: microservices, event-driven, cloud-native, properly tested from the start. When the new system is ready, the team will practice CD properly.

The rewrite was scoped two years ago. The first service was delivered. The second is in progress. The third has been descoped twice. The monolith continues to receive new features because business cannot wait for the rewrite. The old system is as painful to deploy as ever. New features are being added to the system that was supposed to be abandoned. The rewrite horizon has moved from “Q4 this year” to “sometime next year” to “when we get the migration budget approved.”

The team is waiting for a future state to start doing things better. The future state keeps retreating. The present state keeps getting worse.

Common variations:

  • The platform prerequisite. “We can’t practice CD until we have the new platform.” The new platform is eighteen months away. In the meantime, deployments remain manual and painful. The platform arrives - and is missing the one capability the team needed, which requires another six months of work.
  • The containerization first. “We need to containerize everything before we can build a proper pipeline.” Containerization is a reasonable goal, but it is not a prerequisite for automated testing, trunk-based development, or deployment automation. The team waits for containerization before improving any practice.
  • The greenfield sidestep. When asked why the current system does not have automated tests, the answer is “that codebase is untestable; we’re writing the new system with tests.” The new system is a side project that may never replace the primary system. Meanwhile, the primary system ships defects that tests would have caught.
  • The waiting for tooling. “Once we’ve migrated to [new CI tool], we’ll build out the pipeline properly.” The tooling migration takes a year. Building the pipeline properly does not start when the tool arrives because by then a new prerequisite has emerged.

The telltale sign: the phrase “once we finish the rewrite” has appeared in planning conversations for more than a year, and the completion date has moved at least twice.

Why This Is a Problem

Deferral is a form of compounding debt. Each month the existing system continues to be deployed manually is a month of manual deployment effort that automation would have eliminated. Each month without automated testing is a month of defects that would have been caught earlier. The future improvement, when it arrives, must pay for itself against an accumulating baseline of foregone benefit.

It reduces quality

A user hits a bug in the existing system today. The fix is delayed because the team is focused on the rewrite. “We’ll get it right in the new system” is not comfort to the user affected now - or to the users who will be affected by the next bug from a codebase with no automated tests.

There is also a structural risk: the existing system continues to receive features. Features added to the “soon to be replaced” system are written without the quality discipline the team plans to apply to the new system. The technical debt accelerates because everyone knows the system is temporary. By the time the rewrite is complete - if it ever is - the existing system has accumulated years of change made under the assumption that quality does not matter because the system will be replaced.

It increases rework

The new system goes live. Within two weeks, the business discovers it does not handle a particular edge case that the old system handled silently for years. Nobody wrote it down. The team spends a sprint reverse-engineering and replicating behavior that a test suite on the old system would have documented automatically. This happens not once but repeatedly throughout the migration.

Deferring test automation also defers the discovery of architectural problems. In teams that write tests, untestable code is discovered immediately when trying to write the first test. In teams that defer testing to the new system, the architectural problems that make testing hard are discovered only during the rewrite - when they are significantly more expensive to address.

It makes delivery timelines unpredictable

The rewrite was scoped at six months. At month four, the team discovers the existing system has integrations nobody documented. The timeline moves to nine months. At month seven, scope increases because the business added new requirements. The horizon is always receding.

When the rewrite slips, the CD adoption it was supposed to unlock also slips. The team is delivering against two roadmaps: the existing system’s features (which the business needs now) and the new system’s construction (which nobody is willing to slow down). Both slip. The existing system’s delivery timeline remains painful. The new system’s delivery timeline is aspirational and usually wrong.

Impact on continuous delivery

CD is a set of practices that can be applied incrementally to existing systems. Waiting for a rewrite to start those practices means not benefiting from them for the duration of the rewrite and then having to build them fresh on the new system without the organizational experience of having used them on anything real.

Teams that introduce CD practices to existing systems - even painful, legacy systems - build the organizational muscle memory and tooling that transfers to the new system. Automated testing on the legacy system, however imperfect, is experience that informs how tests are written on the new system. Deployment automation for the legacy system is practice for deployment automation on the new system. Deferring CD defers not just the benefits but the organizational learning.

How to Fix It

Step 1: Identify what can improve now, without the rewrite

List the specific practices the team is deferring to the rewrite. For each one, identify the specific technical barrier: “We can’t add tests because class X has 12 dependencies that cannot be injected.” Then determine whether the barrier applies to all parts of the system or only some.

In most legacy systems, there are areas with lower coupling that can be tested today. There is a deployment process that can be automated even if the application architecture is not ideal. There is a build process that can be made faster. Not everything is blocked by the rewrite.

Step 2: Start the “strangler fig” for at least one CD practice (Weeks 2-4)

The strangler fig pattern - wrapping old behavior with new - applies to practices as well as architecture. Choose one CD practice and apply it to the new code being added to the existing system, even while the old code remains unchanged.

For example: all new classes written in the existing system are testable (properly isolated with injected dependencies). Old untestable classes are not rewritten, but no new untestable code is added. Over time, the testable fraction of the codebase grows. The rewrite is not a prerequisite for this improvement - a team agreement is.

Step 3: Automate the deployment of the existing system (Weeks 3-8)

Manual deployment of the existing system is a cost paid on every deployment. Deployment automation does not require a new architecture. Even a monolith with a complex deployment process can have that process codified in a pipeline script. The benefit is immediate. The organizational experience of running an automated deployment pipeline transfers directly to the new system when it is ready.

Step 4: Set a “both systems healthy” standard for the rewrite (Weeks 4-8)

Reframing the rewrite as a migration rather than an escape hatch changes the team’s relationship to the existing system. The standard: both systems should be healthy. The existing system receives the same deployment pipeline investment as the new system. Tests are written for new features on the existing system. Operational monitoring is maintained on the existing system.

This creates two benefits. First, the existing system is better cared for. Second, the team stops treating the rewrite as the only path to quality improvement, which reduces the urgency that has been artificially attached to the rewrite timeline.

Step 5: Establish criteria for declaring the rewrite “done” (Ongoing)

Rewrites without completion criteria never end. Define explicitly what the rewrite achieves: what functionality must be migrated, what performance targets must be met, what CD practices must be operational. When those criteria are met, the rewrite is done. This prevents the horizon from receding indefinitely.

ObjectionResponse
“The existing codebase is genuinely untestable - you cannot add tests to it”Some code is very hard to test. But “very hard” is not “impossible.” Characterization testing, integration tests at the boundary, and applying the strangler fig to new additions are all available. Even imperfect test coverage on an existing system is better than none.
“We don’t want to invest in automation for code we’re about to throw away”You are not about to throw it away - you have been about to throw it away for two years. The expected duration of the investment is the duration of the rewrite, which is already longer than estimated. A year of automated deployment benefit is real return.
“The new system will be built with CD from the start, so we’ll get the benefits there”That is true, but it ignores that the existing system is what your users depend on today. Defects escaping from the existing system cost real money, regardless of how clean the new system’s practices will be.

Measuring Progress

MetricWhat to look for
Percentage of new code in existing system covered by automated testsShould increase from the current baseline as new code is held to a higher standard
Release frequencyShould increase as deployment automation reduces the friction of deploying the existing system
Lead timeShould decrease for the existing system as manual steps are automated
Rewrite completion percentage vs. original estimateTracking this honestly surfaces how much the horizon has moved
Change fail rateShould decrease for the existing system as test coverage increases