This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Phase 4: Deliver on Demand

The capability to deploy any change to production at any time, using the delivery strategy that fits your context.

1: Deploy on Demand
2: Progressive Rollout
3: Experience Reports

Key question: “Can we deliver any change to production when the business needs it?”

This is the destination: you can deploy any change that passes the pipeline to production whenever you choose. Some teams will auto-deploy every commit (continuous deployment). Others will deploy on demand when the business is ready. Both are valid - the capability is what matters, not the trigger.

What You’ll Do

Deploy on demand - Remove the last manual gates so any green build can reach production
Use progressive rollout - Canary, blue-green, and percentage-based deployments
Explore ACD - AI-assisted continuous delivery patterns
Learn from experience reports - How other teams made the journey

Continuous Delivery vs. Continuous Deployment

These terms are often confused. The distinction matters for this phase:

Continuous delivery means every commit that passes the pipeline could be deployed to production at any time. The capability exists. A human or business process decides when.
Continuous deployment means every commit that passes the pipeline is deployed to production automatically. No human decision is involved.

Continuous delivery is the goal of this migration guide. Continuous deployment is one delivery strategy that works well for certain contexts - SaaS products, internal tools, services behind feature flags. It is not a higher level of maturity. A team that deploys on demand with a one-click deploy is just as capable as a team that auto-deploys every commit.

Why This Phase Matters

When your foundations are solid, your pipeline is reliable, and your batch sizes are small, deploying any change becomes low-risk. The remaining barriers are organizational, not technical: approval processes, change windows, release coordination. This phase addresses those barriers so the team has the option to deploy whenever the business needs it.

Signs You’ve Arrived

Any commit that passes the pipeline can reach production within minutes
The team deploys frequently (daily or more) with no drama
Mean time to recovery is measured in minutes
The team has confidence that any deployment can be safely rolled back
New team members can deploy on their first day
The deployment strategy (on-demand or automatic) is a team choice, not a constraint

Phase 3: Optimize - the previous phase that establishes small batches, feature flags, and flow improvements
Fear of Deploying - a deployment symptom that this phase eliminates by making deployment routine and low-risk
Infrequent Releases - a symptom directly addressed by delivering on demand
DORA Recommended Practices - the research-backed capabilities that underpin delivery performance
Deployment Frequency - the primary metric that reflects delivery-on-demand capability
Mean Time to Repair - the recovery metric that progressive rollout and automated rollback improve

1 - Deploy on Demand

Remove the last manual gates and deploy every change that passes the pipeline.

Phase 4 - Deliver on Demand | Original content

Deploy on demand means that any change which passes the full automated pipeline can reach production without waiting for a human to press a button, open a ticket, or schedule a window. This page covers the prerequisites, the transition from continuous delivery to continuous deployment, and how to address the organizational concerns that are the real barriers.

Continuous Delivery vs. Continuous Deployment

These two terms are often confused. The distinction matters:

Continuous Delivery: Every commit that passes the pipeline could be deployed to production. A human decides when to deploy.
Continuous Deployment: Every commit that passes the pipeline is deployed to production. No human decision is required.

If you have completed Phases 1-3 of this migration, you have continuous delivery. This page is about removing that last manual decision and moving to continuous deployment.

Why Remove the Last Gate?

The manual deployment decision feels safe. It gives someone a chance to “eyeball” the change before it goes to production. In practice, it does the opposite.

The Problems with Manual Gates

Problem	Why It Happens	Impact
Batching	If deploys are manual, teams batch changes to reduce the number of deploy events	Larger batches increase risk and make rollback harder
Delay	Changes wait for someone to approve, which may take hours or days	Longer lead time, delayed feedback
False confidence	The approver cannot meaningfully review what the automated pipeline already tested	The gate provides the illusion of safety without actual safety
Bottleneck	One person or team becomes the deploy gatekeeper	Creates a single point of failure for the entire delivery flow
Deploy fear	Infrequent deploys mean each deploy is higher stakes	Teams become more cautious, batches get larger, risk increases

The Paradox of Manual Safety

The more you rely on manual deployment gates, the less safe your deployments become. This is because manual gates lead to batching, batching increases risk, and increased risk justifies more manual gates. It is a vicious cycle.

Continuous deployment breaks this cycle. Small, frequent, automated deployments are individually low-risk. If one fails, the blast radius is small and recovery is fast.

Prerequisites for Deploy on Demand

Before removing manual gates, verify that these conditions are met. Each one is covered in earlier phases of this migration.

Non-Negotiable Prerequisites

Prerequisite	What It Means	Where to Build It
Comprehensive automated tests	The test suite catches real defects, not just trivial cases	Testing Fundamentals
Fast, reliable pipeline	The pipeline completes in under 15 minutes and rarely fails for non-code reasons	Deterministic Pipeline
Automated rollback	You can roll back a bad deployment in minutes without manual intervention	Rollback
Feature flags	Incomplete features are hidden from users via flags, not deployment timing	Feature Flags
Small batch sizes	Each deployment contains 1-3 small changes, not dozens	Small Batches
Production-like environments	Test environments match production closely enough that test results are trustworthy	Production-Like Environments
Observability	You can detect production issues within minutes through monitoring and alerting	Metrics-Driven Improvement

Assessment: Are You Ready?

Answer these questions honestly:

When was the last time your pipeline caught a real bug? If the answer is “I don’t remember,” your test suite may not be trustworthy enough.
How long does a rollback take? If the answer is more than 15 minutes, automate it first.
Do deploys ever fail for non-code reasons? (Environment issues, credential problems, network flakiness.) If yes, stabilize your pipeline first.
Does the team trust the pipeline? If team members regularly say “let me check one more thing before we deploy,” trust is not there yet. Build it through retrospectives and transparent metrics.

The Transition: Three Approaches

Approach 1: Shadow Mode

Run continuous deployment alongside manual deployment. Every change that passes the pipeline is automatically deployed to a shadow production environment (or a canary group). A human still approves the “real” production deployment.

Duration: 2-4 weeks.

What you learn: How often the automated deployment would have been correct. If the answer is “every time” (or close to it), the manual gate is not adding value.

Transition: Once the team sees that the shadow deployments are consistently safe, remove the manual gate.

Approach 2: Opt-In per Team

Allow individual teams to adopt continuous deployment while others continue with manual gates. This works well in organizations with multiple teams at different maturity levels.

Duration: Ongoing. Teams opt in when they are ready.

What you learn: Which teams are ready and which need more foundation work. Early adopters demonstrate the pattern for the rest of the organization.

Transition: As more teams succeed, continuous deployment becomes the default. Remaining teams are supported in reaching readiness.

Approach 3: Direct Switchover

Remove the manual gate for all teams at once. This is appropriate when the organization has high confidence in its pipeline and all teams have completed Phases 1-3.

Duration: Immediate.

What you learn: Quickly reveals any hidden dependencies on the manual gate (e.g., deploy coordination between teams, configuration changes that ride along with deployments).

Transition: Be prepared to temporarily revert if unforeseen issues arise. Have a clear rollback plan for the process change itself.

Addressing Organizational Concerns

The technical prerequisites are usually met before the organizational ones. These are the conversations you will need to have.

“What about change management / ITIL?”

Change management frameworks like ITIL define a “standard change” category: a pre-approved, low-risk, well-understood change that does not require a Change Advisory Board (CAB) review. Continuous deployment changes qualify as standard changes because they are:

Small (one to a few commits)
Automated (same pipeline every time)
Reversible (automated rollback)
Well-tested (comprehensive automated tests)

Work with your change management team to classify pipeline-passing deployments as standard changes. This preserves the governance framework while removing the bottleneck.

“What about compliance and audit?”

Continuous deployment does not eliminate audit trails - it strengthens them. Every deployment is:

Traceable: Tied to a specific commit, which is tied to a specific story or ticket
Reproducible: The same pipeline produces the same result every time
Recorded: Pipeline logs capture every test that passed, every approval that was automated
Reversible: Rollback history shows when and why a deployment was reverted

Provide auditors with access to pipeline logs, deployment history, and the automated test suite. This is a more complete audit trail than a manual approval signature.

“What about database migrations?”

Database migrations require special care in continuous deployment because they cannot be rolled back as easily as code changes.

Rules for database migrations in CD:

Migrations must be backward-compatible. The previous version of the code must work with the new schema.
Use expand/contract pattern. First deploy the new column/table (expand). Then deploy the code that uses it. Then remove the old column/table (contract). Each step is a separate deployment.
Never drop a column in the same deployment that stops using it. There is always a window where both old and new code run simultaneously.
Test migrations in production-like environments before they reach production.

“What if we deploy a breaking change?”

This is why you have automated rollback and observability. The sequence is:

Deployment happens automatically
Monitoring detects an issue (error rate spike, latency increase, health check failure)
Automated rollback triggers (or on-call engineer triggers manual rollback)
The team investigates and fixes the issue
The fix goes through the pipeline and deploys automatically

The key insight: this sequence takes minutes with continuous deployment. With manual deployment on a weekly schedule, the same breaking change would take days to detect and fix.

After the Transition

What Changes for the Team

Before	After
“Are we deploying today?”	Deploys happen automatically, all the time
“Who’s doing the deploy?”	Nobody - the pipeline does it
“Can I get this into the next release?”	Every merge to trunk is the next release
“We need to coordinate the deploy with team X”	Teams deploy independently
“Let’s wait for the deploy window”	There are no deploy windows

What Stays the Same

Code review still happens (before merge to trunk)
Automated tests still run (in the pipeline)
Feature flags still control feature visibility (decoupling deploy from release)
Monitoring still catches issues (but now recovery is faster)
The team still owns its deployments (but the manual step is gone)

The First Week

The first week of continuous deployment will feel uncomfortable. This is normal. The team will instinctively want to “check” deployments that happen automatically. Resist the urge to add manual checks back. Instead:

Watch the monitoring dashboards more closely than usual
Have the team discuss each automatic deployment in standup for the first week
Celebrate the first deployment that goes out without anyone noticing - that is the goal

Key Pitfalls

1. “We adopted continuous deployment but kept the approval step ‘just in case’”

If the approval step exists, it will be used, and you have not actually adopted continuous deployment. Remove the gate completely. If something goes wrong, use rollback - do not use a pre-deployment gate.

2. “Our deploy cadence didn’t actually increase”

Continuous deployment only increases deploy frequency if the team is integrating to trunk frequently. If the team still merges weekly, they will deploy weekly - automatically, but still weekly. Revisit Trunk-Based Development and Small Batches.

3. “We have continuous deployment for the application but not the database/infrastructure”

Partial continuous deployment creates a split experience: application changes flow freely but infrastructure changes still require manual coordination. Extend the pipeline to cover infrastructure as code, database migrations, and configuration changes.

Measuring Success

Metric	Target	Why It Matters
Deployment frequency	Multiple per day	Confirms the pipeline is deploying every change
Lead time	< 1 hour from commit to production	Confirms no manual gates are adding delay
Manual interventions per deploy	Zero	Confirms the process is fully automated
Change failure rate	Stable or improving	Confirms automation is not introducing new failures
MTTR	< 15 minutes	Confirms automated rollback is working

Next Step

Continuous deployment deploys every change, but not every change needs to go to every user at once. Progressive Rollout strategies let you control who sees a change and how quickly it spreads.

Infrequent Releases - the primary symptom that deploy on demand resolves
Merge Freeze - a symptom caused by manual deployment gates that disappears with continuous deployment
Fear of Deploying - a cultural symptom that fades as automated deployments become routine
CAB Gates - an organizational anti-pattern that this guide addresses through standard change classification
Manual Deployments - the pipeline anti-pattern that deploy on demand eliminates
Deployment Frequency - the key metric for measuring deploy-on-demand adoption

2 - Progressive Rollout

Use canary, blue-green, and percentage-based deployments to reduce deployment risk.

Phase 4 - Deliver on Demand | Original content

Progressive rollout strategies let you deploy to production without deploying to all users simultaneously. By exposing changes to a small group first and expanding gradually, you catch problems before they affect your entire user base. This page covers the three major strategies, when to use each, and how to implement automated rollback.

Why Progressive Rollout?

Even with comprehensive tests, production-like environments, and small batch sizes, some issues only surface under real production traffic. Progressive rollout is the final safety layer: it limits the blast radius of any deployment by exposing the change to a small audience first.

This is not a replacement for testing. It is an addition. Your automated tests should catch the vast majority of issues. Progressive rollout catches the rest - the issues that depend on real user behavior, real data volumes, or real infrastructure conditions that cannot be fully replicated in test environments.

The Three Strategies

Strategy 1: Canary Deployment

A canary deployment routes a small percentage of production traffic to the new version while the majority continues to hit the old version. If the canary shows no problems, traffic is gradually shifted.

Canary deployment traffic split diagram

┌─────────────────┐
                   5%   │  New Version     │  ← Canary
                ┌──────►│  (v2)            │
                │       └─────────────────┘
  Traffic ──────┤
                │       ┌─────────────────┐
                └──────►│  Old Version     │  ← Stable
                  95%   │  (v1)            │
                        └─────────────────┘

How it works:

Deploy the new version alongside the old version
Route 1-5% of traffic to the new version
Compare key metrics (error rate, latency, business metrics) between canary and stable
If metrics are healthy, increase traffic to 25%, 50%, 100%
If metrics degrade, route all traffic back to the old version

When to use canary:

Changes that affect request handling (API changes, performance optimizations)
Changes where you want to compare metrics between old and new versions
Services with high traffic volume (you need enough canary traffic for statistical significance)

When canary is not ideal:

Changes that affect batch processing or background jobs (no “traffic” to route)
Very low traffic services (the canary may not get enough traffic to detect issues)
Database schema changes (both versions must work with the same schema)

Implementation options:

Infrastructure	How to Route Traffic
Kubernetes + service mesh (Istio, Linkerd)	Weighted routing rules in VirtualService
Load balancer (ALB, NGINX)	Weighted target groups
CDN (CloudFront, Fastly)	Origin routing rules
Application-level	Feature flag with percentage rollout

Strategy 2: Blue-Green Deployment

Blue-green deployment maintains two identical production environments. At any time, one (blue) serves live traffic and the other (green) is idle or staging.

Blue-green deployment traffic switch diagram

BEFORE:
    Traffic ──────► [Blue - v1] (ACTIVE)
                    [Green]     (IDLE)

  DEPLOY:
    Traffic ──────► [Blue - v1] (ACTIVE)
                    [Green - v2] (DEPLOYING / SMOKE TESTING)

  SWITCH:
    Traffic ──────► [Green - v2] (ACTIVE)
                    [Blue - v1]  (STANDBY / ROLLBACK TARGET)

How it works:

Deploy the new version to the idle environment (green)
Run smoke tests against green to verify basic functionality
Switch the router/load balancer to point all traffic at green
Keep blue running as an instant rollback target
After a stability period, repurpose blue for the next deployment

When to use blue-green:

You need instant, complete rollback (switch the router back)
You want to test the deployment in a full production environment before routing traffic
Your infrastructure supports running two parallel environments cost-effectively

When blue-green is not ideal:

Stateful applications where both environments share mutable state
Database migrations (the new version’s schema must work for both environments during transition)
Cost-sensitive environments (maintaining two full production environments doubles infrastructure cost)

Rollback speed: Seconds. Switching the router back is the fastest rollback mechanism available.

Strategy 3: Percentage-Based Rollout

Percentage-based rollout gradually increases the number of users who see the new version. Unlike canary (which is traffic-based), percentage rollout is typically user-based - a specific user always sees the same version during the rollout period.

Percentage-based rollout schedule

Hour 0:   1% of users  → v2,  99% → v1
  Hour 2:   5% of users  → v2,  95% → v1
  Hour 8:  25% of users  → v2,  75% → v1
  Day 2:   50% of users  → v2,  50% → v1
  Day 3:  100% of users  → v2

How it works:

Enable the new version for a small percentage of users (using feature flags or infrastructure routing)
Monitor metrics for the affected group
Gradually increase the percentage over hours or days
At any point, reduce the percentage back to 0% if issues are detected

When to use percentage rollout:

User-facing feature changes where you want consistent user experience (a user always sees v1 or v2, not a random mix)
Changes that benefit from A/B testing data (compare user behavior between groups)
Long-running rollouts where you want to collect business metrics before full exposure

When percentage rollout is not ideal:

Backend infrastructure changes with no user-visible impact
Changes that affect all users equally (e.g., API response format changes)

Implementation: Percentage rollout is typically implemented through Feature Flags (Level 2 or Level 3), using the user ID as the hash key to ensure consistent assignment.

Choosing the Right Strategy

Factor	Canary	Blue-Green	Percentage
Rollback speed	Seconds (reroute traffic)	Seconds (switch environments)	Seconds (disable flag)
Infrastructure cost	Low (runs alongside existing)	High (two full environments)	Low (same infrastructure)
Metric comparison	Strong (side-by-side comparison)	Weak (before/after only)	Strong (group comparison)
User consistency	No (each request may hit different version)	Yes (all users see same version)	Yes (each user sees consistent version)
Complexity	Moderate	Moderate	Low (if you have feature flags)
Best for	API changes, performance changes	Full environment validation	User-facing features

Many teams use more than one strategy. A common pattern:

Blue-green for infrastructure and platform changes
Canary for service-level changes
Percentage rollout for user-facing feature changes

Automated Rollback

Progressive rollout is only effective if rollback is automated. A human noticing a problem at 3 AM is not a reliable rollback mechanism.

Metrics to Monitor

Define automated rollback triggers before deploying. Common triggers:

Metric	Trigger Condition	Example
Error rate	Canary error rate > 2x stable error rate	Stable: 0.1%, Canary: 0.3% -> rollback
Latency (p99)	Canary p99 > 1.5x stable p99	Stable: 200ms, Canary: 400ms -> rollback
Health check	Any health check failure	HTTP 500 on /health -> rollback
Business metric	Conversion rate drops > 5% for canary group	10% conversion -> 4% conversion -> rollback
Saturation	CPU or memory exceeds threshold	CPU > 90% for 5 minutes -> rollback

Automated Rollback Flow

Automated rollback flow diagram

Deploy new version
       │
       ▼
Route 5% of traffic to new version
       │
       ▼
Monitor for 15 minutes
       │
       ├── Metrics healthy ──────► Increase to 25%
       │                                │
       │                                ▼
       │                          Monitor for 30 minutes
       │                                │
       │                                ├── Metrics healthy ──────► Increase to 100%
       │                                │
       │                                └── Metrics degraded ─────► ROLLBACK
       │
       └── Metrics degraded ─────► ROLLBACK

Implementation Tools

Tool	How It Helps
Argo Rollouts	Kubernetes-native progressive delivery with automated analysis and rollback
Flagger	Progressive delivery operator for Kubernetes with Istio, Linkerd, or App Mesh
Spinnaker	Multi-cloud deployment platform with canary analysis
Custom scripts	Query your metrics system, compare thresholds, trigger rollback via API

The specific tool matters less than the principle: define rollback criteria before deploying, monitor automatically, and roll back without human intervention.

Implementing Progressive Rollout

Step 1: Choose Your First Strategy

Pick the strategy that matches your infrastructure:

If you already have feature flags: start with percentage-based rollout
If you have Kubernetes with a service mesh: start with canary
If you have parallel environments: start with blue-green

Step 2: Define Rollback Criteria

Before your first progressive deployment:

Identify the 3-5 metrics that define “healthy” for your service
Define numerical thresholds for each metric
Define the monitoring window (how long to wait before advancing)
Document the rollback procedure (even if automated, document it for human understanding)

Step 3: Run a Manual Progressive Rollout

Before automating, run the process manually:

Deploy to a canary or small percentage
A team member monitors the dashboard for the defined window
The team member decides to advance or rollback
Document what they checked and how they decided

This manual practice builds understanding of what the automation will do.

Step 4: Automate the Rollout

Replace the manual monitoring with automated checks:

Implement metric queries that check your rollback criteria
Implement automated traffic shifting (advance or rollback based on metrics)
Implement alerting so the team knows when a rollback occurs
Test the automation by intentionally deploying a known-bad change (in a controlled way)

Key Pitfalls

1. “Our canary doesn’t get enough traffic for meaningful metrics”

If your service handles 100 requests per hour, a 5% canary gets 5 requests per hour - not enough to detect problems statistically. Solutions: use a higher canary percentage (25-50%), use longer monitoring windows, or use blue-green instead (which does not require traffic splitting).

2. “We have progressive rollout but rollback is still manual”

Progressive rollout without automated rollback is half a solution. If the canary shows problems at 2 AM and nobody is watching, the damage occurs before anyone responds. Automated rollback is the essential companion to progressive rollout.

3. “We treat progressive rollout as a replacement for testing”

Progressive rollout is the last line of defense, not the first. If you are regularly catching bugs in canary that your test suite should have caught, your test suite needs improvement. Progressive rollout should catch rare, production-specific issues - not common bugs.

4. “Our rollout takes days because we’re too cautious”

A rollout that takes a week negates the benefits of continuous deployment. If your confidence in the pipeline is low enough to require a week-long rollout, the issue is pipeline quality, not rollout speed. Address the root cause through better testing and more production-like environments.

Measuring Success

Metric	Target	Why It Matters
Automated rollbacks per month	Low and stable	Confirms the pipeline catches most issues before production
Time from deploy to full rollout	Hours, not days	Confirms the team has confidence in the process
Incidents caught by progressive rollout	Tracked (any number)	Confirms the progressive rollout is providing value
Manual interventions during rollout	Zero	Confirms the process is fully automated

Next Step

With deploy on demand and progressive rollout, your technical deployment infrastructure is complete. ACD explores how AI-assisted patterns can extend these practices further.

Fear of Deploying - a symptom that progressive rollout eliminates by limiting blast radius
Production Issues Found by Customers - a visibility problem that automated canary analysis helps detect before users are affected
Staging Passes, Production Fails - a symptom that progressive rollout mitigates by catching production-specific issues early
Feature Flags - the foundation for percentage-based rollout strategies
Blind Operations - an anti-pattern that must be resolved before automated rollback can work
Change Failure Rate - the metric that progressive rollout helps keep low by catching issues before full exposure

3 - Experience Reports

Real-world stories from teams that have made the journey to continuous deployment.

Phase 4 - Deliver on Demand

Theory is necessary but insufficient. This page collects experience reports from organizations that have adopted continuous deployment at scale, including the challenges they faced, the approaches they took, and the results they achieved. These reports demonstrate that CD is not limited to startups or greenfield projects - it works in large, complex, regulated environments.

Why Experience Reports Matter

Every team considering continuous deployment faces the same objection: “That works for [Google / Netflix / small startups], but our situation is different.” Experience reports counter this objection with evidence. They show that organizations of every size, in every industry, with every kind of legacy system, have found a path to continuous deployment.

No experience report will match your situation exactly. That is not the point. The point is to extract patterns: what obstacles did these teams encounter, and how did they overcome them?

Walmart: CD at Retail Scale

Context

Walmart operates one of the world’s largest e-commerce platforms alongside its massive physical retail infrastructure. Changes to the platform affect millions of transactions per day. The organization had a traditional release process with weekly deployment windows and multi-stage manual approval.

The Challenge

Scale: Thousands of developers across hundreds of teams
Risk tolerance: Any outage affects revenue in real time
Legacy: Decades of existing systems with deep interdependencies
Regulation: PCI compliance requirements for payment processing

What They Did

Invested in a centralized deployment platform (OneOps, later Concord) that standardized the deployment pipeline across all teams
Broke the monolithic release into independent service deployments
Implemented automated canary analysis for every deployment
Moved from weekly release trains to on-demand deployment per team

Key Lessons

Platform investment pays off. Building a shared deployment platform let hundreds of teams adopt CD without each team solving the same infrastructure problems.
Compliance and CD are compatible. Automated pipelines with full audit trails satisfied PCI requirements more reliably than manual approval processes.
Cultural change is harder than technical change. Teams that had operated on weekly release cycles for years needed coaching and support to trust automated deployment.

Microsoft: From Waterfall to Daily Deploys

Context

Microsoft’s Azure DevOps (formerly Visual Studio Team Services) team made a widely documented transformation from 3-year waterfall releases to deploying multiple times per day. This transformation happened within one of the largest software organizations in the world.

The Challenge

History: Decades of waterfall development culture
Product complexity: A platform used by millions of developers
Organizational size: Thousands of engineers across multiple time zones
Customer expectations: Enterprise customers expected stability and predictability

What They Did

Broke the product into independently deployable services (ring-based deployment)
Implemented a ring-based rollout: Ring 0 (team), Ring 1 (internal Microsoft users), Ring 2 (select external users), Ring 3 (all users)
Invested heavily in automated testing, achieving thousands of tests running in minutes
Moved from a fixed release cadence to continuous deployment with feature flags controlling release
Used telemetry to detect issues in real-time and automated rollback when metrics degraded

Key Lessons

Ring-based deployment is progressive rollout. Microsoft’s ring model is an implementation of the progressive rollout strategies described in this guide.
Feature flags enabled decoupling. By deploying frequently but releasing features incrementally via flags, the team could deploy without worrying about feature completeness.
The transformation took years, not months. Moving from 3-year cycles to daily deployment was a multi-year journey with incremental progress at each step.

Google: Engineering Productivity at Scale

Context

Google is often cited as the canonical example of continuous deployment, deploying changes to production thousands of times per day across its vast service portfolio.

The Challenge

Scale: Billions of users, millions of servers
Monorepo: Most of Google operates from a single repository with billions of lines of code
Interdependencies: Changes in shared libraries can affect thousands of services
Velocity: Thousands of engineers committing changes every day

What They Did

Built a culture of automated testing where tests are a first-class deliverable, not an afterthought
Implemented a submit queue that runs automated tests on every change before it merges to the trunk
Invested in build infrastructure (Blaze/Bazel) that can build and test only the affected portions of the codebase
Used percentage-based rollout for user-facing changes
Made rollback a one-click operation available to every team

Key Lessons

Test infrastructure is critical infrastructure. Google’s ability to deploy frequently depends entirely on its ability to test quickly and reliably.
Monorepo and CD are compatible. The common assumption that CD requires microservices with separate repos is false. Google deploys from a monorepo.
Invest in tooling before process. Google built the tooling (build systems, test infrastructure, deployment automation) that made good practices the path of least resistance.

Amazon: Two-Pizza Teams and Ownership

Context

Amazon’s transformation to service-oriented architecture and team ownership is one of the most influential in the industry. The “two-pizza team” model and “you build it, you run it” philosophy directly enabled continuous deployment.

The Challenge

Organizational size: Hundreds of thousands of employees
System complexity: Thousands of services powering amazon.com and AWS
Availability requirements: Even brief outages are front-page news
Pace of innovation: Competitive pressure demands rapid feature delivery

What They Did

Decomposed the system into independently deployable services, each owned by a small team
Gave teams full ownership: build, test, deploy, operate, and support
Built internal deployment tooling (Apollo) that automates canary analysis, rollback, and one-click deployment
Established the practice of deploying every commit that passes the pipeline, with automated rollback on metric degradation

Key Lessons

Ownership drives quality. When the team that writes the code also operates it in production, they write better code and build better monitoring.
Small teams move faster. Two-pizza teams (6-10 people) can make decisions without bureaucratic overhead.
Automation eliminates toil. Amazon’s internal deployment tooling means that deploying is not a skilled activity - any team member can deploy (and the pipeline usually deploys automatically).

HP: CD in Hardware-Adjacent Software

Context

HP’s LaserJet firmware team demonstrated that continuous delivery principles apply even to embedded software, a domain often considered incompatible with frequent deployment.

The Challenge

Embedded software: Firmware that runs on physical printers
Long development cycles: Firmware releases had traditionally been annual
Quality requirements: Firmware bugs require physical recalls or complex update procedures
Team size: Large, distributed teams with varying skill levels

What They Did

Invested in automated testing infrastructure for firmware
Reduced build times from days to under an hour
Moved from annual releases to frequent incremental updates
Implemented continuous integration with automated test suites running on simulator and hardware

Key Lessons

CD principles are universal. Even embedded firmware can benefit from small batches, automated testing, and continuous integration.
Build time is a critical constraint. Reducing build time from days to under an hour unlocked the ability to test frequently, which enabled frequent integration, which enabled frequent delivery.
Results were dramatic: Development costs reduced by approximately 40%, programs delivered on schedule increased by roughly 140%.

Flickr: “10+ Deploys Per Day”

Context

Flickr’s 2009 presentation “10+ Deploys Per Day: Dev and Ops Cooperation” is credited with helping launch the DevOps movement. At a time when most organizations deployed quarterly, Flickr was deploying more than ten times per day.

The Challenge

Web-scale service: Serving billions of photos to millions of users
Ops/Dev divide: Traditional separation between development and operations teams
Fear of change: Deployments were infrequent because they were risky

What They Did

Built automated infrastructure provisioning and deployment
Implemented feature flags to decouple deployment from release
Created a culture of shared responsibility between development and operations
Made deployment a routine, low-ceremony event that anyone could trigger
Used IRC bots (and later chat-based tools) to coordinate and log deployments

Key Lessons

Culture is the enabler. Flickr’s technical practices were important, but the cultural shift - developers and operations working together, shared responsibility, mutual respect - was what made frequent deployment possible.
Tooling should reduce friction. Flickr’s deployment tools were designed to make deploying as easy as possible. The easier it is to deploy, the more often people deploy, and the smaller each deployment becomes.
Transparency builds trust. Logging every deployment in a shared channel let everyone see what was deploying, who deployed it, and whether it caused problems. This transparency built organizational trust in frequent deployment.

VXS: “CD: Superhuman Efforts are the New Normal”

Context

VXS Decision is a startup like thousands of others: founder-led vision, under-funded, time crunch, resource crunch, but when targeting Enterprise customers: How do you deliver reliable, Enterprise-grade software without the resources of an Enterprise? This led to the discovery of the framework of principles and patterns now formulated as “Agentic CD.”

The Challenge

produce demoware or build to use?
fast output leads to structural inconsistency
architectural drift
how and what to document?
keeping the codebase maintainable

What They Did

Experimented with LLM for code generation
Applied rigorous CD practices to the work with AI agents
Mandated additional first-class artifacts in the repo
Standardized the approach of working with AI agents
Crunched Agentic CD pipeline cycles to deliver entire features in hours

Key Lessons

Agents Drift. Documentation on top of the codebases provides containment for inconsistency and duplication.
You need to extend your definition of ‘deliverable’. Code must not merely exist and pass the tests, it must be consistent with documented architecture and descriptions.
First-class artifacts are the true product. These include intent, behaviour, design, and decisions. With these, an LLM can reconstruct the product even without having access to the code itself.
You need a third folder in your repo. Where formally, /src and /test did the entire work, the /docs folder becomes your lifeline.

Agentic CD Additions

Additional practices required for LLM-assisted development:

Intent-first workflow. Anchor the implementation with a proper intent statement: what, why, for whom.
Delta & overlap analysis. Agents can compare new features against the existing system, detect redundancy, conflict, structural drift. The most interesting question becomes: “How does this relate to what we currently do?”
Structured documentation layers. User guides, feature descriptions, architectural decision records (ADRs) and system structure documentation become the glue of your system.
Human In the Loop. Key artifacts can be generated by Agents, but HITL is necessary to capture drift. Intent and decisions are human territory, behaviour and design must be actively guided by humans.
The docs are for the machine, not for humans. Documentation artifacts must be structured to guide Agents in implementation with minimal context windows, not to “read nicely” for humans.
- ASCII art beats photos, illustrations or doodles.
- Short paragraphs, no filler words. Consistent language.
- Optimize documentation to reference paragraphs to the Agents quickly and effectively.
- Cross-reference documents to reduce Agentic search efforts.

Outcomes

Delivery Speed measured in end-to-end cycle time:
- less than 1 hour for small changes and roughly 1 day for a large feature set
- sustained 10x-30x increase in development throughput, consistent over months
Quality: Every feature ships with: documentation, test coverage, linting, security review, architectural consistency, avoiding typical “AI slop” patterns
Operational Confidence boosted by ensuring every change is integrated, validated, reproducible, and deployable from a technical, organizational and product perspective alike.
Team Scalability:
- approach teachable to new joiners within days
- getting the startup out of the “resource pickle.”

Key Lessons

LLMs without CD discipline create entropy: speed without structure degrades system integrity
Agentic CD principles are scale-independent: the same patterns apply in a startup as in an enterprise. The startup even benefits more, because it can scale/pivot within hours.
Agentic development requires additional artifacts: those documents you thought you can skip to speed things up? They become your product!
The bottleneck moves from typing code to maintaining coherence: You will be investing more time keeping your first-class documents correct and consistent than into writing code. Referencing the right document sections becomes your steering panel.

The VXS Journey to Discover Agentic CD

In 2023, early experiments with LLM-generated code looked promising but quickly broke down in practice. The models produced working code, but integration was tedious, structure drifted, and quality was inconsistent. Available tooling accelerated output but also amplified architectural chaos. Attempts to adopt community conventions created additional noise and documentation bloat rather than clarity. The result was a clear pattern: without structure, AI increases speed but destroys coherence.

The breakthrough came from systematically applying Continuous Delivery principles directly to agentic development. Every feature began with an explicit intent, aligned against existing system structure, documented, tested, and only then implemented. Documentation, ADRs, and tests became first-class artifacts in the repository, acting as control surfaces for the AI. With a single pipeline and strict definition of “deployable,” the system stabilized. The outcome was sustained 10x-30x delivery performance with consistent quality. This showed that Continuous Delivery is not dependent on scale or large platform teams - its principles hold even in a startup using agentic development.

Common Patterns Across Reports

Despite the diversity of these organizations, several patterns emerge consistently:

1. Investment in Automation Precedes Cultural Change

Every organization built the tooling first. Automated testing, automated deployment, automated rollback - these created the conditions where frequent deployment was possible. Cultural change followed when people saw that the automation worked.

2. Incremental Adoption, Not Big Bang

No organization switched to continuous deployment overnight. They all moved incrementally: shorter release cycles first, then weekly deploys, then daily, then on-demand. Each step built confidence for the next.

3. Team Ownership Is Essential

Organizations that gave teams ownership of their deployments (build it, run it) moved faster than those that kept deployment as a centralized function. Ownership creates accountability, which drives quality.

4. Feature Flags Are Universal

Every organization in these reports uses feature flags to decouple deployment from release. This is not optional for continuous deployment - it is foundational.

5. The Results Are Consistent

Regardless of industry, size, or starting point, organizations that adopt continuous deployment consistently report:

Higher deployment frequency (daily or more)
Lower change failure rate (small changes fail less)
Faster recovery (automated rollback, small blast radius)
Higher developer satisfaction (less toil, more impact)
Better business outcomes (faster time to market, reduced costs)

Applying These Lessons to Your Migration

You do not need to be Google-sized to benefit from these patterns. Extract what applies:

Start with automation. Build the pipeline, the tests, the rollback mechanism.
Adopt incrementally. Move from monthly to weekly to daily. Do not try to jump to 10 deploys per day on day one.
Give teams ownership. Let teams deploy their own services.
Use feature flags. Decouple deployment from release.
Measure and improve. Track DORA metrics. Run experiments. Use retrospectives.

These are the practices covered throughout this migration guide. The experience reports confirm that they work - not in theory, but in production, at scale, in the real world.

Additional Experience Reports

These reports did not fit neatly into the case studies above but provide valuable perspectives:

Ken Mugrage on trunk-based development as part of modern Continuous Delivery - A practitioner’s view of how TBD enables CD in practice
Integrating Security Feedback into a BDD-Driven Minimum CD Pipeline - A detailed walk-through of building a CD pipeline with security testing integrated from the start

Phase 4: Deliver on Demand

What You’ll Do

Continuous Delivery vs. Continuous Deployment

Why This Phase Matters

Signs You’ve Arrived

Related Content

1 - Deploy on Demand

Continuous Delivery vs. Continuous Deployment

Why Remove the Last Gate?

The Problems with Manual Gates

The Paradox of Manual Safety

Prerequisites for Deploy on Demand

Non-Negotiable Prerequisites

Assessment: Are You Ready?

The Transition: Three Approaches

Approach 1: Shadow Mode

Approach 2: Opt-In per Team

Approach 3: Direct Switchover

Addressing Organizational Concerns

“What about change management / ITIL?”

“What about compliance and audit?”

“What about database migrations?”

“What if we deploy a breaking change?”

After the Transition

What Changes for the Team

What Stays the Same

The First Week

Key Pitfalls

1. “We adopted continuous deployment but kept the approval step ‘just in case’”

2. “Our deploy cadence didn’t actually increase”

3. “We have continuous deployment for the application but not the database/infrastructure”

Measuring Success

Next Step

Related Content

2 - Progressive Rollout

Why Progressive Rollout?

The Three Strategies

Strategy 1: Canary Deployment

Strategy 2: Blue-Green Deployment

Strategy 3: Percentage-Based Rollout

Choosing the Right Strategy

Automated Rollback

Metrics to Monitor

Automated Rollback Flow

Implementation Tools

Implementing Progressive Rollout

Step 1: Choose Your First Strategy

Step 2: Define Rollback Criteria

Step 3: Run a Manual Progressive Rollout

Step 4: Automate the Rollout

Key Pitfalls

1. “Our canary doesn’t get enough traffic for meaningful metrics”

2. “We have progressive rollout but rollback is still manual”

3. “We treat progressive rollout as a replacement for testing”

4. “Our rollout takes days because we’re too cautious”

Measuring Success

Next Step

Related Content

3 - Experience Reports

Why Experience Reports Matter

Walmart: CD at Retail Scale

Context

The Challenge

What They Did

Key Lessons

Microsoft: From Waterfall to Daily Deploys

Context

The Challenge

What They Did

Key Lessons

Google: Engineering Productivity at Scale

Context

The Challenge

What They Did

Key Lessons

Amazon: Two-Pizza Teams and Ownership

Context

The Challenge

What They Did

Key Lessons