Phase 4: Deliver on Demand
The capability to deploy any change to production at any time, using the delivery strategy that fits your context.
Key question: “Can we deliver any change to production when the business needs it?”
This is the destination: you can deploy any change that passes the pipeline to production
whenever you choose. Some teams will auto-deploy every commit (continuous deployment). Others
will deploy on demand when the business is ready. Both are valid - the capability is what
matters, not the trigger.
What You’ll Do
- Deploy on demand - Remove the last manual gates so any green build can reach production
- Use progressive rollout - Canary, blue-green, and percentage-based deployments
- Explore ACD - AI-assisted continuous delivery patterns
- Learn from experience reports - How other teams made the journey
Continuous Delivery vs. Continuous Deployment
These terms are often confused. The distinction matters for this phase:
- Continuous delivery means every commit that passes the pipeline could be deployed to
production at any time. The capability exists. A human or business process decides when.
- Continuous deployment means every commit that passes the pipeline is deployed to
production automatically. No human decision is involved.
Continuous delivery is the goal of this migration guide. Continuous deployment is one delivery
strategy that works well for certain contexts - SaaS products, internal tools, services behind
feature flags. It is not a higher level of maturity. A team that deploys on demand with a
one-click deploy is just as capable as a team that auto-deploys every commit.
Why This Phase Matters
When your foundations are solid, your pipeline is reliable, and your batch sizes are small,
deploying any change becomes low-risk. The remaining barriers are organizational, not
technical: approval processes, change windows, release coordination. This phase addresses those
barriers so the team has the option to deploy whenever the business needs it.
Signs You’ve Arrived
- Any commit that passes the pipeline can reach production within minutes
- The team deploys frequently (daily or more) with no drama
- Mean time to recovery is measured in minutes
- The team has confidence that any deployment can be safely rolled back
- New team members can deploy on their first day
- The deployment strategy (on-demand or automatic) is a team choice, not a constraint
Related Content
1 - Deploy on Demand
Remove the last manual gates and deploy every change that passes the pipeline.
Phase 4 - Deliver on Demand | Original content
Deploy on demand means that any change which passes the full automated pipeline can reach production without waiting for a human to press a button, open a ticket, or schedule a window. This page covers the prerequisites, the transition from continuous delivery to continuous deployment, and how to address the organizational concerns that are the real barriers.
Continuous Delivery vs. Continuous Deployment
These two terms are often confused. The distinction matters:
- Continuous Delivery: Every commit that passes the pipeline could be deployed to production. A human decides when to deploy.
- Continuous Deployment: Every commit that passes the pipeline is deployed to production. No human decision is required.
If you have completed Phases 1-3 of this migration, you have continuous delivery. This page is about removing that last manual decision and moving to continuous deployment.
Why Remove the Last Gate?
The manual deployment decision feels safe. It gives someone a chance to “eyeball” the change before it goes to production. In practice, it does the opposite.
The Problems with Manual Gates
| Problem | Why It Happens | Impact |
|---|
| Batching | If deploys are manual, teams batch changes to reduce the number of deploy events | Larger batches increase risk and make rollback harder |
| Delay | Changes wait for someone to approve, which may take hours or days | Longer lead time, delayed feedback |
| False confidence | The approver cannot meaningfully review what the automated pipeline already tested | The gate provides the illusion of safety without actual safety |
| Bottleneck | One person or team becomes the deploy gatekeeper | Creates a single point of failure for the entire delivery flow |
| Deploy fear | Infrequent deploys mean each deploy is higher stakes | Teams become more cautious, batches get larger, risk increases |
The Paradox of Manual Safety
The more you rely on manual deployment gates, the less safe your deployments become. This is because manual gates lead to batching, batching increases risk, and increased risk justifies more manual gates. It is a vicious cycle.
Continuous deployment breaks this cycle. Small, frequent, automated deployments are individually low-risk. If one fails, the blast radius is small and recovery is fast.
Prerequisites for Deploy on Demand
Before removing manual gates, verify that these conditions are met. Each one is covered in earlier phases of this migration.
Non-Negotiable Prerequisites
Assessment: Are You Ready?
Answer these questions honestly:
- When was the last time your pipeline caught a real bug? If the answer is “I don’t remember,” your test suite may not be trustworthy enough.
- How long does a rollback take? If the answer is more than 15 minutes, automate it first.
- Do deploys ever fail for non-code reasons? (Environment issues, credential problems, network flakiness.) If yes, stabilize your pipeline first.
- Does the team trust the pipeline? If team members regularly say “let me check one more thing before we deploy,” trust is not there yet. Build it through retrospectives and transparent metrics.
The Transition: Three Approaches
Approach 1: Shadow Mode
Run continuous deployment alongside manual deployment. Every change that passes the pipeline is automatically deployed to a shadow production environment (or a canary group). A human still approves the “real” production deployment.
Duration: 2-4 weeks.
What you learn: How often the automated deployment would have been correct. If the answer is “every time” (or close to it), the manual gate is not adding value.
Transition: Once the team sees that the shadow deployments are consistently safe, remove the manual gate.
Approach 2: Opt-In per Team
Allow individual teams to adopt continuous deployment while others continue with manual gates. This works well in organizations with multiple teams at different maturity levels.
Duration: Ongoing. Teams opt in when they are ready.
What you learn: Which teams are ready and which need more foundation work. Early adopters demonstrate the pattern for the rest of the organization.
Transition: As more teams succeed, continuous deployment becomes the default. Remaining teams are supported in reaching readiness.
Approach 3: Direct Switchover
Remove the manual gate for all teams at once. This is appropriate when the organization has high confidence in its pipeline and all teams have completed Phases 1-3.
Duration: Immediate.
What you learn: Quickly reveals any hidden dependencies on the manual gate (e.g., deploy coordination between teams, configuration changes that ride along with deployments).
Transition: Be prepared to temporarily revert if unforeseen issues arise. Have a clear rollback plan for the process change itself.
Addressing Organizational Concerns
The technical prerequisites are usually met before the organizational ones. These are the conversations you will need to have.
“What about change management / ITIL?”
Change management frameworks like ITIL define a “standard change” category: a pre-approved, low-risk, well-understood change that does not require a Change Advisory Board (CAB) review. Continuous deployment changes qualify as standard changes because they are:
- Small (one to a few commits)
- Automated (same pipeline every time)
- Reversible (automated rollback)
- Well-tested (comprehensive automated tests)
Work with your change management team to classify pipeline-passing deployments as standard changes. This preserves the governance framework while removing the bottleneck.
“What about compliance and audit?”
Continuous deployment does not eliminate audit trails - it strengthens them. Every deployment is:
- Traceable: Tied to a specific commit, which is tied to a specific story or ticket
- Reproducible: The same pipeline produces the same result every time
- Recorded: Pipeline logs capture every test that passed, every approval that was automated
- Reversible: Rollback history shows when and why a deployment was reverted
Provide auditors with access to pipeline logs, deployment history, and the automated test suite. This is a more complete audit trail than a manual approval signature.
“What about database migrations?”
Database migrations require special care in continuous deployment because they cannot be rolled back as easily as code changes.
Rules for database migrations in CD:
- Migrations must be backward-compatible. The previous version of the code must work with the new schema.
- Use expand/contract pattern. First deploy the new column/table (expand). Then deploy the code that uses it. Then remove the old column/table (contract). Each step is a separate deployment.
- Never drop a column in the same deployment that stops using it. There is always a window where both old and new code run simultaneously.
- Test migrations in production-like environments before they reach production.
“What if we deploy a breaking change?”
This is why you have automated rollback and observability. The sequence is:
- Deployment happens automatically
- Monitoring detects an issue (error rate spike, latency increase, health check failure)
- Automated rollback triggers (or on-call engineer triggers manual rollback)
- The team investigates and fixes the issue
- The fix goes through the pipeline and deploys automatically
The key insight: this sequence takes minutes with continuous deployment. With manual deployment on a weekly schedule, the same breaking change would take days to detect and fix.
After the Transition
What Changes for the Team
| Before | After |
|---|
| “Are we deploying today?” | Deploys happen automatically, all the time |
| “Who’s doing the deploy?” | Nobody - the pipeline does it |
| “Can I get this into the next release?” | Every merge to trunk is the next release |
| “We need to coordinate the deploy with team X” | Teams deploy independently |
| “Let’s wait for the deploy window” | There are no deploy windows |
What Stays the Same
- Code review still happens (before merge to trunk)
- Automated tests still run (in the pipeline)
- Feature flags still control feature visibility (decoupling deploy from release)
- Monitoring still catches issues (but now recovery is faster)
- The team still owns its deployments (but the manual step is gone)
The First Week
The first week of continuous deployment will feel uncomfortable. This is normal. The team will instinctively want to “check” deployments that happen automatically. Resist the urge to add manual checks back. Instead:
- Watch the monitoring dashboards more closely than usual
- Have the team discuss each automatic deployment in standup for the first week
- Celebrate the first deployment that goes out without anyone noticing - that is the goal
Key Pitfalls
1. “We adopted continuous deployment but kept the approval step ‘just in case’”
If the approval step exists, it will be used, and you have not actually adopted continuous deployment. Remove the gate completely. If something goes wrong, use rollback - do not use a pre-deployment gate.
2. “Our deploy cadence didn’t actually increase”
Continuous deployment only increases deploy frequency if the team is integrating to trunk frequently. If the team still merges weekly, they will deploy weekly - automatically, but still weekly. Revisit Trunk-Based Development and Small Batches.
3. “We have continuous deployment for the application but not the database/infrastructure”
Partial continuous deployment creates a split experience: application changes flow freely but infrastructure changes still require manual coordination. Extend the pipeline to cover infrastructure as code, database migrations, and configuration changes.
Measuring Success
| Metric | Target | Why It Matters |
|---|
| Deployment frequency | Multiple per day | Confirms the pipeline is deploying every change |
| Lead time | < 1 hour from commit to production | Confirms no manual gates are adding delay |
| Manual interventions per deploy | Zero | Confirms the process is fully automated |
| Change failure rate | Stable or improving | Confirms automation is not introducing new failures |
| MTTR | < 15 minutes | Confirms automated rollback is working |
Next Step
Continuous deployment deploys every change, but not every change needs to go to every user at once. Progressive Rollout strategies let you control who sees a change and how quickly it spreads.
Related Content
- Infrequent Releases - the primary symptom that deploy on demand resolves
- Merge Freeze - a symptom caused by manual deployment gates that disappears with continuous deployment
- Fear of Deploying - a cultural symptom that fades as automated deployments become routine
- CAB Gates - an organizational anti-pattern that this guide addresses through standard change classification
- Manual Deployments - the pipeline anti-pattern that deploy on demand eliminates
- Deployment Frequency - the key metric for measuring deploy-on-demand adoption
2 - Progressive Rollout
Use canary, blue-green, and percentage-based deployments to reduce deployment risk.
Phase 4 - Deliver on Demand | Original content
Progressive rollout strategies let you deploy to production without deploying to all users simultaneously. By exposing changes to a small group first and expanding gradually, you catch problems before they affect your entire user base. This page covers the three major strategies, when to use each, and how to implement automated rollback.
Why Progressive Rollout?
Even with comprehensive tests, production-like environments, and small batch sizes, some issues only surface under real production traffic. Progressive rollout is the final safety layer: it limits the blast radius of any deployment by exposing the change to a small audience first.
This is not a replacement for testing. It is an addition. Your automated tests should catch the vast majority of issues. Progressive rollout catches the rest - the issues that depend on real user behavior, real data volumes, or real infrastructure conditions that cannot be fully replicated in test environments.
The Three Strategies
Strategy 1: Canary Deployment
A canary deployment routes a small percentage of production traffic to the new version while the majority continues to hit the old version. If the canary shows no problems, traffic is gradually shifted.
How it works:
- Deploy the new version alongside the old version
- Route 1-5% of traffic to the new version
- Compare key metrics (error rate, latency, business metrics) between canary and stable
- If metrics are healthy, increase traffic to 25%, 50%, 100%
- If metrics degrade, route all traffic back to the old version
When to use canary:
- Changes that affect request handling (API changes, performance optimizations)
- Changes where you want to compare metrics between old and new versions
- Services with high traffic volume (you need enough canary traffic for statistical significance)
When canary is not ideal:
- Changes that affect batch processing or background jobs (no “traffic” to route)
- Very low traffic services (the canary may not get enough traffic to detect issues)
- Database schema changes (both versions must work with the same schema)
Implementation options:
| Infrastructure | How to Route Traffic |
|---|
| Kubernetes + service mesh (Istio, Linkerd) | Weighted routing rules in VirtualService |
| Load balancer (ALB, NGINX) | Weighted target groups |
| CDN (CloudFront, Fastly) | Origin routing rules |
| Application-level | Feature flag with percentage rollout |
Strategy 2: Blue-Green Deployment
Blue-green deployment maintains two identical production environments. At any time, one (blue) serves live traffic and the other (green) is idle or staging.
How it works:
- Deploy the new version to the idle environment (green)
- Run smoke tests against green to verify basic functionality
- Switch the router/load balancer to point all traffic at green
- Keep blue running as an instant rollback target
- After a stability period, repurpose blue for the next deployment
When to use blue-green:
- You need instant, complete rollback (switch the router back)
- You want to test the deployment in a full production environment before routing traffic
- Your infrastructure supports running two parallel environments cost-effectively
When blue-green is not ideal:
- Stateful applications where both environments share mutable state
- Database migrations (the new version’s schema must work for both environments during transition)
- Cost-sensitive environments (maintaining two full production environments doubles infrastructure cost)
Rollback speed: Seconds. Switching the router back is the fastest rollback mechanism available.
Strategy 3: Percentage-Based Rollout
Percentage-based rollout gradually increases the number of users who see the new version. Unlike canary (which is traffic-based), percentage rollout is typically user-based - a specific user always sees the same version during the rollout period.
How it works:
- Enable the new version for a small percentage of users (using feature flags or infrastructure routing)
- Monitor metrics for the affected group
- Gradually increase the percentage over hours or days
- At any point, reduce the percentage back to 0% if issues are detected
When to use percentage rollout:
- User-facing feature changes where you want consistent user experience (a user always sees v1 or v2, not a random mix)
- Changes that benefit from A/B testing data (compare user behavior between groups)
- Long-running rollouts where you want to collect business metrics before full exposure
When percentage rollout is not ideal:
- Backend infrastructure changes with no user-visible impact
- Changes that affect all users equally (e.g., API response format changes)
Implementation: Percentage rollout is typically implemented through Feature Flags (Level 2 or Level 3), using the user ID as the hash key to ensure consistent assignment.
Choosing the Right Strategy
| Factor | Canary | Blue-Green | Percentage |
|---|
| Rollback speed | Seconds (reroute traffic) | Seconds (switch environments) | Seconds (disable flag) |
| Infrastructure cost | Low (runs alongside existing) | High (two full environments) | Low (same infrastructure) |
| Metric comparison | Strong (side-by-side comparison) | Weak (before/after only) | Strong (group comparison) |
| User consistency | No (each request may hit different version) | Yes (all users see same version) | Yes (each user sees consistent version) |
| Complexity | Moderate | Moderate | Low (if you have feature flags) |
| Best for | API changes, performance changes | Full environment validation | User-facing features |
Many teams use more than one strategy. A common pattern:
- Blue-green for infrastructure and platform changes
- Canary for service-level changes
- Percentage rollout for user-facing feature changes
Automated Rollback
Progressive rollout is only effective if rollback is automated. A human noticing a problem at 3 AM is not a reliable rollback mechanism.
Metrics to Monitor
Define automated rollback triggers before deploying. Common triggers:
| Metric | Trigger Condition | Example |
|---|
| Error rate | Canary error rate > 2x stable error rate | Stable: 0.1%, Canary: 0.3% -> rollback |
| Latency (p99) | Canary p99 > 1.5x stable p99 | Stable: 200ms, Canary: 400ms -> rollback |
| Health check | Any health check failure | HTTP 500 on /health -> rollback |
| Business metric | Conversion rate drops > 5% for canary group | 10% conversion -> 4% conversion -> rollback |
| Saturation | CPU or memory exceeds threshold | CPU > 90% for 5 minutes -> rollback |
Automated Rollback Flow
| Tool | How It Helps |
|---|
| Argo Rollouts | Kubernetes-native progressive delivery with automated analysis and rollback |
| Flagger | Progressive delivery operator for Kubernetes with Istio, Linkerd, or App Mesh |
| Spinnaker | Multi-cloud deployment platform with canary analysis |
| Custom scripts | Query your metrics system, compare thresholds, trigger rollback via API |
The specific tool matters less than the principle: define rollback criteria before deploying, monitor automatically, and roll back without human intervention.
Implementing Progressive Rollout
Step 1: Choose Your First Strategy
Pick the strategy that matches your infrastructure:
- If you already have feature flags: start with percentage-based rollout
- If you have Kubernetes with a service mesh: start with canary
- If you have parallel environments: start with blue-green
Step 2: Define Rollback Criteria
Before your first progressive deployment:
- Identify the 3-5 metrics that define “healthy” for your service
- Define numerical thresholds for each metric
- Define the monitoring window (how long to wait before advancing)
- Document the rollback procedure (even if automated, document it for human understanding)
Step 3: Run a Manual Progressive Rollout
Before automating, run the process manually:
- Deploy to a canary or small percentage
- A team member monitors the dashboard for the defined window
- The team member decides to advance or rollback
- Document what they checked and how they decided
This manual practice builds understanding of what the automation will do.
Step 4: Automate the Rollout
Replace the manual monitoring with automated checks:
- Implement metric queries that check your rollback criteria
- Implement automated traffic shifting (advance or rollback based on metrics)
- Implement alerting so the team knows when a rollback occurs
- Test the automation by intentionally deploying a known-bad change (in a controlled way)
Key Pitfalls
1. “Our canary doesn’t get enough traffic for meaningful metrics”
If your service handles 100 requests per hour, a 5% canary gets 5 requests per hour - not enough to detect problems statistically. Solutions: use a higher canary percentage (25-50%), use longer monitoring windows, or use blue-green instead (which does not require traffic splitting).
2. “We have progressive rollout but rollback is still manual”
Progressive rollout without automated rollback is half a solution. If the canary shows problems at 2 AM and nobody is watching, the damage occurs before anyone responds. Automated rollback is the essential companion to progressive rollout.
3. “We treat progressive rollout as a replacement for testing”
Progressive rollout is the last line of defense, not the first. If you are regularly catching bugs in canary that your test suite should have caught, your test suite needs improvement. Progressive rollout should catch rare, production-specific issues - not common bugs.
4. “Our rollout takes days because we’re too cautious”
A rollout that takes a week negates the benefits of continuous deployment. If your confidence in the pipeline is low enough to require a week-long rollout, the issue is pipeline quality, not rollout speed. Address the root cause through better testing and more production-like environments.
Measuring Success
| Metric | Target | Why It Matters |
|---|
| Automated rollbacks per month | Low and stable | Confirms the pipeline catches most issues before production |
| Time from deploy to full rollout | Hours, not days | Confirms the team has confidence in the process |
| Incidents caught by progressive rollout | Tracked (any number) | Confirms the progressive rollout is providing value |
| Manual interventions during rollout | Zero | Confirms the process is fully automated |
Next Step
With deploy on demand and progressive rollout, your technical deployment infrastructure is complete. ACD explores how AI-assisted patterns can extend these practices further.
Related Content
3 - Experience Reports
Real-world stories from teams that have made the journey to continuous deployment.
Phase 4 - Deliver on Demand
Theory is necessary but insufficient. This page collects experience reports from organizations that have adopted continuous deployment at scale, including the challenges they faced, the approaches they took, and the results they achieved. These reports demonstrate that CD is not limited to startups or greenfield projects - it works in large, complex, regulated environments.
Why Experience Reports Matter
Every team considering continuous deployment faces the same objection: “That works for [Google / Netflix / small startups], but our situation is different.” Experience reports counter this objection with evidence. They show that organizations of every size, in every industry, with every kind of legacy system, have found a path to continuous deployment.
No experience report will match your situation exactly. That is not the point. The point is to extract patterns: what obstacles did these teams encounter, and how did they overcome them?
Walmart: CD at Retail Scale
Context
Walmart operates one of the world’s largest e-commerce platforms alongside its massive physical retail infrastructure. Changes to the platform affect millions of transactions per day. The organization had a traditional release process with weekly deployment windows and multi-stage manual approval.
The Challenge
- Scale: Thousands of developers across hundreds of teams
- Risk tolerance: Any outage affects revenue in real time
- Legacy: Decades of existing systems with deep interdependencies
- Regulation: PCI compliance requirements for payment processing
What They Did
- Invested in a centralized deployment platform (OneOps, later Concord) that standardized the deployment pipeline across all teams
- Broke the monolithic release into independent service deployments
- Implemented automated canary analysis for every deployment
- Moved from weekly release trains to on-demand deployment per team
Key Lessons
- Platform investment pays off. Building a shared deployment platform let hundreds of teams adopt CD without each team solving the same infrastructure problems.
- Compliance and CD are compatible. Automated pipelines with full audit trails satisfied PCI requirements more reliably than manual approval processes.
- Cultural change is harder than technical change. Teams that had operated on weekly release cycles for years needed coaching and support to trust automated deployment.
Microsoft: From Waterfall to Daily Deploys
Context
Microsoft’s Azure DevOps (formerly Visual Studio Team Services) team made a widely documented transformation from 3-year waterfall releases to deploying multiple times per day. This transformation happened within one of the largest software organizations in the world.
The Challenge
- History: Decades of waterfall development culture
- Product complexity: A platform used by millions of developers
- Organizational size: Thousands of engineers across multiple time zones
- Customer expectations: Enterprise customers expected stability and predictability
What They Did
- Broke the product into independently deployable services (ring-based deployment)
- Implemented a ring-based rollout: Ring 0 (team), Ring 1 (internal Microsoft users), Ring 2 (select external users), Ring 3 (all users)
- Invested heavily in automated testing, achieving thousands of tests running in minutes
- Moved from a fixed release cadence to continuous deployment with feature flags controlling release
- Used telemetry to detect issues in real-time and automated rollback when metrics degraded
Key Lessons
- Ring-based deployment is progressive rollout. Microsoft’s ring model is an implementation of the progressive rollout strategies described in this guide.
- Feature flags enabled decoupling. By deploying frequently but releasing features incrementally via flags, the team could deploy without worrying about feature completeness.
- The transformation took years, not months. Moving from 3-year cycles to daily deployment was a multi-year journey with incremental progress at each step.
Google: Engineering Productivity at Scale
Context
Google is often cited as the canonical example of continuous deployment, deploying changes to production thousands of times per day across its vast service portfolio.
The Challenge
- Scale: Billions of users, millions of servers
- Monorepo: Most of Google operates from a single repository with billions of lines of code
- Interdependencies: Changes in shared libraries can affect thousands of services
- Velocity: Thousands of engineers committing changes every day
What They Did
- Built a culture of automated testing where tests are a first-class deliverable, not an afterthought
- Implemented a submit queue that runs automated tests on every change before it merges to the trunk
- Invested in build infrastructure (Blaze/Bazel) that can build and test only the affected portions of the codebase
- Used percentage-based rollout for user-facing changes
- Made rollback a one-click operation available to every team
Key Lessons
- Test infrastructure is critical infrastructure. Google’s ability to deploy frequently depends entirely on its ability to test quickly and reliably.
- Monorepo and CD are compatible. The common assumption that CD requires microservices with separate repos is false. Google deploys from a monorepo.
- Invest in tooling before process. Google built the tooling (build systems, test infrastructure, deployment automation) that made good practices the path of least resistance.
Amazon: Two-Pizza Teams and Ownership
Context
Amazon’s transformation to service-oriented architecture and team ownership is one of the most influential in the industry. The “two-pizza team” model and “you build it, you run it” philosophy directly enabled continuous deployment.
The Challenge
- Organizational size: Hundreds of thousands of employees
- System complexity: Thousands of services powering amazon.com and AWS
- Availability requirements: Even brief outages are front-page news
- Pace of innovation: Competitive pressure demands rapid feature delivery
What They Did
- Decomposed the system into independently deployable services, each owned by a small team
- Gave teams full ownership: build, test, deploy, operate, and support
- Built internal deployment tooling (Apollo) that automates canary analysis, rollback, and one-click deployment
- Established the practice of deploying every commit that passes the pipeline, with automated rollback on metric degradation
Key Lessons
- Ownership drives quality. When the team that writes the code also operates it in production, they write better code and build better monitoring.
- Small teams move faster. Two-pizza teams (6-10 people) can make decisions without bureaucratic overhead.
- Automation eliminates toil. Amazon’s internal deployment tooling means that deploying is not a skilled activity - any team member can deploy (and the pipeline usually deploys automatically).
HP: CD in Hardware-Adjacent Software
Context
HP’s LaserJet firmware team demonstrated that continuous delivery principles apply even to embedded software, a domain often considered incompatible with frequent deployment.
The Challenge
- Embedded software: Firmware that runs on physical printers
- Long development cycles: Firmware releases had traditionally been annual
- Quality requirements: Firmware bugs require physical recalls or complex update procedures
- Team size: Large, distributed teams with varying skill levels
What They Did
- Invested in automated testing infrastructure for firmware
- Reduced build times from days to under an hour
- Moved from annual releases to frequent incremental updates
- Implemented continuous integration with automated test suites running on simulator and hardware
Key Lessons
- CD principles are universal. Even embedded firmware can benefit from small batches, automated testing, and continuous integration.
- Build time is a critical constraint. Reducing build time from days to under an hour unlocked the ability to test frequently, which enabled frequent integration, which enabled frequent delivery.
- Results were dramatic: Development costs reduced by approximately 40%, programs delivered on schedule increased by roughly 140%.
Flickr: “10+ Deploys Per Day”
Context
Flickr’s 2009 presentation “10+ Deploys Per Day: Dev and Ops Cooperation” is credited with helping launch the DevOps movement. At a time when most organizations deployed quarterly, Flickr was deploying more than ten times per day.
The Challenge
- Web-scale service: Serving billions of photos to millions of users
- Ops/Dev divide: Traditional separation between development and operations teams
- Fear of change: Deployments were infrequent because they were risky
What They Did
- Built automated infrastructure provisioning and deployment
- Implemented feature flags to decouple deployment from release
- Created a culture of shared responsibility between development and operations
- Made deployment a routine, low-ceremony event that anyone could trigger
- Used IRC bots (and later chat-based tools) to coordinate and log deployments
Key Lessons
- Culture is the enabler. Flickr’s technical practices were important, but the cultural shift - developers and operations working together, shared responsibility, mutual respect - was what made frequent deployment possible.
- Tooling should reduce friction. Flickr’s deployment tools were designed to make deploying as easy as possible. The easier it is to deploy, the more often people deploy, and the smaller each deployment becomes.
- Transparency builds trust. Logging every deployment in a shared channel let everyone see what was deploying, who deployed it, and whether it caused problems. This transparency built organizational trust in frequent deployment.
VXS: “CD: Superhuman Efforts are the New Normal”
Context
VXS Decision is a startup like thousands of others: founder-led vision, under-funded, time crunch, resource crunch, but when targeting Enterprise customers: How do you deliver reliable, Enterprise-grade software without the resources of an Enterprise?
This led to the discovery of the framework of principles and patterns now formulated as “Agentic CD.”
The Challenge
- produce demoware or build to use?
- fast output leads to structural inconsistency
- architectural drift
- how and what to document?
- keeping the codebase maintainable
What They Did
- Experimented with LLM for code generation
- Applied rigorous CD practices to the work with AI agents
- Mandated additional first-class artifacts in the repo
- Standardized the approach of working with AI agents
- Crunched Agentic CD pipeline cycles to deliver entire features in hours
Key Lessons
- Agents Drift. Documentation on top of the codebases provides containment for inconsistency and duplication.
- You need to extend your definition of ‘deliverable’. Code must not merely exist and pass the tests, it must be consistent with documented architecture and descriptions.
- First-class artifacts are the true product. These include intent, behaviour, design, and decisions. With these, an LLM can reconstruct the product even without having access to the code itself.
- You need a third folder in your repo. Where formally, /src and /test did the entire work, the /docs folder becomes your lifeline.
Agentic CD Additions
Additional practices required for LLM-assisted development:
- Intent-first workflow. Anchor the implementation with a proper intent statement: what, why, for whom.
- Delta & overlap analysis. Agents can compare new features against the existing system, detect redundancy, conflict, structural drift. The most interesting question becomes: “How does this relate to what we currently do?”
- Structured documentation layers. User guides, feature descriptions, architectural decision records (ADRs) and system structure documentation become the glue of your system.
- Human In the Loop. Key artifacts can be generated by Agents, but HITL is necessary to capture drift. Intent and decisions are human territory, behaviour and design must be actively guided by humans.
- The docs are for the machine, not for humans. Documentation artifacts must be structured to guide Agents in implementation with minimal context windows, not to “read nicely” for humans.
- ASCII art beats photos, illustrations or doodles.
- Short paragraphs, no filler words. Consistent language.
- Optimize documentation to reference paragraphs to the Agents quickly and effectively.
- Cross-reference documents to reduce Agentic search efforts.
Outcomes
- Delivery Speed measured in end-to-end cycle time:
- less than 1 hour for small changes and roughly 1 day for a large feature set
- sustained 10x-30x increase in development throughput, consistent over months
- Quality: Every feature ships with: documentation, test coverage, linting, security review, architectural consistency, avoiding typical “AI slop” patterns
- Operational Confidence boosted by ensuring every change is integrated, validated, reproducible, and deployable from a technical, organizational and product perspective alike.
- Team Scalability:
- approach teachable to new joiners within days
- getting the startup out of the “resource pickle.”
Key Lessons
- LLMs without CD discipline create entropy: speed without structure degrades system integrity
- Agentic CD principles are scale-independent: the same patterns apply in a startup as in an enterprise. The startup even benefits more, because it can scale/pivot within hours.
- Agentic development requires additional artifacts: those documents you thought you can skip to speed things up? They become your product!
- The bottleneck moves from typing code to maintaining coherence: You will be investing more time keeping your first-class documents correct and consistent than into writing code. Referencing the right document sections becomes your steering panel.
The VXS Journey to Discover Agentic CD
In 2023, early experiments with LLM-generated code looked promising but quickly broke down in practice. The models produced working code, but integration was tedious, structure drifted, and quality was inconsistent. Available tooling accelerated output but also amplified architectural chaos. Attempts to adopt community conventions created additional noise and documentation bloat rather than clarity. The result was a clear pattern: without structure, AI increases speed but destroys coherence.
The breakthrough came from systematically applying Continuous Delivery principles directly to agentic development. Every feature began with an explicit intent, aligned against existing system structure, documented, tested, and only then implemented. Documentation, ADRs, and tests became first-class artifacts in the repository, acting as control surfaces for the AI. With a single pipeline and strict definition of “deployable,” the system stabilized. The outcome was sustained 10x-30x delivery performance with consistent quality. This showed that Continuous Delivery is not dependent on scale or large platform teams - its principles hold even in a startup using agentic development.
Common Patterns Across Reports
Despite the diversity of these organizations, several patterns emerge consistently:
1. Investment in Automation Precedes Cultural Change
Every organization built the tooling first. Automated testing, automated deployment, automated rollback - these created the conditions where frequent deployment was possible. Cultural change followed when people saw that the automation worked.
2. Incremental Adoption, Not Big Bang
No organization switched to continuous deployment overnight. They all moved incrementally: shorter release cycles first, then weekly deploys, then daily, then on-demand. Each step built confidence for the next.
3. Team Ownership Is Essential
Organizations that gave teams ownership of their deployments (build it, run it) moved faster than those that kept deployment as a centralized function. Ownership creates accountability, which drives quality.
4. Feature Flags Are Universal
Every organization in these reports uses feature flags to decouple deployment from release. This is not optional for continuous deployment - it is foundational.
5. The Results Are Consistent
Regardless of industry, size, or starting point, organizations that adopt continuous deployment consistently report:
- Higher deployment frequency (daily or more)
- Lower change failure rate (small changes fail less)
- Faster recovery (automated rollback, small blast radius)
- Higher developer satisfaction (less toil, more impact)
- Better business outcomes (faster time to market, reduced costs)
Applying These Lessons to Your Migration
You do not need to be Google-sized to benefit from these patterns. Extract what applies:
- Start with automation. Build the pipeline, the tests, the rollback mechanism.
- Adopt incrementally. Move from monthly to weekly to daily. Do not try to jump to 10 deploys per day on day one.
- Give teams ownership. Let teams deploy their own services.
- Use feature flags. Decouple deployment from release.
- Measure and improve. Track DORA metrics. Run experiments. Use retrospectives.
These are the practices covered throughout this migration guide. The experience reports confirm that they work - not in theory, but in production, at scale, in the real world.
Additional Experience Reports
These reports did not fit neatly into the case studies above but provide valuable perspectives:
Further Reading
For additional case studies, see:
- Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim - The research behind DORA metrics, with extensive case study data
- Continuous Delivery by Jez Humble and David Farley - The foundational text, with detailed examples from multiple organizations
- The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, and John Willis - Case studies from organizations across industries
Related Content
- Retrospectives - the practice of learning from experience that these reports exemplify at an industry scale
- Metrics-Driven Improvement - the approach every experience report team used to guide their CD adoption
- Feature Flags - a universal pattern across all experience reports for decoupling deployment from release
- Progressive Rollout - the rollout strategies (canary, ring-based, percentage) described in the Microsoft and Google reports
- DORA Recommended Practices - the research-backed capabilities that these experience reports validate in practice
- Coordinated Deployments - a symptom every organization in these reports eliminated through independent service deployment