This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Reference

Practice definitions, metrics, testing guidance, glossary, and other reference material.

1: Pipeline Reference Architecture

1.1: Single Team, Single Deployable
1.2: Multiple Teams, Single Deployable
1.3: Independent Teams, Independent Deployables

2: Systemic Defect Fixes

2.1: Product & Discovery Defects
2.2: Integration & Boundaries Defects
2.3: Knowledge & Communication Defects
2.4: Change & Complexity Defects
2.5: Testing & Observability Gap Defects
2.6: Process & Deployment Defects
2.7: Data & State Defects
2.8: Dependency & Infrastructure Defects
2.9: Security & Compliance Defects
2.10: Performance & Resilience Defects

3: CD Practices

3.1: Continuous Integration
3.2: Trunk-Based Development
3.3: Single Path to Production
3.4: Deterministic Pipeline
3.5: Definition of Deployable
3.6: Immutable Artifacts
3.7: Production-Like Environments
3.8: Rollback
3.9: Application Configuration

4: Metrics

4.1: Integration Frequency
4.2: Build Duration
4.3: Development Cycle Time
4.4: Lead Time
4.5: Change Fail Rate
4.6: Mean Time to Repair
4.7: Release Frequency
4.8: Work in Progress

5: Testing

5.1: Unit Tests
5.2: Integration Tests
5.3: Functional Tests
5.4: End-to-End Tests
5.5: Contract Tests
5.6: Static Analysis
5.7: Test Doubles
5.8: Test Feedback Speed
5.9: Testing Glossary

6: DORA Recommended Practices
7: CD Dependency Tree
8: Glossary
9: FAQ
10: Resources

Look up definitions, check metrics, or find resources for deeper reading.

Sections

Practices - Core CD practice definitions from MinimumCD
Metrics - Delivery metrics: what to measure, why it matters, how to interpret
Testing - Test architecture, types, and best practices
Pipeline Reference Architecture - Quality gate patterns by defect detection priority
Systemic Defect Fixes - Defect sources with earliest detection points and prevention strategies
Glossary - Key terms and definitions
CD Dependency Tree - How CD practices depend on each other
DORA Recommended Practices - Research-backed capabilities mapped to migration phases
FAQ - Common questions about CD and this migration guide
Resources - Books, talks, and further reading

1 - Pipeline Reference Architecture

Pipeline reference architectures for single-team, multi-team, and distributed service delivery, with quality gates sequenced by defect detection priority.

This section defines quality gates sequenced by defect detection priority and three pipeline patterns that apply them. Quality gates are derived from the Systemic Defect Fixes catalog and sequenced so the cheapest, fastest checks run first.

Gates marked with [Pre-Feature] must be in place and passing before any new feature work begins. They form the baseline safety net that every commit runs through. Adding features without these gates means defects accumulate faster than the team can detect them.

Gates marked with ▲ are enhanced by AI - the AI shifts detection earlier or catches issues that rule-based tools miss. See the Systemic Defect Fixes catalog for details.

Quality Gates in Priority Sequence

The gate sequence follows a single principle: fail fast, fail cheap. Gates that catch the most common defects with the least execution time run first. Each gate listed below maps to one or more defect sources from the catalog.

Pre-commit Gates

These run on the developer’s machine before code leaves the workstation. They provide sub-second to sub-minute feedback.

Gate	Defect Sources Addressed	Catalog Section	Pre-Feature
Linting and formatting	Code style consistency, preventable review noise	Process & Deployment	Required
Static type checking	Null/missing data assumptions, type mismatches	Data & State	Required
Secret scanning	Secrets committed to source control	Security & Compliance	Required
SAST (injection patterns)	Injection vulnerabilities, taint analysis	Security & Compliance	Required
Race condition detection	Race conditions (thread sanitizers, where language supports it)	Integration & Boundaries
Accessibility linting	Missing alt text, ARIA violations, contrast failures	Product & Discovery
Unit tests	Logic errors, unintended side effects, edge cases	Change & Complexity	Required
Timeout enforcement checks	Missing timeout and deadline enforcement	Performance & Resilience
▲ AI semantic code review	Logic errors, missing edge cases, subtle injection vectors beyond pattern matching	Process & Deployment, Security & Compliance

CI Stage 1: Build and Fast Tests < 5 min

These run on every commit to trunk.

Gate	Defect Sources Addressed	Catalog Section	Pre-Feature
All pre-commit gates	Re-run in CI to catch anything bypassed locally	See Pre-commit Gates	Required
Compilation / build	Build reproducibility, dependency resolution	Dependency & Infrastructure	Required
Dependency vulnerability scan (SCA)	Known vulnerabilities in dependencies	Security & Compliance	Required
License compliance scan	License compliance violations	Security & Compliance
Code complexity and duplication scoring	Accumulated technical debt	Change & Complexity
▲ AI change impact analysis	Semantic blast radius of changes; unintended side effects beyond syntactic dependencies	Change & Complexity
▲ AI vulnerability reachability analysis	Correlate CVEs with actual code usage paths to prioritize exploitable risks over theoretical ones	Security & Compliance
Stage duration warning	Warn if Stage 1 exceeds 10 minutes; slow fast-feedback loops mask defects and delay trunk integration	Process & Deployment

CD Stage 1: Integration and Contract Tests < 10 min

These validate boundaries between components.

Gate	Defect Sources Addressed	Catalog Section	Pre-Feature
Contract tests	Interface mismatches, wrong assumptions about upstream/downstream	Integration & Boundaries	Required
Schema migration validation	Schema migration and backward compatibility failures	Data & State	Required
Infrastructure-as-code drift detection	Configuration drift, environment differences	Dependency & Infrastructure
Environment parity checks	Test environments not reflecting production	Testing & Observability Gaps
▲ AI boundary coverage analysis	Integration boundaries missing contract tests; semantic service relationship mapping	Testing & Observability Gaps
▲ AI behavioral assumption detection	Undocumented assumptions at service boundaries that contract tests don’t cover	Integration & Boundaries

CD Stage 2: Broader Automated Verification < 15 min

These run in parallel where possible.

Gate	Defect Sources Addressed	Catalog Section
Mutation testing	Untested edge cases and error paths, weak assertions	Testing & Observability Gaps
Performance benchmarks	Performance regressions	Performance & Resilience
Resource leak detection	Resource leaks (memory, connections)	Performance & Resilience
Security integration tests	Authentication and authorization gaps	Security & Compliance
Compliance-as-code policy checks	Regulatory requirement gaps, missing audit trails	Security & Compliance
SBOM generation	License compliance, dependency transparency	Security & Compliance
Automated WCAG compliance scan	Full-page rendered accessibility checks with browser automation	Product & Discovery
▲ AI edge case test generation	Untested boundaries and error conditions identified from code path analysis	Testing & Observability Gaps
▲ AI authorization path analysis	Missing authorization checks and privilege escalation patterns in code paths	Security & Compliance
▲ AI resilience review	Single points of failure and missing fallback paths in architecture	Performance & Resilience
▲ AI regulatory mapping	Map regulatory requirements to implementation artifacts; flag uncovered controls	Security & Compliance

Acceptance Tests < 20 min

These validate user-facing behavior in a production-like environment.

Gate	Defect Sources Addressed	Catalog Section
Functional acceptance tests	Implementation does not match acceptance criteria	Product & Discovery
Load and capacity tests	Unknown capacity limits, slow response times	Performance & Resilience
Chaos and resilience tests	Network partition handling, missing graceful degradation	Performance & Resilience
Cache invalidation verification	Cache invalidation errors	Data & State
Feature interaction tests	Unanticipated feature interactions	Change & Complexity
▲ AI intent alignment review	Acceptance criteria vs. user behavior data misalignment; specs that meet the letter but miss the intent	Product & Discovery

Production Verification

These run during and after deployment. They are not optional - they close the feedback loop.

Gate	Defect Sources Addressed	Catalog Section
Health checks with auto-rollback	Inadequate rollback capability	Process & Deployment
Canary or progressive deployment	Batching too many changes per release	Process & Deployment
Real user monitoring and SLO checks	Slow user-facing response times, product-market misalignment	Performance & Resilience
Structured audit logging verification	Missing audit trails	Security & Compliance
▲ AI change risk scoring	Automated risk assessment from change diff, deployment history, and blast radius analysis	Process & Deployment

Pre-Feature Baseline

These gates must be active before starting feature work

Without these gates passing on every commit to trunk, defects accumulate faster than the team can detect them. If any are missing, add them before writing new features. The Foundations phase covers how to establish this baseline.

Linting and formatting
Static type checking
Secret scanning
SAST for injection patterns
Compilation / build
Unit tests
Dependency vulnerability scan
Contract tests at every integration boundary
Schema migration validation

Pipeline Patterns

These three patterns apply the quality gates above to progressively more complex team and deployment topologies. Most organizations start with Pattern 1 and evolve toward Pattern 3 as team count and deployment independence requirements grow.

Single Team, Single Deployable - one team owns one modular monolith with a linear pipeline
Multiple Teams, Single Deployable - multiple teams own sub-domain modules within a shared modular monolith, each with its own sub-pipeline feeding a thin integration pipeline
Independent Teams, Independent Deployables - each team owns an independently deployable service with its own full pipeline and API contract verification

Mapping to the Defect Sources Catalog

Each quality gate above is derived from the Systemic Defect Fixes catalog. The catalog organizes defects by origin - product and discovery, integration, knowledge, change and complexity, testing gaps, process, data, dependencies, security, and performance. The pipeline gates are the automated enforcement points for the systemic prevention strategies described in the catalog.

Gates marked with ▲ correspond to catalog entries where AI shifts detection earlier than current rule-based automation. For expert agent patterns that implement these gates in an agentic CD context, see ACD Pipeline Enforcement.

When adding or removing gates, consult the catalog to ensure that no defect category loses its detection point. A gate that seems redundant may be the only automated check for a specific defect source.

1.1 - Single Team, Single Deployable

A linear pipeline pattern for a single team owning a modular monolith.

This architecture suits a team of up to 8-10 people owning a modular monolith - a single deployable application with well-defined internal module boundaries. The codebase is organized by domain, not by technical layer. Each module encapsulates its own data, logic, and interfaces, communicating with other modules through explicit internal APIs. The application deploys as one unit, but its internal structure makes it possible to reason about, test, and change one module without understanding the entire codebase. The pipeline is linear with parallel stages where dependencies allow.

Pre-Feature Gate CI Stage Parallel Verification Acceptance Production

graph TD
    classDef prefeature fill:#0d7a32,stroke:#0a6128,color:#fff
    classDef ci fill:#224968,stroke:#1a3a54,color:#fff
    classDef parallel fill:#30648e,stroke:#224968,color:#fff
    classDef accept fill:#6c757d,stroke:#565e64,color:#fff
    classDef prod fill:#a63123,stroke:#8a2518,color:#fff

    A["Pre-commit Gates<br/><small>Lint, Types, Secrets, SAST</small>"]:::prefeature
    B["Build + Unit Tests"]:::prefeature
    C["Contract + Schema Tests"]:::prefeature
    D["Security Scans"]:::parallel
    E["Performance Benchmarks"]:::parallel
    F["Acceptance Tests<br/><small>Production-Like Env</small>"]:::accept
    G["Create Immutable Artifact"]:::ci
    H["Deploy Canary / Progressive"]:::prod
    I["Health Checks + SLO Monitors<br/>Auto-Rollback"]:::prod

    A -->|"commit to trunk"| B
    B --> C
    C --> D & E
    D --> F
    E --> F
    F --> G
    G --> H
    H --> I

Key Characteristics

One pipeline, one artifact: The entire application builds and deploys as a single immutable artifact. There is no fan-out or fan-in.
Linear with parallel branches: Security scans and performance benchmarks run in parallel because neither depends on the other. Everything else is sequential.
Trunk-based development: All developers commit to trunk at least daily. The pipeline runs on every commit.
Total target time: Under 15 minutes from commit to production-ready artifact. Acceptance tests may extend this to 20 minutes for complex applications.
Ownership: The team owns the pipeline definition, which lives in the same repository as the application code.

When This Architecture Breaks Down

This architecture stops working when:

The system becomes too large for a single team to manage.
Build times extend along with the ability to respond quickly even after optimization
Different parts of the application need different deployment cadences

When these symptoms appear, consider splitting into the multi-team architecture or decomposing the application into independently deployable services with their own pipelines.

Quality Gates - the full gate sequence this pipeline applies
Multiple Teams, Single Deployable - the next pattern when one team is not enough
Modular Monolith - glossary definition
Pipeline Architecture - how to evolve pipeline architecture from entangled to loosely coupled

1.2 - Multiple Teams, Single Deployable

A sub-pipeline pattern for multiple teams contributing domain modules to a shared modular monolith.

This architecture suits organizations where multiple teams contribute to a single deployable modular monolith - a common pattern for large applications, mobile apps, or platforms where the final artifact must be assembled from team contributions.

The modular monolith structure is what makes multi-team ownership possible. Each team owns a specific module representing a bounded sub-domain of the application. Team A might own checkout and payments, Team B owns inventory and fulfillment, Team C owns user accounts and authentication. Modules communicate through explicit internal APIs, not by reaching into each other’s database tables or calling private methods. Each team’s sub-pipeline validates only their module. A shared integration pipeline assembles and verifies the combined result.

This ownership model is critical. Without clear module boundaries, teams step on each other’s code, sub-pipelines trigger on unrelated changes, and merge conflicts replace pipeline contention as the bottleneck. The module split must follow the application’s domain boundaries, not its technical layers. A team that owns “the database layer” or “the API controllers” will always be coupled to every other team. A team that owns “payments” can change its database, API, and UI independently. If the codebase is not yet structured as a modular monolith, restructure it before adopting this architecture

otherwise the sub-pipelines will constantly interfere with each other.

graph TD
    classDef prefeature fill:#0d7a32,stroke:#0a6128,color:#fff
    classDef team fill:#224968,stroke:#1a3a54,color:#fff
    classDef integration fill:#30648e,stroke:#224968,color:#fff
    classDef prod fill:#a63123,stroke:#8a2518,color:#fff

    subgraph teamA ["Payments Sub-Domain (Team A)"]
        A1["Pre-commit Gates"]:::prefeature
        A2["Build + Unit Tests"]:::prefeature
        A3["Contract Tests"]:::prefeature
        A4["Security + Perf"]:::team
        A1 --> A2 --> A3 --> A4
    end

    subgraph teamB ["Inventory Sub-Domain (Team B)"]
        B1["Pre-commit Gates"]:::prefeature
        B2["Build + Unit Tests"]:::prefeature
        B3["Contract Tests"]:::prefeature
        B4["Security + Perf"]:::team
        B1 --> B2 --> B3 --> B4
    end

    subgraph teamC ["Accounts Sub-Domain (Team C)"]
        C1["Pre-commit Gates"]:::prefeature
        C2["Build + Unit Tests"]:::prefeature
        C3["Contract Tests"]:::prefeature
        C4["Security + Perf"]:::team
        C1 --> C2 --> C3 --> C4
    end

    subgraph integ ["Integration Pipeline"]
        I1["Assemble Combined Artifact"]:::integration
        I2["Integration Contract Tests"]:::integration
        I3["Acceptance Tests<br/><small>Production-Like Env</small>"]:::integration
        I4["Create Immutable Artifact"]:::integration
        I1 --> I2 --> I3 --> I4
    end

    A4 --> I1
    B4 --> I1
    C4 --> I1

    I4 --> D1["Deploy Canary / Progressive"]:::prod
    D1 --> D2["Health Checks + SLO Monitors<br/>Auto-Rollback"]:::prod

Key Characteristics

Module ownership by domain: Each team owns a bounded module of the application’s functionality. Ownership is defined by domain, not by technical layer. The team is responsible for all code, tests, and pipeline configuration within their module.
Team-owned sub-pipelines: Each team runs their own pre-commit, build, unit test, contract test, and security gates independently. A team’s sub-pipeline validates only their module and is their fast feedback loop.
Contract tests at both levels: Teams run contract tests in their sub-pipeline to catch boundary issues at the module edges. The integration pipeline runs cross-module contract tests to verify the assembled result.
Integration pipeline is thin: The integration pipeline does not re-run each team’s tests. It validates only what cannot be validated in isolation - cross-module integration, the assembled artifact, and end-to-end acceptance tests.
Sub-pipeline target time: Under 10 minutes. This is the team’s primary feedback loop and must stay fast.
Integration pipeline target time: Under 15 minutes. If it grows beyond this, the integration test suite needs decomposition or the application needs architectural changes to enable independent deployment.
Trunk-based development with path filters: All teams commit to the same trunk. Sub-pipelines trigger based on path filters aligned to module boundaries, so a change to the payments module does not trigger the inventory sub-pipeline.

Preventing the Integration Pipeline from Becoming a Bottleneck

The integration pipeline is a shared resource and the most likely bottleneck in this architecture. To keep it fast:

Move tests left into sub-pipelines: Every test that can run in a sub-pipeline should run there. The integration pipeline should only contain tests that require the full assembled artifact.
Use contract tests aggressively: Contract tests in sub-pipelines catch most integration issues without needing the full system. The integration pipeline’s contract tests are a verification layer, not the primary detection point.
Run the integration pipeline on every commit to trunk: Do not batch. Batching creates large changesets that are harder to debug when they fail.
Parallelize acceptance tests: Group acceptance tests by feature area and run groups in parallel.
Monitor integration pipeline duration: Set an alert if it exceeds 15 minutes. Treat this the same as a failing test - fix it immediately.

When to Move Away from This Architecture

This architecture is a pragmatic pattern for organizations that cannot yet decompose their monolith into independently deployable services. The long-term goal is loose coupling - independent services with independent pipelines that do not need a shared integration step.

Signs you are ready to decompose:

Contract tests catch virtually all integration issues in sub-pipelines
The integration pipeline adds little value beyond what sub-pipelines already verify
Teams are blocked by integration pipeline queuing more than once per week
Different parts of the application need different deployment cadences

Quality Gates - the full gate sequence this pipeline applies
Single Team, Single Deployable - the simpler pattern for one team
Independent Teams, Independent Deployables - the target pattern when modules become independent services
Modular Monolith - glossary definition
Architecture Decoupling - how to move toward independent deployment
Team Alignment to Code - how to structure teams around domain boundaries so this pipeline pattern works

1.3 - Independent Teams, Independent Deployables

A fully independent pipeline pattern for teams deploying their own services in any order, with API contract verification replacing integration testing.

This is the target architecture for continuous delivery at scale. Each team owns an independently deployable service with its own pipeline, its own release cadence, and its own path to production. No team waits for another team to deploy. No integration pipeline serializes their work. The only shared infrastructure is the API contract layer that defines how services communicate.

This architecture demands disciplined API management. Without it, independent deployment is an illusion - teams deploy whenever they want, but they break each other constantly.

graph TD
    classDef prefeature fill:#0d7a32,stroke:#0a6128,color:#fff
    classDef team fill:#224968,stroke:#1a3a54,color:#fff
    classDef contract fill:#30648e,stroke:#224968,color:#fff
    classDef prod fill:#a63123,stroke:#8a2518,color:#fff
    classDef api fill:#6c757d,stroke:#565e64,color:#fff

    subgraph svcA ["Service A Pipeline (Team A)"]
        A1["Pre-commit Gates"]:::prefeature
        A2["Build + Unit Tests"]:::prefeature
        A3["Contract<br/>Verification"]:::prefeature
        A4["Security + Perf"]:::team
        A5["Acceptance Tests"]:::team
        A6["Create Immutable Artifact"]:::team
        A1 --> A2 --> A3 --> A4 --> A5 --> A6
    end

    subgraph svcB ["Service B Pipeline (Team B)"]
        B1["Pre-commit Gates"]:::prefeature
        B2["Build + Unit Tests"]:::prefeature
        B3["Contract<br/>Verification"]:::prefeature
        B4["Security + Perf"]:::team
        B5["Acceptance Tests"]:::team
        B6["Create Immutable Artifact"]:::team
        B1 --> B2 --> B3 --> B4 --> B5 --> B6
    end

    subgraph svcC ["Service C Pipeline (Team C)"]
        C1["Pre-commit Gates"]:::prefeature
        C2["Build + Unit Tests"]:::prefeature
        C3["Contract<br/>Verification"]:::prefeature
        C4["Security + Perf"]:::team
        C5["Acceptance Tests"]:::team
        C6["Create Immutable Artifact"]:::team
        C1 --> C2 --> C3 --> C4 --> C5 --> C6
    end

    subgraph apis ["API Schema Registry"]
        R1["Published API Schemas<br/><small>OpenAPI, AsyncAPI, Protobuf</small>"]:::api
        R2["Backward Compatibility<br/>Checks"]:::api
        R3["Consumer Pacts<br/><small>where available</small>"]:::api
        R1 --- R2 --- R3
    end

    A3 <-..->|"verify"| R3
    B3 <-..->|"verify"| R3
    C3 <-..->|"verify"| R3

    A6 --> A7["Deploy + Canary"]:::prod
    A7 --> A8["Health + SLOs"]:::prod

    B6 --> B7["Deploy + Canary"]:::prod
    B7 --> B8["Health + SLOs"]:::prod

    C6 --> C7["Deploy + Canary"]:::prod
    C7 --> C8["Health + SLOs"]:::prod

Pre-Feature Gate Team Pipeline API Schema Registry Production

Key Characteristics

Fully independent deployment: Each team deploys on its own schedule. Team A can deploy ten times a day while Team C deploys once a week. No coordination is required.
No shared integration pipeline: There is no fan-in step. Each pipeline goes straight from artifact creation to production. This eliminates the integration bottleneck entirely.
Contract tests replace integration tests: Instead of testing all services together, each team verifies its API contracts independently. The level of contract verification depends on how much coordination is possible between teams (see contract verification approaches below).
Each team owns its full pipeline: From pre-commit to production monitoring. No shared pipeline definitions, no central platform team gating deployments.

Why API Management Is Critical

Independent deployment only works when teams can change their service without breaking others. This requires a shared understanding of API boundaries that is enforced automatically, not through meetings or documents that drift.

Without API management, independent pipelines create independent failures. Teams deploy incompatible changes, discover the breakage in production, and revert to coordinated releases to stop the bleeding. This is worse than the multi-team architecture because it creates the illusion of independence while delivering the reliability of chaos.

What API Management Requires

Published API schemas: Every service publishes its API contract (OpenAPI, AsyncAPI, Protobuf, or equivalent) as a versioned artifact. The schema is the source of truth for what the service provides.
Contract verification (see approaches below): At minimum, providers verify backward compatibility against their own published schema. Where cross-team coordination is feasible, consumer-driven contracts add stronger guarantees.
Backward compatibility enforcement: Every API change is checked for backward compatibility against the published schema. Breaking changes require a new API version using the expand-then-contract pattern:
- Deploy the new version alongside the old
- Migrate consumers to the new version
- Remove the old version only after all consumers have migrated
Schema registry: A central registry (Confluent Schema Registry, a simple artifact repository, or a Pact Broker where consumer-driven contracts are used) stores published schemas. Pipelines pull from this registry to run compatibility checks. The registry is shared infrastructure, but it does not gate deployments - it provides data that each team’s pipeline uses to make its own go/no-go decision.
API versioning strategy: Teams agree on a versioning convention (URL path versioning, header versioning, or semantic versioning for message schemas) and enforce it through pipeline gates. The convention must be simple enough that every team follows it without deliberation.

Contract Verification Approaches

Not all teams can coordinate on shared contract tooling. The right approach depends on the relationship between provider and consumer teams. These approaches are listed from least to most coordination required. Use the strongest approach your context supports.

Approach	How It Works	Coordination Required	Best When
Provider schema compatibility	Provider’s pipeline checks every change for backward compatibility against its own published schema (e.g., OpenAPI diff). No consumer involvement needed.	None between teams	Teams are in different organizations, or consumers are external/unknown
Provider-maintained consumer tests	Provider team writes tests that exercise known consumer usage patterns based on API analytics, documentation, or past breakage.	Minimal - provider observes consumers	Provider can see consumer traffic patterns but cannot require consumer participation
Consumer-driven contracts	Consumers publish pacts describing the subset of the provider API they depend on. Provider runs these pacts in its pipeline. See Contract Tests.	High - shared tooling, broker, and agreement to maintain pacts	Teams are in the same organization with shared tooling and willingness to maintain pacts

Most organizations use a mix. Internal teams with shared tooling can adopt consumer-driven contracts. Teams consuming third-party or cross-organization APIs use provider schema compatibility checks and provider-maintained consumer tests.

The critical requirement is not which approach you use but that every provider pipeline verifies backward compatibility before deployment. The minimum viable contract verification is an automated schema diff against the published API - if the diff contains a breaking change, the pipeline fails.

Additional Quality Gates for Distributed Architectures

Gate	Defect Sources Addressed	Catalog Section
Provider schema backward compatibility	Interface mismatches from provider changes	Integration & Boundaries
Consumer-driven contract verification (where feasible)	Wrong assumptions about upstream/downstream	Integration & Boundaries
API schema backward compatibility check	Schema migration and backward compatibility failures	Data & State
Cross-service timeout propagation check	Missing timeout and deadline enforcement across boundaries	Performance & Resilience
Circuit breaker and fallback verification	Network partitions and partial failures handled wrong	Dependency & Infrastructure
Distributed tracing validation	Missing observability across service boundaries	Testing & Observability Gaps

When This Architecture Works

This architecture is the goal for organizations with:

Multiple teams that need different deployment cadences
Services with well-defined, stable API boundaries
Teams mature enough to own their full delivery pipeline
Investment in contract testing tooling and API governance

When This Architecture Fails

Shared database schemas: Multiple services can share a database engine without problems. The failure mode is shared schemas - when Service A and Service B both read from and write to the same tables, a schema migration by one service can break the other’s queries. Each service must own its own schema. If two services need the same data, expose it through an API or event, not through direct table access.
Synchronous dependency chains: If Service A calls Service B which calls Service C in the request path, a deployment of C can break A through B. Circuit breakers and fallbacks are required at every boundary, and contract tests must cover failure modes, not just success paths.
No contract verification discipline: If teams skip backward compatibility checks or let contract test failures slide, breakage shifts from the pipeline to production. The architecture degrades into uncoordinated deployments with production as the integration environment. At minimum, every provider must run automated schema compatibility checks - even without consumer-driven contracts.
Missing observability: When services deploy independently, debugging production issues requires distributed tracing, correlated logging, and SLO monitoring across service boundaries. Without this, independent deployment means independent troubleshooting with no way to trace cause and effect.

Relationship to the Other Architectures

Architecture 3 is where Architecture 2 teams evolve to. The progression is:

Single team, single deployable - one team, one pipeline, one artifact
Multiple teams, single deployable - multiple teams, sub-pipelines, shared integration step
Independent teams, independent deployables - multiple teams, fully independent pipelines, contract-based integration

The move from 2 to 3 happens incrementally. Extract one service at a time. Give it its own pipeline. Establish contract tests between it and the monolith. When the contract tests are reliable, stop running the extracted service’s code through the integration pipeline. Repeat until the integration pipeline is empty.

Quality Gates - the full gate sequence this pipeline applies
Multiple Teams, Single Deployable - the pattern teams evolve from
Contract Tests - contract testing patterns and examples
Architecture Decoupling - how to extract services incrementally
Premature Microservices - the risk of jumping to this architecture too early

2 - Systemic Defect Fixes

A catalog of defect sources across the delivery value stream with earliest detection points, AI shift-left opportunities, and systemic prevention strategies.

Defects do not appear randomly. They originate from specific, predictable sources in the delivery value stream. This reference catalogs those sources so teams can shift detection left, automate where possible, and apply AI where it adds real value to the feedback loop.

The goal is systems thinking: detect issues as early as possible in the value stream so feedback informs continuous improvement in how we work, not just reactive fixes to individual defects.

▲ AI shifts detection earlier than current automation alone
Dark cells = current automation is sufficient; AI adds no additional value
No marker = AI assists at the current detection point but does not shift it earlier

How to Use This Catalog

Pick your pain point. Find the category where your team loses the most time to defects or rework. Start there, not at the top.
Focus on the Systemic Prevention column. Automated detection catches defects faster, but systemic prevention eliminates entire categories. Prioritize the prevention fix for each issue you selected.
Measure before and after. Track defect escape rate by category and time-to-detection. If the systemic fix is working, both metrics improve within weeks.

Discovery Requirements Design Coding Pre-commit CI Acceptance Tests Production

Shift left: earlier detection is cheaper to fix

Category	What it covers
Product & Discovery	Wrong features, misaligned requirements, accessibility gaps - defects born before coding begins
Integration & Boundaries	Interface mismatches, behavioral assumptions, race conditions at service boundaries
Knowledge & Communication	Implicit domain knowledge, ambiguous requirements, tribal knowledge loss, divergent mental models
Change & Complexity	Unintended side effects, technical debt, feature interactions, configuration drift
Testing & Observability Gaps	Untested edge cases, missing contract tests, insufficient monitoring, environment parity
Process & Deployment	Long-lived branches, manual steps, large batches, inadequate rollback, work stacking
Data & State	Schema migration failures, null assumptions, concurrency issues, cache invalidation
Dependency & Infrastructure	Third-party breaking changes, environment differences, network partition handling
Security & Compliance	Vulnerabilities, secrets in source, auth gaps, injection, regulatory requirements, audit trails
Performance & Resilience	Regressions, resource leaks, capacity limits, missing timeouts, graceful degradation

2.1 - Product & Discovery Defects

Defects that originate before a single line of code is written - the most expensive category because they compound through every downstream phase.

These defects originate before a single line of code is written. They are the most expensive to fix because they compound through every downstream phase.

Issue	Earliest Detection (Automation)	Automated Detection	Earlier Detection with AI	Systemic Prevention
Building the wrong thing	Discovery	Product analytics platforms, usage trend alerts	▲ Synthesize user feedback, support tickets, and usage data to surface misalignment earlier than production metrics	Validated user research before backlog entry; dual-track agile
Solving a problem nobody has	Discovery	Support ticket clustering tools, feature adoption tracking	▲ Semantic analysis of interview transcripts, forums, and support tickets to identify real vs. assumed pain	Problem validation as a stage gate; publish problem brief before solution
Correct problem, wrong solution	Discovery	A/B testing frameworks, feature flag cohort comparison	Evaluate prototypes against problem definitions; generate alternative approaches	Prototype multiple approaches; measurable success criteria first
Meets spec but misses user intent	Requirements	Session replay tools, rage-click and error-loop detection	▲ Review acceptance criteria against user behavior data to flag misalignment	Acceptance criteria focused on user outcomes, not checklists
Over-engineering beyond need	Design	Static analysis for dead code and unused abstractions	▲ Flag unnecessary abstraction layers and premature optimization in code review	YAGNI principle; justify every abstraction layer
Prioritizing wrong work	Discovery	DORA metrics versus business outcomes, WSJF scoring	Synthesize roadmap, customer data, and market signals to surface opportunity costs	WSJF prioritization with outcome data
Inaccessible UI excludes users	Pre-commit	axe-core, pa11y, Lighthouse accessibility audits	Current tooling sufficient	WCAG compliance as acceptance criteria; automated accessibility checks in pipeline

Defect Sources - full catalog overview and how to use it
Testing - testing types and best practices
Anti-Patterns - patterns that undermine delivery performance

2.2 - Integration & Boundaries Defects

Defects at system boundaries that are invisible to unit tests and often survive until production. Contract testing and deliberate boundary design are the primary defenses.

Defects at system boundaries are invisible to unit tests and often survive until production. Contract testing and deliberate boundary design are the primary defenses.

Issue	Earliest Detection (Automation)	Automated Detection	Earlier Detection with AI	Systemic Prevention
Interface mismatches	CI	Consumer-driven contract tests, API schema validators	Predict which consumers break from API changes based on usage patterns	Mandatory contract tests per boundary; API-first with generated clients
Wrong assumptions about upstream/downstream	Design	Chaos engineering platforms, synthetic transactions, fault injection	▲ Review code and docs to identify undocumented behavioral assumptions	Document behavioral contracts; defensive coding at boundaries
Race conditions	Pre-commit	Thread sanitizers, race detectors, formal verification tools, fuzz testing	Flag concurrency anti-patterns but cannot replace formal detection tools	Idempotent design; queues over shared mutable state

Defect Sources - full catalog overview and how to use it
Testing - testing types and best practices
Contract Tests - verify that your test doubles still match reality

2.3 - Knowledge & Communication Defects

Defects that emerge from gaps between what people know and what the code expresses - the hardest to detect with automated tools and the easiest to prevent with team practices.

These defects emerge from gaps between what people know and what the code expresses. They are the hardest to detect with automated tools and the easiest to prevent with team practices.

Issue	Earliest Detection (Automation)	Automated Detection	Earlier Detection with AI	Systemic Prevention
Implicit domain knowledge not in code	Coding	Magic number detection, code ownership analytics	▲ Identify undocumented business rules and knowledge gaps from code and test analysis	Domain-Driven Design with ubiquitous language; embed rules in code
Ambiguous requirements	Requirements	Flag stories without acceptance criteria, BDD spec coverage tracking	▲ Review requirements for ambiguity, missing edge cases, and contradictions; generate test scenarios	Three Amigos before work; example mapping; executable specs
Tribal knowledge loss	Coding	Bus factor analysis from commit history, single-author concentration alerts	▲ Generate documentation from code and tests; flag documentation drift from implementation	Pair/mob programming as default; rotate on-call; living docs
Divergent mental models across teams	Design	Divergent naming detection, contract test failures	▲ Compare terminology and domain models across codebases to detect semantic mismatches	Shared domain models; explicit bounded contexts

Defect Sources - full catalog overview and how to use it
Anti-Patterns - patterns that undermine delivery performance

2.4 - Change & Complexity Defects

Defects caused by the act of changing existing code. The larger the change and the longer it lives outside trunk, the higher the risk.

These defects are caused by the act of changing existing code. The larger the change and the longer it lives outside trunk, the higher the risk.

Issue	Earliest Detection (Automation)	Automated Detection	Earlier Detection with AI	Systemic Prevention
Unintended side effects	CI	Automated test suites, mutation testing frameworks, change impact analysis	▲ Reason about semantic change impact beyond syntactic dependencies; automated blast radius analysis	Small focused commits; trunk-based development; feature flags
Accumulated technical debt	CI	Complexity trends, duplication scoring, dependency cycle detection, quality gates	▲ Identify architectural drift, abstraction decay, and calcified workarounds	Refactoring as part of every story; dedicated debt budget
Unanticipated feature interactions	Acceptance Tests	Combinatorial and pairwise testing, feature flag interaction matrix	Reason about feature interactions semantically; flag conflicts testing matrices miss	Feature flags with controlled rollout; modular design; canary deployments
Configuration drift	CI	Infrastructure-as-code drift detection, environment diffing	Current tooling sufficient	Infrastructure as code; immutable infrastructure; GitOps

Defect Sources - full catalog overview and how to use it
Testing - testing types and best practices
Anti-Patterns - patterns that undermine delivery performance

2.5 - Testing & Observability Gap Defects

Defects that survive because the safety net has holes. The fix is not more testing - it is better-targeted testing and observability that closes the specific gaps.

These defects survive because the safety net has holes. The fix is not more testing: it is better-targeted testing and observability that closes the specific gaps.

Issue	Earliest Detection (Automation)	Automated Detection	Earlier Detection with AI	Systemic Prevention
Untested edge cases and error paths	CI	Mutation testing frameworks, branch coverage thresholds	▲ Analyze code paths and generate tests for untested boundaries and error conditions	Property-based testing as standard; boundary value analysis
Missing contract tests at boundaries	CI	Boundary inventory versus contract test inventory	▲ Identify boundaries lacking tests by understanding semantic service relationships	Mandatory contract tests per new boundary
Insufficient monitoring	Design	Observability coverage scoring, health endpoint checks, structured logging verification	Current tooling sufficient	Observability as non-functional requirement; SLOs for every user-facing path
Test environments don’t reflect production	CI	Automated environment parity checks, synthetic transaction comparison, infrastructure-as-code diff tools	Current tooling sufficient	Production-like data in staging; test in production with flags

Defect Sources - full catalog overview and how to use it
Testing - testing types and best practices
Testing Symptoms - symptoms caused by testing gaps
Visibility Symptoms - symptoms caused by missing observability

2.6 - Process & Deployment Defects

Defects caused by the delivery process itself. Manual steps, large batches, and slow feedback loops create the conditions for failure.

These defects are caused by the delivery process itself. Manual steps, large batches, and slow feedback loops create the conditions for failure.

Issue	Earliest Detection (Automation)	Automated Detection	Earlier Detection with AI	Systemic Prevention
Long-lived branches	Pre-commit	Branch age alerts, merge conflict frequency, CI dashboard for branch count	Process change, not AI	Trunk-based development; merge at least daily
Manual pipeline steps	CI	Pipeline audit for manual gates, deployment lead time analysis	Automation, not AI	Automate every step commit-to-production
Batching too many changes per release	CI	Changes-per-deploy metrics, deployment frequency tracking	CD practice, not AI	Every commit is a release candidate; single-piece flow
Inadequate rollback capability	CI	Automated rollback testing in CI, mean time to rollback measurement	Deployment patterns, not AI	Blue/green or canary deployments; auto-rollback on health failure
Reliance on human review to catch preventable defects	Coding	Linters, static analysis security testing, type systems, complexity scoring	▲ Semantic code review for logic errors and missing edge cases that automated rules cannot express	Reserve human review for knowledge transfer and design decisions
Manual review of risks and compliance (CAB)	Design	Change lead time analysis, CAB effectiveness metrics	▲ Automated change risk scoring from change diff and deployment history; blast radius analysis	Replace CAB with automated progressive delivery
Work stacking on individuals; everything started, nothing finished; PRs waiting days for review; uneven workloads; blocked work sits idle; completed work misses the intent	CI	Issue tracker reports where individuals have multiple items assigned simultaneously	Process change, not AI	Push-Based Work Assignment anti-pattern

Defect Sources - full catalog overview and how to use it
Deployment Symptoms - symptoms caused by deployment process problems
Anti-Patterns - patterns that undermine delivery performance

2.7 - Data & State Defects

Data defects are particularly dangerous because they can corrupt persistent state. Unlike code defects, data corruption often cannot be fixed by deploying a new version.

Issue	Earliest Detection (Automation)	Automated Detection	Earlier Detection with AI	Systemic Prevention
Schema migration and backward compatibility failures	CI	Schema compatibility validators, migration dry-runs	Predict downstream impact by understanding consumer usage patterns	Expand-then-contract schema migrations; never breaking changes
Null or missing data assumptions	Pre-commit	Null safety static analyzers, strict type systems	Flag code where optional fields are used without null checks	Null-safe type systems; Option/Maybe as default; validate at boundaries
Concurrency and ordering issues	CI	Thread sanitizers, load tests with randomized timing	Design patterns, not AI	Design for out-of-order delivery; idempotent consumers
Cache invalidation errors	Acceptance Tests	Cache consistency monitoring, TTL verification, stale data detection	Review cache invalidation logic for incomplete paths or mismatches	Short TTLs; event-driven invalidation

Defect Sources - full catalog overview and how to use it
Testing - testing types and best practices

2.8 - Dependency & Infrastructure Defects

Defects that originate outside your codebase but break your system. The fix is to treat external dependencies as untrusted boundaries.

These defects originate outside your codebase but break your system. The fix is to treat external dependencies as untrusted boundaries.

Issue	Earliest Detection (Automation)	Automated Detection	Earlier Detection with AI	Systemic Prevention
Third-party library breaking changes	CI	Dependency update automation, software composition analysis for breaking versions	Review changelogs and API diffs to assess breaking change risk; predict compatibility issues	Pin dependencies; automated upgrade PRs with test gates
Infrastructure differences across environments	CI	Infrastructure-as-code drift detection, config comparison, environment parity scoring	IaC and GitOps, not AI	Single source of truth for all environments; containerization
Network partitions and partial failures handled wrong	Acceptance Tests	Chaos engineering platforms, synthetic transaction monitoring	Review architectures for missing failure handling patterns	Circuit breakers; retries; bulkheads as defaults; test failure modes explicitly

Defect Sources - full catalog overview and how to use it
Testing - testing types and best practices

2.9 - Security & Compliance Defects

Security and compliance defects are silent until they are catastrophic. The gap between what the code does and what policy requires is invisible without deliberate, automated verification at every stage.

Security and compliance defects are silent until they are catastrophic. They share a pattern: the gap between what the code does and what policy requires is invisible without deliberate, automated verification at every stage.

Issue	Earliest Detection (Automation)	Automated Detection	Earlier Detection with AI	Systemic Prevention
Known vulnerabilities in dependencies	CI	Software composition analysis, CVE database scanning, dependency lock file auditing	▲ Correlate vulnerability advisories with actual usage paths to prioritize exploitable risks over theoretical ones	Automated dependency updates with test gates; pin and audit all transitive dependencies
Secrets committed to source control	Pre-commit	Pre-commit secret scanners, entropy-based detection, git history auditing tools	Flag patterns that resemble credentials in code, config, and documentation	Secrets management platform; inject at runtime, never store in repo
Authentication and authorization gaps	Design	Security-focused integration tests, RBAC policy validators, access matrix verification	▲ Review code paths for missing authorization checks and privilege escalation patterns	Centralized auth framework; deny-by-default access policies; automated access matrix tests
Injection vulnerabilities	Pre-commit	SAST tools, taint analysis, parameterized query enforcement	▲ Identify subtle injection vectors that pattern-matching rules miss, including second-order injection	Input validation at boundaries; parameterized queries as default; content security policies
Regulatory requirement gaps	Requirements	Compliance-as-code policy engines, automated control mapping	▲ Map regulatory requirements to implementation artifacts and flag uncovered controls	Compliance requirements as acceptance criteria; automated evidence collection
Missing audit trails	Design	Structured logging verification, audit event coverage scoring	Review code for state-changing operations that lack audit logging	Audit logging as a framework default; every state change emits a structured event
License compliance violations	CI	License scanning tools, SBOM generation and policy evaluation	Review license compatibility across the full dependency graph	Approved license allowlist enforced in CI; SBOM generated on every build

Defect Sources - full catalog overview and how to use it
Testing - testing types and best practices
Anti-Patterns - patterns that undermine delivery performance

2.10 - Performance & Resilience Defects

Performance defects degrade gradually, often hiding behind averages until a threshold tips and the system fails under real load. Detection requires baselines, budgets, and automated enforcement - not periodic manual testing.

Performance defects are rarely binary. They degrade gradually, often hiding behind averages until a threshold tips and the system fails under real load. Detection requires baselines, budgets, and automated enforcement - not periodic manual testing.

Issue	Earliest Detection (Automation)	Automated Detection	Earlier Detection with AI	Systemic Prevention
Performance regressions	CI	Automated benchmark suites, performance budget enforcement in CI	▲ Identify code changes likely to degrade performance from structural analysis before benchmarks run	Performance budgets enforced in CI; benchmark suite runs on every commit
Resource leaks	CI	Memory and connection pool profilers, leak detection in automated test runs	Flag allocation patterns without corresponding cleanup in code review	Resource management via language-level constructs (try-with-resources, RAII, using); pool size alerts
Unknown capacity limits	Acceptance Tests	Load testing frameworks, capacity threshold monitoring, saturation alerts	Predict capacity bottlenecks from architecture and traffic patterns	Regular automated load tests; capacity model updated with every architecture change
Missing timeout and deadline enforcement	Pre-commit	Static analysis for unbounded calls, integration test timeout verification	▲ Identify call chains with missing or inconsistent timeout propagation	Default timeouts on all external calls; deadline propagation across service boundaries
Slow user-facing response times	CI	Real user monitoring, synthetic transaction baselines, web vitals tracking	Correlate frontend and backend telemetry to pinpoint latency sources	Response time SLOs per user-facing path; performance budgets for page weight and API latency
Missing graceful degradation	Design	Chaos engineering platforms, failure injection, circuit breaker verification	▲ Review architectures for single points of failure and missing fallback paths	Design for partial failure; circuit breakers and fallbacks as defaults; game day exercises

Defect Sources - full catalog overview and how to use it
Testing - testing types and best practices
Visibility Symptoms - symptoms caused by missing observability

3 - CD Practices

Concise definitions of the core continuous delivery practices from MinimumCD.

These pages define the minimum practices required for continuous delivery. Each page covers what the practice is, why it matters, and what the minimum criteria are. For migration guidance and tactical how-to content, follow the links to the corresponding phase pages.

Core Practices

Continuous Integration - Integrate work to trunk at least daily with automated testing
Trunk-Based Development - All changes integrate into a single shared trunk
Single Path to Production - One automated pipeline for all changes to reach any environment
Deterministic Pipeline - Same inputs always produce the same outputs
Definition of Deployable - Automated criteria that determine production readiness
Immutable Artifacts - Build once, deploy everywhere without modification
Production-Like Environments - Test in environments that mirror production
Rollback - Fast, automated recovery from any deployment
Application Configuration - Separate what varies between environments from what does not

3.1 - Continuous Integration

Integrate work to trunk at least daily with automated testing to maintain a releasable codebase.

Definition

Continuous Integration (CI) is the activity of each developer integrating work to the trunk of version control at least daily and verifying that the work is, to the best of our knowledge, releasable.

CI is not just about tooling - it is fundamentally about team workflow and working agreements.

Minimum Activities Required

Trunk-based development - all work integrates to trunk
Work integrates to trunk at a minimum daily (each developer, every day)
Work has automated testing before merge to trunk
Work is tested with other work automatically on merge
All feature work stops when the build is red
New work does not break delivered work

Why This Matters

Without CI, Teams Experience

Integration hell: Weeks or months of painful merge conflicts
Late defect detection: Bugs found after they are expensive to fix
Reduced collaboration: Developers work in isolation, losing context
Deployment fear: Large batches of untested changes create risk
Slower delivery: Time wasted on merge conflicts and rework
Quality erosion: Without rapid feedback, technical debt accumulates

With CI, Teams Achieve

Rapid feedback: Know within minutes if changes broke something
Smaller changes: Daily integration forces better work breakdown
Better collaboration: Team shares ownership of the codebase
Lower risk: Small, tested changes are easier to diagnose and fix
Faster delivery: No integration delays blocking deployment
Higher quality: Continuous testing catches issues early

What Is Improved

Teamwork

CI requires strong teamwork to function correctly. Key improvements:

Pull workflow: Team picks next important work instead of working from assignments
Code review cadence: Quick reviews (< 4 hours) keep work flowing
Pair programming: Real-time collaboration eliminates review delays
Shared ownership: Everyone maintains the codebase together
Team goals over individual tasks: Focus shifts from “my work” to “our progress”

Work Breakdown

CI forces better work decomposition:

Definition of Ready: Every story has testable acceptance criteria before work starts
Small batches: If the team can complete work in < 2 days, it is refined enough
Vertical slicing: Each change delivers a thin, tested slice of functionality
Incremental delivery: Features built incrementally, each step integrated daily

Testing

CI requires a shift in testing approach:

From writing tests after code is “complete” to writing tests before/during coding (TDD/BDD)
From testing implementation details to testing behavior and outcomes
From manual testing before deployment to automated testing on every commit
From separate QA phase to quality built into development

Migration Guidance

For detailed guidance on adopting CI practices during your CD migration, see:

Trunk-Based Development - Phase 1 foundation
Testing Fundamentals - Phase 1 testing architecture
Working Agreements - Phase 1 team commitments

Additional Resources

Continuous Integration on Martin Fowler’s site
Accelerate - Nicole Forsgren, Jez Humble, Gene Kim
The Practical Test Pyramid - Martin Fowler
Branch By Abstraction
Feature Toggles - Martin Fowler

3.2 - Trunk-Based Development

All changes integrate into a single shared trunk with no intermediate branches.

“Trunk-based development has been shown to be a predictor of high performance in software development and delivery. It is characterized by fewer than three active branches in a code repository; branches and forks having very short lifetimes (e.g., less than a day) before being merged; and application teams rarely or never having ‘code lock’ periods when no one can check in code or do pull requests due to merging conflicts, code freezes, or stabilization phases.”
Accelerate by Nicole Forsgren Ph.D., Jez Humble & Gene Kim

Definition

Trunk-based development (TBD) is a team workflow where changes are integrated into the trunk with no intermediate integration (develop, test, etc.) branch. The two common workflows are making changes directly to the trunk or using very short-lived branches that branch from the trunk and integrate back into the trunk.

Release branches are an intermediate step that some choose on their path to continuous delivery while improving their quality processes in the pipeline. True CD releases from the trunk.

Minimum Activities Required

All changes integrate into the trunk
If branches from the trunk are used:
- They originate from the trunk
- They re-integrate to the trunk
- They are short-lived and removed after the merge

What Is Improved

Smaller changes: TBD emphasizes small, frequent changes that are easier for the team to review and more resistant to impactful merge conflicts. Conflicts become rare and trivial.
We must test: TBD requires us to implement tests as part of the development process.
Better teamwork: We need to work more closely as a team. This has many positive impacts, not least we will be more focused on getting the team’s highest priority done.
Better work definition: Small changes require us to decompose the work into a level of detail that helps uncover things that lack clarity or do not make sense. This provides much earlier feedback on potential quality issues.
Replaces process with engineering: Instead of creating a process where we control the release of features with branches, we can control the release of features with engineering techniques called evolutionary coding methods. These techniques have additional benefits related to stability that cannot be found when replaced by process.
Reduces risk: Long-lived branches carry two common risks. First, the change will not integrate cleanly and the merge conflicts result in broken or lost features. Second, the branch will be abandoned, usually because of the first reason.

Migration Guidance

For detailed guidance on adopting TBD during your CD migration, see:

Trunk-Based Development - Phase 1 foundation with two migration paths
TBD Migration Guide - Detailed tactical guide for moving from GitFlow to TBD

Additional Resources

trunkbaseddevelopment.com - Comprehensive reference by Paul Hammant
Continuous Delivery - Jez Humble and David Farley
Feature Toggles - Martin Fowler

3.3 - Single Path to Production

All deployments flow through one automated pipeline - no exceptions.

Definition

The deployment pipeline is the single, standardized path for all changes to reach any environment - development, testing, staging, or production. No manual deployments, no side channels, no “quick fixes” bypassing the pipeline. If it is not deployed through the pipeline, it does not get deployed.

Key Principles

Single path: All deployments flow through the same pipeline
No exceptions: Even hotfixes and rollbacks go through the pipeline
Automated: Deployment is triggered automatically after pipeline validation
Auditable: Every deployment is tracked and traceable
Consistent: The same process deploys to all environments

What Is Improved

Reliability: Every deployment is validated the same way
Traceability: Clear audit trail from commit to production
Consistency: Environments stay in sync
Speed: Automated deployments are faster than manual
Safety: Quality gates are never bypassed
Confidence: Teams trust that production matches what was tested
Recovery: Rollbacks are as reliable as forward deployments

Migration Guidance

For detailed guidance on establishing a single path to production, see:

Single Path to Production - Phase 2 pipeline practice with anti-patterns, code examples, and getting started steps

Additional Resources

Continuous Delivery: The Deployment Pipeline
Accelerate - Nicole Forsgren, Jez Humble, Gene Kim
Site Reliability Engineering: Release Engineering

3.4 - Deterministic Pipeline

The same inputs to the pipeline always produce the same outputs.

Definition

A deterministic pipeline produces consistent, repeatable results. Given the same inputs (code, configuration, dependencies), the pipeline will always produce the same outputs and reach the same pass/fail verdict. The pipeline’s decision on whether a change is releasable is definitive - if it passes, deploy it; if it fails, fix it.

Key Principles

Repeatable: Running the pipeline twice with identical inputs produces identical results
Authoritative: The pipeline is the final arbiter of quality, not humans
Immutable: No manual changes to artifacts or environments between pipeline stages
Trustworthy: Teams trust the pipeline’s verdict without second-guessing

What Makes a Pipeline Deterministic

Version control everything: Source code, IaC, pipeline definitions, test data, dependency lockfiles, tool versions
Lock dependency versions: Always use lockfiles. Never rely on latest or version ranges.
Eliminate environmental variance: Containerize builds, pin image tags, install exact tool versions
Remove human intervention: No manual approvals in the critical path, no manual environment setup
Fix flaky tests immediately: Quarantine, fix, or delete. Never allow a “just re-run it” culture.

What Is Improved

Quality increases: Real issues are never dismissed as “flaky tests”
Speed increases: No time wasted on test reruns or manual verification
Trust increases: Teams rely on the pipeline instead of adding manual gates
Debugging improves: Failures are reproducible, making root cause analysis easier
Delivery improves: Faster, more reliable path from commit to production

Migration Guidance

For detailed guidance on building a deterministic pipeline, see:

Deterministic Pipeline - Phase 2 pipeline practice with anti-pattern/good-pattern examples and getting started steps

Additional Resources

3.5 - Definition of Deployable

Automated criteria that determine when a change is ready for production.

Definition

The “definition of deployable” is your organization’s agreed-upon set of non-negotiable quality criteria that every artifact must pass before it can be deployed to any environment. This definition should be automated, enforced by the pipeline, and treated as the authoritative verdict on whether a change is ready for deployment.

Key Principles

Pipeline is definitive: If the pipeline passes, the artifact is deployable - no exceptions
Automated validation: All criteria are checked automatically, not manually
Consistent across environments: The same standards apply whether deploying to test or production
Fails fast: The pipeline rejects artifacts that do not meet the standard immediately

What Should Be in Your Definition

Your definition of deployable should include automated checks for:

Security: SAST scans, dependency vulnerability scans, secret detection
Functionality: Unit tests, integration tests, end-to-end tests, regression tests
Compliance: Audit trails, policy as code, change documentation
Performance: Response time thresholds, load test baselines, resource utilization
Reliability: Health check validation, graceful degradation tests, rollback verification
Code quality: Linting, static analysis, complexity metrics

What Is Improved

Removes bottlenecks: No waiting for manual approval meetings
Increases quality: Automated checks catch more issues than manual reviews
Reduces cycle time: Deployable artifacts are identified in minutes, not days
Improves collaboration: Shared understanding of quality standards
Enables continuous delivery: Trust in the pipeline makes frequent deployments safe

Migration Guidance

For detailed guidance on defining what “deployable” means for your organization, see:

Deployable Definition - Phase 2 pipeline practice with progressive quality gates, context-specific definitions, and getting started steps

Additional Resources

Dave Farley: Real Example of a Deployment Pipeline in the Fintech Industry
Continuous Delivery: The Deployment Pipeline
Accelerate - Nicole Forsgren, Jez Humble, Gene Kim

3.6 - Immutable Artifacts

Build once, deploy everywhere. The artifact is never modified after creation.

Definition

Central to CD is that we are validating the artifact with the pipeline. It is built once and deployed to all environments. A common anti-pattern is building an artifact for each environment. The pipeline should generate immutable, versioned artifacts.

Immutable Pipeline: Failures should be addressed by changes in version control so that two executions with the same configuration always yield the same results. Never go to the failure point, make adjustments in the environment, and re-start from that point.
Immutable Artifacts: Some package management systems allow the creation of release candidate versions. For example, it is common to find -SNAPSHOT versions in Java. However, this means the artifact’s behavior can change without modifying the version. Version numbers are cheap. If we are to have an immutable pipeline, it must produce an immutable artifact. Never use or produce -SNAPSHOT versions.

Immutability provides the confidence to know that the results from the pipeline are real and repeatable.

What Is Improved

Everything must be version controlled: source code, environment configurations, application configurations, and even test data. This reduces variability and improves the quality process.
Confidence in testing: The artifact validated in pre-production is byte-for-byte identical to what runs in production.
Faster rollback: Previous artifacts are unchanged in the artifact repository, ready to be redeployed.
Audit trail: Every artifact is traceable to a specific commit and pipeline run.

Migration Guidance

For detailed guidance on implementing immutable artifacts, see:

Immutable Artifacts - Phase 2 pipeline practice with anti-patterns, good patterns, and getting started steps

Additional Resources

The Twelve-Factor App
Continuous Delivery - Jez Humble and David Farley

3.7 - Production-Like Environments

Test in environments that mirror production to catch environment-specific issues early.

Definition

It is crucial to leverage pre-production environments in your CD pipeline to run all of your tests (unit, integration, UAT, manual QA, E2E) early and often. Test environments increase interaction with new features and exposure to bugs - both of which are important prerequisites for reliable software.

Types of Pre-Production Environments

Most organizations employ both static and short-lived environments and utilize them for case-specific stages of the SDLC:

Staging environment: The last environment that teams run automated tests against prior to deployment, particularly for testing interaction between all new features after a merge. Its infrastructure reflects production as closely as possible.
Ephemeral environments: Full-stack, on-demand environments spun up on every code change. Each ephemeral environment is leveraged in your pipeline to run E2E, unit, and integration tests on every code change. These environments are defined in version control, created and destroyed automatically on demand. They are short-lived by definition but should closely resemble production. They replace long-lived “static” environments and the maintenance required to keep those stable.

What Is Improved

Infrastructure is kept consistent: Test environments deliver results that reflect real-world performance. Fewer unprecedented bugs reach production since using prod-like data and dependencies allows you to run your entire test suite earlier.
Test against latest changes: These environments rebuild upon code changes with no manual intervention.
Test before merge: Attaching an ephemeral environment to every PR enables E2E testing in your CI before code changes get deployed to staging.

Migration Guidance

For detailed guidance on implementing production-like environments, see:

Production-Like Environments - Phase 2 pipeline practice with environment parity, ephemeral environments, and getting started steps

Additional Resources

EphemeralEnvironments.io - Resource on ephemeral environment practices
Continuous Delivery - Jez Humble and David Farley

3.8 - Rollback

Fast, automated recovery from any deployment.

Definition

Rollback on-demand means the ability to quickly and safely revert to a previous working version of your application at any time, without requiring special approval, manual intervention, or complex procedures. It should be as simple and reliable as deploying forward.

Key Principles

Fast: Rollback completes in minutes, not hours. Target < 5 minutes.
Automated: No manual steps or special procedures. Single command or click.
Safe: Rollback is validated just like forward deployment.
Simple: Any team member can execute it without specialized knowledge.
Tested: Rollback mechanism is regularly tested, not just used in emergencies.

What Is Improved

Mean Time To Recovery (MTTR): Drops from hours to minutes
Deployment frequency: Increases due to reduced risk
Team confidence: Higher willingness to deploy
Customer satisfaction: Faster incident resolution
On-call burden: Reduced stress for on-call engineers

Migration Guidance

For detailed guidance on implementing rollback capability, see:

Rollback - Phase 2 pipeline practice with blue-green, canary, feature flag, and database-safe rollback patterns

Additional Resources

3.9 - Application Configuration

Separate what varies between environments from what does not.

Definition

Application configuration defines the internal behavior of your application and is bundled with the artifact. It does not vary between environments. This is distinct from environment configuration (secrets, URLs, credentials) which varies by deployment.

We embrace The Twelve-Factor App config definitions:

Application Configuration: Internal to the app, does NOT vary by environment (feature flags, business rules, UI themes, default settings)
Environment Configuration: Varies by deployment (database URLs, API keys, service endpoints, credentials)

Key Principles

Application configuration should be:

Version controlled with the source code
Deployed as part of the immutable artifact
Testable in the CI pipeline
Unchangeable after the artifact is built

What Is Improved

Immutability: The artifact tested in staging is identical to what runs in production
Traceability: You can trace any behavior back to a specific commit
Testability: Application behavior can be validated in the pipeline before deployment
Reliability: No configuration drift between environments caused by manual changes
Faster rollback: Rolling back an artifact rolls back all application configuration changes

Migration Guidance

For detailed guidance on managing application configuration, see:

Application Configuration - Phase 2 pipeline practice with static vs dynamic feature flag patterns and getting started steps

Additional Resources

The Twelve-Factor App: Config
Continuous Delivery: Configuration Management
Feature Toggles - Martin Fowler

4 - Metrics

Detailed definitions for key delivery metrics. Understand what to measure and why.

These metrics help you assess your current delivery performance and track improvement over time. Start with the metrics most relevant to your current phase.

Key Metrics

Metric	What It Measures
Integration Frequency	How often code is integrated to trunk
Build Duration	Time from commit to artifact creation
Development Cycle Time	Time from starting work to delivery
Lead Time	Time from request to delivery
Change Fail Rate	Percentage of changes requiring remediation
Mean Time to Repair	Time to restore service after failure
Release Frequency	How often releases reach production
Work in Progress	Amount of started but unfinished work

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

4.1 - Integration Frequency

How often developers integrate code changes to the trunk. A leading indicator of CI maturity and small batch delivery.

Definition

Integration Frequency measures the average number of production-ready pull requests a team merges to trunk per day, normalized by team size. On a team of five developers, healthy continuous integration practice produces at least five integrations per day, roughly one per developer.

This metric is a direct indicator of how well a team practices Continuous Integration. Teams that integrate frequently work in small batches, receive fast feedback, and reduce the risk associated with large, infrequent merges.

Integration Frequency formula

integrationFrequency = mergedPullRequests / day / numberOfDevelopers

A value of 1.0 or higher per developer per day indicates that work is being decomposed into small, independently deliverable increments.

How to Measure

Count trunk merges. Track the number of pull requests (or direct commits) merged to main or trunk each day.
Normalize by team size. Divide the daily count by the number of developers actively contributing that day.
Calculate the rolling average. Use a 5-day or 10-day rolling window to smooth daily variation and surface meaningful trends.

Most source control platforms expose this data through their APIs:

GitHub: list merged pull requests via the REST or GraphQL API.
GitLab: query merged merge requests per project.
Bitbucket: use the pull request activity endpoint.

Alternatively, count commits to the default branch if pull requests are not used.

Targets

Level	Integration Frequency (per developer per day)
Low	Less than 1 per week
Medium	A few times per week
High	Once per day
Elite	Multiple times per day

The elite target aligns with trunk-based development, where developers push small changes to the trunk multiple times daily and rely on automated testing and feature flags to manage risk.

Common Pitfalls

Meaningless commits. Teams may inflate the count by integrating trivial or empty changes. Pair this metric with code review quality and defect rate.
Breaking the trunk. Pushing faster without adequate test coverage leads to a red build and slows the entire team. Always pair Integration Frequency with build success rate and Change Fail Rate.
Counting the wrong thing. Merges to long-lived feature branches do not count. Only merges to the trunk or main integration branch reflect true CI practice.
Ignoring quality. If defect rates rise as integration frequency increases, the team is skipping quality steps. Use defect rate as a guardrail metric.

Connection to CD

Integration Frequency is the foundational metric for Continuous Delivery. Without frequent integration, every downstream metric suffers:

Smaller batches reduce risk. Each integration carries less change, making failures easier to diagnose and fix.
Faster feedback loops. Frequent integration means the CI pipeline runs more often, catching issues within minutes instead of days.
Enables trunk-based development. High integration frequency is incompatible with long-lived branches. Teams naturally move toward short-lived branches or direct trunk commits.
Reduces merge conflicts. The longer code stays on a branch, the more likely it diverges from trunk. Frequent integration keeps the delta small.
Prerequisite for deployment frequency. You cannot deploy more often than you integrate. Improving this metric directly unblocks improvements to Release Frequency.

To improve Integration Frequency:

Decompose stories into smaller increments using Behavior-Driven Development.
Use Test-Driven Development to produce modular, independently testable code.
Adopt feature flags or branch by abstraction to decouple integration from release.
Practice Trunk-Based Development with short-lived branches lasting less than one day.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

4.2 - Build Duration

Time from code commit to a deployable artifact. A critical constraint on feedback speed and mean time to repair.

Definition

Build Duration measures the elapsed time from when a developer pushes a commit until the CI pipeline produces a deployable artifact and all automated quality gates have passed. This includes compilation, unit tests, integration tests, static analysis, security scans, and artifact packaging.

Build Duration represents the minimum possible time between deciding to make a change and having that change ready for production. It sets a hard floor on Lead Time and directly constrains how quickly a team can respond to production incidents.

Build Duration formula

buildDuration = artifactReadyTimestamp - commitPushTimestamp

This metric is sometimes referred to as “pipeline cycle time” or “CI cycle time.” The book Accelerate references it as part of “hard lead time.”

How to Measure

Record the commit timestamp. Capture when the commit arrives at the CI server (webhook receipt or pipeline trigger time).
Record the artifact-ready timestamp. Capture when the final pipeline stage completes successfully and the deployable artifact is published.
Calculate the difference. Subtract the commit timestamp from the artifact-ready timestamp.
Track the median and p95. The median shows typical performance. The 95th percentile reveals worst-case builds that block developers.

Most CI platforms expose build duration natively:

GitHub Actions: createdAt and updatedAt on workflow runs.
GitLab CI: pipeline created_at and finished_at.
Jenkins: build start time and duration fields.
CircleCI: workflow duration in the Insights dashboard.

Set up alerts when builds exceed your target threshold so the team can investigate regressions immediately.

Targets

Level	Build Duration
Low	More than 30 minutes
Medium	10 to 30 minutes
High	5 to 10 minutes
Elite	Less than 5 minutes

The ten-minute threshold is a widely recognized guideline. Builds longer than ten minutes break developer flow, discourage frequent integration, and increase the cost of fixing failures.

Common Pitfalls

Removing tests to hit targets. Reducing test count or skipping test types (integration, security) lowers build duration but degrades quality. Always pair this metric with Change Fail Rate and defect rate.
Ignoring queue time. If builds wait in a queue before execution, the developer experiences the queue time as part of the feedback delay even though it is not technically “build” time. Measure wall-clock time from commit to result.
Optimizing the wrong stage. Profile the pipeline before optimizing. Often a single slow test suite or a sequential step that could run in parallel dominates the total duration.
Flaky tests. Tests that intermittently fail cause retries, effectively doubling or tripling build duration. Track flake rate alongside build duration.

Connection to CD

Build Duration is a critical bottleneck in the Continuous Delivery pipeline:

Constrains Mean Time to Repair. When production is down, the build pipeline is the minimum time to get a fix deployed. A 30-minute build means at least 30 minutes of downtime for any fix, no matter how small. Reducing build duration directly improves MTTR.
Enables frequent integration. Developers are unlikely to integrate multiple times per day if each integration takes 30 minutes to validate. Short builds encourage higher Integration Frequency.
Shortens feedback loops. The sooner a developer learns that a change broke something, the less context they have lost and the cheaper the fix. Builds under ten minutes keep developers in flow.
Supports continuous deployment. Automated deployment pipelines cannot deliver changes rapidly if the build stage is slow. Build duration is often the largest component of Lead Time.

To improve Build Duration:

Parallelize stages. Run unit tests, linting, and security scans concurrently rather than sequentially.
Replace slow end-to-end tests. Move heavyweight end-to-end tests to an asynchronous post-deploy verification stage. Use contract tests and service virtualization in the main pipeline.
Decompose large services. Smaller codebases compile and test faster. If build duration is stubbornly high, consider breaking the service into smaller domains.
Cache aggressively. Cache dependencies, Docker layers, and compilation artifacts between builds.
Set a build time budget. Alert the team whenever a new test or step pushes the build past your target, so test efficiency is continuously maintained.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

4.3 - Development Cycle Time

Average time from when work starts until it is running in production. A key flow metric for identifying delivery bottlenecks.

Definition

Development Cycle Time measures the elapsed time from when a developer begins work on a story or task until that work is deployed to production and available to users. It captures the full construction phase of delivery: coding, code review, testing, integration, and deployment.

Development Cycle Time formula

developmentCycleTime = productionDeployTimestamp - workStartedTimestamp

This is distinct from Lead Time, which includes the time a request spends waiting in the backlog before work begins. Development Cycle Time focuses exclusively on the active delivery phase.

The Accelerate research uses “lead time for changes” (measured from commit to production) as a key DORA metric. Development Cycle Time extends this slightly further back to when work starts, capturing the full development process including any time between starting work and the first commit.

How to Measure

Record when work starts. Capture the timestamp when a story moves to “In Progress” in your issue tracker, or when the first commit for the story appears.
Record when work reaches production. Capture the timestamp of the production deployment that includes the completed story.
Calculate the difference. Subtract the start time from the production deploy time.
Report the median and distribution. The median provides a typical value. The distribution (or a control chart) reveals variability and outliers that indicate process problems.

Sources for this data include:

Issue trackers (Jira, GitHub Issues, Azure Boards): status transition timestamps.
Source control: first commit timestamp associated with a story.
Deployment logs: timestamp of production deployments linked to stories.

Linking stories to deployments is essential. Use commit message conventions (e.g., story IDs in commit messages) or deployment metadata to create this connection.

Targets

Level	Development Cycle Time
Low	More than 2 weeks
Medium	1 to 2 weeks
High	2 to 7 days
Elite	Less than 2 days

Elite teams deliver completed work to production within one to two days of starting it. This is achievable only when work is decomposed into small increments, the pipeline is fast, and deployment is automated.

Common Pitfalls

Marking work “Done” before it reaches production. If “Done” means “code complete” rather than “deployed,” the metric understates actual cycle time. The Definition of Done must include production deployment.
Skipping the backlog. Moving items from “Backlog” directly to “Done” after deploying hides the true wait time and development duration. Ensure stories pass through the standard workflow stages.
Splitting work into functional tasks. Breaking a story into separate “development,” “testing,” and “deployment” tasks obscures the end-to-end cycle time. Measure at the story or feature level.
Ignoring variability. A low average can hide a bimodal distribution where some stories take hours and others take weeks. Use a control chart or histogram to expose the full picture.
Optimizing for speed without quality. If cycle time drops but Change Fail Rate rises, the team is cutting corners. Use quality metrics as guardrails.

Connection to CD

Development Cycle Time is the most comprehensive measure of delivery flow and sits at the heart of Continuous Delivery:

Exposes bottlenecks. A long cycle time reveals where work gets stuck: waiting for code review, queued for testing, blocked by a manual approval, or delayed by a slow pipeline. Each bottleneck is a target for improvement.
Drives smaller batches. The only way to achieve a cycle time under two days is to decompose work into very small increments. This naturally leads to smaller changes, less risk, and faster feedback.
Reduces waste from changing priorities. Long cycle times mean work in progress is exposed to priority changes, context switches, and scope creep. Shorter cycles reduce the window of vulnerability.
Improves feedback quality. The sooner a change reaches production, the sooner the team gets real user feedback. Short cycle times enable rapid learning and course correction.
Subsumes other metrics. Cycle time is affected by Integration Frequency, Build Duration, and Work in Progress. Improving any of these upstream metrics will reduce cycle time.

To improve Development Cycle Time:

Decompose work into stories that can be completed and deployed within one to two days.
Remove handoffs between teams (e.g., separate dev and QA teams).
Automate the build and deploy pipeline to eliminate manual steps.
Improve test design so the pipeline runs faster without sacrificing coverage.
Limit Work in Progress so the team focuses on finishing work rather than starting new items.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

4.4 - Lead Time

Total time from when a change is committed until it is running in production. A DORA key metric for delivery throughput.

Definition

Lead Time measures the total elapsed time from when a code change is committed to the version control system until that change is successfully running in production. This is one of the four key metrics identified by the DORA (DevOps Research and Assessment) team as a predictor of software delivery performance.

Lead Time formula

leadTime = productionDeployTimestamp - commitTimestamp

In the broader value stream, “lead time” can also refer to the time from a customer request to delivery. The DORA definition focuses specifically on the segment from commit to production, which the Accelerate research calls “lead time for changes.” This narrower definition captures the efficiency of your delivery pipeline and deployment process.

Lead Time includes Build Duration plus any additional time for deployment, approval gates, environment provisioning, and post-deploy verification. It is a superset of build time and a subset of Development Cycle Time, which also includes the coding phase before the first commit.

How to Measure

Record the commit timestamp. Use the timestamp of the commit as recorded in source control (not the local author timestamp, but the time it was pushed or merged to the trunk).
Record the production deployment timestamp. Capture when the deployment containing that commit completes successfully in production.
Calculate the difference. Subtract the commit time from the deploy time.
Aggregate across commits. Report the median lead time across all commits deployed in a given period (daily, weekly, or per release).

Data sources:

Source control: commit or merge timestamps from Git, GitHub, GitLab, etc.
Pipeline platform: pipeline completion times from Jenkins, GitHub Actions, GitLab CI, etc.
Deployment tooling: production deployment timestamps from Argo CD, Spinnaker, Flux, or custom scripts.

For teams practicing continuous deployment, lead time may be nearly identical to build duration. For teams with manual approval gates or scheduled release windows, lead time will be significantly longer.

Targets

Level	Lead Time for Changes
Low	More than 6 months
Medium	1 to 6 months
High	1 day to 1 week
Elite	Less than 1 hour

These levels are drawn from the DORA State of DevOps research. Elite performers deliver changes to production in under an hour from commit, enabled by fully automated pipelines and continuous deployment.

Common Pitfalls

Measuring only build time. Lead time includes everything after the commit, not just the CI pipeline. Manual approval gates, scheduled deployment windows, and environment provisioning delays must all be included.
Ignoring waiting time. A change may sit in a queue waiting for a release train, a change advisory board (CAB) review, or a deployment window. This wait time is part of lead time and often dominates the total.
Tracking requests instead of commits. Some teams measure from customer request to delivery. While valuable, this conflates backlog prioritization with delivery efficiency. Keep this metric focused on the commit-to-production segment.
Hiding items from the backlog. Requests tracked in spreadsheets or side channels before entering the backlog distort lead time measurements. Ensure all work enters the system of record promptly.
Reducing quality to reduce lead time. Shortening approval processes or skipping test stages reduces lead time at the cost of quality. Pair this metric with Change Fail Rate as a guardrail.

Connection to CD

Lead Time is one of the four DORA metrics and a direct measure of your delivery pipeline’s end-to-end efficiency:

Reveals pipeline bottlenecks. A large gap between build duration and lead time points to manual processes, approval queues, or deployment delays that the team can target for automation.
Measures the cost of failure recovery. When production breaks, lead time is the minimum time to deliver a fix (unless you roll back). This makes lead time a direct input to Mean Time to Repair.
Drives automation. The primary way to reduce lead time is to automate every step between commit and production: build, test, security scanning, environment provisioning, deployment, and verification.
Reflects deployment strategy. Teams using continuous deployment have lead times measured in minutes. Teams using weekly release trains have lead times measured in days. The metric makes the cost of batching visible.
Connects speed and stability. The DORA research shows that elite performers achieve both low lead time and low Change Fail Rate. Speed and quality are not trade-offs. They reinforce each other when the delivery system is well-designed.

To improve Lead Time:

Automate the deployment pipeline end to end, eliminating manual gates.
Replace change advisory board (CAB) reviews with automated policy checks and peer review.
Deploy on every successful build rather than batching changes into release trains.
Reduce Build Duration to shrink the largest component of lead time.
Monitor and eliminate environment provisioning delays.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

4.5 - Change Fail Rate

Percentage of production deployments that cause a failure or require remediation. A DORA key metric for delivery stability.

Definition

Change Fail Rate measures the percentage of deployments to production that result in degraded service, negative customer impact, or require immediate remediation such as a rollback, hotfix, or patch.

Change Fail Rate formula

changeFailRate = failedChangeCount / totalChangeCount * 100

A “failed change” includes any deployment that:

Is rolled back.
Requires a hotfix deployed within a short window (commonly 24 hours).
Triggers a production incident attributed to the change.
Requires manual intervention to restore service.

This is one of the four DORA key metrics. It measures the stability side of delivery performance, complementing the throughput metrics of Lead Time and Release Frequency.

How to Measure

Count total production deployments over a defined period (weekly, monthly).
Count deployments classified as failures using the criteria above.
Divide failures by total deployments and express as a percentage.

Data sources:

Deployment logs: total deployment count from your CD platform.
Incident management: incidents linked to specific deployments (PagerDuty, Opsgenie, ServiceNow).
Rollback records: deployments that were reverted, either manually or by automated rollback.
Hotfix tracking: deployments tagged as hotfixes or emergency changes.

Automate the classification where possible. For example, if a deployment is followed by another deployment of the same service within a defined window (e.g., one hour), flag the original as a potential failure for review.

Targets

Level	Change Fail Rate
Low	46 to 60%
Medium	16 to 45%
High	0 to 15%
Elite	0 to 5%

These levels are drawn from the DORA State of DevOps research. Elite performers maintain a change fail rate below 5%, meaning fewer than 1 in 20 deployments causes a problem.

Common Pitfalls

Not recording failures. Deploying fixes without logging the original failure understates the true rate. Ensure every incident and rollback is tracked.
Reclassifying defects. Creating review processes that reclassify production defects as “feature requests” or “known limitations” hides real failures.
Inflating deployment count. Re-deploying the same working version to increase the denominator artificially lowers the rate. Only count deployments that contain new changes.
Pursuing zero defects at the cost of speed. An obsessive focus on eliminating all failures can slow Release Frequency to a crawl. A small failure rate with fast recovery is preferable to near-zero failures with monthly deployments.
Ignoring near-misses. Changes that cause degraded performance but do not trigger a full incident are still failures. Define clear criteria for what constitutes a failed change and apply them consistently.

Connection to CD

Change Fail Rate is the primary quality signal in a Continuous Delivery pipeline:

Validates pipeline quality gates. A rising change fail rate indicates that the automated tests, security scans, and quality checks in the pipeline are not catching enough defects. Each failure is an opportunity to add or improve a quality gate.
Enables confidence in frequent releases. Teams will only deploy frequently if they trust the pipeline. A low change fail rate builds this trust and supports higher Release Frequency.
Smaller changes fail less. The DORA research consistently shows that smaller, more frequent deployments have lower failure rates than large, infrequent releases. Improving Integration Frequency naturally improves this metric.
Drives root cause analysis. Each failed change should trigger a blameless investigation: what automated check could have caught this? The answers feed directly into pipeline improvements.
Balances throughput metrics. Change Fail Rate is the essential guardrail for Lead Time and Release Frequency. If those metrics improve while change fail rate worsens, the team is trading quality for speed.

To improve Change Fail Rate:

Deploy smaller changes more frequently to reduce the blast radius of failures.
Identify the root cause of each failure and add automated checks to prevent recurrence.
Strengthen the test suite, particularly integration and contract tests that validate interactions between services.
Implement progressive delivery (canary releases, feature flags) to limit the impact of defective changes before they reach all users.
Conduct blameless post-incident reviews and feed learnings back into the delivery pipeline.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

4.6 - Mean Time to Repair

Average time from when a production incident is detected until service is restored. A DORA key metric for recovery capability.

Definition

Mean Time to Repair (MTTR) measures the average elapsed time between when a production incident is detected and when it is fully resolved and service is restored to normal operation.

Mean Time to Repair formula

mttr = sum(resolvedTimestamp - detectedTimestamp) / incidentCount

MTTR reflects an organization’s ability to recover from failure. It encompasses detection, diagnosis, fix development, build, deployment, and verification. A short MTTR depends on the entire delivery system working well: fast builds, automated deployments, good observability, and practiced incident response.

The Accelerate research identifies MTTR as one of the four key DORA metrics and notes that “software delivery performance is a combination of lead time, release frequency, and MTTR.” It is the stability counterpart to the throughput metrics.

How to Measure

Record the detection timestamp. This is when the team first becomes aware of the incident, typically when an alert fires, a customer reports an issue, or monitoring detects an anomaly.
Record the resolution timestamp. This is when the incident is resolved and service is confirmed to be operating normally. Resolution means the customer impact has ended, not merely that a fix has been deployed.
Calculate the duration for each incident.
Compute the average across all incidents in a given period.

Data sources:

Incident management platforms: PagerDuty, Opsgenie, ServiceNow, or Statuspage provide incident lifecycle timestamps.
Monitoring and alerting: alert trigger times from Datadog, Prometheus Alertmanager, CloudWatch, or equivalent.
Deployment logs: timestamps of rollbacks or hotfix deployments.

Report both the mean and the median. The mean can be skewed by a single long outage, so the median gives a better sense of typical recovery time. Also track the maximum MTTR per period to highlight worst-case incidents.

Targets

Level	Mean Time to Repair
Low	More than 1 week
Medium	1 day to 1 week
High	Less than 1 day
Elite	Less than 1 hour

Elite performers restore service in under one hour. This requires automated rollback or roll-forward capability, fast build pipelines, and well-practiced incident response processes.

Common Pitfalls

Closing incidents prematurely. Marking an incident as resolved before the customer impact has actually ended artificially deflates MTTR. Define “resolved” clearly and verify that service is truly restored.
Not counting detection time. If the team discovers a problem informally (e.g., a developer notices something odd) and fixes it before opening an incident, the time is not captured. Encourage consistent incident reporting.
Ignoring recurring incidents. If the same issue keeps reappearing, each individual MTTR may be short, but the cumulative impact is high. Track recurrence as a separate quality signal.
Conflating MTTR with MTTD. Mean Time to Detect (MTTD) and Mean Time to Repair overlap but are distinct. If you only measure from alert to resolution, you miss the detection gap, the time between when the problem starts and when it is detected. Both matter.
Optimizing MTTR without addressing root causes. Getting faster at fixing recurring problems is good, but preventing those problems in the first place is better. Pair MTTR with Change Fail Rate to ensure the number of incidents is also decreasing.

Connection to CD

MTTR is a direct measure of how well the entire Continuous Delivery system supports recovery:

Pipeline speed is the floor. The minimum possible MTTR for a roll-forward fix is the Build Duration plus deployment time. A 30-minute build means you cannot restore service via a code fix in less than 30 minutes. Reducing build duration directly reduces MTTR.
Automated deployment enables fast recovery. Teams that can deploy with one click or automatically can roll back or roll forward in minutes. Manual deployment processes add significant time to every incident.
Feature flags accelerate mitigation. If a failing change is behind a feature flag, the team can disable it in seconds without deploying new code. This can reduce MTTR from minutes to seconds for flag-protected changes.
Observability shortens detection and diagnosis. Good logging, metrics, and tracing help the team identify the cause of an incident quickly. Without observability, diagnosis dominates the repair timeline.
Practice improves performance. Teams that deploy frequently have more experience responding to issues. High Release Frequency correlates with lower MTTR because the team has well-rehearsed recovery procedures.
Trunk-based development simplifies rollback. When trunk is always deployable, the team can roll back to the previous commit. Long-lived branches and complex merge histories make rollback risky and slow.

To improve MTTR:

Keep the pipeline always deployable so a fix can be deployed at any time.
Reduce Build Duration to enable faster roll-forward.
Implement feature flags for large changes so they can be disabled without redeployment.
Invest in observability: structured logging, distributed tracing, and meaningful alerting.
Practice incident response regularly, including deploying rollbacks and hotfixes.
Conduct blameless post-incident reviews and feed learnings back into the pipeline and monitoring.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

4.7 - Release Frequency

How often changes are deployed to production. A DORA key metric for delivery throughput and team capability.

Definition

Release Frequency (also called Deployment Frequency) measures how often a team successfully deploys changes to production. It is expressed as deployments per day, per week, or per month, depending on the team’s current cadence.

Release Frequency formula

releaseFrequency = productionDeployments / timePeriod

This is one of the four DORA key metrics. It measures the throughput side of delivery performance, measuring how rapidly the team can get completed work into the hands of users. Higher release frequency enables faster feedback, smaller batch sizes, and reduced deployment risk.

Each deployment should deliver a meaningful change. Re-deploying the same artifact or deploying empty changes does not count.

How to Measure

Count production deployments. Record each successful deployment to the production environment over a defined period.
Exclude non-changes. Do not count re-deployments of unchanged artifacts, infrastructure-only changes (unless relevant), or deployments to non-production environments.
Calculate frequency. Divide the count by the time period. Express as deployments per day (for high performers) or per week/month (for teams earlier in their journey).

Data sources:

CD platforms: Argo CD, Spinnaker, Flux, Octopus Deploy, or similar tools track every deployment.
Pipeline logs: GitHub Actions, GitLab CI, Jenkins, and CircleCI record deployment job executions.
Cloud provider logs: AWS CodeDeploy, Azure DevOps, GCP Cloud Deploy, and Kubernetes audit logs.
Custom deployment scripts: Add a logging line that records the timestamp, service name, and version to a central log or metrics system.

Targets

Level	Release Frequency
Low	Less than once per 6 months
Medium	Once per month to once per 6 months
High	Once per week to once per month
Elite	Multiple times per day

These levels are drawn from the DORA State of DevOps research. Elite performers deploy on demand, multiple times per day, with each deployment containing a small set of changes.

Common Pitfalls

Counting empty deployments. Re-deploying the same artifact or building artifacts that contain no changes inflates the metric without delivering value. Count only deployments with meaningful changes.
Ignoring failed deployments. If you count deployments that are immediately rolled back, the frequency looks good but the quality is poor. Pair with Change Fail Rate to get the full picture.
Equating frequency with value. Deploying frequently is a means, not an end. Deploying 10 times a day delivers no value if the changes do not meet user needs. Release Frequency measures capability, not outcome.
Batch releasing to hit a target. Combining multiple changes into a single release to deploy “more often” defeats the purpose. The goal is small, individual changes flowing through the pipeline independently.
Focusing on speed without quality. If release frequency increases but Change Fail Rate also increases, the team is releasing faster than its quality processes can support. Slow down and improve the pipeline.

Connection to CD

Release Frequency is the ultimate output metric of a Continuous Delivery pipeline:

Validates the entire delivery system. High release frequency is only possible when the pipeline is fast, tests are reliable, deployment is automated, and the team has confidence in the process. It is the end-to-end proof that CD is working.
Reduces deployment risk. Each deployment carries less change when deployments are frequent. Less change means less risk, easier rollback, and simpler debugging when something goes wrong.
Enables rapid feedback. Frequent releases get features and fixes in front of users sooner. This shortens the feedback loop and allows the team to course-correct before investing heavily in the wrong direction.
Exercises recovery capability. Teams that deploy frequently practice the deployment process daily. When a production incident occurs, the deployment process is well-rehearsed and reliable, directly improving Mean Time to Repair.
Decouples deploy from release. At high frequency, teams separate the act of deploying code from the act of enabling features for users. Feature flags, progressive delivery, and dark launches become standard practice.

To improve Release Frequency:

Reduce Development Cycle Time by decomposing work into smaller increments.
Remove manual handoffs to other teams (e.g., ops, QA, change management).
Automate every step of the deployment process, from build through production verification.
Replace manual change approval boards with automated policy checks and peer review.
Convert hard dependencies on other teams or services into soft dependencies using feature flags and service virtualization.
Adopt Trunk-Based Development so that trunk is always in a deployable state.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

4.8 - Work in Progress

Number of work items started but not yet completed. A leading indicator of flow problems, context switching, and delivery delays.

Definition

Work in Progress (WIP) is the total count of work items that have been started but not yet completed and delivered to production. This includes all types of work: stories, defects, tasks, spikes, and any other items that a team member has begun but not finished.

Work in Progress formula

wip = countOf(items where status is between "started" and "done")

WIP is a leading indicator from Lean manufacturing. Unlike trailing metrics such as Development Cycle Time or Lead Time, WIP tells you about problems that are happening right now. High WIP predicts future delivery delays, increased cycle time, and lower quality.

Little’s Law provides the mathematical relationship:

Little’s Law: cycle time as a function of WIP

cycleTime = wip / throughput

If throughput (the rate at which items are completed) stays constant, increasing WIP directly increases cycle time. The only way to reduce cycle time without working faster is to reduce WIP.

How to Measure

Count all in-progress items. At a regular cadence (daily or at each standup), count the number of items in any active state on your team’s board. Include everything between “To Do” and “Done.”
Normalize by team size. Divide WIP by the number of team members to get a per-person ratio. This makes the metric comparable across teams of different sizes.
Track over time. Record the WIP count daily and observe trends. A rising WIP count is an early warning of delivery problems.

Data sources:

Kanban boards: Jira, Azure Boards, Trello, GitHub Projects, or physical boards. Count cards in any column between the backlog and done.
Issue trackers: Query for items with an “In Progress,” “In Review,” “In QA,” or equivalent active status.
Manual count: At standup, ask: “How many things are we actively working on right now?”

The simplest and most effective approach is to make WIP visible by keeping the team board up to date and counting active items daily.

Targets

Level	WIP per Team
Low	More than 2x team size
Medium	Between 1x and 2x team size
High	Equal to team size
Elite	Less than team size (ideally half)

The guiding principle is that WIP should never exceed team size. A team of five should have at most five items in progress at any time. Elite teams often work in pairs, bringing WIP to roughly half the team size.

Common Pitfalls

Hiding work. Not moving items to “In Progress” when working on them keeps WIP artificially low. The board must reflect reality. If someone is working on it, it should be visible.
Marking items done prematurely. Moving items to “Done” before they are deployed to production understates WIP. The Definition of Done must include production deployment.
Creating micro-tasks. Splitting a single story into many small tasks (development, testing, code review, deployment) and tracking each separately inflates the item count without changing the actual work. Measure WIP at the story or feature level.
Ignoring unplanned work. Production support, urgent requests, and interruptions consume capacity but are often not tracked on the board. If the team is spending time on it, it is WIP and should be visible.
Setting WIP limits but not enforcing them. WIP limits only work if the team actually stops starting new work when the limit is reached. Treat WIP limits as a hard constraint, not a suggestion.

Connection to CD

WIP is the most actionable flow metric and directly impacts every aspect of Continuous Delivery:

Predicts cycle time. Per Little’s Law, WIP and cycle time are directly proportional. Reducing WIP is the fastest way to reduce Development Cycle Time without changing anything else about the delivery process.
Reduces context switching. When developers juggle multiple items, they lose time switching between contexts. Research consistently shows that each additional item in progress reduces effective productivity. Low WIP means more focus and faster completion.
Exposes blockers. When WIP limits are in place and an item gets blocked, the team cannot simply start something new. They must resolve the blocker first. This forces the team to address systemic problems rather than working around them.
Enables continuous flow. CD depends on a steady flow of small changes moving through the pipeline. High WIP creates irregular, bursty delivery. Low WIP creates smooth, predictable flow.
Improves quality. When teams focus on fewer items, each item gets more attention. Code reviews happen faster, testing is more thorough, and defects are caught sooner. This naturally reduces Change Fail Rate.
Supports trunk-based development. High WIP often correlates with many long-lived branches. Reducing WIP encourages developers to complete and integrate work before starting something new, which aligns with Integration Frequency goals.

To reduce WIP:

Set explicit WIP limits for the team and enforce them. Start with a limit equal to team size and reduce it over time.
Prioritize finishing work over starting new work. At standup, ask “What can I help finish?” before “What should I start?”
Prioritize code review and pairing to unblock teammates over picking up new items.
Make the board visible and accurate. Use it as the single source of truth for what the team is working on.
Identify and address recurring blockers that cause items to stall in progress.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

5 - Testing

Test architecture, types, and best practices for building confidence in your delivery pipeline.

A reliable test suite is essential for continuous delivery. This page describes the test architecture that gives your pipeline the confidence to deploy any change - even when dependencies outside your control are unavailable. The child pages cover each test type in detail.

Beyond the Test Pyramid

The test pyramid - many unit tests at the base, fewer integration tests in the middle, a handful of end-to-end tests at the top - has been the dominant mental model for test strategy since Mike Cohn introduced it. The core insight is sound: push testing as low as possible. Lower-level tests are faster, more deterministic, and cheaper to maintain. Higher-level tests are slower, more brittle, and more expensive.

But as a prescriptive model, the pyramid is overly simplistic. Teams that treat it as a rigid ratio end up in unproductive debates about whether they have “too many” integration tests or “not enough” unit tests. The shape of your test distribution matters far less than whether your tests, taken together, give you the confidence to deploy.

What actually matters

The pyramid’s principle - write tests with different granularity - remains correct. But for CD, the question is not “do we have the right pyramid shape?” The question is:

Can our pipeline determine that a change is safe to deploy without depending on any system we do not control?

This reframes the testing conversation. Instead of counting tests by type and trying to match a diagram, you design a test architecture where:

Fast, deterministic tests catch the vast majority of defects and run on every commit. These tests use test doubles for anything outside the team’s control. They give you a reliable go/no-go signal in minutes.
Contract tests verify that your test doubles still match reality. They run asynchronously and catch drift between your assumptions and the real world - without blocking your pipeline.
A small number of non-deterministic tests validate that the fully integrated system works. These run post-deployment and provide monitoring, not gating.

This structure means your pipeline can confidently say “yes, deploy this” even if a downstream API is having an outage, a third-party service is slow, or a partner team hasn’t deployed their latest changes yet. Your ability to deliver is decoupled from the reliability of systems you do not own.

The anti-pattern: the ice cream cone

Most teams that struggle with CD have an inverted test distribution - too many slow, expensive end-to-end tests and too few fast, focused tests.

The ice cream cone anti-pattern: an inverted test distribution where most testing effort goes to manual and end-to-end tests at the top, with too few fast unit tests at the bottom

The ice cream cone makes CD impossible. Manual testing gates block every release. End-to-end tests take hours, fail randomly, and depend on external systems being healthy. The pipeline cannot give a fast, reliable answer about deployability, so deployments become high-ceremony events.

Test Architecture

A test architecture is the deliberate structure of how different test types work together across your pipeline to give you deployment confidence. Each layer has a specific role, and the layers reinforce each other.

Layer	Test Type	Role	Deterministic?	Details
1	Unit Tests	Verify behavior in isolation - catch logic errors, regressions, and edge cases instantly	Yes	Fastest feedback loop; use test doubles for external dependencies
2	Integration Tests	Verify boundaries - catch mismatched interfaces, serialization errors, query bugs	Yes	Fast enough to run on every commit
3	Functional Tests	Verify your system works as a complete unit in isolation	Yes	Proves the system handles interactions correctly with all external dependencies stubbed
4	Contract Tests	Verify your test doubles still match reality	No	Runs asynchronously; failures trigger review, not pipeline blocks
5	End-to-End Tests	Verify complete user journeys through the fully integrated system	No	Monitoring, not gating - runs post-deployment

Static Analysis runs alongside layers 1-3, catching code quality, security, and style issues without executing the code. Test Doubles are used throughout layers 1-3 to isolate external dependencies.

How the layers work together

Test layers by pipeline stage

Pipeline stage    Test layer              Deterministic?   Blocks deploy?
─────────────────────────────────────────────────────────────────────────
On every commit   Unit tests              Yes              Yes
                  Integration tests       Yes              Yes
                  Functional tests        Yes              Yes

Asynchronous      Contract tests          No               No (triggers review)

Post-deployment   E2E smoke tests         No               Triggers rollback if critical
                  Synthetic monitoring    No               Triggers alerts

The critical insight: everything that blocks deployment is deterministic and under your control. Everything that involves external systems runs asynchronously or post-deployment. This is what gives you the independence to deploy any time, regardless of the state of the world around you.

Pre-merge vs post-merge

The table above maps to two distinct phases of your pipeline, each with different goals and constraints.

Pre-merge (before code lands on trunk): Run unit, integration, and functional tests. These must all be deterministic and fast. Target: under 10 minutes total. This is the quality gate that every change must pass. If pre-merge tests are slow, developers batch up changes or skip local runs, both of which undermine continuous integration.

Post-merge (after code lands on trunk, before or after deployment): Re-run the full deterministic suite against the integrated trunk to catch merge-order interactions. Run contract tests, E2E smoke tests, and synthetic monitoring. Target: under 30 minutes for the full post-merge cycle.

Why re-run pre-merge tests post-merge? Two changes can each pass pre-merge independently but conflict when combined on trunk. The post-merge run catches these integration effects. If a post-merge failure occurs, the team fixes it immediately - trunk must always be releasable.

Testing Matrix

Use this reference to decide what type of test to write and where it runs in your pipeline.

What You Need to Verify	Test Type	Speed	Deterministic?	Blocks Deploy?
A function or method behaves correctly	Unit	Milliseconds	Yes	Yes
Components interact correctly at a boundary	Integration	Milliseconds to seconds	Yes	Yes
Your whole service works in isolation	Functional	Seconds	Yes	Yes
Your test doubles match reality	Contract	Seconds	No	No
A critical user journey works end-to-end	E2E	Minutes	No	No
Code quality, security, and style compliance	Static Analysis	Seconds	Yes	Yes
UI meets WCAG accessibility standards	Static Analysis + Functional	Seconds	Yes	Yes

Best Practices

Do

Run tests on every commit. If tests do not run automatically, they will be skipped.
Keep the deterministic suite under 10 minutes. If it is slower, developers will stop running it locally.
Fix broken tests immediately. A broken test is equivalent to a broken build.
Delete tests that do not provide value. A test that never fails and tests trivial behavior is maintenance cost with no benefit.
Test behavior, not implementation. Use a black box approach - verify what the code does, not how it does it. As Ham Vocke advises: “if I enter values x and y, will the result be z?” - not the sequence of internal calls that produce z. Avoid white box testing that asserts on internals.
Use test doubles for external dependencies. Your deterministic tests should run without network access to external systems.
Validate test doubles with contract tests. Test doubles that drift from reality give false confidence.
Treat test code as production code. Give it the same care, review, and refactoring attention.
Run automated accessibility checks on every commit. WCAG compliance scans are fast, deterministic, and catch violations that are invisible to sighted developers. Treat them like security scans: automate the detectable rules and reserve manual review for subjective judgment.

Do Not

Do not tolerate flaky tests. Quarantine or delete them immediately.
Do not gate your pipeline on non-deterministic tests. E2E and contract test failures should trigger review or alerts, not block deployment.
Do not couple your deployment to external system availability. If a third-party API being down prevents you from deploying, your test architecture has a critical gap.
Do not write tests after the fact as a checkbox exercise. Tests written without understanding the behavior they verify add noise, not value.
Do not test private methods directly. Test the public interface; private methods are tested indirectly.
Do not share mutable state between tests. Each test should set up and tear down its own state.
Do not use sleep/wait for timing-dependent tests. Use explicit waits, polling, or event-driven assertions.
Do not require a running database or external service for unit tests. That makes them integration tests - which is fine, but categorize them correctly.

Test Types

Type	Purpose
Unit Tests	Verify individual components in isolation
Integration Tests	Verify components work together
Functional Tests	Verify user-facing behavior
End-to-End Tests	Verify complete user workflows
Contract Tests	Verify API contracts between services
Static Analysis	Catch issues without running code
Test Doubles	Patterns for isolating dependencies in tests
Feedback Speed	Why test suite speed matters and the cognitive science behind the targets

ACD - How acceptance criteria make testing the constraint that governs agent-generated code
Testing Fundamentals - Establishing testing practices as part of CD migration
High Coverage but Ineffective Tests - When tests pass but do not catch real defects

Content contributed by Dojo Consortium, licensed under CC BY 4.0. Additional concepts drawn from Ham Vocke, The Practical Test Pyramid, and Toby Clemson, Testing Strategies in a Microservice Architecture.

5.1 - Unit Tests

Fast, deterministic tests that verify a unit of behavior through its public interface, asserting on what the code does rather than how it works.

Definition

A unit test is a deterministic test that exercises a unit of behavior (a single meaningful action or decision your code makes) and verifies that the observable outcome is correct. The “unit” is not a function, method, or class. It is a behavior: given these inputs, the system produces this result. A single behavior may involve one function or several collaborating objects. What matters is that the test treats the code as a black box and asserts only on what it produces, not on how it produces it.

All external dependencies are replaced with test doubles so the test runs quickly and produces the same result every time.

White box testing (asserting on internal method calls, call order, or private state) creates change-detector tests that break during routine refactoring without catching real defects. Prefer testing through the public interface (methods, APIs, exported functions) and asserting on return values, state changes visible to consumers, or observable side effects.

The purpose of unit tests is to:

Verify that a unit of behavior produces the correct observable outcome.
Cover high-complexity logic where many input permutations exist, such as business rules, calculations, and state transitions.
Keep cyclomatic complexity visible and manageable through good separation of concerns.

When to Use

During development: run the relevant subset of unit tests continuously while writing code. TDD (Red-Green-Refactor) is the most effective workflow.
On every commit: use pre-commit hooks or watch-mode test runners so broken tests never reach the remote repository.
In CI: execute the full unit test suite on every pull request and on the trunk after merge to verify nothing was missed locally.

Unit tests are the right choice when the behavior under test can be exercised without network access, file system access, or database connections. If you need any of those, you likely need an integration test or a functional test instead.

Characteristics

Property	Value
Speed	Milliseconds per test
Determinism	Always deterministic
Scope	A single unit of behavior
Dependencies	All replaced with test doubles
Network	None
Database	None
Breaks build	Yes

Examples

A JavaScript unit test verifying a pure utility function:

JavaScript unit test for castArray utility

// castArray.test.js
describe("castArray", () => {
  it("should wrap non-array items in an array", () => {
    expect(castArray(1)).toEqual([1]);
    expect(castArray("a")).toEqual(["a"]);
    expect(castArray({ a: 1 })).toEqual([{ a: 1 }]);
  });

  it("should return array values by reference", () => {
    const array = [1];
    expect(castArray(array)).toBe(array);
  });

  it("should return an empty array when no arguments are given", () => {
    expect(castArray()).toEqual([]);
  });
});

A Java unit test using Mockito to isolate the system under test:

Java unit test with Mockito stub isolating the controller

@Test
public void shouldReturnUserDetails() {
    // Arrange
    User mockUser = new User("Ada", "Engineering");
    when(userService.getUserInfo("u123")).thenReturn(mockUser);

    // Act
    User result = userController.getUser("u123");

    // Assert
    assertEquals("Ada", result.getName());
    assertEquals("Engineering", result.getDepartment());
}

Anti-Patterns

White box testing: asserting on internal state, call order, or private method behavior rather than observable output. These change-detector tests break during refactoring without catching real defects. Test through the public interface instead.
Testing private methods: private implementations are meant to change. They are exercised indirectly through the behavior they support. Test the public interface instead.
No assertions: a test that runs code without asserting anything provides false confidence. Lint rules can catch this automatically.
Disabling or skipping tests: skipped tests erode confidence over time. Fix or remove them.
Confusing “unit” with “function”: a unit of behavior may span multiple collaborating objects. Forcing one-test-per-function creates brittle tests that mirror the implementation structure rather than verifying meaningful outcomes.
Ice cream cone testing: relying primarily on slow E2E tests while neglecting fast unit tests inverts the test pyramid and slows feedback.
Chasing coverage numbers: gaming coverage metrics (e.g., running code paths without meaningful assertions) creates a false sense of confidence. Focus on behavior coverage instead.

Connection to CD Pipeline

Unit tests occupy the base of the test pyramid. They run in the earliest stages of the CD pipeline and provide the fastest feedback loop:

Local development: watch mode reruns tests on every save.
Pre-commit: hooks run the suite before code reaches version control.
PR verification: CI runs the full suite and blocks merge on failure.
Trunk verification: CI reruns tests on the merged HEAD to catch integration issues.

Because unit tests are fast and deterministic, they should always break the build on failure. A healthy CD pipeline depends on a large, reliable suite of black box unit tests that verify behavior rather than implementation, giving developers the confidence to refactor freely and ship small changes frequently.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

5.2 - Integration Tests

Deterministic tests that verify how units interact together or with external system boundaries using test doubles for non-deterministic dependencies.

Definition

An integration test is a deterministic test that verifies how the unit under test interacts with other units without directly accessing external sub-systems. It may validate multiple units working together (sometimes called a “sociable unit test”) or the portion of the code that interfaces with an external network dependency while using a test double to represent that dependency.

For clarity: an “integration test” is not a test that broadly integrates multiple sub-systems. That is an end-to-end test.

When to Use

Integration tests provide the best balance of speed, confidence, and cost. Use them when:

You need to verify that multiple units collaborate correctly (for example, a service calling a repository that calls a data mapper).
You need to validate the interface layer to an external system (HTTP client, message producer, database query) while keeping the external system replaced by a test double.
You want to confirm that a refactoring did not break behavior. Integration tests that avoid testing implementation details survive refactors without modification.
You are building a front-end component that composes child components and needs to verify the assembled behavior from the user’s perspective.

If the test requires a live network call to a system outside localhost, it is either a contract test or an E2E test.

Characteristics

Property	Value
Speed	Milliseconds to low seconds
Determinism	Always deterministic
Scope	Multiple units or a unit plus its boundary
Dependencies	External systems replaced with test doubles
Network	Localhost only
Database	Localhost / in-memory only
Breaks build	Yes

Examples

A JavaScript integration test verifying that a connector returns structured data:

Integration test - connector returning structured data

describe("retrieving Hygieia data", () => {
  it("should return counts of merged pull requests per day", async () => {
    const result = await hygieiaConnector.getResultsByDay(
      hygieiaConfigs.integrationFrequencyRoute,
      testTeam,
      startDate,
      endDate
    );

    expect(result.status).toEqual(200);
    expect(result.data).toBeInstanceOf(Array);
    expect(result.data[0]).toHaveProperty("value");
    expect(result.data[0]).toHaveProperty("dateStr");
  });

  it("should return an empty array if the team does not exist", async () => {
    const result = await hygieiaConnector.getResultsByDay(
      hygieiaConfigs.integrationFrequencyRoute,
      0,
      startDate,
      endDate
    );
    expect(result.data).toEqual([]);
  });
});

Subcategories

Service integration tests validate how the system under test responds to information from an external service. Use virtual services or static mocks; pair with contract tests to keep the doubles current.

Database integration tests validate query logic against a controlled data store. Prefer in-memory databases, isolated DB instances, or personalized datasets over shared live data.

Front-end integration tests render the component tree and interact with it the way a user would. Follow the accessibility order of operations for element selection: visible text and labels first, ARIA roles second, test IDs only as a last resort.

Anti-Patterns

Peeking behind the curtain: using tools that expose component internals (e.g., Enzyme’s instance() or state()) instead of testing from the user’s perspective.
Mocking too aggressively: replacing every collaborator turns an integration test into a unit test and removes the value of testing real interactions. Only mock what is necessary to maintain determinism.
Testing implementation details: asserting on internal state, private methods, or call counts rather than observable output.
Introducing a test user: creating an artificial actor that would never exist in production. Write tests from the perspective of a real end-user or API consumer.
Tolerating flaky tests: non-deterministic integration tests erode trust. Fix or remove them immediately.
Duplicating E2E scope: if the test integrates multiple deployed sub-systems with live network calls, it belongs in the E2E category, not here.

Connection to CD Pipeline

Integration tests form the largest portion of a healthy test suite (the “trophy” or the middle of the pyramid). They run alongside unit tests in the earliest CI stages:

Local development: run in watch mode or before committing.
PR verification: CI executes the full suite; failures block merge.
Trunk verification: CI reruns on the merged HEAD.

Because they are deterministic and fast, integration tests should always break the build. A team whose refactors break many tests likely has too few integration tests and too many fine-grained unit tests. As Kent C. Dodds advises: “Write tests, not too many, mostly integration.”

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

5.3 - Functional Tests

Deterministic tests that verify all modules of a sub-system work together from the actor’s perspective, using test doubles for external dependencies.

Definition

A functional test is a deterministic test that verifies all modules of a sub-system are working together. It introduces an actor (typically a user interacting with the UI or a consumer calling an API) and validates the ingress and egress of that actor within the system boundary. External sub-systems are replaced with test doubles to keep the test deterministic.

Functional tests cover broad-spectrum behavior: UI interactions, presentation logic, and business logic flowing through the full sub-system. They differ from end-to-end tests in that side effects are mocked and never cross boundaries outside the system’s control.

Functional tests are sometimes called component tests. Martin Fowler calls them sociable unit tests to distinguish them from solitary unit tests that stub all collaborators: a sociable test allows real collaborators within the sub-system boundary while still replacing external dependencies with test doubles.

When to Use

You need to verify a complete user-facing feature from input to output within a single deployable unit (e.g., a service or a front-end application).
You want to test how the UI, business logic, and data layers interact without depending on live external services.
You need to simulate realistic user workflows (filling in forms, navigating pages, submitting API requests) while keeping the test fast and repeatable.
You are validating acceptance criteria for a user story and want a test that maps directly to the specified behavior.
You need to verify keyboard navigation, focus management, and screen reader announcements as part of feature verification. Accessibility behavior is user-facing behavior and belongs in functional tests.

If the test needs to reach a live external dependency, it is an E2E test. If it tests a single unit in isolation, it is a unit test.

Characteristics

Property	Value
Speed	Seconds (slower than unit, faster than E2E)
Determinism	Always deterministic
Scope	All modules within a single sub-system
Dependencies	External systems replaced with test doubles
Network	Localhost only
Database	Localhost / in-memory only
Breaks build	Yes
When to run	Pre-commit and CI

Examples

A functional test for a REST API using an in-process server and mocked downstream services:

REST API functional test - order creation with mocked inventory service

describe("POST /orders", () => {
  it("should create an order and return 201", async () => {
    // Arrange: mock the inventory service response
    nock("https://inventory.internal")
      .get("/stock/item-42")
      .reply(200, { available: true, quantity: 10 });

    // Act: send a request through the full application stack
    const response = await request(app)
      .post("/orders")
      .send({ itemId: "item-42", quantity: 2 });

    // Assert: verify the user-facing response
    expect(response.status).toBe(201);
    expect(response.body.orderId).toBeDefined();
    expect(response.body.status).toBe("confirmed");
  });

  it("should return 409 when inventory is insufficient", async () => {
    nock("https://inventory.internal")
      .get("/stock/item-42")
      .reply(200, { available: true, quantity: 0 });

    const response = await request(app)
      .post("/orders")
      .send({ itemId: "item-42", quantity: 2 });

    expect(response.status).toBe(409);
    expect(response.body.error).toMatch(/insufficient/i);
  });
});

A front-end functional test exercising a login flow with a mocked auth service:

Front-end functional test - login flow with mocked auth service

describe("Login page", () => {
  it("should redirect to the dashboard after successful login", async () => {
    mockAuthService.login.mockResolvedValue({ token: "abc123" });

    render(<App />);
    await userEvent.type(screen.getByLabelText("Email"), "ada@example.com");
    await userEvent.type(screen.getByLabelText("Password"), "s3cret");
    await userEvent.click(screen.getByRole("button", { name: "Sign in" }));

    expect(await screen.findByText("Dashboard")).toBeInTheDocument();
  });
});

Accessibility Verification

Functional tests already exercise the UI from the actor’s perspective, making them the natural place to verify that interactions work for all users. Accessibility assertions fit alongside existing functional assertions rather than in a separate test suite.

A functional test verifying keyboard-only interaction and running axe-core assertions against the rendered page:

Accessibility functional test - keyboard navigation and axe-core WCAG assertions

import { axe, toHaveNoViolations } from "jest-axe";

expect.extend(toHaveNoViolations);

describe("Checkout flow", () => {
  it("should be completable using only the keyboard", async () => {
    render(<CheckoutPage />);

    // Navigate to the first form field using Tab
    await userEvent.tab();
    expect(screen.getByLabelText("Card number")).toHaveFocus();

    // Fill in the form using keyboard only
    await userEvent.type(screen.getByLabelText("Card number"), "4111111111111111");
    await userEvent.tab();
    await userEvent.type(screen.getByLabelText("Expiry"), "12/27");
    await userEvent.tab();

    // Submit with Enter
    await userEvent.keyboard("{Enter}");
    expect(await screen.findByText("Order confirmed")).toBeInTheDocument();

    // Verify no accessibility violations in the final state
    const results = await axe(document.body);
    expect(results).toHaveNoViolations();
  });
});

Anti-Patterns

Using live external services: this makes the test non-deterministic and slow. Use test doubles for anything outside the sub-system boundary.
Testing through the database: sharing a live database between tests introduces ordering dependencies and flakiness. Use in-memory databases or mocked data layers.
Ignoring the actor’s perspective: functional tests should interact with the system the way a user or consumer would. Reaching into internal APIs or bypassing the UI defeats the purpose.
Duplicating unit test coverage: functional tests should focus on feature-level behavior and happy/critical paths, not every edge case. Leave permutation testing to unit tests.
Slow test setup: if spinning up the sub-system takes too long, invest in faster bootstrapping (in-memory stores, lazy initialization) rather than skipping functional tests.
Deferring accessibility testing to a manual audit phase: accessibility defects caught in a quarterly audit are weeks or months old. Automated WCAG checks in functional tests catch violations on every commit, just like any other regression.

Connection to CD Pipeline

Functional tests run after unit and integration tests in the pipeline, typically as part of the same CI stage:

Pre-commit: functional tests run locally before every commit. Because they are deterministic and scoped to the sub-system, they are fast enough to give immediate feedback without slowing the development loop.
PR verification: functional tests run in CI against the sub-system in isolation, giving confidence that the feature works before merge.
Trunk verification: the same tests run on the merged HEAD to catch conflicts.
Pre-deployment gate: functional tests can serve as the final deterministic gate before a build artifact is promoted to a staging environment.

Because functional tests are deterministic, they should break the build on failure. They are more expensive than unit and integration tests, so teams should focus on happy-path and critical-path scenarios while keeping the total count manageable.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

5.4 - End-to-End Tests

Non-deterministic tests that validate the entire software system along with its integration with external interfaces and production-like scenarios.

Definition

End-to-end (E2E) tests validate the entire software system, including its integration with external interfaces. They exercise complete production-like scenarios using real (or production-like) data and environments to simulate real-time settings. No test doubles are used. The test hits live services, databases, and third-party integrations just as a real user would.

Because they depend on external systems, E2E tests are typically non-deterministic: they can fail for reasons unrelated to code correctness, such as network instability or third-party outages.

When to Use

E2E tests should be the least-used test type due to their high cost in execution time and maintenance. Use them for:

Happy-path validation of critical business flows (e.g., user signup, checkout, payment processing).
Smoke testing a deployed environment to verify that key integrations are functioning.
Cross-team workflows that span multiple sub-systems and cannot be tested any other way.

Do not use E2E tests to cover edge cases, error handling, or input validation. Those scenarios belong in unit, integration, or functional tests.

Vertical vs. Horizontal E2E Tests

Vertical E2E tests target features under the control of a single team:

Favoriting an item and verifying it persists across refresh.
Creating a saved list and adding items to it.

Horizontal E2E tests span multiple teams:

Navigating from the homepage through search, item detail, cart, and checkout.

Horizontal tests are significantly more complex and fragile. Due to their large failure surface area, they are not suitable for blocking release pipelines.

Characteristics

Property	Value
Speed	Seconds to minutes per test
Determinism	Typically non-deterministic
Scope	Full system including external integrations
Dependencies	Real services, databases, third-party APIs
Network	Full network access
Database	Live databases
Breaks build	Generally no (see guidance below)

Examples

A vertical E2E test verifying user lookup through a live web interface:

Vertical E2E test - user lookup via live web interface

@Test
public void verifyValidUserLookup() throws Exception {
    // Act -- interact with the live application
    homePage.getUserData("validUserId");
    waitForElement(By.xpath("//span[@id='name']"));

    // Assert -- verify real data returned from the live backend
    assertEquals("Ada Lovelace", homePage.getName());
    assertEquals("Engineering", homePage.getOrgName());
    assertEquals("Grace Hopper", homePage.getManagerName());
}

A browser-based E2E test using a tool like Playwright:

Browser-based E2E test - add to cart and checkout with Playwright

test("user can add an item to cart and check out", async ({ page }) => {
  await page.goto("https://staging.example.com");
  await page.getByRole("link", { name: "Running Shoes" }).click();
  await page.getByRole("button", { name: "Add to Cart" }).click();

  await page.getByRole("link", { name: "Cart" }).click();
  await expect(page.getByText("Running Shoes")).toBeVisible();

  await page.getByRole("button", { name: "Checkout" }).click();
  await expect(page.getByText("Order confirmed")).toBeVisible();
});

Anti-Patterns

Using E2E tests as the primary safety net: this is the “ice cream cone” anti-pattern. E2E tests are slow and fragile; the majority of your confidence should come from unit and integration tests.
Blocking the pipeline with horizontal E2E tests: these tests span too many teams and failure surfaces. Run them asynchronously and review failures out of band.
Ignoring flaky failures: E2E tests often fail for environmental reasons. Track the frequency and root cause of failures. If a test is not providing signal, fix it or remove it.
Testing edge cases in E2E: exhaustive input validation and error-path testing should happen in cheaper, faster test types.
Not capturing failure context: E2E failures are expensive to debug. Capture screenshots, network logs, and video recordings automatically on failure.

Connection to CD Pipeline

E2E tests run in the later stages of the delivery pipeline, after the build artifact has passed all deterministic tests and has been deployed to a staging or pre-production environment:

Post-deployment smoke tests: a small, fast suite of vertical E2E tests verifies that the deployment succeeded and critical paths work.
Scheduled regression suites: broader E2E suites (including horizontal tests) run on a schedule rather than on every commit.
Production monitoring: customer experience alarms (synthetic monitoring) are a form of continuous E2E testing that runs in production.

Because E2E tests are non-deterministic, they should not break the build in most cases. A team may choose to gate on a small set of highly reliable vertical E2E tests, but must invest in reducing false positives to make this valuable. CD pipelines should be optimized for rapid recovery of production issues rather than attempting to prevent all defects with slow, fragile E2E gates.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

5.5 - Contract Tests

Non-deterministic tests that validate test doubles by verifying API contract format against live external systems.

Definition

A contract test validates that the test doubles used in integration tests still accurately represent the real external system. Contract tests run against the live external sub-system and exercise the portion of the code that interfaces with it. Because they depend on live services, contract tests are non-deterministic and should not break the build. Instead, failures should trigger a review to determine whether the contract has changed and the test doubles need updating.

A contract test validates contract format, not specific data. It verifies that response structures, field names, types, and status codes match expectations, not that particular values are returned.

Contract tests have two perspectives:

Provider: the team that owns the API verifies that all changes are backwards compatible (unless a new API version is introduced). Every build should validate the provider contract.
Consumer: the team that depends on the API verifies that they can still consume the properties they need, following Postel’s Law: “Be conservative in what you do, be liberal in what you accept from others.”

When to Use

You have integration tests that use test doubles (mocks, stubs, recorded responses) to represent external services, and you need assurance those doubles remain accurate.
You consume a third-party or cross-team API that may change without notice.
You provide an API to other teams and want to ensure that your changes do not break their expectations (consumer-driven contracts).
You are adopting contract-driven development, where contracts are defined during design so that provider and consumer teams can work in parallel using shared mocks and fakes.

Characteristics

Property	Value
Speed	Seconds (depends on network latency)
Determinism	Non-deterministic (hits live services)
Scope	Interface boundary between two systems
Dependencies	Live external sub-system
Network	Yes (calls the real dependency)
Database	Depends on the provider
Breaks build	No (failures trigger review, not build failure)

Examples

A provider contract test verifying that an API response matches the expected schema:

Provider contract test - schema validation

describe("GET /users/:id contract", () => {
  it("should return a response matching the user schema", async () => {
    const response = await fetch("https://api.partner.com/users/1");
    const body = await response.json();

    // Validate structure, not specific data
    expect(response.status).toBe(200);
    expect(body).toHaveProperty("id");
    expect(typeof body.id).toBe("number");
    expect(body).toHaveProperty("name");
    expect(typeof body.name).toBe("string");
    expect(body).toHaveProperty("email");
    expect(typeof body.email).toBe("string");
  });
});

A consumer-driven contract test using Pact:

Consumer-driven contract test with Pact

describe("Order Service - Inventory Provider Contract", () => {
  it("should receive stock availability in the expected format", async () => {
    // Define the expected interaction
    await provider.addInteraction({
      state: "item-42 is in stock",
      uponReceiving: "a request for item-42 stock",
      withRequest: { method: "GET", path: "/stock/item-42" },
      willRespondWith: {
        status: 200,
        body: {
          available: Matchers.boolean(true),
          quantity: Matchers.integer(10),
        },
      },
    });

    // Exercise the consumer code against the mock provider
    const result = await inventoryClient.checkStock("item-42");
    expect(result.available).toBe(true);
  });
});

Anti-Patterns

Using contract tests to validate business logic: contract tests verify structure and format, not behavior. Business logic belongs in functional tests.
Breaking the build on contract test failure: because these tests hit live systems, failures may be caused by network issues or temporary outages, not actual contract changes. Treat failures as signals to investigate.
Neglecting to update test doubles: when a contract test fails because the upstream API changed, the test doubles in your integration tests must be updated to match. Ignoring failures defeats the purpose.
Running contract tests too infrequently: the frequency should be proportional to the volatility of the interface. Highly active APIs need more frequent contract validation.
Testing specific data values: asserting that name equals "Alice" makes the test brittle. Assert on types, required fields, and response codes instead.

Connection to CD Pipeline

Contract tests run asynchronously from the main CI build, typically on a schedule:

Provider side: provider contract tests (schema validation, response code checks) are often implemented as deterministic unit tests and run on every commit as part of the provider’s CI pipeline.
Consumer side: consumer contract tests run on a schedule (e.g., hourly or daily) against the live provider. Failures are reviewed and may trigger updates to test doubles or conversations between teams.
Consumer-driven contracts: when using tools like Pact, the consumer publishes contract expectations and the provider runs them continuously. Both teams communicate when contracts break.

Contract tests are the bridge that keeps your fast, deterministic integration test suite honest. Without them, test doubles can silently drift from reality, and your integration tests provide false confidence.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

5.6 - Static Analysis

Code analysis tools that evaluate non-running code for security vulnerabilities, complexity, and best practice violations.

Definition

Static analysis (also called static testing) evaluates non-running code against rules for known good practices. Unlike other test types that execute code and observe behavior, static analysis inspects source code, configuration files, and dependency manifests to detect problems before the code ever runs.

Static analysis serves several key purposes:

Catches errors that would otherwise surface at runtime.
Warns of excessive complexity that degrades the ability to change code safely.
Identifies security vulnerabilities and coding patterns that provide attack vectors.
Enforces coding standards by removing subjective style debates from code reviews.
Alerts to dependency issues such as outdated packages, known CVEs, license incompatibilities, or supply-chain compromises.

When to Use

Static analysis should run continuously, at every stage where feedback is possible:

In the IDE: real-time feedback as developers type, via editor plugins and language server integrations.
On save: format-on-save and lint-on-save catch issues immediately.
Pre-commit: hooks prevent problematic code from entering version control.
In CI: the full suite of static checks runs on every PR and on the trunk after merge, verifying that earlier local checks were not bypassed.

Static analysis is always applicable. Every project, regardless of language or platform, benefits from linting, formatting, and dependency scanning.

Characteristics

Property	Value
Speed	Seconds (typically the fastest test category)
Determinism	Always deterministic
Scope	Entire codebase (source, config, dependencies)
Dependencies	None (analyzes code at rest)
Network	None (except dependency scanners)
Database	None
Breaks build	Yes

Examples

Linting

A .eslintrc.json configuration enforcing test quality rules:

ESLint configuration for test quality rules

{
  "rules": {
    "jest/no-disabled-tests": "warn",
    "jest/expect-expect": "error",
    "jest/no-commented-out-tests": "error",
    "jest/valid-expect": "error",
    "no-unused-vars": "error",
    "no-console": "warn"
  }
}

Type Checking

Statically typed languages catch type mismatches at compile time, eliminating entire classes of runtime errors. Java, for example, rejects incompatible argument types before the code runs:

Java type checking example

public static double calculateTotal(double price, int quantity) {
    return price * quantity;
}

// Compiler error: incompatible types: String cannot be converted to double
calculateTotal("19.99", 3);

Dependency Scanning

Tools like npm audit, Snyk, or Dependabot scan for known vulnerabilities:

npm audit output example

$ npm audit
found 2 vulnerabilities (1 moderate, 1 high)
  moderate: Prototype Pollution in lodash < 4.17.21
  high:     Remote Code Execution in log4j < 2.17.1

Types of Static Analysis

Type	Purpose
Linting	Catches common errors and enforces best practices
Formatting	Enforces consistent code style, removing subjective debates
Complexity analysis	Flags overly deep or long code blocks that breed defects
Type checking	Prevents type-related bugs, replacing some unit tests
Security scanning	Detects known vulnerabilities and dangerous coding patterns
Dependency scanning	Checks for outdated, hijacked, or insecurely licensed deps
Accessibility linting	Detects missing alt text, ARIA violations, contrast failures, semantic HTML issues

Accessibility Linting

Accessibility linting catches deterministic WCAG violations the same way a security scanner catches known vulnerability patterns. Automated checks cover structural issues (missing alt text, invalid ARIA attributes, insufficient contrast ratios, broken heading hierarchy) while manual review covers subjective aspects like whether alt text is actually meaningful.

A .pa11yci configuration running WCAG 2.1 AA checks against rendered pages:

pa11y-ci configuration for WCAG 2.1 AA checks

{
  "defaults": {
    "standard": "WCAG2AA",
    "timeout": 10000,
    "wait": 1000
  },
  "urls": [
    "http://localhost:1313/docs/",
    "http://localhost:1313/docs/testing/"
  ]
}

An axe-core unit test asserting that a rendered component has no accessibility violations:

axe-core accessibility test with jest-axe

import { axe, toHaveNoViolations } from "jest-axe";

expect.extend(toHaveNoViolations);

it("should have no accessibility violations", async () => {
  const { container } = render(<LoginForm />);
  const results = await axe(container);
  expect(results).toHaveNoViolations();
});

Anti-Patterns

Disabling rules instead of fixing code: suppressing linter warnings or ignoring security findings erodes the value of static analysis over time.
Not customizing rules: default rulesets are a starting point. Write custom rules for patterns that come up repeatedly in code reviews.
Running static analysis only in CI: by the time CI reports a formatting error, the developer has context-switched. IDE plugins and pre-commit hooks provide immediate feedback.
Ignoring dependency vulnerabilities: known CVEs in dependencies are a direct attack vector. Treat high-severity findings as build-breaking.
Treating static analysis as optional: static checks should be mandatory and enforced. If developers can bypass them, they will.

Connection to CD Pipeline

Static analysis is the first gate in the CD pipeline, providing the fastest feedback:

IDE / local development: plugins run in real time as code is written.
Pre-commit: hooks run linters, formatters, and accessibility checks on changed components, blocking commits that violate rules.
PR verification: CI runs the full static analysis suite (linting, type checking, security scanning, dependency auditing, accessibility linting) and blocks merge on failure.
Trunk verification: the same checks re-run on the merged HEAD to catch anything missed.
Scheduled scans: dependency and security scanners run on a schedule to catch newly disclosed vulnerabilities in existing dependencies.

Because static analysis requires no running code, no test environment, and no external dependencies, it is the cheapest and fastest form of quality verification. A mature CD pipeline treats static analysis failures the same as test failures: they break the build.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

5.7 - Test Doubles

Patterns for isolating dependencies in tests: stubs, mocks, fakes, spies, and dummies.

Definition

Test doubles are stand-in objects that replace real production dependencies during testing. The term comes from the film industry’s “stunt double.” Just as a stunt double replaces an actor for dangerous scenes, a test double replaces a costly or non-deterministic dependency to make tests fast, isolated, and reliable.

Test doubles allow you to:

Remove non-determinism by replacing network calls, databases, and file systems with predictable substitutes.
Control test conditions by forcing specific states, error conditions, or edge cases that would be difficult to reproduce with real dependencies.
Increase speed by eliminating slow I/O operations.
Isolate the system under test so that failures point directly to the code being tested, not to an external dependency.

Types of Test Doubles

Type	Description	Example Use Case
Dummy	Passed around but never actually used. Fills parameter lists.	A required logger parameter in a constructor.
Stub	Provides canned answers to calls made during the test. Does not respond to anything outside what is programmed.	Returning a fixed user object from a repository.
Spy	A stub that also records information about how it was called (arguments, call count, order).	Verifying that an analytics event was sent once.
Mock	Pre-programmed with expectations about which calls will be made. Verification happens on the mock itself.	Asserting that `sendEmail()` was called with specific arguments.
Fake	Has a working implementation, but takes shortcuts not suitable for production.	An in-memory database replacing PostgreSQL.

Choosing the Right Double

Use stubs when you need to supply data but do not care how it was requested.
Use spies when you need to verify call arguments or call count.
Use mocks when the interaction itself is the primary thing being verified.
Use fakes when you need realistic behavior but cannot use the real system.
Use dummies when a parameter is required by the interface but irrelevant to the test.

When to Use

Test doubles are used in every layer of deterministic testing:

Unit tests: nearly all dependencies are replaced with test doubles to achieve full isolation.
Integration tests: external sub-systems (APIs, databases, message queues) are replaced, but internal collaborators remain real.
Functional tests: dependencies that cross the sub-system boundary are replaced to maintain determinism.

Test doubles should be used less in later pipeline stages. End-to-end tests use no test doubles by design.

Examples

A JavaScript stub providing a canned response:

JavaScript stub returning a fixed user

// Stub: return a fixed user regardless of input
const userRepository = {
  findById: jest.fn().mockResolvedValue({
    id: "u1",
    name: "Ada Lovelace",
    email: "ada@example.com",
  }),
};

const user = await userService.getUser("u1");
expect(user.name).toBe("Ada Lovelace");

A Java spy verifying interaction:

Java spy verifying call count with Mockito

@Test
public void shouldCallUserServiceExactlyOnce() {
    UserService spyService = Mockito.spy(userService);
    doReturn(testUser).when(spyService).getUserInfo("u123");

    User result = spyService.getUserInfo("u123");

    verify(spyService, times(1)).getUserInfo("u123");
    assertEquals("Ada", result.getName());
}

A fake in-memory repository:

JavaScript fake in-memory repository

class FakeUserRepository {
  constructor() {
    this.users = new Map();
  }
  save(user) {
    this.users.set(user.id, user);
  }
  findById(id) {
    return this.users.get(id) || null;
  }
}

Anti-Patterns

Mocking what you do not own: wrapping a third-party API in a thin adapter and mocking the adapter is safer than mocking the third-party API directly. Direct mocks couple your tests to the library’s implementation.
Over-mocking: replacing every collaborator with a mock turns the test into a mirror of the implementation. Tests become brittle and break on every refactor. Only mock what is necessary to maintain determinism.
Not validating test doubles: if the real dependency changes its contract, your test doubles silently drift. Use contract tests to keep doubles honest.
Complex mock setup: if setting up mocks requires dozens of lines, the system under test may have too many dependencies. Consider refactoring the production code rather than adding more mocks.
Using mocks to test implementation details: asserting on the exact sequence and count of internal method calls creates change-detector tests. Prefer asserting on observable output.

Connection to CD Pipeline

Test doubles are a foundational technique that enables the fast, deterministic tests required for continuous delivery:

Early pipeline stages (static analysis, unit tests, integration tests) rely heavily on test doubles to stay fast and deterministic. This is where the majority of defects are caught.
Later pipeline stages (E2E tests, production monitoring) use fewer or no test doubles, trading speed for realism.
Contract tests run asynchronously to validate that test doubles still match reality, closing the gap between the deterministic and non-deterministic stages of the pipeline.

The guiding principle from Justin Searls applies: “Don’t poke too many holes in reality.” Use test doubles when you must, but prefer real implementations when they are fast and deterministic.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

5.8 - Test Feedback Speed

Why test suite speed matters for developer effectiveness and how cognitive limits set the targets.

Why speed has a threshold

The 10-minute CI target and the preference for sub-second unit tests are not arbitrary. They come from how human cognition handles interrupted work. When a developer makes a change and waits for test results, three things determine whether that feedback is useful: whether the developer still holds the mental model of the change, whether they can act on the result immediately, and whether the wait is short enough that they do not context-switch to something else.

Research on task interruption and working memory consistently shows that context switches are expensive. Gloria Mark’s research at UC Irvine found that it takes an average of 23 minutes for a person to fully regain deep focus after being interrupted during a task, and that interrupted tasks take twice as long and contain twice as many errors as uninterrupted ones.¹ If the test suite itself takes 30 minutes, the total cost of a single feedback cycle approaches an hour - and most of that time is spent re-loading context, not fixing code.

The cognitive breakpoints

Jakob Nielsen’s foundational research on response times identified three thresholds that govern how users perceive and respond to system delays: 0.1 seconds (feels instantaneous), 1 second (noticeable but flow is maintained), and 10 seconds (attention limit - the user starts thinking about other things).² These thresholds, rooted in human perceptual and cognitive limits, apply directly to developer tooling.

Different feedback speeds produce fundamentally different developer behaviors:

Feedback time	Developer behavior	Cognitive impact
Under 1 second	Feels instantaneous. The developer stays in flow, treating the test result as part of the editing cycle.²	Working memory is fully intact. The change and the result are experienced as a single action.
1 to 10 seconds	The developer waits. Attention may drift briefly but returns without effort.	Working memory is intact. The developer can act on the result immediately.
10 seconds to 2 minutes	The developer starts to feel the wait. They may glance at another window or check a message, but they do not start a new task.	Working memory begins to decay. The developer can still recover context quickly, but each additional second increases the chance of distraction.²
2 to 10 minutes	The developer context-switches. They check email, review a PR, or start thinking about a different problem. When the result arrives, they must actively return to the original task.	Working memory is partially lost. Rebuilding context takes several minutes depending on the complexity of the change.¹
Over 10 minutes	The developer fully disengages and starts a different task. The test result arrives as an interruption to whatever they are now doing.	Working memory of the original change is gone. Rebuilding it takes upward of 23 minutes.¹ Investigating a failure means re-reading code they wrote an hour ago.

The 10-minute CI target exists because it is the boundary between “developer waits and acts on the result” and “developer starts something else and pays a full context-switch penalty.” Below 10 minutes, feedback is actionable. Above 10 minutes, feedback becomes an interruption. DORA’s research on continuous integration reinforces this: tests should complete in under 10 minutes to support the fast feedback loops that high-performing teams depend on.³

What this means for test architecture

These cognitive breakpoints should drive how you structure your test suite:

Local development (under 1 second). Unit tests for the code you are actively changing should run in watch mode, re-executing on every save. At this speed, TDD becomes natural - the test result is part of the writing process, not a separate step. This is where you test complex logic with many permutations.

Pre-push verification (under 2 minutes). The full unit test suite and the functional tests for the component you changed should complete before you push. At this speed, the developer stays engaged and acts on failures immediately. This is where you catch regressions.

CI pipeline (under 10 minutes). The full deterministic suite - all unit tests, all functional tests, all integration tests - should complete within 10 minutes of commit. At this speed, the developer has not yet fully disengaged from the change. If CI fails, they can investigate while the code is still fresh.

Post-deploy verification (minutes to hours). E2E smoke tests and contract test validation run after deployment. These are non-deterministic, slower, and less frequent. Failures at this level trigger investigation, not immediate developer action.

When a test suite exceeds 10 minutes, the solution is not to accept slower feedback. It is to redesign the suite: replace E2E tests with functional tests using test doubles, parallelize test execution, and move non-deterministic tests out of the gating path.

Impact on application architecture

Test feedback speed is not just a testing concern - it puts pressure on how you design your systems. A monolithic application with a single test suite that takes 40 minutes to run forces every developer to pay the full context-switch penalty on every change, regardless of which module they touched.

Breaking a system into smaller, independently testable components is often motivated as much by test speed as by deployment independence. When a component has its own focused test suite that runs in under 2 minutes, the developer working on that component gets fast, relevant feedback. They do not wait for tests in unrelated modules to finish.

This creates a virtuous cycle: smaller components with clear boundaries produce faster test suites, which enable more frequent integration, which encourages smaller changes, which are easier to test. Conversely, a tightly coupled monolith produces a slow, tangled test suite that discourages frequent integration, which leads to larger changes, which are harder to test and more likely to fail.

Architecture decisions that improve test feedback speed include:

Clear component boundaries with well-defined interfaces, so each component can be tested in isolation with test doubles for its dependencies.
Separating business logic from infrastructure so that core rules can be unit tested in milliseconds without databases, queues, or network calls.
Independently deployable services with their own test suites, so a change to one service does not require running the entire system’s tests.
Avoiding shared mutable state between components, which forces integration tests and introduces non-determinism.

If your test suite is slow and you cannot make it faster by optimizing test execution alone, the architecture is telling you something. A system that is hard to test quickly is also hard to change safely - and both problems have the same root cause.

The compounding cost of slow feedback

Slow feedback does not just waste time - it changes behavior. When the suite takes 40 minutes, developers adapt:

They batch changes to avoid running the suite more than necessary, creating larger and riskier commits.
They stop running tests locally because the wait is unacceptable during active development.
They push to CI and context-switch, paying the full rebuild penalty on every cycle.
They rerun failures instead of investigating, because re-reading the code they wrote an hour ago is expensive enough that “maybe it was flaky” feels like a reasonable bet.

Each of these behaviors degrades quality independently. Together, they make continuous integration impossible. A team that cannot get feedback on a change within 10 minutes cannot sustain the practice of integrating changes multiple times per day.⁴

Sources

5.9 - Testing Glossary

Definitions for testing terms as they are used on this site.

These definitions reflect how this site uses each term. They are not universal definitions - other communities may use the same words differently.

Black Box Testing

A testing approach where the test exercises code through its public interface and asserts only on observable outputs - return values, state changes visible to consumers, or side effects such as messages sent. The test has no knowledge of internal implementation details. Black box tests are resilient to refactoring because they verify what the code does, not how it does it. Contrast with white box testing.

Referenced in: Testing, Unit Tests

Functional Acceptance Tests

Automated tests that verify a system behaves as specified. Functional acceptance tests exercise end-to-end user workflows in a production-like environment and confirm the implementation matches the acceptance criteria. They answer “did we build what was specified?” rather than “does the code work?” They do not validate whether the specification itself is correct - only real user feedback can confirm we are building the right thing.

Referenced in: Pipeline Reference Architecture

TDD (Test-Driven Development)

A development practice where tests are written before the production code that makes them pass. TDD supports CD by ensuring high test coverage, driving simple design, and producing a fast, reliable test suite. TDD feeds into the testing fundamentals required in Phase 1.

Referenced in: CD for Greenfield Projects, Integration Frequency, Inverted Test Pyramid, Small Batches, TBD Migration Guide, Trunk-Based Development, Unit Tests

Virtual Service

A test double that simulates a real external service over the network, responding to HTTP requests with pre-configured or recorded responses. Unlike in-process stubs or mocks, a virtual service runs as a standalone process and is accessed via real network calls, making it suitable for functional testing and integration testing where your application needs to make actual HTTP requests against a dependency. Tools such as WireMock, Mountebank, and Hoverfly can create virtual services from recorded traffic or API specifications. See Test Doubles.

Referenced in: Integration Tests, Testing Fundamentals

White Box Testing

A testing approach where the test has knowledge of and asserts on internal implementation details - specific methods called, call order, internal state, or code paths taken. White box tests verify how the code works, not what it produces. These tests are fragile because any refactoring of internals breaks them, even when behavior is unchanged. Avoid white box testing in unit tests; prefer black box testing that asserts on observable outcomes.

Referenced in: Testing, Unit Tests

6 - DORA Recommended Practices

The practices that drive software delivery performance, as identified by DORA research.

The DevOps Research and Assessment (DORA) research program has identified practices that predict high software delivery performance. These practices are not tools or technologies. They are cultural conditions and behaviors that enable teams to deliver software quickly, reliably, and sustainably.

This page organizes the DORA recommended practices by their relevance to each migration phase. Use it as a reference to understand which practices you are building at each stage of your journey and which ones to focus on next.

Using This Table

“Primary” means the phase where the practice is the main focus of improvement work. “Ongoing” means the practice is relevant in every phase and should be continuously nurtured. “Started” or “Expanded” means the practice is introduced or deepened in that phase. No entry means the practice is not a primary concern in that phase, though it may still be relevant.

Practice Maturity by Phase

Practice	Phase 0	Phase 1	Phase 2	Phase 3	Phase 4
Version control	Prerequisite
Continuous integration		Primary
Deployment automation			Primary
Trunk-based development		Primary
Test automation		Primary	Expanded
Test data management			Primary
Shift left on security			Primary
Loosely coupled architecture				Primary
Empowered teams	Ongoing	Ongoing	Ongoing	Ongoing	Ongoing
Customer feedback					Primary
Value stream visibility	Primary			Revisited
Working in small batches		Started		Primary
Team experimentation	Ongoing	Ongoing	Ongoing	Ongoing	Ongoing
Limit WIP				Primary
Visual management	Started	Ongoing	Ongoing	Ongoing	Ongoing
Monitoring and observability			Started	Expanded	Primary
Proactive notification					Primary
Generative culture	Ongoing	Ongoing	Ongoing	Ongoing	Ongoing
Learning culture	Ongoing	Ongoing	Ongoing	Ongoing	Ongoing
Collaboration among teams		Started	Primary
Job satisfaction	Ongoing	Ongoing	Ongoing	Ongoing	Ongoing
Transformational leadership	Ongoing	Ongoing	Ongoing	Ongoing	Ongoing

Continuous Delivery Practices

These practices directly support the mechanics of getting software from commit to production. They are the primary focus of Phases 1 and 2 of the migration.

Version Control

All production artifacts (application code, test code, infrastructure configuration, deployment scripts, and database schemas) are stored in version control and can be reproduced from a single source of truth.

Migration relevance: This is a prerequisite for Phase 1. If any part of your delivery process depends on files stored on a specific person’s machine or a shared drive, address that before beginning the migration.

Continuous Integration

Developers integrate their work to trunk at least daily. Each integration triggers an automated build and test process. Broken builds are fixed within minutes.

Migration relevance: Phase 1: Foundations. CI is the gateway practice. Without it, none of the pipeline practices in Phase 2 can function. See Build Automation and Trunk-Based Development.

Deployment Automation

Deployments are fully automated and can be triggered by anyone on the team. No manual steps are required between a green pipeline and production.

Migration relevance: Phase 2: Pipeline. Specifically, Single Path to Production and Rollback.

Trunk-Based Development

Developers work in small batches and merge to trunk at least daily. Branches, if used, are short-lived (less than one day). There are no long-lived feature branches.

Migration relevance: Phase 1: Trunk-Based Development. This is one of the first practices to establish because it enables CI.

Test Automation

A comprehensive suite of automated tests provides confidence that the software is deployable. Tests are reliable, fast, and maintained as carefully as production code.

Migration relevance: Phase 1: Testing Fundamentals. Also see the Testing reference section for guidance on specific test types.

Test Data Management

Test data is managed in a way that allows automated tests to run independently, repeatably, and without relying on shared mutable state. Tests can create and clean up their own data.

Migration relevance: Becomes critical during Phase 2 when you need production-like environments and deterministic pipeline results.

Shift Left on Security

Security is integrated into the development process rather than added as a gate at the end. Automated security checks run in the pipeline. Security requirements are part of the definition of deployable.

Migration relevance: Integrated during Phase 2: Pipeline Architecture as automated quality gates rather than manual review steps.

Architecture Practices

These practices address the structural characteristics of your system that enable or prevent independent, frequent deployment.

Loosely Coupled Architecture

Teams can deploy their services independently without coordinating with other teams. Changes to one service do not require changes to other services. APIs have well-defined contracts.

Migration relevance: Phase 3: Architecture Decoupling. This practice becomes critical when optimizing for deployment frequency and small batch sizes.

Product and Process Practices

These practices address how work is planned, prioritized, and delivered.

Customer Feedback

Product decisions are informed by direct feedback from customers. Teams can observe how features are used in production and adjust accordingly.

Migration relevance: Becomes fully enabled in Phase 4: Deliver on Demand when every change reaches production quickly enough for real customer feedback to inform the next change.

Value Stream Visibility

The team has a clear view of the entire delivery process from request to production, including wait times, handoffs, and rework loops.

Migration relevance: Phase 0: Value Stream Mapping. This is the first activity in the migration because it informs every decision that follows.

Working in Small Batches

Work is broken down into small increments that can be completed, tested, and deployed independently. Each increment delivers measurable value or validated learning.

Migration relevance: Begins in Phase 1: Work Decomposition and is optimized in Phase 3: Small Batches.

Limit Work in Progress

Teams have explicit WIP limits that constrain the number of items in any stage of the delivery process. WIP limits are enforced and respected.

Migration relevance: Phase 3: Limiting WIP. Reducing WIP is one of the most effective ways to improve lead time and delivery predictability.

Visual Management

The state of all work is visible to the entire team through dashboards, boards, or other visual tools. Anyone can see what is in progress, what is blocked, and what has been deployed.

Migration relevance: All phases. Visual management supports the identification of constraints in Phase 0 and the enforcement of WIP limits in Phase 3.

Monitoring and Observability

Teams have access to production metrics, logs, and traces that allow them to understand system behavior, detect issues, and diagnose problems quickly.

Migration relevance: Critical for Phase 4: Progressive Rollout where automated health checks determine whether a deployment proceeds or rolls back. Also supports fast mean time to restore.

Proactive Notification

Teams are alerted to problems before customers are affected. Monitoring thresholds and anomaly detection trigger notifications that enable rapid response.

Migration relevance: Becomes critical in Phase 4 when deployments are continuous and automated. Proactive notification is what makes continuous deployment safe.

Collaboration Among Teams

Development, operations, security, and product teams work together rather than in silos. Handoffs are minimized. Shared responsibility replaces blame.

Migration relevance: All phases, but especially Phase 2: Pipeline where the pipeline must encode the quality criteria from all disciplines (security, testing, operations) into automated gates.

Practices Relevant in Every Phase

The following practices are not tied to a specific migration phase. They are conditions that support every phase and should be cultivated continuously throughout the migration.

Empowered Teams. Teams choose their own tools, technologies, and approaches within organizational guardrails. Teams that cannot make local decisions about their pipeline, test strategy, or deployment approach will be unable to iterate quickly enough to make progress.

Team Experimentation. Teams can try new ideas, tools, and approaches without requiring lengthy approval. Failed experiments are treated as learning, not waste. The migration itself is an experiment that requires psychological safety and organizational support.

Generative Culture. Following Ron Westrum’s typology, a generative culture is characterized by high cooperation, shared risk, and focus on the mission. Teams in pathological or bureaucratic cultures will struggle with every phase because practices like TBD and CI require trust and psychological safety.

Learning Culture. The organization invests in learning. Teams have time for experimentation, training, and knowledge sharing. The CD migration is a learning journey that requires time and space to learn new practices, make mistakes, and improve.

Job Satisfaction. Team members find their work meaningful and have the autonomy and resources to do it well. The migration should improve job satisfaction by reducing toil and giving teams faster feedback. If the migration is experienced as a burden, something is wrong with the approach.

Transformational Leadership. Leaders support the migration with vision, resources, and organizational air cover. Without leadership support, the migration will stall when it encounters the first organizational blocker.

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

7 - CD Dependency Tree

Visual guide showing how CD practices depend on and build upon each other.

The full interactive dependency tree is at practices.minimumcd.org. This page summarizes the key dependency chains and how they map to the migration phases in this guide.

Continuous delivery is not a single practice you adopt. It is a system of interdependent practices where each one supports and enables others. Understanding these dependencies helps you plan your migration in the right order, addressing foundational practices before building on them.

Using the Tree to Diagnose Problems

When something in your delivery process is not working, trace it through the dependency tree to find the root cause.

Deployments keep failing. Look at what feeds CD in the tree. Is your pipeline deterministic? Are you using immutable artifacts? Is your application config externalized? The failure is likely in one of the pipeline practices.

CI builds are constantly broken. Look at what feeds CI. Are developers actually practicing TBD (integrating daily)? Is the test suite reliable, or is it full of flaky tests? Is the build automated end-to-end? The broken builds are a symptom of a problem in the development practices layer.

You cannot reduce batch size. Look at what feeds small batches. Is work being decomposed into vertical slices? Are feature flags available so partial work can be deployed safely? Is the architecture decoupled enough to allow independent deployment? The batch size problem originates in one of these upstream practices.

Every feature requires cross-team coordination to deploy. Look at team structure. Are teams organized around domains they can deliver independently, or around technical layers that force handoffs for every feature? If deploying a feature requires the frontend team, backend team, and DBA team to coordinate a release window, the team structure is preventing independent delivery. No amount of pipeline automation fixes this. The team boundaries need to change.

Migration Tip

When you encounter a problem, resist the urge to fix the symptom. Use the dependency tree to trace the problem to its root cause. Fixing the symptom (for example, adding more manual testing to catch deployment failures) will not solve the underlying issue and often adds toil that makes things worse. Fix the dependency that is broken, and the downstream problem resolves itself.

Mapping to Migration Phases

The dependency tree directly informs the sequencing of migration phases:

Dependency Layer	Migration Phase	Why This Order
Development practices (BDD, trunk-based development)	Phase 1 - Foundations	These are prerequisites for CI, which is a prerequisite for everything else
Build and test infrastructure (build automation, automated testing, test environments)	Phase 1 and Phase 2	You need reliable build and test infrastructure before you can build a reliable pipeline
Pipeline practices (application pipeline, immutable artifacts, configuration management, rollback)	Phase 2 - Pipeline	The pipeline depends on solid CI and development practices
Flow optimization (small batches, feature flags, WIP limits, metrics)	Phase 3 - Optimize	Optimization requires a working pipeline to optimize
Organizational practices (cross-functional teams, component ownership, developer-driven support)	All phases	These cross-cutting practices support every phase. Team structure should be addressed early because it constrains architecture and work decomposition

Understanding the Dependency Model

How Dependencies Work

CD sits at the top of the tree. It depends directly on many practices, each of which has its own dependencies. When practice A depends on practice B, it means B is a prerequisite or enabler for A. You cannot reliably adopt A without B in place.

For example, continuous delivery depends directly on:

Category	Direct Dependencies
Pipeline	Application pipeline, immutable artifacts, on-demand rollback, configuration management
Testing	Continuous testing, automated database changes, test environments
Integration	Continuous integration
Environment	Automated environment provisioning, monitoring and alerting
Organizational	Cross-functional product teams, developer-driven support, prioritized features
Development	ATDD, modular system design

Each of these has its own dependency chain. The application pipeline alone depends on automated testing, deployment automation, automated artifact versioning, and quality gates. Automated testing in turn depends on build automation. Build automation depends on version control and dependency management. The chain runs deep.

Key Dependency Chains

BDD enables testing enables CI enables CD

Behavior-Driven Development produces clear, testable acceptance criteria. Those criteria drive functional testing and acceptance test-driven development. A comprehensive, fast test suite enables Continuous Integration with confidence. And CI is the foundational prerequisite for CD.

If your team skips BDD, stories are ambiguous. If stories are ambiguous, tests are incomplete or wrong. If tests are unreliable, CI is unreliable. And if CI is unreliable, CD is impossible.

Trunk-Based Development enables CI

CI requires that all developers integrate to a shared trunk at least once per day. If your team uses long-lived feature branches, you are not doing CI regardless of how often your build server runs. TBD is not optional for CD. It is a prerequisite.

Cross-functional teams enable component ownership enables modular systems

How teams are organized determines what they can deliver independently. A team organized around a domain (owning the services, data, and interfaces for that domain) can decompose work into vertical slices within their boundary and deploy without coordinating with other teams. A team organized around a technical layer (the “frontend team,” the “DBA team”) cannot. Every feature requires handoffs across layer teams, and deployment requires coordinating all of them.

Conway’s Law makes this structural: the system’s architecture will mirror the team structure. In the dependency tree, cross-functional product teams enable component ownership, which enables the modular system design that CD requires.

Version control is the root of everything

Nearly every automation practice traces back to version control. Build automation, configuration management, infrastructure automation, and component ownership all depend on it. If your version control practices are weak (infrequent commits, poor branching discipline, configuration stored outside version control), the entire tree above it is compromised.

8 - Glossary

Key terms and definitions used throughout this guide.

This glossary defines the terms used across every phase of the CD migration guide. Where a term has a specific meaning within a migration phase, the relevant phase is noted.

A

Acceptance Criteria

Concrete expectations for a change, expressed as observable outcomes that can be used as fitness functions - executed as deterministic tests or evaluated by review agents. In ACD, acceptance criteria include a done definition (what “done” looks like from an observer’s perspective) and an evaluation design (test cases with known-good outputs). They constrain the agent: comprehensive criteria prevent incorrect code from passing, while shallow criteria allow code that passes tests but violates intent. See Acceptance Criteria.

Referenced in: Agent-Assisted Specification, Agent Delivery Contract, AI Adoption Roadmap, AI-Generated Code Ships Without Developer Understanding, AI Is Generating Technical Debt Faster Than the Team Can Absorb It, AI Tooling Slows You Down Instead of Speeding You Up, Find Your Symptom, Pipeline Enforcement and Expert Agents, Pitfalls and Metrics, Rubber-Stamping AI-Generated Code, Small-Batch Agent Sessions, Testing Fundamentals, The Four Prompting Disciplines, Tokenomics: Optimizing Token Usage in Agent Architecture, Work Decomposition, Working Agreements

ACD (Agentic Continuous Delivery)

The application of continuous delivery in environments where software changes are proposed by AI agents. ACD extends CD with additional constraints, delivery artifacts, and pipeline enforcement to reliably constrain agent autonomy without slowing delivery. ACD assumes the team already practices continuous delivery. Without that foundation, the agentic extensions have nothing to extend. See Agentic Continuous Delivery.

Referenced in: Agentic Continuous Delivery (ACD), AI Adoption Roadmap, Getting Started: Where to Put What, Pipeline Enforcement and Expert Agents, Pitfalls and Metrics, The Agentic Development Learning Curve, The Four Prompting Disciplines, Agent Delivery Contract, Tokenomics: Optimizing Token Usage in Agent Architecture, Your Migration Journey

Agent (AI)

An AI system that uses tool calls in a loop to complete multi-step tasks autonomously. Unlike a single LLM call that returns a response, an agent can invoke tools, observe results, and decide what to do next until a goal is met or a stopping condition is reached. An agent’s behavior is shaped by its prompt - the complete set of instructions, context, and constraints it receives at the start of a session. See Agentic CD.

Referenced in: Agent-Assisted Specification, Agentic Architecture Patterns, Agentic Continuous Delivery (ACD), AI Adoption Roadmap, AI Tooling Slows You Down Instead of Speeding You Up, Coding and Review Agent Configuration, Experience Reports, Getting Started: Where to Put What, Learning Paths, Pipeline Enforcement and Expert Agents, Pitfalls and Metrics, Small-Batch Agent Sessions, The Agentic Development Learning Curve, Agent Delivery Contract, Tokenomics: Optimizing Token Usage in Agent Architecture

Artifact

A packaged, versioned output of a build process (e.g., a container image, JAR file, or binary). In a CD pipeline, artifacts are built once and promoted through environments without modification. See Immutable Artifacts.

Referenced in: Agent-Assisted Specification, Agentic Architecture Patterns, Agentic Continuous Delivery (ACD), Build Automation, Build Duration, CD for Greenfield Projects, Coding and Review Agent Configuration, Data Pipelines and ML Models Have No Deployment Automation, Deployable Definition, Deployments Are One-Way Doors, Deterministic Pipeline, Developers Cannot Run the Pipeline Locally, DORA Recommended Practices, End-to-End Tests, Every Change Requires a Ticket and Approval Chain, Experience Reports, Functional Tests, Independent Teams, Independent Deployables, Merge Freezes Before Deployments, Metrics-Driven Improvement, Missing Deployment Pipeline, Multiple Teams, Single Deployable, No Contract Testing Between Services, No Evidence of What Was Deployed or When, Pipeline Enforcement and Expert Agents, Pitfalls and Metrics, Rollback, Single Team, Single Deployable, Small-Batch Agent Sessions, The Agentic Development Learning Curve, The Build Runs Again for Every Environment, Agent Delivery Contract, The Team Ignores Alerts Because There Are Too Many, The Team Is Afraid to Deploy, Tightly Coupled Monolith, Tokenomics: Optimizing Token Usage in Agent Architecture, Working Agreements

B

Black Box Testing

See Testing Glossary.

Baseline Metrics

The set of delivery measurements taken before beginning a migration, used as the benchmark against which improvement is tracked. See Phase 0 - Baseline Metrics.

Referenced in: Phase 0: Assess

Batch Size

The amount of change included in a single deployment. Smaller batches reduce risk, simplify debugging, and shorten feedback loops. Reducing batch size is a core focus of Phase 3 - Small Batches.

Referenced in: DORA Recommended Practices, Hardening Sprints Are Needed Before Every Release, Metrics-Driven Improvement, Missing Deployment Pipeline, New Releases Introduce Regressions in Previously Working Functionality, Phase 2: Pipeline, Releases Are Infrequent and Painful, Small Batches

BDD (Behavior-Driven Development)

A collaboration practice where developers, testers, and product representatives define expected behavior using structured examples before code is written. BDD produces executable specifications that serve as both documentation and automated tests. BDD supports effective work decomposition by forcing clarity about what a story actually means before development begins.

Referenced in: Agent-Assisted Specification, Agentic Continuous Delivery (ACD), AI Tooling Slows You Down Instead of Speeding You Up, Coding and Review Agent Configuration, Getting Started: Where to Put What, Knowledge & Communication Defects, Pipeline Enforcement and Expert Agents, Pitfalls and Metrics, Small Batches, Small-Batch Agent Sessions, TBD Migration Guide, Agent Delivery Contract, Work Decomposition

Blue-Green Deployment

A deployment strategy that maintains two identical production environments. New code is deployed to the inactive environment, verified, and then traffic is switched. See Progressive Rollout.

Referenced in: Every Deployment Is Immediately Visible to All Users, Process & Deployment Defects

Branch Lifetime

The elapsed time between creating a branch and merging it to trunk. CD requires branch lifetimes measured in hours, not days or weeks. Long branch lifetimes are a symptom of poor work decomposition or slow code review. See Trunk-Based Development.

Referenced in: AI Adoption Roadmap, Feedback Takes Hours Instead of Minutes, Long-Lived Feature Branches, Merging Is Painful and Time-Consuming, Metrics-Driven Improvement, TBD Migration Guide

C

Canary Deployment

A deployment strategy where a new version is rolled out to a small subset of users or servers before full rollout. If the canary shows no issues, the deployment proceeds to 100%. See Progressive Rollout.

Referenced in: Change & Complexity Defects, Pipeline Enforcement and Expert Agents, Process & Deployment Defects, Progressive Rollout

CD (Continuous Delivery)

The practice of ensuring that every change to the codebase is always in a deployable state and can be released to production at any time through a fully automated pipeline. Continuous delivery does not require that every change is deployed automatically, but it requires that every change could be deployed automatically. This is the primary goal of this migration guide.

Referenced in: Agent-Assisted Specification, AI Adoption Roadmap, Agentic Continuous Delivery (ACD), CD for Greenfield Projects, Change Advisory Board Gates, Data Pipelines and ML Models Have No Deployment Automation, Deterministic Pipeline, DORA Recommended Practices, Experience Reports, Feature Flags, Horizontal Slicing, Independent Teams, Independent Deployables, Inverted Test Pyramid, Knowledge Silos, Leadership Sees CD as a Technical Nice-to-Have, Learning Paths, Long-Lived Feature Branches, Manual Testing Only, Metrics-Driven Improvement, Missing Deployment Pipeline, Phase 0: Assess, Phase 1: Foundations, Phase 2: Pipeline, Phase 3: Optimize, Pipeline Enforcement and Expert Agents, Pipeline Reference Architecture, Process & Deployment Defects, Push-Based Work Assignment, Retrospectives, Rubber-Stamping AI-Generated Code, Small Batches, Team Membership Changes Constantly, Test Doubles, The Deployment Target Does Not Support Modern CI/CD Tooling, Thin-Spread Teams, Tightly Coupled Monolith, Unit Tests, Work Decomposition

Change Failure Rate (CFR)

The percentage of deployments to production that result in a degraded service and require remediation (e.g., rollback, hotfix, or patch). One of the four DORA metrics. See Metrics - Change Fail Rate.

Referenced in: Architecture Decoupling, CD for Greenfield Projects, Change Advisory Board Gates, Experience Reports, Metrics-Driven Improvement, Phase 0: Assess, Pitfalls and Metrics, Retrospectives

CI (Continuous Integration)

The practice of integrating code changes to a shared trunk at least once per day, where each integration is verified by an automated build and test suite. CI is a prerequisite for CD, not a synonym. A team that runs automated builds on feature branches but merges weekly is not doing CI. See Build Automation.

Referenced in: Architecture Decoupling, CD for Greenfield Projects, Change & Complexity Defects, Data & State Defects, Data Pipelines and ML Models Have No Deployment Automation, Dependency & Infrastructure Defects, Deterministic Pipeline, Developers Cannot Run the Pipeline Locally, Experience Reports, Feedback Takes Hours Instead of Minutes, Functional Tests, Integration & Boundaries Defects, Inverted Test Pyramid, It Works on My Machine, Long-Lived Feature Branches, Manual Testing Only, Merge Freezes Before Deployments, Merging Is Painful and Time-Consuming, Metrics-Driven Improvement, Missing Deployment Pipeline, No Evidence of What Was Deployed or When, Performance & Resilience Defects, Pipeline Enforcement and Expert Agents, Pipeline Reference Architecture, Process & Deployment Defects, Coding and Review Agent Configuration, Agentic Architecture Patterns, Security & Compliance Defects, Security Review Is a Gate, Not a Guardrail, Services Reach Production with No Health Checks or Alerting, Small-Batch Agent Sessions, Symptoms for Developers, Test Suite Is Too Slow to Run, Testing & Observability Gap Defects, Tests Pass in One Environment but Fail in Another, Tests Randomly Pass or Fail, The Development Workflow Has Friction at Every Step, Unit Tests

Constraint

In the Theory of Constraints, the single factor most limiting the throughput of a system. During a CD migration, your job is to find and fix constraints in order of impact. See Identify Constraints.

Referenced in: Agent-Assisted Specification, Agent Delivery Contract, AI Is Generating Technical Debt Faster Than the Team Can Absorb It, Baseline Metrics, Build Automation, Current State Checklist, DORA Recommended Practices, Experience Reports, Identify Constraints, Knowledge Silos, Learning Paths, Migrate to CD, Migrating Brownfield to CD, Multiple Services Must Be Deployed Together, Phase 0: Assess, Push-Based Work Assignment, Releases Are Infrequent and Painful, Releases Depend on One Person, Security Review Is a Gate, Not a Guardrail, Sprint Planning Is Dominated by Dependency Negotiation, The Agentic Development Learning Curve, The Four Prompting Disciplines, Untestable Architecture, Value Stream Mapping

Context (LLM)

The complete assembled input provided to an LLM for a single inference call. Context includes the system prompt, tool definitions, any reference material or documents, conversation history, and the current user request. “Context” and “prompt” are often used interchangeably; the distinction is that “context” emphasizes what information is present, while “prompt” emphasizes the structured input as a whole. Context is measured in tokens. As context grows, costs and latency increase and performance can degrade when relevant information is buried far from the end of the context. See Tokenomics.

Referenced in: Agentic Architecture Patterns, Agentic Continuous Delivery (ACD), Coding and Review Agent Configuration, Getting Started: Where to Put What, Pitfalls and Metrics, Small-Batch Agent Sessions, The Agentic Development Learning Curve, Tokenomics: Optimizing Token Usage in Agent Architecture

Context Window

The maximum number of tokens an LLM can process in a single call, spanning both input and output. The context window is a hard limit; exceeding it requires truncation or a redesigned approach. Large context windows (150,000+ tokens) create false confidence - more available space does not mean better performance, and filling the window increases both latency and cost. See Tokenomics.

Referenced in: Experience Reports, Agentic Architecture Patterns, Tokenomics: Optimizing Token Usage in Agent Architecture

Continuous Deployment

An extension of continuous delivery where every change that passes the automated pipeline is deployed to production without manual intervention. Continuous delivery ensures every change can be deployed; continuous deployment ensures every change is deployed. See Phase 4 - Deliver on Demand.

Referenced in: AI Adoption Roadmap, Architecture Decoupling, Change Advisory Board Gates, DORA Recommended Practices, Experience Reports, Feature Flags, Tightly Coupled Monolith

D

Deployable

A change that has passed all automated quality gates defined by the team and is ready for production deployment. The definition of deployable is codified in the pipeline, not decided by a person at deployment time. See Deployable Definition.

Referenced in: CD for Greenfield Projects, DORA Recommended Practices, Deployable Definition, Everything Started, Nothing Finished, Experience Reports, Functional Tests, Horizontal Slicing, Independent Teams, Independent Deployables, Long-Lived Feature Branches, Merge Freezes Before Deployments, Monolithic Work Items, Multiple Services Must Be Deployed Together, Multiple Teams, Single Deployable, Releases Are Infrequent and Painful, Rubber-Stamping AI-Generated Code, Small Batches, Team Alignment to Code, Trunk-Based Development, Work Decomposition, Work Items Take Days or Weeks to Complete, Working Agreements

Deployment Frequency

How often an organization successfully deploys to production. One of the four DORA metrics. See Metrics - Release Frequency.

Referenced in: Architecture Decoupling, CD for Greenfield Projects, Change Advisory Board Gates, DORA Recommended Practices, Experience Reports, Integration Frequency, Leadership Sees CD as a Technical Nice-to-Have, Metrics-Driven Improvement, Missing Deployment Pipeline, No Contract Testing Between Services, Phase 0: Assess, Process & Deployment Defects, Release Frequency, Retrospectives, Single Path to Production, TBD Migration Guide, The Team Is Caught Between Shipping Fast and Not Breaking Things, Tightly Coupled Monolith, Untestable Architecture

Development Cycle Time

The elapsed time from the first commit on a change to that change being deployable. This measures the efficiency of your development and pipeline process, excluding upstream wait times. See Metrics - Development Cycle Time.

Dependency

Code, service, or resource whose behavior is not defined in the current module. Dependencies vary by location and ownership:

Internal dependency - code in another file or module within the same repository, or in another repository your team controls. Internal dependencies share your release cycle and your team can change them directly.
External dependency - a third-party library, external API, or managed service outside your team’s direct control.

The distinction matters for testing. Internal dependencies are part of your own codebase and should be exercised through real code paths in tests. Replacing them with test doubles couples your tests to implementation details and causes rippling failures during routine refactoring. Reserve test doubles for external dependencies and runtime connections where real invocation is impractical or non-deterministic.

See also: Hard Dependency, Soft Dependency.

Referenced in: The Agentic Development Learning Curve

DORA Metrics

The four key metrics identified by the DORA (DevOps Research and Assessment) research program as predictive of software delivery performance: deployment frequency, lead time for changes, change failure rate, and mean time to restore service. See DORA Recommended Practices.

Referenced in: CD for Greenfield Projects, Change Fail Rate, Development Cycle Time, DORA Recommended Practices, Experience Reports, Lead Time, Mean Time to Repair, Metrics-Driven Improvement, Phase 3: Optimize, Product & Discovery Defects, Release Frequency, Retrospectives, Small Batches, Work Decomposition

E

External Dependency

A dependency on code or services outside your team’s direct control. External dependencies include third-party libraries, public APIs, managed cloud services, and any resource whose release cycle and availability your team cannot influence.

External dependencies are the primary case where test doubles add value. A test double for an external API verifies your integration logic without relying on network availability or third-party rate limits. By contrast, mocking internal code - another class in the same repository or a module your team owns - creates fragile tests that break whenever the internal implementation changes, even when the behavior is correct.

When evaluating whether to mock something, ask: “Can my team change this code and release it in our pipeline?” If yes, it is an internal dependency and should be tested through real code paths. If no, it is an external dependency and a test double is appropriate.

F

Feature Team

A team organized around user-facing features or customer journeys rather than owned product subdomains. A feature team is cross-functional - it contains the skills to deliver a feature end-to-end - but it does not own a stable domain of code. Multiple feature teams may modify the same components, with no single team accountable for quality or consistency within them.

In practice: feature teams must re-orient on code they do not continuously maintain each time a feature requires it; quality agreements cannot be enforced within the team because other teams also modify the same code; and while feature teams appear to minimize inter-team dependencies, they produce the opposite - everyone who can change a component is effectively on the same large, loosely communicating team. Feature teams are structurally equivalent to long-lived project teams.

Contrast with full-stack product team and subdomain product team, which achieve cross-functional delivery through stable domain ownership rather than feature-by-feature assembly.

Referenced in: Team Alignment to Code

Feature Flag

A mechanism that allows code to be deployed to production with new functionality disabled, then selectively enabled for specific users, percentages of traffic, or environments. Feature flags decouple deployment from release. See Feature Flags.

Referenced in: Architecture Decoupling, CD for Greenfield Projects, Change & Complexity Defects, Change Advisory Board Gates, Change Fail Rate, Database Migrations Block or Break Deployments, Deploying Stateful Services Causes Outages, Every Change Requires a Ticket and Approval Chain, Every Deployment Is Immediately Visible to All Users, Experience Reports, Feature Flags, Hard-Coded Environment Assumptions, Horizontal Slicing, Integration Frequency, Long-Lived Feature Branches, Mean Time to Repair, Monolithic Work Items, Phase 3: Optimize, Pipeline Enforcement and Expert Agents, Product & Discovery Defects, Progressive Rollout, Rollback, Single Path to Production, Small Batches, TBD Migration Guide, Teams Cannot Change Their Own Pipeline Without Another Team, The Team Resists Merging to the Main Branch, Trunk-Based Development, Vendor Release Cycles Constrain the Team’s Deployment Frequency, Work Decomposition, Work Requires Sign-Off from Teams Not Involved in Delivery, Working Agreements

Flow Efficiency

The ratio of active work time to total elapsed time in a delivery process. A flow efficiency of 15% means that for every hour of actual work, roughly 5.7 hours are spent waiting. Value stream mapping reveals your flow efficiency. See Value Stream Mapping.

Referenced in: Value Stream Mapping

Full-Stack Product Team

A team that owns every layer of a user-facing capability - UI, API, and data store - and whose public interface is designed for human users. A vertical slice for a full-stack product team delivers one observable behavior from the user interface through to the database. The slice is done when a user can observe the behavior through that interface. Contrast with subdomain product team.

Referenced in: Horizontal Slicing, Small Batches, Work Decomposition

Functional Acceptance Tests

See Testing Glossary.

G

GitFlow

A branching model created by Vincent Driessen in 2010 that uses multiple long-lived branches (main, develop, release/*, hotfix/*, feature/*) with specific merge rules and directions. GitFlow was designed for infrequent, scheduled releases and is fundamentally incompatible with continuous delivery because it defers integration, creates multiple paths to production, and adds merge complexity. See the TBD Migration Guide for a step-by-step path from GitFlow to trunk-based development.

Referenced in: Single Path to Production, TBD Migration Guide, Trunk-Based Development

H

Hard Dependency

A dependency that must be resolved before work can proceed. In delivery, hard dependencies include things like waiting for another team’s API, a shared database migration, or an infrastructure provisioning request. Hard dependencies create queues and increase lead time. Eliminating hard dependencies is a focus of Architecture Decoupling.

Referenced in: Team Alignment to Code

Hardening Sprint

A sprint dedicated to stabilizing and fixing defects before a release. The existence of hardening sprints is a strong signal that quality is not being built in during regular development. Teams practicing CD do not need hardening sprints because every commit is deployable. See Testing Fundamentals.

Referenced in: Hardening Sprints Are Needed Before Every Release

Hypothesis-Driven Development

An approach that frames every change as an experiment with a predicted outcome. Instead of specifying a change as a requirement to implement, the team states a hypothesis: “We believe [this change] will produce [this outcome] because [this reason].” After deployment, the team validates whether the predicted outcome occurred. Changes that confirm the hypothesis build confidence. Changes that refute it produce learning that informs the next hypothesis. This creates a feedback loop where every deployed change generates a signal, whether it “succeeds” or not. See Hypothesis-Driven Development for the full lifecycle and Agent Delivery Contract for how hypotheses integrate with specification artifacts.

Referenced in: Metrics-Driven Improvement, Agent Delivery Contract, Agent-Assisted Specification

I

Immutable Artifact

A build artifact that is never modified after creation. The same artifact that is tested in the pipeline is the exact artifact that is deployed to production. Configuration differences between environments are handled externally. See Immutable Artifacts.

Referenced in: Merge Freezes Before Deployments

Integration Frequency

How often a developer integrates code to the shared trunk. CD requires at least daily integration. See Metrics - Integration Frequency.

Referenced in: The Team Has No Shared Agreements About How to Work

L

Lead Time for Changes

The elapsed time from when a commit is made to when it is successfully running in production. One of the four DORA metrics. See Metrics - Lead Time.

Referenced in: Architecture Decoupling, CD for Greenfield Projects, Development Cycle Time, Lead Time, Leadership Sees CD as a Technical Nice-to-Have, Manual Testing Only, Metrics-Driven Improvement, Phase 0: Assess, Retrospectives, Security Review Is a Gate, Not a Guardrail, Working Agreements

M

Mean Time to Restore (MTTR)

The elapsed time from when a production incident is detected to when service is restored. One of the four DORA metrics. Teams practicing CD have short MTTR because deployments are small, rollback is automated, and the cause of failure is easy to identify. See Metrics - Mean Time to Repair.

Referenced in: Architecture Decoupling, CD for Greenfield Projects, Metrics-Driven Improvement, Retrospectives

Modular Monolith

A single deployable application whose codebase is organized into well-defined modules with explicit boundaries. Each module encapsulates a bounded domain and communicates with other modules through defined interfaces, not by reaching into shared database tables or calling internal methods directly. The application deploys as one unit, but its internal structure allows teams to reason about, test, and change one module independently. See Pipeline Reference Architecture and Premature Microservices.

Referenced in: Multiple Teams, Single Deployable, Pipeline Reference Architecture, Single Team, Single Deployable, Team Alignment to Code

O

Orchestrator

An agent that coordinates the work of other agents. The orchestrator receives a high-level goal, breaks it into sub-tasks, delegates those sub-tasks to specialized sub-agents, and assembles the results. Because orchestrators accumulate context across multiple steps, context hygiene at agent boundaries is especially important - what the orchestrator passes to each sub-agent is a cost and quality decision. See Tokenomics.

Referenced in: Agentic Architecture Patterns, Agentic Continuous Delivery (ACD), Coding and Review Agent Configuration, Getting Started: Where to Put What, The Agentic Development Learning Curve, Tokenomics: Optimizing Token Usage in Agent Architecture

P

Pipeline

The automated sequence of build, test, and deployment stages that every change passes through on its way to production. See Phase 2 - Pipeline.

Referenced in: Agentic Continuous Delivery (ACD), AI Adoption Roadmap, CD for Greenfield Projects, Change Advisory Board Gates, Data Pipelines and ML Models Have No Deployment Automation, Database Migrations Block or Break Deployments, Deploying Stateful Services Causes Outages, Deployments Are One-Way Doors, Deterministic Pipeline, Developers Cannot Run the Pipeline Locally, DORA Recommended Practices, Each Language Has Its Own Ad Hoc Pipeline, Every Change Rebuilds the Entire Repository, Every Change Requires a Ticket and Approval Chain, Every Deployment Is Immediately Visible to All Users, Experience Reports, Feedback Takes Hours Instead of Minutes, Functional Tests, Getting a Test Environment Requires Filing a Ticket, Getting Started: Where to Put What, High Coverage but Tests Miss Defects, Horizontal Slicing, Independent Teams, Independent Deployables, Inverted Test Pyramid, Leadership Sees CD as a Technical Nice-to-Have, Long-Lived Feature Branches, Manual Testing Only, Merge Freezes Before Deployments, Metrics-Driven Improvement, Missing Deployment Pipeline, No Evidence of What Was Deployed or When, Phase 1: Foundations, Phase 2: Pipeline, Phase 3: Optimize, Pipeline Enforcement and Expert Agents, Pipeline Reference Architecture, Pipelines Take Too Long, Pitfalls and Metrics, Process & Deployment Defects, Product & Discovery Defects, Production Issues Discovered by Customers, Production Problems Are Discovered Hours or Days Late, Push-Based Work Assignment, Retrospectives, Rubber-Stamping AI-Generated Code, Coding and Review Agent Configuration, Agentic Architecture Patterns, Recommended Patterns for Agentic Workflow Architecture, Releases Are Infrequent and Painful, Releases Depend on One Person, Security Review Is a Gate, Not a Guardrail, Services in the Same Portfolio Have Wildly Different Maturity Levels, Services Reach Production with No Health Checks or Alerting, Small-Batch Agent Sessions, Staging Passes but Production Fails, Symptoms for Developers, TBD Migration Guide, Team Alignment to Code, Teams Cannot Change Their Own Pipeline Without Another Team, Test Doubles, Test Environments Take Too Long to Reset Between Runs, Test Suite Is Too Slow to Run, Tests Pass in One Environment but Fail in Another, Tests Randomly Pass or Fail, The Agentic Development Learning Curve, The Build Runs Again for Every Environment, The Deployment Target Does Not Support Modern CI/CD Tooling, The Development Workflow Has Friction at Every Step, Agent Delivery Contract, The Team Ignores Alerts Because There Are Too Many, The Team Is Afraid to Deploy, The Team Is Caught Between Shipping Fast and Not Breaking Things, The Team Resists Merging to the Main Branch, Thin-Spread Teams, Tightly Coupled Monolith, Tokenomics: Optimizing Token Usage in Agent Architecture, Vendor Release Cycles Constrain the Team’s Deployment Frequency, Work Requires Sign-Off from Teams Not Involved in Delivery, Your Migration Journey

Production-Like Environment

A test or staging environment that matches production in configuration, infrastructure, and data characteristics. Testing in environments that differ from production is a common source of deployment failures. See Production-Like Environments.

Referenced in: CD for Greenfield Projects, DORA Recommended Practices, Hard-Coded Environment Assumptions, Pipeline Enforcement and Expert Agents, Pipeline Reference Architecture, Progressive Rollout, Stakeholders See Working Software Only at Release Time, TBD Migration Guide

Prompt

The complete structured input provided to an LLM for a single inference call. A prompt is not a one- or two-sentence question. In production agentic systems, a prompt is a composed document that typically includes: a system instruction block (role definition, constraints, output format requirements), tool definitions, relevant context (documents, code, conversation history), and the user’s request or task description. The system instruction block and tool definitions alone can consume thousands of tokens before any user content is included. Understanding what a prompt actually contains is a prerequisite for effective tokenomics. See Tokenomics.

Referenced in: Agent-Assisted Specification, Agentic Architecture Patterns, Agent Delivery Contract, Pitfalls and Metrics, Rubber-Stamping AI-Generated Code, Small-Batch Agent Sessions, Tokenomics: Optimizing Token Usage in Agent Architecture

Prompt Caching

A server-side optimization where stable portions of a prompt are stored and reused across repeated calls instead of being processed as new input each time. Effective caching requires placing static content (system instructions, tool definitions, reference documents) at the beginning of the prompt so cache hits cover the maximum token span. Dynamic content (user request, current state) goes at the end where it does not invalidate the cached prefix. See Tokenomics.

Referenced in: Coding and Review Agent Configuration, Agentic Architecture Patterns, Tokenomics: Optimizing Token Usage in Agent Architecture

R

Rollback

The ability to revert a production deployment to a previous known-good state. CD requires automated rollback that takes minutes, not hours. See Rollback.

Referenced in: CD for Greenfield Projects, Change Advisory Board Gates, Change Fail Rate, Data Pipelines and ML Models Have No Deployment Automation, Database Migrations Block or Break Deployments, Deployable Definition, Deployments Are One-Way Doors, Every Change Requires a Ticket and Approval Chain, Experience Reports, Feature Flags, Horizontal Slicing, Mean Time to Repair, Metrics-Driven Improvement, Missing Deployment Pipeline, No Deployment Health Checks, Phase 2: Pipeline, Pipeline Reference Architecture, Pitfalls and Metrics, Process & Deployment Defects, Production Problems Are Discovered Hours or Days Late, Progressive Rollout, Release Frequency, Releases Depend on One Person, Single Path to Production, Symptoms for Developers, Systemic Defect Fixes, TBD Migration Guide, The Team Is Caught Between Shipping Fast and Not Breaking Things, Tightly Coupled Monolith, Work Decomposition

S

Soft Dependency

A dependency that can be worked around or deferred. Unlike hard dependencies, soft dependencies do not block work but may influence sequencing or design decisions. Feature flags can turn many hard dependencies into soft dependencies by allowing incomplete integrations to be deployed in a disabled state.

Story Points

A relative estimation unit used by some teams to forecast effort. Story points are frequently misused as a productivity metric, which creates perverse incentives to inflate estimates and discourages the small work decomposition that CD requires. If your organization uses story points as a velocity target, see Metrics-Driven Improvement.

Referenced in: Leadership Sees CD as a Technical Nice-to-Have, Some Developers Are Overloaded While Others Wait for Work, Team Burnout and Unsustainable Pace, Velocity as Individual Metric

Sub-agent

A specialized agent invoked by an orchestrator to perform a specific, well-defined task. Sub-agents should receive only the context relevant to their task - not the orchestrator’s full accumulated context. Passing oversized context bundles to sub-agents is a common source of unnecessary token consumption and can degrade performance by burying relevant information. See Tokenomics.

Referenced in: Coding and Review Agent Configuration, Agentic Architecture Patterns, Tokenomics: Optimizing Token Usage in Agent Architecture

Subdomain Product Team

A team that owns a bounded subdomain within a larger distributed system - full-stack within their service (API, business logic, data store) but not directly user-facing. Their public interface is designed for machines: other services or teams consume it through a defined API contract. A vertical slice for a subdomain product team delivers one observable behavior through that contract. The slice is done when the API satisfies the agreed behavior for its service consumers. Contrast with full-stack product team.

Referenced in: Horizontal Slicing, Small Batches, Work Decomposition

System Prompt

The static, stable instruction block placed at the start of a prompt that establishes the model’s role, constraints, output format requirements, and tool definitions. Unlike the user-provided portion of the prompt, system prompts change rarely between calls and are the primary candidates for prompt caching. Keeping the system prompt concise and placing it first maximizes cache effectiveness and reduces per-call input costs. See Tokenomics.

Referenced in: Agentic Architecture Patterns, Coding and Review Agent Configuration, Getting Started: Where to Put What, Pitfalls and Metrics, Tokenomics: Optimizing Token Usage in Agent Architecture

T

TBD (Trunk-Based Development)

A source-control branching model where all developers integrate to a single shared branch (trunk) at least once per day. Short-lived feature branches (less than a day) are acceptable. Long-lived feature branches are not. TBD is a prerequisite for CI, which is in turn a prerequisite for CD. See Trunk-Based Development.

Referenced in: Build Automation, CD for Greenfield Projects, Change & Complexity Defects, DORA Recommended Practices, Feature Flags, Integration Frequency, Long-Lived Feature Branches, Metrics-Driven Improvement, Multiple Teams, Single Deployable, Phase 1: Foundations, Process & Deployment Defects, Retrospectives, Single Team, Single Deployable, TBD Migration Guide, Team Membership Changes Constantly, The Team Resists Merging to the Main Branch, Trunk-Based Development, Work Decomposition, Work in Progress, Work Items Take Days or Weeks to Complete, Working Agreements

TDD (Test-Driven Development)

See Testing Glossary.

Token

The billing and capacity unit for LLMs. A token is roughly three-quarters of an English word. All LLM costs, latency, and context limits are measured in tokens, not words, sentences, or API calls. Input and output tokens are priced and counted separately. Output tokens typically cost 2-5x more than input tokens because generating tokens is computationally more expensive than reading them. Frontier models cost 10-20x more per token than smaller alternatives. See Tokenomics.

Referenced in: Agentic Architecture Patterns, Agentic Continuous Delivery (ACD), AI Is Generating Technical Debt Faster Than the Team Can Absorb It, Coding and Review Agent Configuration, Getting Started: Where to Put What, The Agentic Development Learning Curve, Tokenomics: Optimizing Token Usage in Agent Architecture

Toil

Repetitive, manual work related to maintaining a production service that is automatable, has no lasting value, and scales linearly with service size. Examples include manual deployments, manual environment provisioning, and manual test execution. Eliminating toil is a primary benefit of building a CD pipeline.

Referenced in: AI Adoption Roadmap, Architecture Decoupling, Build Duration, Change Advisory Board Gates, Deployable Definition, DORA Recommended Practices, Experience Reports, Feature Flags, Lead Time, Progressive Rollout, Tightly Coupled Monolith, Your Migration Journey

U

Unplanned Work

Work that arrives outside the planned backlog - production incidents, urgent bug fixes, ad hoc requests. High levels of unplanned work indicate systemic quality or operational problems. Teams with high change failure rates generate their own unplanned work through failed deployments. Reducing unplanned work is a natural outcome of improving change failure rate through CD practices.

Referenced in: Team Burnout and Unsustainable Pace, Thin-Spread Teams

V

Virtual Service

See Testing Glossary.

Referenced in: Test Environments Take Too Long to Reset Between Runs

Value Stream Map

A visual representation of every step required to deliver a change from request to production, showing process time, wait time, and percent complete and accurate at each step. The foundational tool for Phase 0 - Assess.

Referenced in: Phase 0: Assess

Vertical Sliced Story

A user story that delivers a thin slice of functionality across all layers of the system (UI, API, database, etc.) rather than a horizontal slice that implements one layer completely. Vertical slices are independently deployable and testable, which is essential for CD. Vertical slicing is a core technique in Work Decomposition.

Referenced in: Agent-Assisted Specification, CD Dependency Tree, CD for Greenfield Projects, Horizontal Slicing, Long-Lived Feature Branches, Monolithic Work Items, Small Batches, Small-Batch Agent Sessions, Sprint Planning Is Dominated by Dependency Negotiation, Stakeholders See Working Software Only at Release Time

W

WIP (Work in Progress)

The number of work items that have been started but not yet completed. High WIP increases lead time, reduces focus, and increases context-switching overhead. Limiting WIP is a key practice in Phase 3 - Limiting WIP.

Referenced in: Architecture Decoupling, Development Cycle Time, DORA Recommended Practices, Everything Started, Nothing Finished, Experience Reports, Feature Flags, Metrics-Driven Improvement, Phase 3: Optimize, Pitfalls and Metrics, Push-Based Work Assignment, Retrospectives, Retrospectives Produce No Real Change, Small Batches, Symptoms for Managers, TBD Migration Guide, Team Burnout and Unsustainable Pace, Team Membership Changes Constantly, The Team Has No Shared Agreements About How to Work, Tokenomics: Optimizing Token Usage in Agent Architecture, Work Decomposition, Work in Progress, Working Agreements

White Box Testing

See Testing Glossary.

Working Agreement

An explicit, documented set of team norms covering how work is defined, reviewed, tested, and deployed. Working agreements create shared expectations and reduce friction. See Working Agreements.

Referenced in: AI Tooling Slows You Down Instead of Speeding You Up, Pull Requests Sit for Days Waiting for Review, Rubber-Stamping AI-Generated Code, The Team Has No Shared Agreements About How to Work

Content contributed by Dojo Consortium, licensed under CC BY 4.0.

9 - FAQ

Frequently asked questions about continuous delivery and this migration guide.

About This Guide

Why does this migration guide exist?

Many teams say they want to adopt continuous delivery but do not know where to start. The CD landscape is full of tools, frameworks, and advice, but there is no clear, sequenced path from “we deploy monthly” to “we can deploy any change at any time.” This guide provides that path.

It is built on the MinimumCD definition of continuous delivery and draws on practices from the Dojo Consortium and the DORA research. The content is organized as a phased migration journey from your current state to continuous delivery rather than as a description of what CD looks like when you are already there.

Who is this guide for?

This guide is for development teams, tech leads, and engineering managers who want to improve their software delivery practices. It is designed for teams that are currently deploying infrequently (monthly, quarterly, or less) and want to reach a state where any change can be deployed to production at any time.

You do not need to be starting from zero. If your team already has CI in place, you can begin with Phase 2: Pipeline. If you have a pipeline but deploy infrequently, start with Phase 3: Optimize. Use the Phase 0 assessment to find your starting point.

Should we adopt this guide as an organization or as a team?

Start with a single team. CD adoption works best when a team can experiment, learn, and iterate without waiting for organizational consensus. Once one team demonstrates results (shorter lead times, lower change failure rate, more frequent deployments), other teams will have a concrete example to follow.

Organizational adoption comes after team adoption, not before. The role of organizational leadership is to create the conditions for teams to succeed: stable team composition, tool funding, policy flexibility for deployment processes, and protection from pressure to cut corners on quality.

How do we use this guide for improvement?

Start with Phase 0: Assess. Map your value stream, measure your current performance, and identify your top constraints. Then work through the phases in order, focusing on one constraint at a time.

The guide is not a checklist to complete in sequence. It is a reference that helps you decide what to work on next. Some teams will spend months in Phase 1 building testing fundamentals. Others will move quickly to Phase 2 because they already have strong development practices. Your value stream map and metrics tell you where to invest.

Revisit your assessment periodically. As you improve, new constraints will emerge. The phases give you a framework for addressing them.

Continuous Delivery Concepts

What is the difference between continuous delivery and continuous deployment?

Continuous delivery means every change to the codebase is always in a deployable state and can be released to production at any time through a fully automated pipeline. The decision to deploy may still be made by a human, but the capability to deploy is always present.

Continuous deployment is an extension of continuous delivery where every change that passes the automated pipeline is deployed to production without manual intervention.

This migration guide takes you through continuous delivery (Phases 0-3) and then to continuous deployment (Phase 4). Continuous delivery is the prerequisite. You cannot safely automate deployment decisions until your pipeline reliably determines what is deployable.

Is continuous delivery the same as having a CD pipeline?

No. Many teams have a CD pipeline tool (Jenkins, GitHub Actions, GitLab CI, etc.) but are not practicing continuous delivery. A pipeline tool is necessary but not sufficient. Continuous delivery also requires trunk-based development, comprehensive test automation, a single path to production, immutable artifacts, and the ability to deploy any green build. If your team has a pipeline but uses long-lived feature branches, deploys only at the end of a sprint, or requires manual testing before a release, you have a pipeline tool but you are not practicing continuous delivery. The current-state checklist in Phase 0 helps you assess the gap.

What does “the pipeline is the only path to production” mean?

It means there is exactly one way for any change to reach production: through the automated pipeline. No one can SSH into a server and make a change. No one can skip the test suite for an “urgent” fix. No one can deploy from their local machine.

This constraint is what gives you confidence. If every change in production has been through the same build, test, and deployment process, you know what is running and how it got there. If exceptions are allowed, you lose that guarantee, and your ability to reason about production state degrades.

During your migration, establishing this single path is a key milestone in Phase 2.

What does “application configuration” mean in the context of CD?

Application configuration refers to values that change between environments but are not part of the application code: database connection strings, API endpoints, feature flag states, logging levels, and similar settings.

In a CD pipeline, configuration is externalized. It lives outside the artifact and is injected at deployment time. This is what makes immutable artifacts possible. You build the artifact once and deploy it to any environment by providing the appropriate configuration.

If configuration is embedded in the artifact (for example, hardcoded URLs or environment-specific config files baked into a container image), you must rebuild the artifact for each environment, which means the artifact you tested is not the artifact you deploy. This breaks the immutability guarantee. See Application Config.

What is an “immutable artifact” and why does it matter?

An immutable artifact is a build output (container image, binary, package) that is never modified after it is created. The exact artifact that passes your test suite is the exact artifact that is deployed to staging, and then to production. Nothing is recompiled, repackaged, or patched between environments.

This matters because it eliminates an entire category of deployment failures: “it worked in staging but not in production” caused by differences in the build. If the same bytes are deployed everywhere, build-related discrepancies are impossible.

Immutability requires externalizing configuration (see above) and storing artifacts in a registry or repository. See Immutable Artifacts.

What does “deployable” mean?

A change is deployable when it has passed all automated quality gates defined in the pipeline. The definition is codified in the pipeline itself, not decided by a person at deployment time.

A typical deployable definition includes:

All unit tests pass
All integration tests pass
All functional tests pass
Static analysis checks pass (linting, security scanning)
The artifact is built and stored in the artifact registry
Deployment to a production-like environment succeeds
Smoke tests in the production-like environment pass

If any of these gates fail, the change is not deployable. The pipeline makes this determination automatically and consistently. See Deployable Definition.

What is the difference between deployment and release?

Deployment is the act of putting code into a production environment.

Release is the act of making functionality available to users.

These are different events, and decoupling them is one of the most powerful techniques in CD. You can deploy code to production without releasing it to users by using feature flags. The code is running in production, but the new functionality is disabled. When you are ready, you enable the flag and the feature is released.

This decoupling is important because it separates the technical risk (will the deployment succeed?) from the business risk (will users like the feature?). You can manage each risk independently. Deployments become routine technical events. Releases become deliberate business decisions.

Migration Questions

How long does the migration take?

It depends on where you start and how much organizational support you have. As a rough guide:

Phase 0 (Assess): 1-2 weeks
Phase 1 (Foundations): 1-6 months, depending on current testing and TBD maturity
Phase 2 (Pipeline): 1-3 months
Phase 3 (Optimize): 2-6 months
Phase 4 (Deliver on Demand): 1-3 months

These ranges assume a single team working on the migration alongside regular delivery work. The biggest variable is Phase 1: teams with no test automation or TBD practice will spend longer building foundations than teams that already have these in place.

Do not treat these timelines as commitments. The migration is an iterative improvement process, not a project with a deadline.

Do we stop delivering features during the migration?

No. The migration is done alongside regular delivery work, not instead of it. Each migration practice is adopted incrementally: you do not stop the world to rewrite your test suite or redesign your pipeline.

For example, in Phase 1 you adopt trunk-based development by reducing branch lifetimes gradually: from two weeks to one week to two days to same-day. You add automated tests incrementally, starting with the highest-risk code paths. You decompose work into smaller stories one sprint at a time.

The migration practices themselves improve your delivery speed, so the investment pays off as you go. Teams that have completed Phase 1 typically report delivering features faster than before, not slower.

What if our organization requires manual change approval (CAB)?

Many organizations have Change Advisory Board (CAB) processes that require manual approval before production deployments. This is one of the most common organizational blockers for CD. The path forward is to replace the manual approval with automated evidence: a mature CD pipeline provides stronger safety guarantees than a committee meeting, and your DORA metrics can demonstrate this. Most CAB processes were designed for monthly releases with hundreds of changes per batch; when you deploy daily with one or two changes, the risk profile is fundamentally different. See CAB Gates for a detailed approach to this transition.

What if we have a monolithic architecture?

You can practice continuous delivery with a monolith. CD does not require microservices. Many of the highest-performing teams in the DORA research deploy monolithic applications multiple times per day.

What matters is that your architecture supports independent testing and deployment. A well-structured monolith with a comprehensive test suite and a reliable pipeline can achieve CD. A poorly structured collection of microservices with shared databases and coordinated releases cannot.

Architecture decoupling is addressed in Phase 3, but it is about enabling independent deployment and reducing coordination costs, not about adopting any particular architectural style.

What if our tests are slow or unreliable?

This is one of the most common starting conditions. A slow or flaky test suite undermines every CD practice: developers stop trusting the tests, broken builds are ignored, and the pipeline becomes a bottleneck rather than an enabler. The fix is incremental: quarantine flaky tests, parallelize execution, rebalance toward fast unit tests, and set a pipeline time budget (under 10 minutes). See Testing Fundamentals and the Testing reference section for detailed guidance.

Where do I start if I am not sure which phase applies to us?

Start with Phase 0: Assess. Complete the value stream mapping exercise, take baseline metrics, and fill out the current-state checklist. These activities will tell you exactly where you stand and which phase to begin with.

If you do not have time for a full assessment, ask yourself these questions:

Do all developers integrate to trunk at least daily? If no, start with Phase 1.
Do you have a single automated pipeline that every change goes through? If no, start with Phase 2.
Can you deploy any green build to production on demand? If no, focus on the gap between your current state and Phase 2 completion criteria.
Do you deploy at least weekly? If no, look at Phase 3 for batch size and flow optimization.

Is CD about speed or quality?

Quality. The purpose of the pipeline is to validate that an artifact is production-worthy or reject it. Do not chase daily deployments without first building confidence in your ability to detect failure. Move validation as close to the developer as possible: run it on the desktop, run it again on merge to trunk, run it again when the trunk changes.

Testing is not limited to functional tests. You need to test for security, compliance, performance, and everything else required in your context. Set error budgets and do not exceed them. When your error budget is spent, stop shipping features and invest in pipeline hardening. When something breaks in production, harden the pipeline. When exploratory testing uncovers an edge case, harden the pipeline. The primary goal is to build efficient and effective quality gates. Only then can you move quickly.

10 - Resources

Books, videos, and further reading on continuous delivery and deployment.

This page collects the books, websites, and videos that inform the practices in this migration guide. Resources are organized by topic and annotated with which migration phase they are most relevant to.

Books

Continuous Delivery and Deployment

Modern Software Engineering by Dave Farley: Farley’s broader take on what it means to do software engineering well. Covers the principles behind CD - iterating toward a goal, getting fast feedback, working in small steps - and connects them to test-driven development, managing complexity, and designing for testability. Useful for teams that want to understand the why behind CD practices, not just the how.; Most relevant to: All phases
Continuous Delivery Pipelines by Dave Farley: A practical, focused guide to building CD pipelines. Farley covers pipeline design, testing strategies, and deployment patterns in a direct, implementation-oriented style. Start here if you want a concise guide to the pipeline practices in Phase 2.; Most relevant to: Phase 2: Pipeline
Continuous Delivery by Jez Humble and Dave Farley: The foundational text on CD. Published in 2010, it remains the most comprehensive treatment of the principles and practices that make continuous delivery work. Covers version control patterns, build automation, testing strategies, deployment pipelines, and release management. If you read one book before starting your migration, read this one.; Most relevant to: All phases
Accelerate by Nicole Forsgren, Jez Humble, and Gene Kim: Presents the DORA research findings that link technical practices to organizational performance. Covers the four key metrics (deployment frequency, lead time, change failure rate, MTTR) and the capabilities that predict high performance. Essential reading for anyone who needs to make the business case for a CD migration.; Most relevant to: Phase 0: Assess and Phase 3: Metrics-Driven Improvement
Engineering the Digital Transformation by Gary Gruver: Addresses the organizational and leadership challenges of large-scale delivery transformation. Gruver draws on his experience leading transformations at HP and other large enterprises. Particularly valuable for leaders sponsoring a migration who need to understand the change management, communication, and sequencing challenges ahead.; Most relevant to: Organizational leadership across all phases
Release It! by Michael T. Nygard: Covers the design and architecture patterns that make production systems resilient. Topics include stability patterns (circuit breakers, bulkheads, timeouts), deployment patterns, and the operational realities of running software at scale. Essential reading before entering Phase 4, where the team has the capability to deploy any change on demand.; Most relevant to: Phase 4: Deliver on Demand and Phase 2: Rollback
The DevOps Handbook by Gene Kim, Jez Humble, Patrick Debois, and John Willis: A practical companion to The Phoenix Project. Covers the Three Ways (flow, feedback, and continuous learning) and provides detailed guidance on implementing DevOps practices. Useful as a reference throughout the migration.; Most relevant to: All phases
The Phoenix Project by Gene Kim, Kevin Behr, and George Spafford: A novel that illustrates DevOps principles through the story of a fictional IT organization in crisis. Useful for building organizational understanding of why delivery improvement matters, especially for stakeholders who will not read a technical book.; Most relevant to: Building organizational buy-in during Phase 0

Testing

Growing Object-Oriented Software, Guided by Tests by Steve Freeman and Nat Pryce: The definitive guide to test-driven development in practice. Goes beyond unit testing to cover acceptance testing, test doubles, and how TDD drives design. Essential reading for Phase 1 testing fundamentals.; Most relevant to: Phase 1: Testing Fundamentals
Working Effectively with Legacy Code by Michael Feathers: Practical techniques for adding tests to untested code, breaking dependencies, and incrementally improving code that was not designed for testability. Indispensable if your migration starts with a codebase that has little or no automated testing.; Most relevant to: Phase 1: Testing Fundamentals

Work Decomposition and Flow

User Story Mapping by Jeff Patton: A practical guide to breaking features into deliverable increments using story maps. Patton’s approach directly supports the vertical slicing discipline required for small batch delivery.; Most relevant to: Phase 1: Work Decomposition
The Principles of Product Development Flow by Donald Reinertsen: A rigorous treatment of flow economics in product development. Covers queue theory, batch size economics, WIP limits, and the cost of delay. Dense but transformative. Reading this book will change how you think about every aspect of your delivery process.; Most relevant to: Phase 3: Optimize
Making Work Visible by Dominica DeGrandis: Focuses on identifying and eliminating the “time thieves” that steal productivity: too much WIP, unknown dependencies, unplanned work, conflicting priorities, and neglected work. A practical companion to the WIP limiting practices in Phase 3.; Most relevant to: Phase 3: Limiting WIP

Databases

Refactoring Databases: Evolutionary Database Design by Scott Ambler and Pramod Sadalage: The definitive guide to managing database schema changes incrementally. Covers expand-contract migrations, backward-compatible schema changes, and techniques for evolving databases without downtime. Essential reading for teams whose deployment pipeline includes database changes.; Most relevant to: Phase 2: Pipeline and Phase 3: Small Batches

Architecture

Building Microservices by Sam Newman: Covers the architectural patterns that enable independent deployment, including service boundaries, API design, data management, and testing strategies for distributed systems.; Most relevant to: Phase 3: Architecture Decoupling
Team Topologies by Matthew Skelton and Manuel Pais: Addresses the relationship between team structure and software architecture (Conway’s Law in practice). Covers team types, interaction modes, and how to evolve team structures to support fast flow. Valuable for addressing the organizational blockers that surface throughout the migration.; Most relevant to: Organizational design across all phases

Websites

MinimumCD.org: Defines the minimum set of practices required to claim you are doing continuous delivery. This migration guide uses the MinimumCD definition as its target state. Start here to understand what CD actually requires.
Dojo Consortium: A community-maintained collection of CD practices, metrics definitions, and improvement patterns. Many of the definitions and frameworks in this guide are adapted from the Dojo Consortium’s work.
DORA (dora.dev): The DevOps Research and Assessment site, which publishes the annual State of DevOps report and provides resources for measuring and improving delivery performance.
Trunk-Based Development: The comprehensive reference for trunk-based development patterns. Covers short-lived feature branches, feature flags, branch by abstraction, and release branching strategies.
Martin Fowler’s blog (martinfowler.com): Martin Fowler’s site contains authoritative articles on continuous integration, continuous delivery, microservices, refactoring, and software design. Key articles include “Continuous Integration” and “Continuous Delivery.”
Google Cloud Architecture Center: DevOps: Google’s public documentation of the DORA capabilities, including self-assessment tools and implementation guidance.

Videos

“Modern Software Engineering” by Dave Farley (YouTube channel): Dave Farley’s YouTube channel provides weekly videos covering CD practices, pipeline design, testing strategies, and software engineering principles. Accessible and practical.; Most relevant to: All phases
“Continuous Delivery” by Jez Humble (various conference talks): Jez Humble’s conference presentations cover the principles and research behind CD. His talk “Why Continuous Delivery?” is an excellent introduction for teams and stakeholders who are new to the concept.; Most relevant to: Building understanding during Phase 0
“Refactoring” and “TDD” talks by Martin Fowler and Kent Beck: Foundational talks on the development practices that support CD. Understanding TDD and refactoring is essential for Phase 1 testing fundamentals.; Most relevant to: Phase 1: Foundations
“The Smallest Thing That Could Possibly Work” by Bryan Finster: Covers the work decomposition and small batch delivery practices that are central to this migration guide. Focuses on practical techniques for breaking work into vertical slices.; Most relevant to: Phase 1: Work Decomposition and Phase 3: Small Batches
“Real Example of a Deployment Pipeline in the Fintech Industry” by Dave Farley: A concrete walkthrough of a production deployment pipeline in a regulated financial services environment. Demonstrates that CD practices are compatible with compliance requirements.; Most relevant to: Phase 2: Pipeline

Blog Posts and Articles

Continuous Integration Certification by Martin Fowler: A short, practical test for whether your team is actually practicing continuous integration. Useful as a self-assessment during Phase 1.; Most relevant to: Phase 1: Foundations
Continuous Delivery: Anatomy of the Deployment Pipeline by Dave Farley: An article-length overview of deployment pipeline structure, covering commit stage, acceptance testing, and release stages. A good companion to the pipeline phase of this guide.; Most relevant to: Phase 2: Pipeline

Reference

Sections

1 - Pipeline Reference Architecture

Quality Gates in Priority Sequence

Pre-commit Gates

CI Stage 1: Build and Fast Tests < 5 min

CD Stage 1: Integration and Contract Tests < 10 min

CD Stage 2: Broader Automated Verification < 15 min

Acceptance Tests < 20 min

Production Verification

Pre-Feature Baseline

Pipeline Patterns

Mapping to the Defect Sources Catalog

Further Reading

Related Content

1.1 - Single Team, Single Deployable

Key Characteristics

When This Architecture Breaks Down

Related Content

1.2 - Multiple Teams, Single Deployable

Key Characteristics

Preventing the Integration Pipeline from Becoming a Bottleneck

When to Move Away from This Architecture

Related Content

1.3 - Independent Teams, Independent Deployables

Key Characteristics

Why API Management Is Critical

What API Management Requires

Contract Verification Approaches

Additional Quality Gates for Distributed Architectures

When This Architecture Works

When This Architecture Fails

Relationship to the Other Architectures

Related Content

2 - Systemic Defect Fixes

How to Use This Catalog

Categories

Related Content

2.1 - Product & Discovery Defects

Related Content

2.2 - Integration & Boundaries Defects

Related Content

2.3 - Knowledge & Communication Defects

Related Content

2.4 - Change & Complexity Defects

Related Content

2.5 - Testing & Observability Gap Defects

Related Content

2.6 - Process & Deployment Defects

Related Content

2.7 - Data & State Defects

Related Content

2.8 - Dependency & Infrastructure Defects

Related Content

2.9 - Security & Compliance Defects

Related Content

2.10 - Performance & Resilience Defects

Related Content

3 - CD Practices

Core Practices

3.1 - Continuous Integration

Definition

Minimum Activities Required

Why This Matters

Without CI, Teams Experience

With CI, Teams Achieve

What Is Improved

Teamwork

Work Breakdown

Testing

Migration Guidance

Additional Resources

3.2 - Trunk-Based Development

Definition

Minimum Activities Required

What Is Improved

Migration Guidance

Additional Resources

3.3 - Single Path to Production

Definition