Testing

Test architecture, types, and best practices for building confidence in your delivery pipeline.

A reliable test suite is essential for continuous delivery. This page describes the test architecture that gives your pipeline the confidence to deploy any change - even when dependencies outside your control are unavailable. The child pages cover each test type in detail.

Beyond the Test Pyramid

The test pyramid - many unit tests at the base, fewer integration tests in the middle, a handful of end-to-end tests at the top - has been the dominant mental model for test strategy since Mike Cohn introduced it. The core insight is sound: push testing as low as possible. Lower-level tests are faster, more deterministic, and cheaper to maintain. Higher-level tests are slower, more brittle, and more expensive.

But as a prescriptive model, the pyramid is overly simplistic. Teams that treat it as a rigid ratio end up in unproductive debates about whether they have “too many” integration tests or “not enough” unit tests. The shape of your test distribution matters far less than whether your tests, taken together, give you the confidence to deploy.

What actually matters

The pyramid’s principle - write tests with different granularity - remains correct. But for CD, the question is not “do we have the right pyramid shape?” The question is:

Can our pipeline determine that a change is safe to deploy without depending on any system we do not control?

This reframes the testing conversation. Instead of counting tests by type and trying to match a diagram, you design a test architecture where:

  1. Fast, deterministic tests catch the vast majority of defects and run on every commit. These tests use test doubles for anything outside the team’s control. They give you a reliable go/no-go signal in minutes.

  2. Contract tests verify that your test doubles still match reality. They run asynchronously and catch drift between your assumptions and the real world - without blocking your pipeline.

  3. A small number of non-deterministic tests validate that the fully integrated system works. These run post-deployment and provide monitoring, not gating.

This structure means your pipeline can confidently say “yes, deploy this” even if a downstream API is having an outage, a third-party service is slow, or a partner team hasn’t deployed their latest changes yet. Your ability to deliver is decoupled from the reliability of systems you do not own.

The anti-pattern: the ice cream cone

Most teams that struggle with CD have an inverted test distribution - too many slow, expensive end-to-end tests and too few fast, focused tests.

The ice cream cone anti-pattern: an inverted test distribution where most testing effort goes to manual and end-to-end tests at the top, with too few fast unit tests at the bottom

The ice cream cone makes CD impossible. Manual testing gates block every release. End-to-end tests take hours, fail randomly, and depend on external systems being healthy. The pipeline cannot give a fast, reliable answer about deployability, so deployments become high-ceremony events.

Test Architecture

A test architecture is the deliberate structure of how different test types work together across your pipeline to give you deployment confidence. Each layer has a specific role, and the layers reinforce each other.

LayerTest TypeRoleDeterministic?Details
1Unit TestsVerify behavior in isolation - catch logic errors, regressions, and edge cases instantlyYesFastest feedback loop; use test doubles for external dependencies
2Integration TestsVerify boundaries - catch mismatched interfaces, serialization errors, query bugsYesFast enough to run on every commit
3Functional TestsVerify your system works as a complete unit in isolationYesProves the system handles interactions correctly with all external dependencies stubbed
4Contract TestsVerify your test doubles still match realityNoRuns asynchronously; failures trigger review, not pipeline blocks
5End-to-End TestsVerify complete user journeys through the fully integrated systemNoMonitoring, not gating - runs post-deployment

Static Analysis runs alongside layers 1-3, catching code quality, security, and style issues without executing the code. Test Doubles are used throughout layers 1-3 to isolate external dependencies.

How the layers work together

Test layers by pipeline stage
Pipeline stage    Test layer              Deterministic?   Blocks deploy?
─────────────────────────────────────────────────────────────────────────
On every commit   Unit tests              Yes              Yes
                  Integration tests       Yes              Yes
                  Functional tests        Yes              Yes

Asynchronous      Contract tests          No               No (triggers review)

Post-deployment   E2E smoke tests         No               Triggers rollback if critical
                  Synthetic monitoring    No               Triggers alerts

The critical insight: everything that blocks deployment is deterministic and under your control. Everything that involves external systems runs asynchronously or post-deployment. This is what gives you the independence to deploy any time, regardless of the state of the world around you.

Pre-merge vs post-merge

The table above maps to two distinct phases of your pipeline, each with different goals and constraints.

Pre-merge (before code lands on trunk): Run unit, integration, and functional tests. These must all be deterministic and fast. Target: under 10 minutes total. This is the quality gate that every change must pass. If pre-merge tests are slow, developers batch up changes or skip local runs, both of which undermine continuous integration.

Post-merge (after code lands on trunk, before or after deployment): Re-run the full deterministic suite against the integrated trunk to catch merge-order interactions. Run contract tests, E2E smoke tests, and synthetic monitoring. Target: under 30 minutes for the full post-merge cycle.

Why re-run pre-merge tests post-merge? Two changes can each pass pre-merge independently but conflict when combined on trunk. The post-merge run catches these integration effects. If a post-merge failure occurs, the team fixes it immediately - trunk must always be releasable.

Testing Matrix

Use this reference to decide what type of test to write and where it runs in your pipeline.

What You Need to VerifyTest TypeSpeedDeterministic?Blocks Deploy?
A function or method behaves correctlyUnitMillisecondsYesYes
Components interact correctly at a boundaryIntegrationMilliseconds to secondsYesYes
Your whole service works in isolationFunctionalSecondsYesYes
Your test doubles match realityContractSecondsNoNo
A critical user journey works end-to-endE2EMinutesNoNo
Code quality, security, and style complianceStatic AnalysisSecondsYesYes
UI meets WCAG accessibility standardsStatic Analysis + FunctionalSecondsYesYes

Best Practices

Do

  • Run tests on every commit. If tests do not run automatically, they will be skipped.
  • Keep the deterministic suite under 10 minutes. If it is slower, developers will stop running it locally.
  • Fix broken tests immediately. A broken test is equivalent to a broken build.
  • Delete tests that do not provide value. A test that never fails and tests trivial behavior is maintenance cost with no benefit.
  • Test behavior, not implementation. Use a black box approach - verify what the code does, not how it does it. As Ham Vocke advises: “if I enter values x and y, will the result be z?” - not the sequence of internal calls that produce z. Avoid white box testing that asserts on internals.
  • Use test doubles for external dependencies. Your deterministic tests should run without network access to external systems.
  • Validate test doubles with contract tests. Test doubles that drift from reality give false confidence.
  • Treat test code as production code. Give it the same care, review, and refactoring attention.
  • Run automated accessibility checks on every commit. WCAG compliance scans are fast, deterministic, and catch violations that are invisible to sighted developers. Treat them like security scans: automate the detectable rules and reserve manual review for subjective judgment.

Do Not

  • Do not tolerate flaky tests. Quarantine or delete them immediately.
  • Do not gate your pipeline on non-deterministic tests. E2E and contract test failures should trigger review or alerts, not block deployment.
  • Do not couple your deployment to external system availability. If a third-party API being down prevents you from deploying, your test architecture has a critical gap.
  • Do not write tests after the fact as a checkbox exercise. Tests written without understanding the behavior they verify add noise, not value.
  • Do not test private methods directly. Test the public interface; private methods are tested indirectly.
  • Do not share mutable state between tests. Each test should set up and tear down its own state.
  • Do not use sleep/wait for timing-dependent tests. Use explicit waits, polling, or event-driven assertions.
  • Do not require a running database or external service for unit tests. That makes them integration tests - which is fine, but categorize them correctly.

Test Types

TypePurpose
Unit TestsVerify individual components in isolation
Integration TestsVerify components work together
Functional TestsVerify user-facing behavior
End-to-End TestsVerify complete user workflows
Contract TestsVerify API contracts between services
Static AnalysisCatch issues without running code
Test DoublesPatterns for isolating dependencies in tests
Feedback SpeedWhy test suite speed matters and the cognitive science behind the targets

Content contributed by Dojo Consortium, licensed under CC BY 4.0. Additional concepts drawn from Ham Vocke, The Practical Test Pyramid, and Toby Clemson, Testing Strategies in a Microservice Architecture.


Unit Tests

Fast, deterministic tests that verify a unit of behavior through its public interface, asserting on what the code does rather than how it works.

Integration Tests

Deterministic tests that verify how units interact together or with external system boundaries using test doubles for non-deterministic dependencies.

Functional Tests

Deterministic tests that verify all modules of a sub-system work together from the actor’s perspective, using test doubles for external dependencies.

End-to-End Tests

Non-deterministic tests that validate the entire software system along with its integration with external interfaces and production-like scenarios.

Contract Tests

Non-deterministic tests that validate test doubles by verifying API contract format against live external systems.

Static Analysis

Code analysis tools that evaluate non-running code for security vulnerabilities, complexity, and best practice violations.

Test Doubles

Patterns for isolating dependencies in tests: stubs, mocks, fakes, spies, and dummies.

Test Feedback Speed

Why test suite speed matters for developer effectiveness and how cognitive limits set the targets.

Testing Glossary

Definitions for testing terms as they are used on this site.