Architecting Tests for CD

Test architecture, types, and good practices for building confidence in your delivery pipeline.

A test architecture that lets your pipeline deploy confidently, regardless of external system availability, is a core CD capability. The child pages cover each test type.

A CD pipeline’s job is to force every artifact to prove it is worthy of delivery. That proof only works when test changes ship with the code they validate. If a developer adds a feature but the corresponding tests arrive in a later commit, the pipeline approved an artifact it never actually verified. That is not a CD pipeline. It is a CI pipeline with a deploy step. Tests and production code must always travel together through the pipeline as a single unit of change.

Beyond the Test Pyramid

The test pyramid: a triangle with Unit Tests at the wide base (fast, cheap, many), Integration/Component in the middle, and End-to-End at the narrow top (slow, expensive, few). Arrows on the sides indicate cost and speed increase toward the top.

The test pyramid says: write many fast unit tests at the base, fewer integration tests in the middle, and only a handful of end-to-end tests at the top. The underlying principle is sound - lower-level tests are faster, more deterministic, and cheaper to maintain.

The principle behind the shape

The pyramid’s shape communicates a principle: prefer fast, deterministic tests that you fully control. Tests at the base are cheap to write, fast to run, and reliable. Tests at the top are slow, expensive, and depend on systems outside your control. The more weight you put at the base, the faster and more reliable your pipeline becomes - to a point. We also have the engineering goal of achieving the most functional coverage with the fewest number of tests. Every test costs money to maintain and adds time to the pipeline.

The testing trophy

The testing trophy: a trophy-shaped diagram where Component Tests form the large diamond-shaped body, Unit Tests form the narrow stem, Static Analysis forms the base pedestal, and End-to-End tests form a small triangle at the peak.

The testing trophy, popularized by Kent C. Dodds, rebalances the pyramid by putting component tests at the center. Where the pyramid emphasizes unit tests at the base, the trophy argues that component tests give you the most confidence per test because they exercise realistic user behavior through a component’s public interface while still using test doubles for external dependencies.

The trophy also makes static analysis explicit as the foundation. Linting, type checking, and formatting catch entire categories of defects for free - no test code to write or maintain.

Both models agree on the principle: keep end-to-end tests few and focused, and maximize fast, deterministic coverage. The trophy simply shifts where that coverage concentrates. For teams building component-heavy applications, the trophy distribution often produces better results than a strict pyramid.

Teams often miss this underlying principle and treat either shape as a metric. They count tests by type and debate ratios - “do we have enough unit tests?” or “are our integration tests too many?” - when the real question is:

Can our pipeline determine that a change is safe to deploy without depending on any system we do not control?

A pipeline that answers yes can deploy at any time - even when a downstream service is down, a third-party API is slow, or a partner team hasn’t shipped yet. That independence is what CD requires, and it is the reason the pyramid favors the base.

What this looks like in practice

A test architecture that achieves this has three responsibilities:

  1. Fast, deterministic tests - unit, component, and contract tests - run on every commit using test doubles for external dependencies. They give a reliable go/no-go signal in minutes.
  2. Acceptance tests validate that a deployed artifact is deliverable. Acceptance testing is not a single test type. It is a pipeline stage that can include component tests, load tests, chaos tests, resilience tests, and compliance tests. Any test that runs after CI to gate promotion to production is an acceptance test.
  3. Integration tests validate that contract test doubles still match the real external systems. They run in a dedicated test environment with versioned test data, on demand or on a schedule, providing monitoring rather than gating.

The anti-pattern: the ice cream cone

The ice cream cone anti-pattern: an inverted test distribution where most testing effort goes to manual and end-to-end tests at the top, with too few fast unit tests at the bottom

Most teams that struggle with CD have inverted the pyramid - too many slow, flaky end-to-end tests and too few fast, focused ones. Manual gates block every release. The pipeline cannot give a fast, reliable answer, so deployments become high-ceremony events.

Test Architecture

A test architecture is the deliberate structure of how different test types work together across your pipeline to give you deployment confidence. Use the table below to decide what type of test to write and where it runs. This is not a comprehensive list. It shows how common tests impact pipeline design and how teams should structure their suites. See the Pipeline Reference Architecture for a complete quality gate sequence.

Four-lane CD pipeline diagram. Pipeline lane: Commit triggers pre-merge and CI checks (Static Analysis, Unit Tests, Component Tests, Contract Tests - deterministic, blocks merge), then Build, Deploy to test environment, Acceptance Tests in test environment (Component, Load, Chaos, Resilience, Compliance - gates promotion to production), Deploy to production, and a green Live checkmark. Post-deploy lane: Production Verification (Health Checks, Real User Monitoring, SLO) triggered after production deploy - non-deterministic, triggers alerts, never blocks promotion. Async lane: Integration Tests validate contract test doubles against real systems - non-deterministic, post-deploy, failures trigger review. Continuous lane: Exploratory Testing and Usability Testing run continuously alongside delivery and never block.
Pipeline StageWhat You Need to VerifyTest TypeSpeedDeterministic?Blocks Deploy?
CIA function or method behaves correctlyUnitMillisecondsYes■ Yes
CIA complete component or service works through its public interfaceComponentMilliseconds to secondsYes■ Yes
CIYour code correctly interacts with external system interfacesContractMilliseconds to secondsYes■ Yes
CICode quality, security, and style complianceStatic AnalysisSecondsYes■ Yes
CIUI meets WCAG accessibility standardsStatic Analysis + ComponentSecondsYes■ Yes
Acceptance TestingDeployed artifact meets acceptance criteriaDeploy, Smoke, Load, Resilience, Compliance, etc.MinutesNo■ Yes - gates production
Post-deploy (production)Critical user journeys work in productionE2E smokeSeconds to minutesNoNo - triggers rollback
Post-deploy (production)Production health and SLOsSynthetic monitoringContinuousNoNo - triggers alerts
On demand/scheduledContract test doubles still match real external systemsIntegrationSeconds to minutesNoNo - triggers review
ContinuousUnexpected behavior, edge cases, real-world workflowsExploratory TestingVariesNoNever
ContinuousReal users can accomplish goals effectivelyUsability TestingVariesNoNever

The critical insight: everything that blocks merge is deterministic and under your control. Acceptance tests gate production promotion after verifying the deployed artifact. Everything that involves real external systems runs post-deployment. This is what gives you the independence to deploy any time, regardless of the state of the world around you.

Pre-merge vs post-merge

The table maps to two distinct phases of your pipeline, each with different goals and constraints.

Pre-merge (before code lands on trunk): Run unit, component, and contract tests. These must all be deterministic and fast. Target: under 10 minutes total. This is the quality gate that every change must pass. If pre-merge tests are slow, developers batch up changes or skip local runs, both of which undermine continuous integration.

Post-merge (after code lands on trunk, before or after deployment): Re-run the full deterministic suite against the integrated trunk. Then run acceptance tests, E2E smoke tests, and synthetic monitoring post-deploy. Integration tests run separately in a test environment, on demand or on a schedule. Target: under 60 minutes for the full post-merge cycle.

Why re-run pre-merge tests post-merge? Two changes can each pass pre-merge independently but conflict when combined on trunk. The post-merge run catches these integration effects.

If a post-merge failure occurs, the team fixes it immediately. Trunk must always be releasable.

This post-merge re-run is what teams traditionally call regression testing: running all previous tests against the current artifact to confirm that existing behavior still works after a change. In CD, regression testing is not a separate test type or a special suite. Every test in the pipeline is a regression test. The deterministic suite runs on every commit, and the full suite runs post-merge. If all tests pass, the artifact has been regression-tested.

good practices

Do

  • Run tests on every commit. If tests do not run automatically, they will be skipped.
  • Keep the deterministic suite under 10 minutes. If it is slower, developers will stop running it locally.
  • Fix broken tests immediately. A broken test is equivalent to a broken build.
  • Delete tests that do not provide value. A test that never fails and tests trivial behavior is maintenance cost with no benefit.
  • Test behavior, not implementation. Use a black box approach - verify what the code does, not how it does it. As Ham Vocke advises: “if I enter values x and y, will the result be z?” - not the sequence of internal calls that produce z. Avoid white box testing that asserts on internals.
  • Use test doubles for external dependencies. Your deterministic tests should run without network access to external systems.
  • Validate test doubles with contract tests. Test doubles that drift from reality give false confidence.
  • Treat test code as production code. Give it the same care, review, and refactoring attention.
  • Run automated accessibility checks on every commit. WCAG compliance scans are fast, deterministic, and catch violations that are invisible to sighted developers. Treat them like security scans: automate the detectable rules and reserve manual review for subjective judgment.

Do Not

  • Do not tolerate flaky tests. Quarantine or delete them immediately.
  • Do not gate your pipeline on non-deterministic tests. E2E and integration test failures should trigger review or alerts, not block deployment.
  • Do not couple your deployment to external system availability. If a third-party API being down prevents you from deploying, your test architecture has a critical gap.
  • Do not write tests after the fact as a checkbox exercise. Tests written without understanding the behavior they verify add noise, not value.
  • Do not test private methods directly. Test the public interface; private methods are tested indirectly.
  • Do not share mutable state between tests. Each test should set up and tear down its own state.
  • Do not use sleep/wait for timing-dependent tests. Use explicit waits, polling, or event-driven assertions.
  • Do not require a running database or external service for unit or component tests. That makes them integration or end-to-end tests - which is fine, but categorize them correctly and run them post-deployment, not as a pre-merge gate.
  • Do not make exploratory or usability testing a release gate. These activities are continuous and inform product direction; they are not a pass/fail checkpoint before deployment.

Additional concepts drawn from Ham Vocke, The Practical Test Pyramid, and Toby Clemson, Testing Strategies in a Microservice Architecture.


Component Tests

Deterministic tests that verify a complete frontend component or backend service through its public interface, using test doubles for all external dependencies.

Contract Tests

Deterministic tests that verify interface boundaries with external systems using test doubles. Also called narrow integration tests. Validated by integration tests running against real systems.

End-to-End Tests

Tests that exercise two or more real components up to the full system. Non-deterministic by nature; never a pre-merge gate.

Test Feedback Speed

Why test suite speed matters for developer effectiveness and how cognitive limits set the targets.

Integration Tests

Tests that exercise real external dependencies to validate that contract test doubles still match reality. Non-deterministic; never a pre-merge gate.

Static Analysis

Code analysis tools that evaluate non-running code for security vulnerabilities, complexity, and best practice violations.

Test Doubles

Patterns for isolating dependencies in tests: stubs, mocks, fakes, spies, and dummies.

Unit Tests

Fast, deterministic tests that verify a unit of behavior through its public interface, asserting on what the code does rather than how it works.

Testing Glossary

Definitions for testing terms as they are used on this site.