This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Architecture

Anti-patterns in system architecture and design that block continuous delivery.

These anti-patterns affect the structure of the software itself. They create coupling that makes independent deployment impossible, blast radii that make every change risky, and boundaries that force teams to coordinate instead of delivering independently.

1 - Untestable Architecture

Tightly coupled code with no dependency injection or seams makes writing tests require major refactoring first.

Category: Architecture | Quality Impact: Critical

What This Looks Like

A developer wants to write a unit test for a business rule in the order processing module. They open the class and find that it instantiates a database connection directly in the constructor, calls an external payment service with a hardcoded URL, and writes to a global logger that connects to a cloud logging service. There is no way to run this class in a test without a database, a payment sandbox account, and a live logging endpoint. Writing a test for the 10-line discount calculation buried inside this class requires either setting up all of that infrastructure or doing major surgery on the code first.

The team has tried. Some tests exist, but they are integration tests that depend on a shared test database. When the database is unavailable, the tests fail. When two developers run the suite simultaneously, tests interfere with each other. The suite is slow - 40 minutes for a full run - because every test touches real infrastructure. Developers have learned to run only the tests related to their specific change, because running the full suite is impractical. That selection is also unreliable, because they cannot know which tests cover the code they are changing.

Common variations:

  • Constructor-injected globals. Classes that call new DatabaseConnection(), new HttpClient(), or new Logger() inside constructors or methods. There is no way to substitute a test double without modifying the production code.
  • Static method chains. Business logic that calls static utility methods, which call other static methods, which eventually call external services. Static calls cannot be intercepted or mocked without bytecode manipulation.
  • Hardcoded external dependencies. Service URLs, API keys, and connection strings baked into source code rather than injected as configuration. The code is not just untestable - it is also not configurable across environments.
  • God classes with mixed concerns. A class that handles HTTP request parsing, business logic, database writes, and email sending in the same methods. You cannot test the business logic without triggering all the other concerns.
  • Framework entanglement. Business logic written directly inside framework callbacks or lifecycle hooks - a Rails before_action, a Spring @Scheduled method, a serverless function handler - with no extraction into a callable function or class.

The telltale sign: when a developer asks “how do I write a test for this?” and the honest answer is “you would have to refactor it first.”

Why This Is a Problem

Untestable architecture does not just make tests hard to write. It is a symptom that business logic is entangled with infrastructure, which makes every change harder and every defect costlier.

It reduces quality

A bug caught in a 30-second unit test costs minutes to fix. The same bug caught in production costs hours of debugging, a support incident, and a postmortem. Untestable code shifts that cost toward production. When code cannot be tested in isolation, the only way to verify behavior is end-to-end. End-to-end tests run slowly, are sensitive to environmental conditions, and often cannot cover all the branches and edge cases in business logic. A developer who cannot write a fast, isolated test for a discount calculation instead relies on deploying to a staging environment and manually walking through a checkout. This is slow, incomplete, and rarely catches all the edge cases.

The quality impact compounds over time. Without a fast test suite, developers do not run tests frequently. Without frequent test runs, bugs survive for longer before being caught. The further a bug travels from the code that caused it, the more expensive it is to diagnose and fix.

In testable code, dependencies are injected. The payment service is an interface. The database connection is passed in. A test can substitute a fast, predictable in-memory double for every external dependency. The business logic runs in milliseconds, covers every branch, and gives immediate feedback every time the code is changed.

It increases rework

A developer who cannot safely verify a change ships it and hopes. Bugs discovered later require returning to code the developer thought was done - often days or weeks after the context is gone. When a developer needs to modify behavior in a class that has no tests and cannot easily be tested, they make the change and then verify it by running the application manually or relying on end-to-end tests. They cannot be confident that the change did not break a code path they did not exercise.

Refactoring untestable code is doubly expensive. To refactor safely, you need tests. To write tests, you need to refactor. Teams caught in this loop often choose not to refactor at all, because both paths carry high risk. Complexity accumulates. Workarounds are added rather than fixing the underlying structure. The codebase grows harder to change with every feature added.

When dependencies are injected, refactoring is safe. Write the tests first, or write them alongside the refactor, or write them immediately after. Either way, the ability to substitute doubles means the refactor can be verified quickly and cheaply.

It makes delivery timelines unpredictable

A three-day estimate becomes seven when the module turns out to have no tests and deep coupling to external services. That hidden cost is structural, not exceptional. Every change carries unknown risk. The response is more process: more manual QA cycles, more sign-off steps, more careful coordination before releases. All of that process adds time, and the amount of time added is unpredictable because it depends on how many issues the manual process finds.

Testable code makes delivery predictable. The test suite tells you quickly whether a change is safe. Estimates can be more reliable because the cost of a change is proportional to its size, not to the hidden coupling in the code.

Impact on continuous delivery

Continuous delivery depends on a fast, reliable automated test suite. Without that suite, the pipeline cannot provide the safety signal that makes frequent deployment safe. If tests cannot run in isolation, the pipeline either skips them (dangerous) or depends on heavyweight infrastructure (slow and fragile). Either outcome makes continuous delivery impractical.

CD pipelines are designed to provide feedback in minutes, not hours. A test suite that requires a live database, external APIs, and environmental setup to run is incompatible with that requirement. The pipeline becomes the bottleneck that limits deployment frequency, rather than the automation that enables it. Teams cannot confidently deploy multiple times per day when every test run requires 30 minutes and a set of live external services.

Untestable architecture is often the root cause when teams say “we can’t go faster - we need more QA time.” The real constraint is not QA capacity. It is the absence of a test suite that can verify changes quickly and automatically.

How to Fix It

Making an untestable codebase testable is an incremental process. The goal is not to rewrite everything before writing the first test. The goal is to create seams - places where test doubles can be inserted - module by module, as code is touched.

Step 1: Identify the most-changed untestable code

Do not try to fix the entire codebase. Start where the pain is highest.

  1. Use version control history to identify the files changed most frequently in the last six months. High-change files with no test coverage are the highest priority.
  2. For each high-change file, answer: can I write a test for the core business logic without a running database or external service? If the answer is no, it is a candidate.
  3. Rank candidates by frequency of change and business criticality. The goal is to find the code where test coverage will prevent the most real bugs.

Document the list. It is your refactoring backlog. Treat each item as a first-class task, not something that happens “when we have time.”

Step 2: Introduce dependency injection at the seam (Weeks 2-3)

For each candidate class, apply the simplest refactor that creates a testable seam without changing behavior.

In Java:

OrderService before and after dependency injection (Java)
// Before: untestable - constructs dependency internally
public class OrderService {
    public void processOrder(Order order) {
        DatabaseConnection db = new DatabaseConnection();
        PaymentGateway pg = new PaymentGateway("https://payments.example.com");
        // business logic
    }
}

// After: testable - dependencies injected
public class OrderService {
    private final OrderRepository repository;
    private final PaymentGateway paymentGateway;

    public OrderService(OrderRepository repository, PaymentGateway paymentGateway) {
        this.repository = repository;
        this.paymentGateway = paymentGateway;
    }
}

In JavaScript:

processOrder before and after dependency injection (JavaScript)
// Before: untestable
function processOrder(order) {
  const db = new DatabaseConnection();
  const pg = new PaymentGateway(process.env.PAYMENT_URL);
  // business logic
}

// After: testable
function processOrder(order, { repository, paymentGateway }) {
  // business logic using injected dependencies
}

The interface or abstraction is the key. Production code passes real implementations. Tests pass fast, in-memory doubles that return predictable results.

Step 3: Write the tests that are now possible (Weeks 2-3)

Immediately after creating a seam, write tests for the business logic that is now accessible. Do not defer this step.

  1. Write one test for the happy path.
  2. Write tests for the main error conditions.
  3. Write tests for the edge cases and branches that are hard to exercise end-to-end.

Use fast doubles - in-memory fakes or simple stubs - for every external dependency. The tests should run in milliseconds without any network or database access. If a test requires more than a second to run, something is still coupling it to real infrastructure.

Step 4: Extract business logic from framework boundaries (Weeks 3-5)

Framework entanglement requires a different approach. The fix is extraction: move business logic out of framework callbacks and into plain functions or classes that can be called from anywhere, including tests.

A serverless handler that does everything:

Extracting business logic from a serverless handler (JavaScript)
// Before: untestable
exports.handler = async (event) => {
  const db = new Database();
  const order = await db.getOrder(event.orderId);
  const discount = order.total > 100 ? order.total * 0.1 : 0;
  await db.updateOrder({ ...order, discount });
  return { statusCode: 200 };
};

// After: business logic is testable independently
function calculateDiscount(orderTotal) {
  return orderTotal > 100 ? orderTotal * 0.1 : 0;
}

exports.handler = async (event, { db } = { db: new Database() }) => {
  const order = await db.getOrder(event.orderId);
  const discount = calculateDiscount(order.total);
  await db.updateOrder({ ...order, discount });
  return { statusCode: 200 };
};

The calculateDiscount function is now testable in complete isolation. The handler is thin and can be tested with a mock database.

Step 5: Add the linting and architectural rules that prevent backsliding

Once a module is testable, add controls that prevent it from becoming untestable again.

  1. Add a coverage threshold for testable modules. If coverage drops below the threshold, the build fails.
  2. Add an architectural fitness function - a test or lint rule that verifies no direct infrastructure instantiation appears in business logic classes.
  3. In code review, treat “this code is not testable” as a blocking issue, not a preference.

Apply the same process to each new module as it is touched. Over time, the proportion of testable code grows without requiring a big-bang rewrite.

Step 6: Track and retire the integration test workarounds (Ongoing)

As business logic becomes unit-testable, the integration tests that were previously the only coverage can be simplified or removed. Integration tests that verify business logic are slow and brittle - now that the logic has fast unit tests, the integration test can focus on the seam between components, not the business rules inside each one.

ObjectionResponse
“Refactoring for testability is risky - we might break things”The refactor is a structural change, not a behavior change. Apply it in tiny steps, verify with the application running, and add tests as soon as each seam is created. The risk of not refactoring is ongoing: every untested change is a bet on nothing being broken.
“We don’t have time to refactor while delivering features”Apply the refactor as you touch code for feature work. The boy scout rule: leave code more testable than you found it. Over six months, the most-changed code becomes testable without a dedicated refactoring project.
“Dependency injection adds complexity”A constructor that accepts interfaces is not complex. The complexity it removes - hidden coupling to external systems, inability to test in isolation, cascading failures from unavailable services - far exceeds the added boilerplate.
“Our framework doesn’t support dependency injection”Every mainstream framework supports some form of injection. The extraction technique (move logic into plain functions) works for any framework. The framework boundary becomes a thin shell around testable business logic.

Measuring Progress

MetricWhat to look for
Unit test countShould increase as seams are created; more tests without infrastructure dependencies
Build durationShould decrease as infrastructure-dependent tests are replaced with fast unit tests
Test suite pass rateShould increase as flaky infrastructure-dependent tests are replaced with deterministic doubles
Change fail rateShould decrease as test coverage catches regressions before deployment
Development cycle timeShould decrease as developers get faster feedback from the test suite
Files with test coverageShould increase as refactoring progresses; track by module

2 - Tightly Coupled Monolith

Changing one module breaks others. No clear boundaries. Every change is high-risk because blast radius is unpredictable.

Category: Architecture | Quality Impact: High

What This Looks Like

A developer changes a function in the order processing module. The test suite fails in the reporting module, the notification service, and a batch job that nobody knew existed. The developer did not touch any of those systems. They changed one function in one file, and three unrelated features broke.

The team has learned to be cautious. Before making any change, developers trace every caller, every import, and every database query that might be affected. A change that should take an hour takes a day because most of the time is spent figuring out what might break. Even after that analysis, surprises are common.

Common variations:

  • The web of shared state. Multiple modules read and write the same database tables directly. A schema change in one module breaks queries in five others. Nobody owns the tables because everybody uses them.
  • The god object. A single class or module that everything depends on. It handles authentication, logging, database access, and business logic. Changing it is terrifying because the entire application runs through it.
  • Transitive dependency chains. Module A depends on Module B, which depends on Module C. A change to Module C breaks Module A through a chain that nobody can trace without a debugger. The dependency graph is a tangle, not a tree.
  • Shared libraries with hidden contracts. Internal libraries used by multiple modules with no versioning or API stability guarantees. Updating the library for one consumer breaks another. Teams stop updating shared libraries because the risk is too high.
  • Everything deploys together. The application is a single deployable unit. Even if modules are logically separated in the source code, they compile and ship as one artifact. A one-line change to the login page requires deploying the entire system.

The telltale sign: developers regularly say “I don’t know what this change will affect” and mean it. Changes routinely break features that seem unrelated.

Why This Is a Problem

Tight coupling turns every change into a gamble. The cost of a change is not proportional to its size but to the number of hidden dependencies it touches. Small changes carry large risk, which slows everything down.

It reduces quality

When every change can break anything, developers cannot reason about the impact of their work. A well-bounded module lets a developer think locally: “I changed the discount calculation, so discount-related behavior might be affected.” A tightly coupled system offers no such guarantee. The discount calculation might share a database table with the shipping module, which triggers a notification workflow, which updates a dashboard.

This unpredictable blast radius makes code review less effective. Reviewers can verify that the code in the diff is correct, but they cannot verify that it is safe. The breakage happens in code that is not in the diff - code that neither the author nor the reviewer thought to check.

In a system with clear module boundaries, the blast radius of a change is bounded by the module’s interface. If the interface does not change, nothing outside the module can break. Developers and reviewers can focus on the module itself and trust the boundary.

It increases rework

Tight coupling causes rework in two ways. First, unexpected breakage from seemingly safe changes sends developers back to fix things they did not intend to touch. A one-line change that breaks the notification system means the developer now needs to understand and fix the notification system before their original change can ship.

Second, developers working in different parts of the codebase step on each other. Two developers changing different modules unknowingly modify the same shared state. Both changes work individually but conflict when merged. The merge succeeds at the code level but fails at runtime because the shared state cannot satisfy both changes simultaneously. These bugs are expensive to find because the failure only manifests when both changes are present.

Systems with clear boundaries minimize this interference. Each module owns its data and exposes it through explicit interfaces. Two developers working in different modules cannot create a hidden conflict because there is no shared mutable state to conflict on.

It makes delivery timelines unpredictable

In a coupled system, the time to deliver a change includes the time to understand the impact, make the change, fix the unexpected breakage, and retest everything that might be affected. The first and third steps are unpredictable because no one knows the full dependency graph.

A developer estimates a task at two days. On day one, the change is made and tests are passing. On day two, a failing test in another module reveals a hidden dependency. Fixing the dependency takes two more days. The task that was estimated at two days takes four. This happens often enough that the team stops trusting estimates, and stakeholders stop trusting timelines.

The testing cost is also unpredictable. In a modular system, changing Module A means running Module A’s tests. In a coupled system, changing anything might mean running everything. If the full test suite takes 30 minutes, every small change requires a 30-minute feedback cycle because there is no way to scope the impact.

It prevents independent team ownership

When the codebase is a tangle of dependencies, no team can own a module cleanly. Every change in one team’s area risks breaking another team’s area. Teams develop informal coordination rituals: “Let us know before you change the order table.” “Don’t touch the shared utils module without talking to Platform first.”

These coordination costs scale quadratically with the number of teams. Two teams need one communication channel. Five teams need ten. Ten teams need forty-five. The result is that adding developers makes the system slower to change, not faster.

In a system with well-defined module boundaries, each team owns their modules and their data. They deploy independently. They do not need to coordinate on internal changes because the boundaries prevent cross-module breakage. Communication focuses on interface changes, which are infrequent and explicit.

Impact on continuous delivery

Continuous delivery requires that any change can flow from commit to production safely and quickly. Tight coupling breaks this in multiple ways:

  • Blast radius prevents small, safe changes. If a one-line change can break unrelated features, no change is small from a risk perspective. The team compensates by batching changes and testing extensively, which is the opposite of continuous.
  • Testing scope is unbounded. Without module boundaries, there is no way to scope testing to the changed area. Every change requires running the full suite, which slows the pipeline and reduces deployment frequency.
  • Independent deployment is impossible. If everything must deploy together, deployment coordination is required. Teams queue up behind each other. Deployment frequency is limited by the slowest team.
  • Rollback is risky. Rolling back one change might break something else if other changes were deployed simultaneously. The tangle works in both directions.

A team with a tightly coupled monolith can still practice CD, but they must invest in decoupling first. Without boundaries, the feedback loops are too slow and the blast radius is too large for continuous deployment to be safe.

How to Fix It

Decoupling a monolith is a long-term effort. The goal is not to rewrite the system or extract microservices on day one. The goal is to create boundaries that limit blast radius and enable independent change. Start where the pain is greatest.

Step 1: Map the dependency hotspots

Identify the areas of the codebase where coupling causes the most pain:

  1. Use version control history to find the files that change together most frequently. Files that always change as a group are likely coupled.
  2. List the modules or components that are most often involved in unexpected test failures after changes to other areas.
  3. Identify shared database tables - tables that are read or written by more than one module.
  4. Draw the dependency graph. Tools like dependency-cruiser (JavaScript), jdepend (Java), or similar can automate this. Look for cycles and high fan-in nodes.

Rank the hotspots by pain: which coupling causes the most unexpected breakage, the most coordination overhead, or the most test failures?

Step 2: Define module boundaries on paper

Before changing any code, define where boundaries should be:

  1. Group related functionality into candidate modules based on business domain, not technical layer. “Orders,” “Payments,” and “Notifications” are better boundaries than “Database,” “API,” and “UI.”
  2. For each boundary, define what the public interface would be: what data crosses the boundary and in what format?
  3. Identify shared state that would need to be split or accessed through interfaces.

This is a design exercise, not an implementation. The output is a diagram showing target module boundaries with their interfaces.

Step 3: Enforce one boundary (Weeks 3-6)

Pick the boundary with the best ratio of pain-reduced to effort-required and enforce it in code:

  1. Create an explicit interface (API, function contract, or event) for cross-module communication. All external callers must use the interface.
  2. Move shared database access behind the interface. If the payments module needs order data, it calls the orders module’s interface rather than querying the orders table directly.
  3. Add a build-time or lint-time check that enforces the boundary. Fail the build if code outside the module imports internal code directly.

This is the hardest step because it requires changing existing call sites. Use the Strangler Fig approach: create the new interface alongside the old coupling, migrate callers one at a time, and remove the old path when all callers have migrated.

Step 4: Scope testing to module boundaries

Once a boundary exists, use it to scope testing:

  1. Write tests for the module’s public interface (contract tests and functional tests).
  2. Changes within the module only need to run the module’s own tests plus the interface tests. If the interface tests pass, nothing outside the module can break.
  3. Reserve the full integration suite for deployment validation, not developer feedback.

This immediately reduces pipeline duration for changes inside the bounded module. Developers get faster feedback. The pipeline is no longer “run everything for every change.”

Step 5: Repeat for the next boundary (Ongoing)

Each new boundary reduces blast radius, improves test scoping, and enables more independent ownership. Prioritize by pain:

SignalWhat it tells you
Files that always change together across modulesCoupling that forces coordinated changes
Unexpected test failures after unrelated changesHidden dependencies through shared state
Multiple teams needing to coordinate on changesOwnership boundaries that do not match code boundaries
Long pipeline duration from running all testsNo way to scope testing because boundaries do not exist

Over months, the system evolves from a tangle into a set of modules with defined interfaces. This is not a rewrite. It is incremental boundary enforcement applied where it matters most.

ObjectionResponse
“We should just rewrite it as microservices”A rewrite takes months or years and delivers zero value until it is finished. Enforcing boundaries in the existing codebase delivers value with each boundary and does not require a big-bang migration.
“We don’t have time to refactor”You are already paying the cost of coupling in unexpected breakage, slow testing, and coordination overhead. Each boundary you enforce reduces that ongoing cost.
“The coupling is too deep to untangle”Start with the easiest boundary, not the hardest. Even one well-enforced boundary reduces blast radius and proves the approach works.
“Module boundaries will slow us down”Boundaries add a small cost to cross-module changes and remove a large cost from within-module changes. Since most changes are within a module, the net effect is faster delivery.

Measuring Progress

MetricWhat to look for
Unexpected cross-module test failuresShould decrease as boundaries are enforced
Change fail rateShould decrease as blast radius shrinks
Build durationShould decrease as testing can be scoped to affected modules
Development cycle timeShould decrease as developers spend less time tracing dependencies
Cross-team coordination requests per sprintShould decrease as module ownership becomes clearer
Files changed per commitShould decrease as changes become more localized

Team Discussion

Use these questions in a retrospective to explore how this anti-pattern affects your team:

  • Which services or modules can we not change without coordinating with another team?
  • What was the last time a change in one area broke something unrelated? How long did it take to find the connection?
  • If we were to draw the dependency graph of our system today, where would we see the most coupling?

3 - Premature Microservices

The team adopted microservices without a problem that required them. The architecture may be correctly decomposed, but the operational cost far exceeds any benefit.

Category: Architecture | Quality Impact: High

What This Looks Like

The team split their application into services because “microservices are how you do DevOps.” The boundaries might even be reasonable. Each service owns its domain. Contracts are versioned. The architecture diagrams look clean. But the team is six developers, the application handles modest traffic, and nobody has ever needed to scale one component independently of the others.

The team now maintains a dozen repositories, a dozen pipelines, a dozen deployment configurations, and a service mesh. A feature that touches two domains requires changes in two repositories, two code reviews, two deployments, and careful contract coordination. A shared library update means twelve PRs. A security patch means twelve pipeline runs. The team spends more time on service infrastructure than on features.

Common variations:

  • The cargo cult. The team adopted microservices because a conference talk, blog post, or executive mandate said it was the right architecture. The decision was not based on a specific delivery problem. The application had no scaling bottleneck, no team autonomy constraint, and no deployment frequency goal that a monolith could not meet.
  • The resume-driven architecture. The technical lead chose microservices because they wanted experience with the pattern. The architecture serves the team’s learning goals, not the product’s delivery needs.
  • The premature split. A small team split a working monolith into services before the monolith caused delivery problems. The team now spends more time managing service infrastructure than building features. The monolith was delivering faster.
  • The infrastructure gap. The team adopted microservices but does not have centralized logging, distributed tracing, automated service discovery, or container orchestration. Debugging a production issue means SSH-ing into individual servers and correlating timestamps across log files manually. The operational maturity does not match the architectural complexity.

The telltale sign: the team spends more time on service infrastructure, cross-service debugging, and pipeline maintenance than on delivering features, and nobody can name the specific problem that microservices solved.

Why This Is a Problem

Microservices solve specific problems at specific scales: enabling independent deployment for large organizations, allowing components to scale independently under different load profiles, and letting autonomous teams own their domain end-to-end. When none of these problems exist, every service boundary is pure overhead.

It reduces quality

A distributed system introduces failure modes that do not exist in a monolith: network partitions, partial failures, message ordering issues, and data consistency challenges across service boundaries. Each requires deliberate engineering to handle correctly. A team that adopted microservices without distributed-systems experience will get these wrong. Services will fail silently when a dependency is slow. Data will become inconsistent because transactions do not span service boundaries. Retry logic will be missing or incorrect.

A well-structured monolith avoids all of these failure modes. Function calls within a process are reliable, fast, and transactional. The quality bar for a monolith is achievable by any team. The quality bar for a distributed system requires specific expertise.

It increases rework

The operational tax of microservices is proportional to the number of services. Updating a shared library means updating it in every repository. A framework upgrade requires running every pipeline. A cross-cutting concern (logging format change, authentication update, error handling convention) means touching every service. In a monolith, these are single changes. In a microservices architecture, they are multiplied by the service count.

This tax is worth paying when the benefits are real (independent scaling, team autonomy). When the benefits are theoretical, the tax is pure waste.

It makes delivery timelines unpredictable

Distributed-system problems are hard to diagnose. A latency spike in one service causes timeouts in three others. The developer investigating the issue traces the request across services, reads logs from multiple systems, and eventually finds a connection pool exhausted in a downstream service. This investigation takes hours. In a monolith, the same issue would have been a stack trace in a single process.

Feature delivery is also slower. A change that spans two services requires coordinating two PRs, two reviews, two deployments, and verifying that the contract between them is correct. In a monolith, the same change is a single PR with a single deployment.

It creates an operational maturity gap

Microservices require operational capabilities that monoliths do not: centralized logging, distributed tracing, service mesh or discovery, container orchestration, automated scaling, and health-check-based routing. Without these, the team cannot observe, debug, or operate their system reliably.

Teams that adopt microservices before building this operational foundation end up in a worse position than they were with the monolith. The monolith was at least observable: one application, one log stream, one deployment. The microservices architecture without operational tooling is a collection of black boxes.

Impact on continuous delivery

Microservices are often adopted in the name of CD, but premature adoption makes CD harder. CD requires fast, reliable pipelines. A team managing twelve service pipelines without automation or standardization spends its pipeline investment twelve times over. The same team with a well-structured monolith and one pipeline could be deploying to production multiple times per day.

The path to CD does not require microservices. It requires a well-tested, well-structured codebase with automated deployment. A modular monolith with clear internal boundaries and a single pipeline can achieve deployment frequencies that most premature microservices architectures struggle to match.

How to Fix It

Step 1: Assess whether microservices are solving a real problem

Answer these questions honestly:

  • Does the team have a scaling bottleneck that requires independent scaling of specific components? (Not theoretical future scale. An actual current bottleneck.)
  • Are there multiple autonomous teams that need to deploy independently? (Not a single team that split into “service teams” to match the architecture.)
  • Is the monolith’s deployment frequency limited by its size or coupling? (Not by process, testing gaps, or organizational constraints that would also limit microservices.)

If the answer to all three is no, the team does not need microservices. A modular monolith will deliver faster with less operational overhead.

Step 2: Consolidate services that do not need independence (Weeks 2-6)

Merge services that are always deployed together. If Service A and Service B have never been deployed independently, they are not independent services. They are modules that should share a deployment. This is not a failure. It is a course correction based on evidence.

Prioritize merging services owned by the same team. A single team running six services gets the same team autonomy benefit from one well-structured deployable.

Step 3: Build operational maturity for what remains (Weeks 4-8)

For services that genuinely benefit from separation, ensure the team has the operational capabilities to manage them:

  • Centralized logging across all services
  • Distributed tracing for cross-service requests
  • Health checks and automated rollback in every pipeline
  • Monitoring and alerting for each service
  • A standardized pipeline template that new services adopt by default

Each missing capability is a reason to pause and invest in the platform before adding more services.

Step 4: Establish a service extraction checklist (Ongoing)

Before extracting any new service, require answers to:

  1. What specific problem does this service solve that a module cannot?
  2. Does the team have the operational tooling to observe and debug it?
  3. Will this service be deployed independently, or will it always deploy with others?
  4. Is there a team that will own it long-term?

If any answer is unsatisfactory, keep it as a module.

ObjectionResponse
“Microservices are the industry standard”Microservices are a tool for specific problems at specific scales. Netflix and Spotify adopted them because they had thousands of developers and needed team autonomy. A team of ten does not have that problem.
“We already invested in the split”Sunk cost. If the architecture is making delivery slower, continuing to invest in it makes delivery even slower. Merging services back is cheaper than maintaining unnecessary complexity indefinitely.
“We need microservices for CD”CD requires automated testing, a reliable pipeline, and small deployable changes. A modular monolith provides all three. Microservices are one way to achieve independent deployment, but they are not a prerequisite.
“But we might need to scale later”Design for today’s constraints, not tomorrow’s speculation. If scaling demands emerge, extract the specific component that needs to scale. Premature decomposition solves problems you do not have while creating problems you do.

Measuring Progress

MetricWhat to look for
Services that are always deployed togetherShould be merged into a single deployable unit
Time spent on service infrastructure versus featuresShould shift toward features as services are consolidated
Pipeline maintenance overheadShould decrease as the number of pipelines decreases
Lead timeShould decrease as operational overhead shrinks
Change fail rateShould decrease as distributed-system failure modes are eliminated

4 - Shared Database Across Services

Multiple services read and write the same tables, making schema changes a multi-team coordination event.

Category: Architecture | Quality Impact: Medium

What This Looks Like

The orders service, the reporting service, the inventory service, and the notification service all connect to the same database. They each have their own credentials but they point at the same schema. The orders table is queried by all four services. Each service has its own assumptions about what columns exist, what values are valid, and what the foreign key relationships mean.

A developer on the orders team needs to rename a column. It is a minor cleanup - the column was named order_dt and should be ordered_at for consistency. Before making the change, they post to the team channel: “Anyone else using the order_dt column?” Three other teams respond. Two are using it in reporting queries. One is using it in a scheduled job that nobody is sure anyone owns anymore. The rename is shelved. The inconsistency stays because the cost of fixing it is too high.

Common variations:

  • The integration database. A database designed to be shared across systems from the start. Data is centralized by intent. Different teams add tables and columns as needed. Over time, it becomes the source of truth for the entire organization, and nobody can touch it without coordination.
  • The shared-by-accident database. Services were originally a monolith. When the team began splitting them into services, they kept the shared database because extracting data ownership seemed hard. The services are separate in name but coupled in storage.
  • The reporting exception. Services own their data in principle, but the reporting team has read access to all service databases directly. The reporting team becomes an invisible consumer of every schema, which makes schema changes require reporting-team approval before they can proceed.
  • The cross-service join. A service query that joins tables from conceptually different domains - orders joined to user preferences joined to inventory levels. The query works, but it means the service depends on the internal structure of two other domains.

The telltale sign: a developer needs to approve a database schema change in a channel that includes people from three or more different teams, none of whom own the code being changed.

Why This Is a Problem

A shared database couples services together at the storage layer, where the coupling is invisible in service code and extremely difficult to untangle. Services that appear independent - separate codebases, separate deployments, separate teams - are actually a distributed monolith held together by shared mutable state.

It reduces quality

A column rename that takes one developer 20 minutes can break three other services in production before anyone realizes the change shipped. That is the normal cost of shared schema ownership. Each service that reads a table has implicit expectations about that table’s structure. When one service changes the schema, those expectations break in other services. The breaks are not caught at compile time or in code review - they surface at runtime, often in production, when a different service fails because a column it expected no longer exists or contains different values.

This makes schema changes high-risk regardless of how simple they appear. A column rename, a constraint addition, a data type change - all can cascade into failures across services that were never in the same deployment. The safest response is to never change anything, which leads to schemas that grow stale, accumulate technical debt, and eventually become incomprehensible.

When each service owns its own data, schema changes are internal to the owning service. Other services access data through the service’s API, not through the database. The API can maintain backward compatibility while the schema changes. The owning team controls the migration entirely, without coordinating with consumers who do not even know the schema exists.

It increases rework

A two-day schema change becomes a three-week coordination exercise when other teams must change their services before the old column can be removed. That overhead is not exceptional - it is the built-in cost of shared ownership. Database migrations in a shared-database system require a multi-phase process. The first phase deploys code that supports both the old and new schema simultaneously - the old column must stay while new code writes to both columns, because other services still read the old column. The second phase deploys all the consuming services to use the new column. The third phase removes the old column once all consumers have migrated.

Each phase is a separate deployment. Between phases, the system is running in a mixed state that requires extra production code to maintain. That extra code is rework - it exists only to bridge the transition and will be deleted later. Any bug in the bridge code is also rework, because it needs to be diagnosed and fixed in a context that will not exist once the migration is complete.

With service-owned data, the same migration is a single deployment. The service updates its schema and its internal logic simultaneously. No other service needs to change because no other service has direct access to the storage.

It makes delivery timelines unpredictable

Coordinating a schema migration across three teams means aligning three independent deployment schedules. One team might be mid-sprint and unable to deploy a consuming-service change this week. Another team might have a release freeze in place. The migration sits in limbo, the bridge code stays in production, and the developer who initiated the change is blocked.

The dependencies are also invisible in planning. A developer estimates a task that includes a schema change at two days. They do not account for the four-person coordination meeting, the one-week wait for another team to schedule their consuming-service change, and the three-phase deployment sequence. The two-day task takes three weeks.

When schema changes are internal, the owning team deploys on their own schedule. The timeline depends on the complexity of the change, not on the availability of other teams.

It prevents independent deployment

Teams that try to increase deployment frequency hit a wall: the pipeline is fast but every schema change requires coordinating three other teams before shipping. The limiting factor is not the code - it is the shared data. Services cannot deploy independently when they share a database. If Service A deploys a schema change that removes a column Service B depends on, Service B breaks. The only safe deployment strategy is to coordinate all consuming services and deploy them simultaneously or in a carefully managed sequence. Simultaneous deployment eliminates independent release cycles. Managed sequences require orchestration and carry high risk if any service in the sequence fails.

Impact on continuous delivery

CD requires that each service can be built, tested, and deployed independently. A shared database breaks that independence at the most fundamental level: data ownership. Services that share a database cannot have independent pipelines in a meaningful sense, because a passing pipeline on Service A does not guarantee that Service A’s deployment is safe for Service B.

Contract testing and API versioning strategies - standard tools for managing service dependencies in CD - do not apply to a shared database, because there is no contract. Any service can read or write any column at any time. The database is a global mutable namespace shared across all services and all environments. That pattern is incompatible with the independent deployment cadences that CD requires.

How to Fix It

Eliminating a shared database is a long-term effort. The goal is data ownership: each service controls its own data and exposes it through explicit APIs. This does not happen overnight. The path is incremental, moving one domain at a time.

Step 1: Map what reads and writes what

Before changing anything, build a dependency map.

  1. List every table in the shared database.
  2. For each table, identify every service or codebase that reads it and every service that writes it. Use query logs, code search, and database monitoring to find all consumers.
  3. Mark tables that are written by more than one service. These require more careful migration because ownership is ambiguous.
  4. Identify which service has the strongest claim to each table - typically the service that created the data originally.

This map makes the coupling visible. Most teams are surprised by how many hidden consumers exist. The map also identifies the easiest starting points: tables with a single writer and one or two readers that can be migrated first.

Step 2: Identify the domain with the least shared read traffic

Pick the domain with the cleanest data ownership to pilot the migration. The criteria:

  • A clear owner team that writes most of the data.
  • Relatively few consumers (one or two other services).
  • Data that is accessed by consumers for a well-defined purpose that could be served by an API.

A domain like “notification preferences” or “user settings” is often a good candidate. A domain like “orders” that is read by everything is a poor starting point.

Step 3: Build the API for the chosen domain (Weeks 2-4)

Before removing any direct database access, add an API endpoint that provides the same data.

  1. Build the endpoint in the owning service. It should return the data that consuming services currently query for directly.
  2. Write contract tests: the owning service verifies the API response matches the contract, and consuming services verify their code works against the contract. See No Contract Testing for specifics.
  3. Deploy the endpoint but do not switch consumers yet. Run it alongside the direct database access.

This is the safest phase. If the API has a bug, consumers are still using the database directly. No service is broken.

Step 4: Migrate consumers one at a time (Weeks 4-8)

Switch consuming services from direct database queries to the new API, one service at a time.

  1. For the first consuming service, replace the direct query with an API call in a code change and deploy it.
  2. Verify in production that the consuming service is now using the API.
  3. Run both the old and new access patterns in parallel for a short period if possible, to catch any discrepancy.
  4. Once stable, move on to the next consuming service.

At the end of this step, no service other than the owner is accessing the database tables directly.

Step 5: Remove direct access grants and enforce the boundary

Once all consumers have migrated:

  1. Remove database credentials from consuming services. They can no longer connect to the owner’s database even if they wanted to.
  2. Add a monitoring alert for any new direct database connections from services that are not the owner.
  3. Update the architectural decision records and onboarding documentation to make the ownership rule explicit.

Removing access grants is the only enforcement that actually holds over time. A policy that says “don’t access other services’ databases” will be violated under pressure. Removing the credentials makes it a technical impossibility.

Step 6: Repeat for the next domain (Ongoing)

Apply the same pattern to the next domain, working from easiest to hardest. Domains with a single clear writer and few readers migrate quickly. Domains that are written by multiple services require first resolving the ownership question - typically by choosing one service as the canonical source and making others write through that service’s API.

ObjectionResponse
“API calls are slower than direct database queries”The latency difference is typically measured in single-digit milliseconds and can be addressed with caching. The coordination cost of a shared database - multi-team migrations, deployment sequencing, unexpected breakage - is measured in days and weeks.
“We’d have to rewrite everything”No migration requires rewriting everything. Start with one domain, build confidence, and work incrementally. Most teams migrate one domain per quarter without disrupting normal delivery work.
“Our reporting needs cross-domain data”Reporting is a legitimate cross-cutting concern. Build a dedicated reporting data store that receives data from each service via events or a replication mechanism. Reporting reads the reporting store, not production service databases.
“It’s too risky to change a working database”The migration adds an API alongside the existing access - nothing is removed until consumers have moved over. The risk of each step is small. The risk of leaving the shared database in place is ongoing coordination overhead and surprise breakage.

Measuring Progress

MetricWhat to look for
Tables with multiple-service write accessShould decrease toward zero as ownership is clarified
Schema change lead timeShould decrease as changes become internal to the owning service
Cross-team coordination events per deploymentShould decrease as services gain independent data ownership
Release frequencyShould increase as coordination overhead per release drops
Lead timeShould decrease as schema migrations stop blocking delivery
Failed deployments due to schema mismatchShould decrease toward zero as direct cross-service database access is removed

5 - Distributed Monolith

Services exist but the boundaries are wrong. Every business operation requires a synchronous chain across multiple services, and nothing can be deployed independently.

Category: Architecture | Quality Impact: High

What This Looks Like

The organization has services. The architecture diagram shows boxes with arrows between them. But deploying any one service without simultaneously deploying two others breaks production. A single user request passes through four services synchronously before returning a response. When one service in the chain is slow, the entire operation fails. The team has all the complexity of a distributed system and all the coupling of a monolith.

Common variations:

  • Technical-layer services. Services were decomposed along technical lines: an “auth service,” a “notification service,” a “data access layer,” a “validation service.” No single service can handle a complete business operation. Every user action requires orchestrating calls across multiple services because the business logic is scattered across technical boundaries.
  • The shared database. Services have separate codebases but read and write the same database tables. A schema change in one service breaks queries in others. The database is the hidden coupling that makes independent deployment impossible regardless of how clean the service APIs look.
  • The synchronous chain. Service A calls Service B, which calls Service C, which calls Service D. The response time of the user’s request is the sum of all four services plus network latency between them. If any service in the chain is deploying, the entire operation fails. The chain must be deployed as a unit.
  • The orchestrator service. One service acts as a central coordinator, calling all other services in sequence to fulfill a request. It contains the business logic for how services interact. Every new feature requires changes to the orchestrator and at least one downstream service. The orchestrator is a god object distributed across the network.

The telltale sign: services cannot be deployed, scaled, or failed independently. A problem in any one service cascades to all the others.

Why This Is a Problem

A distributed monolith combines the worst properties of both architectures. It has the operational complexity of microservices (network communication, partial failures, distributed debugging) with the coupling of a monolith (coordinated deployments, shared state, cascading failures). The team pays the cost of both and gets the benefits of neither.

It reduces quality

Incorrect service boundaries scatter related business logic across multiple services. A developer implementing a feature must understand how three or four services interact rather than reading one cohesive module. The mental model required to make a correct change is larger than it would be in either a well-structured monolith or a correctly decomposed service architecture.

Distributed failure modes compound this. Network calls between services can fail, time out, or return stale data. When business logic spans services, handling these failures correctly requires understanding the full chain. A developer who changes one service may not realize that a timeout in their service causes a cascade failure three services downstream.

It increases rework

Every feature that touches a business domain crosses service boundaries because the boundaries do not align with domains. A change to how orders are discounted requires modifying the pricing service, the order service, and the invoice service because the discount logic is split across all three. The developer opens three PRs, coordinates three reviews, and sequences three deployments.

When the team eventually recognizes the boundaries are wrong, correcting them is a second architectural migration. Data must move between databases. Contracts must be redrawn. Clients must be updated. The cost of redrawing boundaries after the fact is far higher than drawing them correctly the first time.

It makes delivery timelines unpredictable

Coordinated deployments are inherently riskier and slower than independent ones. The team must schedule release windows, write deployment runbooks, and plan rollback sequences. If one service fails during the coordinated release, the team must decide whether to roll back everything or push forward with a partial deployment. Neither option is safe.

Cross-service debugging also adds unpredictable time. A bug that manifests in Service A may originate in Service C’s response format. Tracing the issue requires reading logs from multiple services, correlating request IDs, and understanding the full call chain. What would be a 30-minute investigation in a monolith becomes a half-day effort.

It eliminates the benefits of services

The entire point of service decomposition is independent operation: deploy independently, scale independently, fail independently. A distributed monolith achieves none of these:

  • Cannot deploy independently. Deploying Service A without Service B breaks production because they share state or depend on matching contract versions without backward compatibility.
  • Cannot scale independently. The synchronous chain means scaling Service A is pointless if Service C (which Service A calls) cannot handle the increased load. The bottleneck moves but does not disappear.
  • Cannot fail independently. A failure in one service cascades through the chain. There are no circuit breakers, no fallbacks, and no graceful degradation because the services were not designed for partial failure.

Impact on continuous delivery

CD requires that every change can flow from commit to production independently. A distributed monolith makes this impossible because changes cannot be deployed independently. The deployment unit is not a single service but a coordinated set of services that must move together.

This forces the team back to batch releases: accumulate changes across services, test them together, deploy them together. The batch grows over time because each release window is expensive to coordinate. Larger batches mean higher risk, longer rollbacks, and less frequent delivery. The architecture that was supposed to enable faster delivery actively prevents it.

How to Fix It

Step 1: Map the actual dependencies

For each service, document:

  • What other services does it call synchronously?
  • What database tables does it share with other services?
  • What services must be deployed at the same time?

Draw the dependency graph. Services that form a cluster of mutual dependencies are candidates for consolidation or boundary correction.

Step 2: Identify domain boundaries

Map business capabilities to services. For each business operation (place an order, process a payment, send a notification), trace which services are involved. If a single business operation touches four services, the boundaries are wrong.

Correct boundaries align with business domains: orders, payments, inventory, users. Each domain service can handle its business operations without synchronous calls to other domain services. Cross-domain communication happens through asynchronous events or well-versioned APIs with backward compatibility.

Step 3: Consolidate or redraw one boundary (Weeks 3-8)

Pick the cluster with the worst coupling and address it:

  • If the services are small and owned by the same team, merge them into one service. This is the fastest fix. A single service with clear internal modules is better than three coupled services that cannot operate independently.
  • If the services are large or owned by different teams, redraw the boundary along domain lines. Move the scattered business logic into the service that owns that domain. Extract shared database tables into the owning service and replace direct table access with API calls.

Step 4: Break synchronous chains (Weeks 6+)

For cross-domain communication that remains after boundary correction:

  • Replace synchronous calls with asynchronous events where the caller does not need an immediate response. Order placed? Publish an event. The notification service subscribes and sends the email without the order service waiting for it.
  • For calls that must be synchronous, add backward-compatible versioning to contracts so each service can deploy on its own schedule.
  • Add circuit breakers and timeouts so that a failure in one service does not cascade to callers.

Step 5: Eliminate the shared database (Weeks 8+)

Each service should own its data. If two services need the same data, one of them owns the table and the other accesses it through an API. Shared database access is the most common source of hidden coupling and the most important to eliminate.

This is a gradual process: add the API, migrate one consumer at a time, and remove direct table access when all consumers have migrated.

ObjectionResponse
“Merging services is going backward”Merging poorly decomposed services is going forward. The goal is correct boundaries, not maximum service count. Fewer services with correct boundaries deliver faster than many services with wrong boundaries.
“Asynchronous communication is too complex”Synchronous chains across services are already complex and fragile. Asynchronous events are more resilient and allow each service to operate independently. The complexity is different, not greater, and it pays for itself in deployment independence.
“We can’t change the database schema without breaking everything”That is exactly the problem. The shared database is the coupling. Eliminating it is the fix, not an obstacle. Use the Strangler Fig pattern: add the API alongside the direct access, migrate consumers gradually, and remove the old path.

Measuring Progress

MetricWhat to look for
Services that must deploy togetherShould decrease as boundaries are corrected
Synchronous call chain depthShould decrease as chains are broken with async events
Shared database tablesShould decrease toward zero as each service owns its data
Lead timeShould decrease as coordinated releases are replaced by independent deployments
Change fail rateShould decrease as cascading failures are eliminated
Deployment coordination events per monthShould decrease toward zero