1 - AI Tooling Slows You Down Instead of Speeding You Up

It takes longer to explain the task to the AI, review the output, and fix the mistakes than it would to write the code directly.

What you are seeing

A developer opens an AI chat window to implement a function. They spend ten minutes writing a prompt that describes the requirements, the constraints, the existing patterns in the codebase, and the edge cases. The AI generates code. The developer reads through it line by line because they have no acceptance criteria to verify against. They spot that it uses a different pattern than the rest of the codebase and misses a constraint they mentioned. They refine the prompt. The AI produces a second version. It is better but still wrong in a subtle way. The developer fixes it by hand. Total time: forty minutes. Writing it themselves would have taken fifteen.

This is not a one-time learning curve. It happens repeatedly, on different tasks, across the team. Developers report that AI tools help with boilerplate and unfamiliar syntax but actively slow them down on tasks that require domain knowledge, codebase-specific patterns, or non-obvious constraints. The promise of “10x productivity” collides with the reality that without clear acceptance criteria, reviewing AI output means auditing the implementation detail by detail - which is often harder than writing the code from scratch.

Common causes

Skipping Specification and Prompting Directly

The most common cause of AI slowdown is jumping straight to code generation without defining what the change should do. Instead of writing an intent description, BDD scenarios, and acceptance criteria first, the developer writes a long prompt that mixes requirements, constraints, and implementation hints into a single message. The AI guesses at the scope. The developer reviews line by line because they have no checklist of expected behaviors. The prompt-review-fix cycle repeats until the output is close enough.

The specification workflow from the Agent Delivery Contract exists to prevent this. When the developer defines the intent (what the change should accomplish), the BDD scenarios (observable behaviors), and the acceptance criteria (how to verify correctness) before generating code, the AI has a constrained target and the developer has a checklist. If the specification for a single change takes more than fifteen minutes, the change is too large - split it.

Agents can help with specification itself. The agent-assisted specification workflow uses agents to find gaps in your intent, draft BDD scenarios, and surface edge cases - all before any code is generated. This front-loads the work where it is cheapest: in conversation, not in implementation review.

Read more: Agent-Assisted Specification

Missing Working Agreements on AI Usage

When the team has no shared understanding of which tasks benefit from AI and which do not, developers default to using AI on everything. Some tasks - writing a parser for a well-defined format, generating test fixtures, scaffolding boilerplate - are good AI targets. Other tasks - implementing complex business rules, debugging production issues, refactoring code with implicit constraints - are poor AI targets because the context transfer cost exceeds the implementation cost.

Without a shared agreement, each developer discovers this boundary independently through wasted time.

Read more: No Shared Workflow Expectations

Knowledge Silos

When domain knowledge is concentrated in a few people, the acceptance criteria for domain-heavy work exist only in those people’s heads. They can implement the feature faster than they can articulate the criteria for an AI prompt. For developers who do not have the domain knowledge, using AI is equally slow because they lack the criteria to validate the output against. Both situations produce slowdowns for different reasons - and both trace back to domain knowledge that has not been made explicit.

Read more: Knowledge Silos

How to narrow it down

  1. Are developers jumping straight to code generation without defining intent, scenarios, and acceptance criteria first? If the prompting-reviewing-fixing cycle consistently takes longer than direct implementation, the problem is usually skipped specification, not the AI tool. Start with Agent-Assisted Specification to define what the change should do before generating code.
  2. Does the team have a shared understanding of which tasks are good AI targets? If individual developers are discovering this through trial and error, the team needs working agreements. Start with the AI Adoption Roadmap to identify appropriate use cases.
  3. Are the slowest AI interactions on tasks that require deep domain knowledge? If AI struggles most where implicit business rules govern the implementation, the problem is not the AI tool but the knowledge distribution. Start with Knowledge Silos.

Ready to fix this? Start with Agent-Assisted Specification to learn the specification workflow that front-loads clarity before code generation.

2 - AI Is Generating Technical Debt Faster Than the Team Can Absorb It

AI tools produce working code quickly, but the codebase is accumulating duplication, inconsistent patterns, and structural problems faster than the team can address them.

What you are seeing

The team adopted AI coding tools six months ago. Feature velocity increased. But the codebase is getting harder to work in. Each AI-assisted session produces code that works - it passes tests, it satisfies the acceptance criteria - but it does not account for what already exists. The AI generates a new utility function that duplicates one three files away. It introduces a third pattern for error handling in a module that already has two. It copies a data access approach that the team decided to move away from last quarter.

Nobody catches these issues in review because the review standard is “does it do what it should and how do we validate it” - which is the right standard for correctness, but it does not address structural fitness. The acceptance criteria say what the change should do. They do not say “and it should use the existing error handling pattern” or “and it should not duplicate the date formatting utility.”

The debt is invisible in metrics. Test coverage is stable or improving. Change failure rate is flat. But development cycle time is creeping up because every new change must navigate around the inconsistencies the previous changes introduced. Refactoring is harder because the AI generated code in patterns the team did not choose and would not have written.

Common causes

No Scheduled Refactoring Sessions

AI generates code faster than humans refactor it. Without deliberate maintenance sessions scoped to cleaning up recently touched files, the codebase drifts toward entropy faster than it would with human-paced development. The team treats refactoring as something that happens organically during feature work, but AI-assisted feature sessions are scoped to their acceptance criteria and do not include cleanup.

The fix is not to allow AI to refactor during feature sessions - that mixes concerns and makes commits unreviewable. It is to schedule explicit refactoring sessions with their own intent, constraints, and acceptance criteria (all existing tests still pass, no behavior changes).

Read more: Pitfalls and Metrics - Schedule refactoring as explicit sessions

No Review Gate for Structural Quality

The team’s review process validates correctness (does it satisfy acceptance criteria?) and security (does it introduce vulnerabilities?) but not structural fitness (does it fit the existing codebase?). Standard review agents check for logic errors, security defects, and performance issues. None of them check whether the change duplicates existing code, introduces a third pattern where one already exists, or violates the team’s architectural decisions.

Automating structural quality checks requires two layers in the pre-commit gate sequence.

Layer 1: Deterministic tools

Deterministic tools run before any AI review and catch mechanical structural problems without token cost. These run in milliseconds and cannot be confused by plausible-looking but incorrect code. Add them to the pre-commit hook sequence alongside lint and type checking:

  • Duplication detection (e.g., jscpd) - flags when the same code block already exists elsewhere in the codebase. When AI generates a utility that already exists three files away, this catches it before review.
  • Complexity thresholds (e.g., ESLint complexity rule, lizard) - flags functions that exceed a cyclomatic complexity limit. AI-generated code tends toward deeply nested conditionals when the prompt does not specify a complexity budget.
  • Dependency and architecture rules (e.g., dependency-cruiser, ArchUnit) - encode module boundary constraints as code. When the team decided to move away from a direct database access pattern, architecture rules make violations a build failure rather than a code review comment.

These tools encode decisions the team has already made. Each one removes a category of structural drift from the review queue entirely.

Layer 2: Semantic review agent with architectural constraints

The semantic review agent can catch structural drift that deterministic tools cannot detect - like a third error-handling approach in a module that already has two - but only if the feature description includes architectural constraints. If the feature description covers only functional requirements, the agent has no basis for evaluating structural fit.

Add a constraints section to the feature description for every change:

  • “Use the existing UserRepository pattern - do not introduce new data access approaches”
  • “Error handling in this module follows the Result type pattern - do not introduce exceptions”
  • “New utilities belong in the shared/utils directory - do not create module-local utilities”

When the agent generates code that violates a stated constraint, the semantic review agent flags it. Without stated constraints, the agent cannot distinguish deliberate new patterns from drift.

The two layers are complementary. Deterministic tools handle mechanical violations fast and cheaply. The semantic review agent handles intent alignment and pattern consistency, but only where the feature description defines what those patterns are.

Read more: Coding and Review Agent Configuration - Semantic Review Agent

Rubber-Stamping AI-Generated Code

When developers do not own the change - cannot articulate what it does, what criteria they verified, or how they would detect a failure - they also do not evaluate whether the change fits the codebase. Structural quality requires someone to notice that the AI reinvented something that already exists. That noticing only happens when a human is engaged enough with the change to compare it against their knowledge of the existing system.

Read more: Rubber-Stamping AI-Generated Code

How to narrow it down

  1. Does the pre-commit gate include duplication detection, complexity limits, and architecture rules? If the only automated structural check is lint, the gate catches style violations but not structural drift. Add deterministic structural tools to the hook sequence described in Coding and Review Agent Configuration.
  2. Do feature descriptions include architectural constraints, not just functional requirements? If the feature description only says what the change should do but not how it should fit structurally, the semantic review agent has no basis for checking pattern conformance. Start by adding constraints to the Agent Delivery Contract.
  3. Is the team scheduling explicit refactoring sessions after feature work? If cleanup only happens incidentally during feature sessions, debt accumulates with every AI-assisted change. Start with the Pitfalls and Metrics guidance on scheduling maintenance sessions after every three to five feature sessions.
  4. Can developers identify where a new change duplicates existing code? If nobody in the review process is comparing the AI’s output against existing utilities and patterns, the team is not engaged enough with the change to catch structural drift. Start with Rubber-Stamping AI-Generated Code.

Ready to fix this? Start with the pre-commit gate. Add duplication detection and architecture rules to the hook sequence from Coding and Review Agent Configuration, then add architectural constraints to your feature description template. These two changes automate detection of the most common structural drift patterns on every change.

3 - Data Pipelines and ML Models Have No Deployment Automation

Application code has a CI/CD pipeline, but ML models and data pipelines are deployed manually or on an ad hoc schedule.

What you are seeing

ML models and data pipelines are deployed manually while application code has a full CI/CD pipeline. When a developer pushes a change to the application, tests run, an artifact is built, and deployment promotes automatically through environments. But the ML model that drives the product’s recommendations was trained two months ago and deployed by a data scientist who ran a Python script from their laptop. Nobody knows which version of the model is in production or what training data it was built on.

Data pipelines have a similar problem. The ETL job that populates the feature store was written in a Jupyter notebook, runs on a schedule via a cron job on a single server, and is updated by manually copying a new version to the server when it changes. There is no version control for the notebook, no automated tests for the pipeline logic, and no staging environment where the pipeline can be validated before it runs against production data.

Common causes

Missing deployment pipeline

The pipeline infrastructure that handles application deployments was not extended to cover model artifacts and data pipelines. Extending it requires ML-aware tooling - model registries, data versioning, training pipelines - that must be built or configured separately from standard application pipeline tools.

Establishing basic practices first - version control for pipeline code, a model registry with version tracking, automated tests for pipeline logic - creates the foundation. A minimal pipeline that validates data pipeline changes before production deployment closes the gap between how application code and model artifacts are treated, removing the dual delivery standard.

Read more: Missing deployment pipeline

Manual deployments

The default for ML work is manual because the discipline of ML operations is younger than software deployment automation. Without deliberate investment in model deployment automation, manual remains the default: a data scientist deploys a model by running a script, updating a config file, or copying files to a server.

Applying the same deployment automation principles to model deployment - versioned artifacts, automated promotion, health checks after deployment - closes the gap between ML and application delivery standards.

Read more: Manual deployments

Knowledge silos

Model deployment and data pipeline operations often live with specific individuals who have the expertise and the access to execute them. When those people are unavailable, model retraining, pipeline updates, and deployment operations cannot happen. The knowledge of how the ML infrastructure works is not distributed.

Documenting deployment procedures, building runbooks for model rollback, and cross-training team members on data infrastructure operations distributes the knowledge before automation is in place.

Read more: Knowledge silos

How to narrow it down

  1. Is the currently deployed model version tracked in version control with a record of when it was deployed? If not, there is no audit trail for model deployments. Start with Missing deployment pipeline.
  2. Can any engineer deploy an updated model or data pipeline, or does it require a specific person? If specific expertise is required, the knowledge is siloed. Start with Knowledge silos.
  3. Are data pipeline changes validated in a non-production environment before running against production data? If not, data pipeline changes go directly to production without validation. Start with Manual deployments.

Ready to fix this? The most common cause is Missing deployment pipeline. Start with its How to Fix It section for week-by-week steps.

4 - The Codebase No Longer Reflects the Business Domain

Business terms are used inconsistently. Domain rules are duplicated, contradicted, or implicit. No one can explain all the invariants the system is supposed to enforce.

What you are seeing

The same business concept goes by three different names in three different modules. A rule about how orders are validated exists in the API layer, partially in a service, and also in the database - with slight differences between them. A developer making a change to the payments flow discovers undocumented assumptions mid-implementation and is not sure whether they are intentional constraints or historical accidents.

New developers cannot form a coherent mental model of the domain from the code alone. They learn by asking colleagues, but colleagues often disagree or are uncertain. The system works, mostly, but nobody can fully explain why it is structured the way it is or what would break if a particular constraint were removed.

Common causes

Thin-Spread Teams

When engineers rotate through a domain without staying long enough to understand its business rules deeply, each rotation leaves its own layer of interpretation on the codebase. One team names a concept one way. The next team introduces a parallel concept with a different name because they did not recognize the existing one. A third team adds a validation rule without knowing an equivalent rule already existed elsewhere. Over time the code reflects the sequence of teams that worked in it rather than the business domain it is supposed to model.

Read more: Thin-Spread Teams

Knowledge Silos

When the canonical understanding of the domain lives in a few individuals, the code drifts from that understanding whenever those individuals are not involved in a change. Developers without deep domain knowledge make reasonable-seeming implementation choices that violate rules they were never told about. The gap between what the domain expert knows and what the code expresses widens with each change made without them.

Read more: Knowledge Silos

How to narrow it down

  1. Are the same business concepts named differently in different parts of the codebase? If a developer must learn multiple synonyms for the same thing to navigate the code, the domain model has been interpreted independently by multiple teams. Start with Thin-Spread Teams.
  2. Can team members explain all the validation rules the system enforces, and do their explanations agree? If there is disagreement or uncertainty, domain knowledge is not shared or externalized. Start with Knowledge Silos.

Ready to fix this? The most common cause is Knowledge Silos. Start with its How to Fix It section for week-by-week steps.


5 - The Development Workflow Has Friction at Every Step

Slow CI servers, poor CLI tools, and no IDE integration. Every step in the development process takes longer than it should.

What you are seeing

The CI servers are slow. A build that should take 5 minutes takes 25 because the agents are undersized and the queue is long. The IDE has no integration with the team’s testing framework, so running a specific test requires dropping to the command line and remembering the exact invocation syntax. The deployment CLI has no tab completion and cryptic error messages. The local development environment requires a 12-step ritual to restart after any configuration change.

Individual friction points seem minor in isolation. A 20-second wait is a slight inconvenience. A missing IDE shortcut is a small annoyance. But friction compounds. A developer who waits 20 seconds, remembers a command, waits 20 more seconds, then navigates an opaque error message has spent a minute on a task that should take 5 seconds. Across ten such interactions per day, across an entire team, this is a meaningful tax on throughput.

The larger cost is attentional, not temporal. Friction interrupts flow. When a developer has to stop thinking about the problem they are solving to remember a command syntax, context-switch to a different tool, or wait for an operation to complete, they lose the thread. Flow states that make complex problems tractable are incompatible with constant context switches caused by tooling friction.

Common causes

Missing deployment pipeline

Investment in pipeline tooling - build caching, parallelized test execution, automated deployment scripts with good error messages - directly reduces the friction of getting changes to production. Teams without this investment accumulate tooling debt. Each year that passes without improving the pipeline leaves a more elaborate set of workarounds in place.

A team that treats the pipeline as a first-class product, maintained and improved the same way they maintain production code, eliminates friction points incrementally. The slow CI queue, the missing IDE integration, the opaque deployment errors - each one is a bug in the pipeline product, and bugs get fixed when someone owns the product.

Read more: Missing deployment pipeline

Manual deployments

When the deployment process is manual, there is no pressure to make the tooling ergonomic. The person doing the deployment learns the steps and adapts. Automation forces the deployment process to be scripted, which creates an interface that can be improved, tested, and measured. A deployment script with good error messages and clear output is a better tool than a deployment runbook, and it can be improved as a piece of software.

Read more: Manual deployments

How to narrow it down

  1. How long does a full pipeline run take? If builds take more than 10 minutes, build caching and parallelization are likely available but not implemented. Start with Missing deployment pipeline.
  2. Can a developer deploy with a single command that provides clear output? If deployment requires multiple manual steps with opaque error messages, the tooling has not been invested in. Start with Manual deployments.
  3. Are builds getting faster over time? If build time is stable or increasing, nobody is actively working on pipeline performance. Start with Missing deployment pipeline.

Ready to fix this? The most common cause is Missing deployment pipeline. Start with its How to Fix It section for week-by-week steps.

6 - Getting a Test Environment Requires Filing a Ticket

Test environments are a scarce, contended resource. Provisioning takes days and requires another team’s involvement.

What you are seeing

A developer needs a clean environment to reproduce a bug. They file a ticket with the infrastructure team requesting environment access. The ticket enters a queue. Two days later, the environment is provisioned. By that time the developer has moved on to other work, the context for the bug is cold, and the urgency has faded.

Test environments are scarce because they are expensive to create manually. The infrastructure team provisions each one by hand: configuring servers, installing dependencies, seeding databases, updating DNS. The process takes hours of skilled work. Because it takes hours, environments are treated as long-lived shared resources rather than disposable per-task resources. Multiple teams share the same staging environment, which creates contention, coordination overhead, and mysterious failures when two teams’ work interacts unexpectedly.

The team has adapted by scheduling environment usage in advance and batching testing work. These adaptations work until there is a deadline, at which point contention over shared environments becomes a delivery risk.

Common causes

Snowflake environments

When environments are configured by hand, they cannot be created on demand. The cost of creating a new environment is the same as the cost of the initial configuration: hours of skilled work. This cost makes environments permanent rather than ephemeral. Infrastructure as code and containerization make environment creation a fast, automated operation that any team member can trigger.

When environments can be created in minutes from code, they stop being scarce. A developer who needs an environment can create one, use it, and destroy it. Two teams working on conflicting features each have their own environment. Contention disappears.

Read more: Snowflake environments

Missing deployment pipeline

Pipelines that include environment provisioning steps can spin up, run tests against, and tear down ephemeral environments as part of every run. The environment is created fresh for each test run and destroyed when the run completes. Without this capability, environments are managed manually outside the pipeline and must be shared.

A pipeline with environment provisioning gives every commit its own isolated environment. There is no ticket to file, no queue to wait in, no contention with other teams - the environment exists for the duration of the run and is gone when the run completes.

Read more: Missing deployment pipeline

Knowledge silos

The knowledge of how to provision an environment lives in the infrastructure team. Until that knowledge is codified as scripts or infrastructure code, environment creation requires a human from that team. The infrastructure team becomes a bottleneck even when they are working as fast as they can.

Externalizing environment provisioning knowledge into code - reproducible, runnable by anyone - removes the dependency on the infrastructure team for routine environment needs.

Read more: Knowledge silos

How to narrow it down

  1. Can a developer create a new isolated test environment without filing a ticket? If not, environment creation is not self-service. Start with Snowflake environments.
  2. Do multiple teams share a single staging environment? Shared environments create contention and interference. Start with Missing deployment pipeline.
  3. Is environment provisioning knowledge documented as runnable code? If provisioning requires knowing undocumented manual steps, the knowledge is siloed. Start with Knowledge silos.

Ready to fix this? The most common cause is Snowflake environments. Start with its How to Fix It section for week-by-week steps.

7 - The Deployment Target Does Not Support Modern CI/CD Tooling

Mainframes or proprietary platforms require custom integration or manual steps. CD practices stop at the boundary of the legacy stack.

What you are seeing

The deployment target is a z/OS mainframe, an AS/400, an embedded device firmware platform, or a proprietary industrial control system. The standard CI/CD tools the rest of the organization uses do not support this target. The vendor’s deployment tooling is command-line based, requires a licensed runtime, and was designed around a workflow that predates modern software delivery practices.

The team’s modern application code lives in a standard git repository with a standard pipeline for the web tier. But the batch processing layer, the financial calculation engine, or the device firmware is deployed through a completely separate process involving FTP, JCL job cards, and a deployment checklist that exists as a Word document on a shared drive.

The organization’s CD practices stop at the boundary of the modern stack. The legacy platform exists in a different operational world with different tooling, different skills, different deployment cadence, and different risk models. Bridging the two worlds requires custom integration work that is unglamorous, expensive, and consistently deprioritized.

Common causes

Manual deployments

Legacy platform deployments are almost always manual. The platform predates modern deployment automation. The deployment procedure exists in documentation and in the heads of the people who have done it. Without investment in custom tooling, mainframe deployments remain manual indefinitely.

Building automation for a mainframe or proprietary platform requires understanding both the platform’s native tools and modern automation principles. The result may not look like a standard pipeline, but it can provide the same benefits: consistent, repeatable, auditable deployments that do not require a specific person.

Read more: Manual deployments

Missing deployment pipeline

A pipeline that covers the full deployment surface - modern application code, database changes, and legacy platform components - requires platform-specific extensions. Standard pipeline tools do not ship with mainframe support, but they can be extended with custom steps that invoke platform-native tools. Without this investment, the pipeline covers only the modern stack.

Building coverage incrementally - wrapping the most common deployment operations first, then expanding - is more achievable than trying to fully automate a complex legacy deployment in one effort.

Read more: Missing deployment pipeline

Knowledge silos

Mainframe and proprietary platform skills are rare and concentrating. Teams typically have one or two people who understand the platform deeply. When those people leave, the deployment process becomes opaque to everyone remaining. The knowledge that enables manual deployments is not distributed and not documented in a form anyone else can use.

Deliberately distributing platform knowledge - pair deployments, written procedures, runbooks that reflect the actual current process - reduces single-person dependency even before automation is available.

Read more: Knowledge silos

How to narrow it down

  1. Is there anyone on the team other than one or two people who can deploy to the legacy platform? If not, knowledge concentration is the immediate risk. Start with Knowledge silos.
  2. Is the legacy platform deployment automated in any way? If completely manual, automation of even one step is a starting point. Start with Manual deployments.
  3. Is the legacy platform deployment included in the same pipeline as modern services? If it is managed outside the pipeline, it lacks all the pipeline’s safety properties. Start with Missing deployment pipeline.

Ready to fix this? The most common cause is Manual deployments. Start with its How to Fix It section for week-by-week steps.

8 - Developers Cannot Run the Pipeline Locally

The only way to know if a change passes CI is to push it and wait. Broken builds are discovered after commit, not before.

What you are seeing

A developer makes a change, commits, and pushes to CI. Thirty minutes later, the build is red. A linting rule was violated. Or a test file was missing from the commit. Or the build script uses a different version of a dependency than the developer’s local machine. The developer fixes the issue and pushes again. Another wait. Another failure - this time a test that only runs in CI and not in the local test suite.

This cycle destroys focus. The developer cannot stay in flow waiting for CI results. They switch to something else, then switch back when the notification arrives. Each context switch adds recovery time. A change that took thirty minutes to write takes two hours from first commit to green build, and the developer was not thinking about it for most of that time.

The deeper issue is that CI and local development are different environments. Tests that pass locally fail in CI because of dependency version differences, missing environment variables, or test execution order differences. The developer cannot reproduce CI failures locally, which makes them much harder to debug and creates a pattern of “push and hope” rather than “validate locally and push with confidence.”

Common causes

Missing deployment pipeline

Pipelines designed for cloud-only execution - pulling from private artifact repositories, requiring CI-specific secrets, using platform-specific compute resources - cannot run locally by construction. The pipeline was designed for the CI environment and only the CI environment.

Pipelines designed with local execution in mind use tools that run identically in any environment: containerized build steps, locally runnable test commands, shared dependency resolution. A developer running the same commands locally that the pipeline runs in CI gets the same results. The feedback loop shrinks from 30 minutes to seconds.

Read more: Missing deployment pipeline

Snowflake environments

When the CI environment differs from the developer’s local environment in ways that affect test outcomes, local and CI results diverge. Different OS versions, different dependency caches, different environment variables, different file system behaviors - any of these can cause tests to pass locally and fail in CI.

Standardized, code-defined environments that run identically locally and in CI eliminate the divergence. If the build step runs inside the same container image locally and in CI, the results are the same.

Read more: Snowflake environments

How to narrow it down

  1. Can a developer run every pipeline step locally? If any step requires CI-specific infrastructure, secrets, or platform features, that step cannot be validated before pushing. Start with Missing deployment pipeline.
  2. Do tests produce different results locally versus in CI? If yes, the environments differ in ways that affect test outcomes. Start with Snowflake environments.
  3. How long does a developer wait between push and feedback? If feedback takes more than a few minutes, the incentive is to batch pushes and work on something else while waiting. Start with Missing deployment pipeline.

Ready to fix this? The most common cause is Missing deployment pipeline. Start with its How to Fix It section for week-by-week steps.

9 - Setting Up a Development Environment Takes Days

New team members are unproductive for their first week. The setup guide is 50 steps long and always out of date.

What you are seeing

A new developer spends two days troubleshooting before the system runs locally. The wiki setup page was last updated 18 months ago. Step 7 refers to a tool that has been replaced. Step 12 requires access to a system that needs a separate ticket to provision. Step 19 assumes an operating system version that is three versions behind. Getting unstuck requires finding a teammate who has memorized the real procedure from experience.

The setup problem is not just a new-hire experience. It affects the entire team whenever someone gets a new machine, switches between projects, or tries to set up a second environment for a specific debugging purpose. The environment is fragile because it was assembled by hand and the assembly process was never made reproducible.

The business cost is usually invisible. Two days of new-hire setup is charged to onboarding. Senior engineers spending half a day helping unblock new hires is charged to sprint work. Developers who avoid setting up new environments and work around the problem are charged to productivity. None of these costs appear on a dashboard that anyone monitors.

Common causes

Snowflake environments

When development environments are not reproducible from code, the assembly process exists only in documentation (which drifts) and in the heads of people who have done it before (who are not always available). Each environment is assembled slightly differently, which means the “how to set up a development environment” question has as many answers as there are developers on the team.

When the environment definition is versioned alongside the code, setup becomes a single command. A new developer who runs that command gets the same working environment as everyone else on the team - no 18-month-old wiki page, no tribal knowledge required, no two-day troubleshooting session. When the code changes in ways that require environment changes, the environment definition is updated at the same time.

Read more: Snowflake environments

Knowledge silos

The real setup procedure exists in the heads of specific team members who have run it enough times to know which steps to skip and which to do differently on which operating systems. When those people are unavailable, setup fails. The knowledge gap is only visible when someone needs it.

When environment setup is codified as runnable scripts and containers, the knowledge is distributed to everyone who can read the code. A new developer no longer has to find the one person who remembers which steps to skip - they run the script, and it works.

Read more: Knowledge silos

Tightly coupled monolith

When running any part of the application requires the full monolith running - including all its dependencies, services, and backing infrastructure - local setup is inherently complex. A developer who only needs to work on the notification service must stand up the entire application, all its databases, and all the services the notification service depends on, which is everything.

Decomposed services with stable interfaces can be developed in isolation. A developer working on the notification service stubs the services it calls and focuses on the piece they are changing. Setup is proportional to scope.

Read more: Tightly coupled monolith

How to narrow it down

  1. Can a new team member set up a working development environment without help? If not, the setup process is not self-contained. Start with Snowflake environments.
  2. Does setup require tribal knowledge that is not captured in the documented procedure? If team members need to “fill in the gaps” from memory, that knowledge needs to be externalized. Start with Knowledge silos.
  3. Does running a single service require running the entire application? If so, local development is inherently complex. Start with Tightly coupled monolith.

Ready to fix this? The most common cause is Snowflake environments. Start with its How to Fix It section for week-by-week steps.

10 - Bugs in Familiar Areas Take Disproportionately Long to Fix

Defects that should be straightforward take days to resolve because the people debugging them are learning the domain as they go. Fixes sometimes introduce new bugs in the same area.

What you are seeing

A bug is filed against the billing module. It looks simple from the outside - a calculation is off by a percentage in certain conditions. The developer assigned to it spends a day reading code before they can even reproduce the problem reliably. The fix takes another day. Two weeks later, a related bug appears: the fix was correct for the case it addressed but violated an assumption elsewhere in the module that nobody told the developer about.

Defect resolution time in specific areas of the system is consistently longer than in others. Post-mortems note that the fix was made by someone unfamiliar with the domain. Bugs cluster in the same modules, with fixes that address the symptom rather than the underlying rule that was violated.

Common causes

Knowledge Silos

When only a few people understand a domain deeply, defects in that domain can only be resolved quickly by those people. When they are unavailable - on leave, on another team, or gone - the bug sits or gets assigned to someone who must reconstruct context before they can make progress. The reconstruction is slow, incomplete, and prone to introducing new violations of rules the developer discovers only after the fact.

Read more: Knowledge Silos

Thin-Spread Teams

When engineers are rotated through a domain based on capacity, the person available to fix a bug is often not the person who knows the domain. They are familiar with the tech stack but not with the business rules, edge cases, and historical decisions that make the module behave the way it does. Debugging becomes an exercise in reverse-engineering domain knowledge from code that may not accurately reflect the original intent.

Read more: Thin-Spread Teams

How to narrow it down

  1. Are defect resolution times consistently longer in specific modules than in others? If certain areas of the system take significantly longer to debug regardless of defect severity, those areas have a knowledge concentration problem. Start with Knowledge Silos.
  2. Do fixes in certain areas frequently introduce new bugs in the same area? If corrections create new violations, the developer fixing the bug lacks the domain knowledge to understand the full set of constraints they are working within. Start with Thin-Spread Teams.

Ready to fix this? The most common cause is Knowledge Silos. Start with its How to Fix It section for week-by-week steps.