This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Getting Started

Agent configuration, learning path, prompting skills, and organizational readiness for agentic continuous delivery.

Start here. These pages cover the configuration, skills, and prerequisites teams need before agents can safely contribute to the delivery pipeline.

1 - Getting Started: Where to Put What

How to structure agent configuration across the project context file, rules, skills, and hooks - mapped to their purpose and time horizon for effective context management.

Each configuration mechanism serves a different purpose. Placing information in the right mechanism controls context cost: it determines what every agent pays on every invocation, and what must be loaded only when needed.

Configuration Mechanisms

MechanismPurposeWhen loaded
Project context fileProject facts every agent always needsEvery session
Rules (system prompts)Per-agent behavior constraintsEvery agent invocation
SkillsNamed session procedures - the specificationOn explicit invocation
CommandsNamed invocations - trigger a skill or a direct actionOn user or agent call
HooksAutomated, deterministic actionsOn trigger event - no agent involved

Project Context File

The project context file is a markdown document that every agent reads at the start of every session. Put here anything that every agent always needs to know about the project. The filename differs by tool - Claude Code uses CLAUDE.md, Gemini CLI uses GEMINI.md, OpenAI Codex uses AGENTS.md, and GitHub Copilot uses .github/copilot-instructions.md - but the purpose does not.

Put in the project context file:

  • Language, framework, and toolchain versions
  • Repository structure - key directories and what lives where
  • Architecture decisions that constrain all changes (example: “this service must not make synchronous external calls in the request path”)
  • Non-obvious conventions that agents would otherwise violate (example: “all database access goes through the repository layer; never access the ORM directly from handlers”)
  • Where tests live and naming conventions for test files
  • Non-obvious business rules that govern all changes

Do not put in the project context file:

  • Task instructions - those go in rules or skills
  • File contents - load those dynamically per session
  • Context specific to one agent - that goes in that agent’s rules
  • Anything an agent only needs occasionally - load it when needed, not always

Because the project context file loads on every session, every line is a token cost on every invocation. Keep it to stable facts, not procedures. A bloated project context file is an invisible per-session tax.

# Language and toolchain
Language: Java 21, Spring Boot 3.2

# Repository structure
services/   bounded contexts - one service per domain
shared/     cross-cutting concerns - no domain logic here

# Architecture constraints
- No direct database access from handlers; all access through the repository layer
- All external calls go through a port interface; never instantiate adapters from handlers
- Payment processing is synchronous; fulfillment is always async via the event bus

# Test layout
src/test/unit/         fast, no I/O
src/test/integration/  requires running dependencies
Test class names mirror source class names with a Test suffix
# Language and toolchain
Language: Java 21, Spring Boot 3.2

# Repository structure
services/   bounded contexts - one service per domain
shared/     cross-cutting concerns - no domain logic here

# Architecture constraints
- No direct database access from handlers; all access through the repository layer
- All external calls go through a port interface; never instantiate adapters from handlers
- Payment processing is synchronous; fulfillment is always async via the event bus

# Test layout
src/test/unit/         fast, no I/O
src/test/integration/  requires running dependencies
Test class names mirror source class names with a Test suffix
# Language and toolchain
Language: Java 21, Spring Boot 3.2

# Repository structure
services/   bounded contexts - one service per domain
shared/     cross-cutting concerns - no domain logic here

# Architecture constraints
- No direct database access from handlers; all access through the repository layer
- All external calls go through a port interface; never instantiate adapters from handlers
- Payment processing is synchronous; fulfillment is always async via the event bus

# Test layout
src/test/unit/         fast, no I/O
src/test/integration/  requires running dependencies
Test class names mirror source class names with a Test suffix
# Language and toolchain
Language: Java 21, Spring Boot 3.2

# Repository structure
services/   bounded contexts - one service per domain
shared/     cross-cutting concerns - no domain logic here

# Architecture constraints
- No direct database access from handlers; all access through the repository layer
- All external calls go through a port interface; never instantiate adapters from handlers
- Payment processing is synchronous; fulfillment is always async via the event bus

# Test layout
src/test/unit/         fast, no I/O
src/test/integration/  requires running dependencies
Test class names mirror source class names with a Test suffix

Rules (System Prompts)

Rules define how a specific agent behaves. Each agent has its own rules document, injected at the top of that agent’s context on every invocation. Rules are stable across sessions - they define the agent’s operating constraints, not what it is doing right now.

Put in rules:

  • Agent scope: what the agent is responsible for, and explicitly what it is not
  • Output format requirements - especially for agents whose output feeds another agent (use structured JSON at these boundaries)
  • Explicit prohibitions (“do not modify files not in your context”)
  • Early-exit conditions to minimize cost (“if the diff contains no logic changes, return {"decision": "pass"} immediately without analysis”)
  • Verbosity constraints (“return code only; no explanation unless explicitly requested”)

Do not put in rules:

  • Project facts - those go in the project context file
  • Session-specific information - that is loaded dynamically by the orchestrator
  • Multi-step procedures - those go in skills

Rules are placed first in every agent’s context. This placement is a caching decision, not just convention. Stable content at the top of context allows the model’s server to cache the rules prefix and reuse it across calls, which reduces the effective input cost of every invocation. See Tokenomics for how caching interacts with context order.

Rules are plain markdown, injected at session start. The content is the same regardless of tool; where it lives differs.

## Implementation Rules

Implement exactly one BDD scenario per session.
Output: return code changes only. No explanation, no rationale, no alternatives.
Flag a concern as: CONCERN: [one sentence]. The orchestrator decides what to do with it.

Context: modify only files provided in your context.
If you need a file not provided, request it as:
  CONTEXT_NEEDED: [filename] - [one sentence why]
Do not infer or reproduce the contents of files not in your context.

Done when: the acceptance test for this scenario passes and all prior tests still pass.
## Implementation Rules

Implement exactly one BDD scenario per session.
Output: return code changes only. No explanation, no rationale, no alternatives.
Flag a concern as: CONCERN: [one sentence]. The orchestrator decides what to do with it.

Context: modify only files provided in your context.
If you need a file not provided, request it as:
  CONTEXT_NEEDED: [filename] - [one sentence why]
Do not infer or reproduce the contents of files not in your context.

Done when: the acceptance test for this scenario passes and all prior tests still pass.
## Implementation Rules

Implement exactly one BDD scenario per session.
Output: return code changes only. No explanation, no rationale, no alternatives.
Flag a concern as: CONCERN: [one sentence]. The orchestrator decides what to do with it.

Context: modify only files provided in your context.
If you need a file not provided, request it as:
  CONTEXT_NEEDED: [filename] - [one sentence why]
Do not infer or reproduce the contents of files not in your context.

Done when: the acceptance test for this scenario passes and all prior tests still pass.
## Implementation Rules

Implement exactly one BDD scenario per session.
Output: return code changes only. No explanation, no rationale, no alternatives.
Flag a concern as: CONCERN: [one sentence]. The orchestrator decides what to do with it.

Context: modify only files provided in your context.
If you need a file not provided, request it as:
  CONTEXT_NEEDED: [filename] - [one sentence why]
Do not infer or reproduce the contents of files not in your context.

Done when: the acceptance test for this scenario passes and all prior tests still pass.

Skills

A skill is a named session procedure - a markdown document describing a multi-step workflow that an agent invokes by name. The agent reads the skill document, follows its instructions, and returns a result. A skill has no runtime; it is pure specification in text. Claude Code calls these commands and stores them in .claude/commands/; Gemini CLI uses .gemini/skills/; OpenAI Codex supports procedure definitions in AGENTS.md; GitHub Copilot reads procedure markdown from .github/.

Put in skills:

  • Session lifecycle procedures: how to start a session, how to run the pre-commit review gate, how to close a session and write the summary
  • Pipeline-restore procedures for when the pipeline fails mid-session
  • Any multi-step workflow the agent should execute consistently and reproducibly

Do not put in skills:

  • One-time instructions - write those inline
  • Anything that should run automatically without agent involvement - that belongs in a hook
  • Project facts - those go in the project context file
  • Per-agent behavior constraints - those go in rules

Each skill should do one thing. A skill named review-and-commit is doing two things. Split it. When a procedure fails mid-execution, a single-responsibility skill makes it obvious which step failed and where to look.

A normal session runs three skills in sequence: /start-session (assembles context and prepares the implementation agent), /review (invokes the pre-commit review gate), and /end-session (validates all gates, writes the session summary, and commits). Add /fix for pipeline-restore mode. See Coding & Review Setup for the complete definition of each skill.

The skill text is identical across tools. Where the file lives differs:

ToolSkill location
Claude Code.claude/commands/start-session.md
Gemini CLI.gemini/skills/start-session.md
OpenAI CodexNamed ## Task: section in AGENTS.md
GitHub Copilot.github/start-session.md

Commands

A command is a named invocation - it is how you or the agent triggers a skill. Skills define what to do; commands are how you call them. In Claude Code, a file named start-session.md in .claude/commands/ creates the /start-session command automatically. In Gemini CLI, skills in .gemini/skills/ are invoked by name in the same way. The command name and the skill document are one-to-one: one file, one command.

Put in commands:

  • Short-form aliases for frequently used skills (example: /review instead of “run the pre-commit review gate”)
  • Direct one-line instructions that do not need a full skill document (“summarize the session”, “list open scenarios”)
  • Agent actions you want to invoke consistently by name without retyping the instruction

Do not put in commands:

  • Multi-step procedures - those belong in a skill document that the command references
  • Anything that should run without being called - that belongs in a hook
  • Project facts or behavior constraints - those go in the project context file or rules

A command that runs a multi-step procedure should invoke the skill document by name, not inline the steps. This keeps the command short and the procedure in one place.

# .claude/commands/review.md
# Invoked as: /review

Run the pre-commit review gate against all staged changes.
Pass staged diff, current BDD scenario, and feature description to the review orchestrator.
Parse the JSON result directly. If "decision" is "block", return findings to the implementation agent.
Do not commit until /review returns {"decision": "pass"}.
# .gemini/skills/review.md
# Invoked as: /review

Run the pre-commit review gate against all staged changes.
Pass staged diff, current BDD scenario, and feature description to the review orchestrator.
Parse the JSON result directly. If "decision" is "block", return findings to the implementation agent.
Do not commit until /review returns {"decision": "pass"}.
# Defined as a named task section in AGENTS.md
# Invoked by name in the session prompt

## Task: review

Run the pre-commit review gate against all staged changes.
Pass staged diff, current BDD scenario, and feature description to the review orchestrator.
Parse the JSON result directly. If "decision" is "block", return findings to the implementation agent.
Do not commit until review returns {"decision": "pass"}.
# .github/review.md
# Referenced by name in the session prompt

Run the pre-commit review gate against all staged changes.
Pass staged diff, current BDD scenario, and feature description to the review orchestrator.
Parse the JSON result directly. If "decision" is "block", return findings to the implementation agent.
Do not commit until review returns {"decision": "pass"}.

Hooks

Hooks are automated actions triggered by events - pre-commit, file-save, post-test. Hooks run deterministic tooling: linters, type checkers, secret scanners, static analysis. No agent decision is involved; the tool either passes or blocks.

Put in hooks:

  • Linting and formatting checks
  • Type checking
  • Secret scanning
  • Static analysis (SAST)
  • Any check that is fast, deterministic, and should block on failure without requiring judgment

Do not put in hooks:

  • Semantic review - that requires an agent; invoke the review orchestrator via a skill
  • Checks that require judgment - agents decide, hooks enforce
  • Steps that depend on session context - hooks operate without session awareness

Hooks run before the review agent. If the linter fails, there is no reason to invoke the review orchestrator. Deterministic checks fail fast; the AI review gate runs only on changes that pass the baseline mechanical checks.

Git pre-commit hooks are independent of the AI tool - they run via git regardless of which model you use. Claude Code and Gemini CLI additionally support tool-use hooks in their settings.json, which trigger shell commands in response to agent events (for example, running linters automatically when the agent stops). OpenAI Codex and GitHub Copilot do not have an equivalent built-in hook system; use git hooks directly with those tools.

# .pre-commit-config.yaml - runs on git commit, before AI review
repos:
  - repo: local
    hooks:
      - id: lint
        name: Lint
        entry: npm run lint -- --check
        language: system
        pass_filenames: false

      - id: type-check
        name: Type check
        entry: npm run type-check
        language: system
        pass_filenames: false

      - id: secret-scan
        name: Secret scan
        entry: detect-secrets-hook
        language: system
        pass_filenames: false

      - id: sast
        name: Static analysis
        entry: semgrep --config auto
        language: system
        pass_filenames: false
{
  "hooks": {
    "Stop": [
      {
        "hooks": [
          {
            "type": "command",
            "command": "npm run lint -- --check && npm run type-check"
          }
        ]
      }
    ]
  }
}
{
  "hooks": {
    "afterResponse": [
      {
        "command": "npm run lint -- --check && npm run type-check"
      }
    ]
  }
}
No built-in tool-use hook system. Use git hooks (.pre-commit-config.yaml)
alongside these tools - see the "Git hooks (all tools)" tab.

The AI review step (/review) runs after these pass. It is invoked by the agent as part of the session workflow, not by the hook sequence directly.


Decision Framework

For any piece of information or procedure, apply this sequence:

  1. Does every agent always need this? - Project context file
  2. Does this constrain how one specific agent behaves? - That agent’s rules
  3. Is this a multi-step procedure invoked by name? - A skill
  4. Is this a short invocation that triggers a skill or a direct action? - A command
  5. Should this run automatically without any agent decision? - A hook

Context Loading Order

Within each agent invocation, load context in this order:

  1. Agent rules (stable - cached across every invocation)
  2. Project context file (stable - cached across every invocation)
  3. Feature description (stable within a feature - often cached)
  4. BDD scenario for this session (changes per session)
  5. Relevant existing files (changes per session)
  6. Prior session summary (changes per session)
  7. Staged diff or current task context (changes per invocation)

Stable content at the top. Volatile content at the bottom. Rules and the project context file belong at the top because they are constant across invocations and benefit from server-side caching. Staged diffs and current files change on every call and provide no caching benefit regardless of where they appear.


File Layout

The examples below show how the configuration mechanisms map to Claude Code, Gemini CLI, OpenAI Codex CLI, and GitHub Copilot. The file names and locations differ; the purpose of each mechanism does not.

.claude/
  agents/
    orchestrator.md     # sub-agent definition: system prompt + model for the orchestrator
    implementation.md   # sub-agent definition: system prompt + model for code generation
    review.md           # sub-agent definition: system prompt + model for review coordination
  commands/
    start-session.md    # skill + command: /start-session - session initialization
    review.md           # skill + command: /review - pre-commit gate
    end-session.md      # skill + command: /end-session - writes summary and commits
    fix.md              # skill + command: /fix - pipeline-restore mode
  settings.json         # hooks - tool-use event triggers (Stop, PreToolUse, etc.)
CLAUDE.md               # project context file - facts for all agents
.gemini/
  skills/
    start-session.md   # skill document - invoked as /start-session
    review.md          # skill document - invoked as /review
    end-session.md     # skill document - invoked as /end-session
    fix.md             # skill document - invoked as /fix
  settings.json        # hooks - afterResponse and other event triggers
GEMINI.md              # project context file - facts for all agents
                       # agent configurations injected programmatically at session start
AGENTS.md   # project context file and named task definitions
            # skills and commands defined as ## Task: name sections
            # agent configurations injected programmatically at session start
            # git hooks handle pre-commit checks (.pre-commit-config.yaml)
.github/
  copilot-instructions.md   # project context file - facts for all agents
  start-session.md           # skill document - referenced by name in the session
  review.md                  # skill document - referenced by name in the session
  end-session.md             # skill document - referenced by name in the session
  fix.md                     # skill document - referenced by name in the session
                             # agent configurations injected via VS Code extension settings
                             # git hooks handle pre-commit checks (.pre-commit-config.yaml)

The skill and command documents are plain markdown in all cases - the same procedure text works across tools because skills are specifications, not code. In Claude Code, the commands directory unifies both: each file in .claude/commands/ is a skill document and creates a slash command of the same name. The .claude/agents/ directory is specific to Claude Code - it defines named sub-agents with their own system prompt and model tier, invocable by the orchestrator. Other tools handle agent configuration programmatically rather than via files. For multi-agent architectures and advanced agent composition, see Agentic Architecture Patterns.


Decomposed Context by Code Area

A single project context file at the repo root works for small codebases. For larger ones with distinct bounded contexts, split the project context file by code area. Claude Code, Gemini CLI, and OpenAI Codex load context files hierarchically: when an agent works in a subdirectory, it reads the context file there in addition to the root-level file. Area-specific facts stay out of the root file and load only when relevant, which reduces per-session token cost for agents working in unrelated areas.

CLAUDE.md       # repo-wide: language, toolchain, top-level architecture
src/
  payments/
    CLAUDE.md   # payments context: domain rules, payment processor contracts
  inventory/
    CLAUDE.md   # inventory context: stock rules, warehouse integrations
  api/
    CLAUDE.md   # API layer: auth patterns, rate limiting conventions
GEMINI.md       # repo-wide: language, toolchain, top-level architecture
src/
  payments/
    GEMINI.md   # payments context: domain rules, payment processor contracts
  inventory/
    GEMINI.md   # inventory context: stock rules, warehouse integrations
  api/
    GEMINI.md   # API layer: auth patterns, rate limiting conventions
AGENTS.md       # repo-wide: language, toolchain, top-level architecture
src/
  payments/
    AGENTS.md   # payments context: domain rules, payment processor contracts
  inventory/
    AGENTS.md   # inventory context: stock rules, warehouse integrations
  api/
    AGENTS.md   # API layer: auth patterns, rate limiting conventions
# GitHub Copilot uses a single .github/copilot-instructions.md
# Decompose by area using sections within that file

.github/
  copilot-instructions.md   # repo-wide facts at the top; area sections below

# Inside copilot-instructions.md:
#
# ## Payments
# Domain rules and payment processor contracts
#
# ## Inventory
# Stock rules and warehouse integrations
#
# ## API layer
# Auth patterns and rate limiting conventions

What goes in area-specific files: Facts that apply only to that area - domain rules, local naming conventions, area-specific architecture constraints, and non-obvious business rules that govern changes in that part of the codebase. Do not repeat content already in the root file.


2 - The Agentic Development Learning Curve

The stages developers normall experience as they learn to work with AI - why many stay stuck at Stage 1 or 2, and what information is needed to progress.

Many developers using AI coding tools today are at Stage 1 or Stage 2. Many conclude from that experience that AI is only useful for boilerplate, or that it cannot handle real work. That conclusion is not wrong given their experience - it is wrong about the ceiling. The ceiling they hit is the ceiling of that stage, not of AI-assisted development. Every stage above has a higher ceiling, but the path up is not obvious without exposure to better practices.

The progression below describes the stages developers generally experience when learning AI-assisted development. At each stage, a specific bottleneck limits how much value AI actually delivers. Solving that constraint opens the next stage. Ignoring it means productivity gains plateau - or reverse - and developers conclude AI is not worth the effort.

Progress through these stages does not happen naturally or automatically. It requires intentional practice changes and, most importantly, exposure to what the next stage looks like. Many developers never see Stages 4 through 6 demonstrated. They optimize within the stage they are at and assume that is the limit of the technology.

Stage 1: Autocomplete

Stage 1 workflow: Developer types code, AI inline suggestion appears, developer accepts or rejects, code committed. Bottleneck: model infers intent from surrounding code, not from what you mean.

What it looks like: AI suggests the next line or block of code as you type. You accept, reject, or modify the suggestion and keep typing. GitHub Copilot tab completion, Cursor tab, and similar tools operate in this mode.

Where it breaks down: Suggestions are generated from context the model infers, not from what you intend. For non-trivial logic, suggestions are plausible-looking but wrong - they compile, pass surface review, and fail at runtime or in edge cases. Teams that stop reviewing suggestions carefully discover this months later when debugging code they do not remember writing.

What works: Low friction, no context management, passive. Excellent for boilerplate, repetitive patterns, argument completion, and common idioms. Speed gains are real, especially for code that follows well-known patterns.

Why developers stay here: The gains at Stage 1 are real and visible. Autocomplete is faster than typing, requires no workflow change, and integrates invisibly into existing habits. There is no obvious failure that signals a ceiling has been hit - developers just accept that AI is useful for simple things and not for complex ones. Without seeing what Stage 4 or Stage 5 looks like, there is no reason to assume a better approach exists.

What drives the move forward: Deliberate curiosity, or an incident traced to an accepted suggestion the developer did not scrutinize. Developers who move forward are usually ones who encountered a demonstration of a higher stage and wanted to replicate it - not ones who naturally outgrew autocomplete.

Stage 2: Prompted Function Generation

Stage 2 workflow: Developer describes task, LLM generates function, developer manually integrates output into codebase. Bottleneck: scope ceiling and manual integration errors.

What it looks like: The developer describes what a function or module should do, pastes the description into a chat interface, and integrates the result. This is single-turn: one request, one response, manual integration.

Where it breaks down: Scope creep. As requests grow beyond a single function, integration errors accumulate: the generated code does not match the surrounding codebase’s patterns, imports are wrong, naming conflicts emerge. The developer rewrites more than half the output and the AI saved little time. Larger requests also produce confidently incorrect code - the model cannot ask clarifying questions, so it fills in assumptions.

What works: Bounded, well-scoped tasks with clear inputs and outputs. Writing a parser, formatting utility, or data transformation that can be fully described in a few sentences. The developer reviews a self-contained unit of work.

Why developers abandon here: Stage 2 is where many developers decide AI “cannot write real code.” They try a larger task, receive confidently wrong output, spend an hour correcting it, and conclude the tool is not worth the effort for anything non-trivial. That conclusion is accurate at Stage 2. The problem is not the technology - it is the workflow. A single-turn prompt with no context, no surrounding code, and no specified constraints will produce plausible-looking guesses for anything beyond simple functions. Developers who abandon here never discover that the same model, given different inputs through a different workflow, produces dramatically better output.

What drives the move forward: Frustration that AI is only useful for small tasks, combined with exposure to someone using it for larger ones. The realization that giving the AI more context - the surrounding files, the calling code, the data structures - would produce better output. This realization is the entry point to context engineering.

Stage 3: Chat-Driven Development

Stage 3 workflow: Developer and LLM exchange prompts and responses across many turns, context fills up, developer manually pastes output into editor. Bottleneck: context degradation and manual integration.

What it looks like: Multi-turn back-and-forth with the model. Developer pastes relevant code, describes the problem, asks for changes, reviews output, pastes it back with follow-up questions. The conversation itself becomes the working context.

Where it breaks down: Context accumulates. Long conversations degrade model performance as the relevant information gets buried. The model loses track of constraints stated early in the conversation. Developers start seeing contradictions between what the model said in turn 3 and what it generates in turn 15. Integration is still manual - copying from chat into the editor introduces transcription errors. The history of what changed and why lives in a chat window, not in version control.

What works: Exploration and learning. Asking “why does this fail” with a stack trace and getting a diagnosis. Iterating on a design by discussing trade-offs. For developers learning a new framework or language, this stage can be transformative.

What drives the move forward: The integration overhead and context degradation become obvious. Developers want the AI to work directly in the codebase, not through a chat buffer.

Stage 4: Agentic Task Completion

Stage 4 workflow: Developer gives vague task to agent, agent reads and edits multiple files, produces a large diff, developer manually reviews before merging. Bottleneck: vague requirements cause drift; reviewer must reconstruct intent.

What it looks like: The agent has tool access - it reads files, edits files, runs commands, and works across the codebase autonomously. The developer describes a task and the agent executes it, producing diffs across multiple files.

Where it breaks down: Vague requirements. An agent given a fuzzy description makes reasonable-but-wrong architectural decisions, names things inconsistently, misses edge cases it cannot infer from the existing code, and produces changes that look correct locally but break something upstream. Review becomes hard because the diff spans many files and the reviewer must reconstruct the intent from the code rather than from a stated specification. Hallucinated APIs, missing error handling, and subtle correctness errors compound because each small decision compounds on the next.

What works: Larger-scoped tasks with clear intent. Refactoring a module to match a new interface, generating tests for existing code, migrating a dependency. The agent navigates the codebase rather than receiving pasted excerpts.

What drives the move forward: Review burden. The developer spends more time validating the agent’s output than they would have spent writing the code. The insight that emerges: the agent needs the same thing a new team member needs - explicit requirements, not vague descriptions.

Stage 5: Spec-First Agentic Development

Stage 5 workflow: Human writes spec, agent generates tests, agent generates implementation, pipeline enforces correctness. All output still routes to human review. Bottleneck: human review throughput cannot keep pace with generation rate.

What it looks like: The developer writes a specification before the agent writes any code. The specification includes intent (why), behavior scenarios (what users experience), and constraints (performance budgets, architectural boundaries, edge case handling). The agent generates test code from the specification first. Tests pass when the behavior is correct. Implementation follows. The Agent Delivery Contract defines the artifact structure. Agent-Assisted Specification describes how to produce specifications at a pace that does not bottleneck the development cycle.

Where it breaks down: Review volume. A fast agent with a spec-first workflow generates changes faster than a human reviewer can validate them. The bottleneck shifts from code generation quality to human review throughput. The developer is now a reviewer of machine output, which is not where they deliver the most value.

What works: Outcomes become predictable. The agent has bounded, unambiguous requirements. Tests make failures deterministic rather than subjective. Code review focuses on whether the implementation is reasonable, not on reconstructing what the developer meant. The specification becomes the record of why a change exists.

What drives the move forward: The review queue. Agents generate changes at a pace that exceeds human review bandwidth. The next stage is not about the developer working harder - it is about replacing the human at the review stages that do not require human judgment.

Stage 6: Multi-Agent Architecture

Stage 6 workflow: Human defines spec, orchestrator routes work to coding agent, parallel reviewer agents validate test fidelity, architecture, and intent, pipeline enforces gates, human reviews only flagged exceptions.

What it looks like: Separate specialized agents handle distinct stages of the workflow. A coding agent implements behavior from specifications. Reviewer agents run in parallel to validate test fidelity, architectural conformance, and intent alignment. An orchestrator routes work and manages context boundaries. Humans define specifications and review what agents flag - they do not review every generated line.

What works: The throughput constraint from Stage 5 is resolved. Expert review agents run at pipeline speed, not human reading speed. Each agent is optimized for its task - the reviewer agents receive only the artifacts relevant to their review, keeping context small and costs bounded. Token costs are an architectural concern, not a billing surprise.

What the architecture requires:

  • Explicit, machine-readable specifications that agent reviewers can validate against
  • Structured inter-agent communication (not prose) so outputs transfer efficiently
  • Model routing by task: smaller models for classification and routing, frontier models for complex reasoning
  • Per-workflow token cost measurement, not per-call measurement
  • A pipeline that can run multiple agents in parallel and collect results before promotion
  • Human ownership of specifications - the stages that require judgment about what matters to the business

This is the ACD destination. The ACD workflow defines the complete sequence. The agent delivery contract are the structured documents the workflow runs on. Tokenomics covers how to architect agents to keep costs in proportion to value. Coding & Review Setup shows a recommended orchestrator, coder, and reviewer configuration.

Why Progress Stalls

Many developers do not advance past Stage 2 because the path forward is not visible from within Stage 1 or 2. The information gap is the dominant constraint, not motivation or skill.

The problem at Stage 1: Autocomplete delivers real, immediate value. There is no pressing failure, no visible ceiling, no obvious reason to change the workflow. Developers optimize their Stage 1 usage - learning which suggestions to trust, which to skip - and reach a stable equilibrium. That equilibrium is far below what is possible.

The problem at Stage 2: The first serious failure at Stage 2 - an hour spent correcting hallucinated output - produces a lasting conclusion: AI is only for simple things. This conclusion comes from a single data point that is entirely valid for that workflow. The developer does not know the problem is the workflow.

The problem at Stages 3-4: Developers who push past Stage 2 often hit Stage 3 or 4 and run into context degradation or vague-requirements drift. Without spec-first discipline, agentic task completion produces hard-to-review diffs and subtle correctness errors. The failure mode looks like “AI makes more work than it saves” - which is true for that approach. Many developers loop back to Stage 2 and conclude they are not missing much.

What breaks the pattern: Seeing a demonstration of Stage 5 or Stage 6 in practice. Watching someone write a specification, have an agent generate tests from it, implement against those tests, and commit a clean diff is a qualitatively different experience from struggling with a chat window. Many developers have not seen this. Most resources on “how to use AI for coding” describe Stage 2 or Stage 3 workflows.

This guide exists to close that gap. The four prompting disciplines describe the skill layers that correspond to these stages and what shifts when agents run autonomously.

How the Bottleneck Shifts Across Stages

StageWhere value is generatedWhat limits it
AutocompleteBoilerplate speedModel cannot infer intent for complex logic
Function generationSelf-contained tasksManual integration; scope ceiling
Chat-driven developmentExploration, diagnosisContext degradation; manual integration
Agentic task completionMulti-file executionVague requirements cause drift; review is hard
Spec-first agenticPredictable, testable outputHuman review cannot keep up with generation rate
Multi-agent architectureFull pipeline throughputSpecification quality; agent orchestration design

Each stage resolves the previous stage’s bottleneck and reveals the next one. Developers who skip stages - for example, moving straight from function generation to multi-agent architecture without spec-first discipline - find that automation amplifies the problems they skipped. An agent generating changes faster than specs can be written, or a reviewer agent validating against specifications that were never written, produces worse outcomes than a slower, more manual process. Skipping is tempting because the later tooling looks impressive. It does not work without the earlier discipline.

Starting from Where You Are

Three questions locate you on the curve:

  1. What does agent output require before it can be committed? Minimal cleanup (Stage 1-2), significant rework (Stage 3-4), or the pipeline decides (Stage 5-6)?
  2. Does every agent task start from a written specification? If not, you are at Stage 4 or below regardless of what tools you use.
  3. Who reviews agent-generated changes? If the answer is always a human reading every diff, you have not yet addressed the Stage 5 throughput ceiling.

Many developers using AI coding tools are at Stage 1 or 2. Many concluded from an early Stage 2 failure that the ceiling is low and moved on. If you are at Stage 1 or 2 and feel like AI is only useful for simple work, the problem is almost certainly the workflow, not the technology.

If you are at Stage 1 or 2: The highest-leverage move is hands-on exposure to an agentic tool at Stage 4. Give the agent access to your codebase - let it read files, run tests, and produce a diff for a small task. The experience of watching an agent navigate a codebase is qualitatively different from receiving function output in a chat window. See Small-Batch Sessions for how to structure small, low-risk tasks that demonstrate what is possible without exposing the full codebase to an unguided agent.

If you are at Stage 3 or 4: The highest-leverage move is writing a specification before giving any task to an agent. One paragraph describing intent, one scenario describing the expected behavior, and one constraint listing what must not change. Even an informal spec at this level produces dramatically better output and easier review than a vague task description.

If you are at Stage 5: Measure your review queue. If agent-generated changes accumulate faster than they are reviewed, you have hit the throughput ceiling. Expert reviewer agents are the next step.

The AI Adoption Roadmap covers the organizational prerequisites that must be in place before accelerating through the later stages. The curve above describes an individual developer’s progression; the roadmap describes what the team and pipeline need to support it.


Content contributed by Bryan Finster

3 - The Four Prompting Disciplines

Four layers of skill that developers must master as AI moves from a chat partner to a long-running worker - and what changes when agents run autonomously.

Most guidance on “prompting” describes Discipline 1: writing clear instructions in a chat window. That is table stakes. Developers working at Stage 5 or 6 of the agentic learning curve operate across all four disciplines simultaneously. Each discipline builds on the one below it.

1. Prompt Craft (The Foundation)

Synchronous, session-based instructions used in a chat window.

Prompt craft is now considered table stakes, the equivalent of fluent typing. It does not differentiate. Every developer using AI tools will reach baseline proficiency here. The skill is necessary but insufficient for agentic workflows.

Key skills:

  • Writing clear, structured instructions
  • Including examples and counter-examples
  • Setting explicit output formats and guardrails
  • Defining how to resolve ambiguity so the model does not guess

Where it maps on the learning curve: Stages 1-2. Developers at these stages optimize prompt craft and assume that is the ceiling. It is not.

2. Context Engineering

Curating the entire information environment (the tokens) the agent operates within.

Context engineering is the difference between a developer who writes better prompts and a developer who builds better scaffolding so the agent starts with everything it needs. The 10x performers are not writing cleverer instructions. They are assembling better context.

Key skills:

Where it maps on the learning curve: Stage 3-4. The transition from chat-driven development to agentic task completion is driven by context engineering. The agent that navigates the codebase with the right context outperforms the agent that receives pasted excerpts in a chat window.

Where it shows up in ACD: The orchestrator assembles context for each session (Coding & Review Setup). The /start-session skill encodes context assembly order. Prompt caching depends on placing stable context before dynamic content (Tokenomics).

3. Intent Engineering

Encoding organizational purpose, values, and trade-off hierarchies into the agent’s operating environment.

Intent engineering tells the agent what to want, not just what to know. An agent given context but no intent will make technically defensible decisions that miss the point. Intent engineering defines the decision boundaries the agent operates within.

Key skills:

  • Telling the agent what to optimize for, not just what to build
  • Defining decision boundaries (for example: “Optimize for customer satisfaction over resolution speed”)
  • Establishing escalation triggers: conditions under which the agent must stop and ask a human instead of deciding autonomously

Where it maps on the learning curve: The transition from Stage 4 to Stage 5. At Stage 4, vague requirements cause drift because the agent fills in intent from its own assumptions. Intent engineering makes those assumptions explicit.

Where it shows up in ACD: The Intent Description artifact is the formalized version of intent engineering. It sits at the top of the artifact authority hierarchy because intent governs every downstream decision.

4. Specification Engineering (The New Ceiling)

Writing structured documents that agents can execute against over extended timelines.

Specification engineering is the skill that separates Stage 5-6 developers from everyone else. When agents run autonomously for hours, you cannot course-correct in real time. The specification must be complete enough that an independent executor can reach the right outcome without asking questions.

Key skills:

  • Self-contained problem statements: Can the task be solved without the agent fetching additional information?
  • Acceptance criteria: Writing three sentences that an independent observer could use to verify “done”
  • Decomposition: Breaking a multi-day project into small subtasks with clear boundaries (see Work Decomposition)
  • Evaluation design: Creating test cases with known-good outputs to catch model regressions

Where it maps on the learning curve: Stage 5-6. Specification engineering is what makes spec-first agentic development and multi-agent architecture possible.

Where it shows up in ACD: The agent delivery contract are the output of specification engineering. The agent-assisted specification workflow is how agents help produce them. The discovery loop shows how to get from a vague idea to a structured specification through conversation, and the complete specification example shows what the finished output looks like.

From Synchronous to Autonomous

Because you cannot course-correct an agent running for hours in real time, you must front-load your oversight. The skill shift looks like this:

Synchronous skills (Stages 1-3)Autonomous skills (Stages 5-6)
Catching mistakes in real timeEncoding guardrails before the session starts
Providing context when askedSelf-contained problem statements
Verbal fluency and quick iterationCompleteness of thinking and edge-case anticipation
Fixing it in the next chat turnStructured specifications with acceptance criteria

This is not a different toolset. It is the same work, front-loaded. Every minute spent on specification saves multiples in review and rework.

The Self-Containment Test

To practice the shift, take a request like “Update the dashboard” and rewrite it as if the recipient:

  1. Has never seen your dashboard
  2. Does not know your company’s internal acronyms
  3. Has zero access to information outside that specific text

If the rewritten request still makes sense and can be acted on, it is ready for an autonomous agent. If it cannot, the missing information is the gap between your current prompt and a specification. This is the same test agent-assisted specification applies: can the agent implement this without asking a clarifying question?

The Planner-Worker Architecture

Modern agents use a planner model to decompose your specification into a task log, and worker models to execute each task. Your job is to provide the decomposition logic - the rules for how to split work - so the planner can function reliably. This is the orchestrator pattern at its core: the orchestrator routes work to specialized agents, but it can only route well when the specification is structured enough to decompose.

Organizational Impact

Practicing specification engineering has effects beyond agent workflows:

  • Tighter communication. Writing self-contained specifications forces you to surface hidden assumptions and unstated disagreements. Memos get clearer. Decision frameworks get sharper.
  • Reduced alignment issues. When specifications are explicit enough for an agent to execute, they are explicit enough for human team members to align on. Ambiguity that would surface as a week-long misunderstanding surfaces during the specification review instead.
  • Agent-readable documentation. Documentation that is structured enough for an AI agent to consume is also more useful for human onboarding. Making your knowledge base agent-readable improves it for everyone.

4 - AI Adoption Roadmap

A guide for incorporating AI into your delivery process safely - remove friction and add safety before accelerating with AI coding.

AI adoption stress-tests your organization. AI does not create new problems. It reveals existing ones faster. Teams that try to accelerate with AI before fixing their delivery process get the same result as putting a bigger engine in a car with no brakes. This page provides the recommended sequence for incorporating AI safely, mirroring the brownfield migration phases.

Before You Add AI: A Decision Framework

Not every problem warrants an AI-based solution. The decision tree below is a gate, not a funnel. Work through each question in order. If you can resolve the need at an earlier step, stop there.

graph TD
    A["New capability or automation need"] --> B{"Is the process as simple as possible?"}
    B -->|No| C["Optimize the process first"]
    B -->|Yes| D{"Can existing system capabilities do it?"}
    D -->|Yes| E["Use them"]
    D -->|No| F{"Can a deterministic component do it?"}
    F -->|Yes| G["Build it"]
    F -->|No| H{"Does the benefit of AI exceed its risk and cost?"}
    H -->|Yes| I["Try an AI-based solution"]
    H -->|No| J["Do not automate this yet"]

If steps 1-3 were skipped, step 4 is not available. An AI solution applied to a process that could be simplified, handled by existing capabilities, or replaced by a deterministic component is complexity in place of clarity.

The Key Insight

The sequence matters: remove friction and add safety before you accelerate. AI amplifies whatever system it is applied to - strong process gets faster, broken process gets more broken, faster.

The Progression

graph LR
    P1["Quality Tools"] --> P2["Clarify Work"]
    P2 --> P3["Harden Guardrails"]
    P3 --> P4["Reduce Delivery Friction"]
    P4 --> P5["Accelerate with AI"]

    style P1 fill:#e8f4fd,stroke:#1a73e8
    style P2 fill:#e8f4fd,stroke:#1a73e8
    style P3 fill:#fce8e6,stroke:#d93025
    style P4 fill:#fce8e6,stroke:#d93025
    style P5 fill:#e6f4ea,stroke:#137333

Quality Tools, Clarify Work, Harden Guardrails, Remove Friction, then Accelerate with AI.

Quality Tools

Brownfield phase: Assess

Before using AI for anything, choose models and tools that minimize hallucination and rework. Not all AI tools are equal. A model that generates plausible-looking but incorrect code creates more work than it saves.

What to do:

  • Choose based on accuracy, not speed. A tool with a 20% error rate carries a hidden rework tax on every use. If rework exceeds 20% of generated output, the tool is a net negative.
  • Use models with strong reasoning capabilities for code generation. Smaller, faster models are appropriate for autocomplete and suggestions, not for generating business logic.
  • Establish a baseline: measure how much rework AI-generated code requires before and after changing tools.

What this enables: AI tooling that generates correct output more often than not. Subsequent steps build on working code rather than compensating for broken code.

Clarify Work

Brownfield phase: Assess / Foundations

Use AI to improve requirements before code is written, not to write code from vague requirements. Ambiguous requirements are the single largest source of defects (see Systemic Defect Fixes), and AI can detect ambiguity faster than manual review.

What to do:

  • Use AI to review tickets, user stories, and acceptance criteria before development begins. Prompt it to identify gaps, contradictions, untestable statements, and missing edge cases.
  • Use AI to generate test scenarios from requirements. If the AI cannot generate clear test cases, the requirements are not clear enough for a human either.
  • Use AI to analyze support tickets and incident reports for patterns that should inform the backlog.

What this enables: Higher-quality inputs to the development process. Developers (human or AI) start with clear, testable specifications rather than ambiguous descriptions that produce ambiguous code. The four prompting disciplines describe the skill progression that makes this work at scale.

Harden Guardrails

Brownfield phase: Foundations / Pipeline

Before accelerating code generation, strengthen the safety net that catches mistakes. This means both product guardrails (does the code work?) and development guardrails (is the code maintainable?).

Product and operational guardrails:

  • Automated test suites with meaningful coverage of critical paths
  • Deterministic CD pipelines that run on every commit
  • Deployment validation (smoke tests, health checks, canary analysis)

Development guardrails:

  • Code style enforcement (linters, formatters) that runs automatically
  • Architecture rules (dependency constraints, module boundaries) enforced in the pipeline
  • Security scanning (SAST, dependency vulnerability checks) on every commit

What to do:

  • Audit your current guardrails. For each one, ask: “If AI generated code that violated this, would our pipeline catch it?” If the answer is no, fix the guardrail before expanding AI use.
  • Add contract tests at service boundaries. AI-generated code is particularly prone to breaking implicit contracts between services.
  • Ensure test suites run in under ten minutes. Slow tests create pressure to skip them, which is dangerous when code is generated faster.

What this enables: A safety net that catches mistakes regardless of who (or what) made them. The pipeline becomes the authority on code quality, not human reviewers. See Pipeline Enforcement and Expert Agents for how these guardrails extend to ACD.

Reduce Delivery Friction

Brownfield phase: Pipeline / Optimize

Remove the manual steps, slow processes, and fragile environments that limit how fast you can safely deliver. These bottlenecks exist in every brownfield system and they become acute when AI accelerates the code generation phase.

What to do:

  • Remove manual approval gates that add wait time without adding safety (see Replacing Manual Validations).
  • Fix fragile test and staging environments that cause intermittent failures.
  • Shorten branch lifetimes. If branches live longer than a day, integration pain will increase as AI accelerates code generation.
  • Automate deployment. If deploying requires a runbook or a specific person, it is a bottleneck that will be exposed when code moves faster.

What this enables: A delivery pipeline where the time from “code complete” to “running in production” is measured in minutes, not days. AI-generated code flows through the same pipeline as human-generated code with the same safety guarantees.

Accelerate with AI

Brownfield phase: Optimize / Continuous Deployment

Now - and only now - expand AI use to code generation, refactoring, and autonomous contributions. The guardrails are in place. The pipeline is fast. Requirements are clear. The outcome of every change is deterministic regardless of whether a human or an AI wrote it.

What to do:

  • Use AI for code generation with the specification-first workflow described in the ACD workflow. Define test scenarios first, let AI generate the test code (validated for behavior focus and spec fidelity), then let AI generate the implementation.
  • Use AI for refactoring: extracting interfaces, reducing complexity, improving test coverage. These are high-value, low-risk tasks where AI excels. Well-structured, well-named code also reduces the token cost of every subsequent AI interaction - see Tokenomics: Code Quality as a Token Cost Driver.
  • Use AI to analyze incidents and suggest fixes, with the same pipeline validation applied to any change.

What this enables: AI-accelerated development where the speed increase translates to faster delivery, not faster defect generation. The pipeline enforces the same quality bar regardless of the author. See Pitfalls and Metrics for what to watch for and how to measure progress.

Mapping to Brownfield Phases

AI Adoption StageBrownfield PhaseKey Connection
Quality ToolsAssessUse the current-state assessment to evaluate AI tooling alongside delivery process gaps
Clarify WorkAssess / FoundationsAI-generated test scenarios from requirements feed directly into work decomposition
Harden GuardrailsFoundations / PipelineThe testing fundamentals and pipeline gates are the same work, with AI-readiness as additional motivation
Reduce Delivery FrictionPipeline / OptimizeReplacing manual validations unblocks AI-speed delivery
Accelerate with AIOptimize / CDThe agent delivery contract become the delivery contract once the pipeline is deterministic and fast

Content contributed by Bryan Finster.