Agent-Assisted Specification
13 minute read
The specification stages of the ACD workflow (Intent Description, User-Facing Behavior, Feature Description, and Acceptance Criteria) ask humans to define intent, behavior, constraints, and acceptance criteria before any code generation begins. This page explains how agents accelerate that work and why the effort stays small.
The Pattern
Every use of an agent in the specification stages follows the same four-step cycle:
- Human drafts - write the first version based on your understanding
- Agent critiques - ask the agent to find gaps, ambiguity, or inconsistency
- Human decides - accept, reject, or modify the agent’s suggestions
- Agent refines - generate an updated version incorporating your decisions
This is not the agent doing specification for you. It is the agent making your specification more thorough than it would be without help, in less time than it would take without help. The sections below show how this cycle applies at each specification stage.
This Is Not Big Upfront Design
The specification stages look heavy if you imagine writing them for an entire feature set. That is not what happens.
You specify the next single unit of work. One thin vertical slice of functionality - a single scenario, a single behavior. A user story may decompose into multiple such units worked in parallel across services. The scope of each unit stays small because continuous delivery requires it: every change must be small enough to deploy safely and frequently. A detailed specification for three months of work does not reduce risk - it amplifies it. Small-scope specification front-loads clarity on one change and gets production feedback before specifying the next.
If your specification effort for a single change takes more than 15 minutes, the change is too large. Split it.
How Agents Help with the Intent Description
The intent description does not need to be perfect on the first draft. Write a rough version and use an agent to sharpen it.
Ask the agent to find ambiguity. Give it your draft intent and ask it to identify anything vague, any assumption that a developer might interpret differently than you intended, or any unstated constraint.
Example prompt:
Here is the intent description for my next change. Identify any
ambiguity, unstated assumptions, or missing context that could
lead to an implementation that technically satisfies this description
but does not match what I actually want.
[paste intent description]Ask the agent to suggest edge cases. Agents are good at generating boundary conditions you might not think of, because they can quickly reason through combinations.
Ask the agent to simplify. If the intent covers too much ground, ask the agent to suggest how to split it into smaller, independently deliverable changes.
Ask the agent to sharpen the hypothesis. If the intent includes a hypothesis (“We believe X will produce Y because Z”), the agent can pressure-test it before any code is written.
Example prompt:
Review this hypothesis. Is the expected outcome measurable with data
we currently collect? Is the causal reasoning plausible? What
alternative explanations could produce the same outcome without this
change being the cause?
[paste intent description with hypothesis]A weak hypothesis - one with an unmeasurable outcome or implausible causal link - will not produce useful feedback after deployment. Catching that now costs a prompt. Catching it after implementation costs a cycle.
The human still owns the intent. The agent is a sounding board that catches gaps before they become defects.
How Agents Help with User-Facing Behavior
Writing BDD scenarios from scratch is slow. Agents can draft them and surface gaps you would otherwise miss.
Generate initial scenarios from the intent. Give the agent your intent description and ask it to produce Gherkin scenarios covering the expected behavior.
Example prompt:
Based on this intent description, generate BDD scenarios in Gherkin
format. Cover the primary success path, key error paths, and edge
cases. For each scenario, explain why it matters.
[paste intent description]Review for completeness, not perfection. The agent’s first draft will cover the obvious paths. Your job is to read through them and ask: “What is missing?” The agent handles volume. You handle judgment.
Ask the agent to find gaps. After reviewing the initial scenarios, ask the agent explicitly what scenarios are missing.
Example prompt:
Here are the BDD scenarios for this feature. What scenarios are
missing? Consider boundary conditions, concurrent access, failure
modes, and interactions with existing behavior.
[paste scenarios]Ask the agent to challenge weak scenarios. Some scenarios may be too vague to constrain an implementation. Ask the agent to identify any scenario where two different implementations could both pass while producing different user-visible behavior.
The human decides which scenarios to keep. The agent ensures you considered more scenarios than you would have on your own.
How Agents Help with the Feature Description and Acceptance Criteria
The Feature Description and Acceptance Criteria stages define the technical boundaries: where the change fits in the system, what constraints apply, and what non-functional requirements must be met.
Ask the agent to suggest architectural considerations. Give it the intent, the BDD scenarios, and a description of the current system architecture. Ask what integration points, dependencies, or constraints you should document.
Example prompt:
Given this intent and these BDD scenarios, what architectural
decisions should I document before implementation begins? Consider
where this change fits in the existing system, what components it
touches, and what constraints an implementer needs to know.
Current system context: [brief architecture description]Ask the agent to draft non-functional acceptance criteria. Agents can suggest performance thresholds, security requirements, and resource limits based on the type of change and its context.
Example prompt:
Based on this feature description, suggest non-functional acceptance
criteria I should define. Consider latency, throughput, security,
resource usage, and operational requirements. For each criterion,
explain why it matters for this specific change.
[paste feature description]Ask the agent to check consistency. Once you have the intent, BDD scenarios, feature description, and acceptance criteria, ask the agent to identify any contradictions or gaps between them.
The human makes the architectural decisions and sets the thresholds. The agent makes sure you did not leave anything out.
Validating the Complete Specification Set
The four specification stages produce four artifacts: intent description, user-facing behavior (BDD scenarios), feature description (constraint architecture), and acceptance criteria. Each can look reasonable in isolation but still conflict with the others. Before moving to test generation and implementation, validate them as a set.
Use an agent as a specification reviewer. Give it all four artifacts and ask it to check for internal consistency.
Review these four specification artifacts for internal consistency
before implementation begins. Check:
- Clarity: is the intent unambiguous? Could it be read differently by two developers?
- Testability: does every BDD scenario have clear, observable outcomes?
- Scope: does the feature description constrain the implementation to what the intent requires, without over-engineering?
- Terminology: are the same concepts named consistently across all four artifacts?
- Completeness: are there behaviors implied by the intent that have no corresponding BDD scenario?
- Conflict: does anything in one artifact contradict anything in another?
- Hypothesis: if the intent includes a hypothesis, is there a corresponding validation path? Can the predicted outcome be measured after deployment?
[paste all four artifacts]The human gates on this review before implementation begins. If the review agent identifies issues, resolve them before generating any test code or implementation. A conflict caught in specification costs minutes. The same conflict caught during implementation costs a session.
This review is not a bureaucratic checkpoint. It is the last moment where the cost of a change is near zero. After this gate, every issue becomes more expensive to fix.
The Discovery Loop: From Conversation to Specification
The prompts above work well when you already know what to specify. When you do not, you need a different starting point. Instead of writing a draft and asking the agent to critique it, treat the agent as a principal architect who interviews you to extract context you did not know was missing.
This is the shift from “order taker” to “architectural interview.” The sections above describe what to do at each specification stage. The discovery loop describes how to get there through conversation when you are starting from a vague idea.
Phase 1: Initial Framing (Intent)
Describe the outcome, not the application. Set the agent’s role and the goal of the conversation explicitly.
I want to build a Software Value Stream Mapping application. Before we
write a single line of code, I want you to act as a Principal Architect.
Your goal is to help me write a self-contained specification that an
autonomous agent can execute. Do not start writing the spec yet. First,
interview me to uncover the technical implementation details, edge cases,
and trade-offs I have not considered.This prompt does three things: it states intent, it assigns a role that produces the right kind of questions, and it prevents the agent from jumping to implementation.
Even at this early stage, include a rough hypothesis about what outcome you expect: “I believe this tool will reduce the time teams spend on manual value stream analysis by 80%.” The hypothesis does not need to be precise yet - the discovery interview will sharpen it - but stating one early forces you to think about measurable outcomes from the start.
Phase 2: Deep-Dive Interview (Context)
Let the agent ask three to five high-signal questions at a time. The goal is to surface the implicit knowledge in your head: domain definitions, data schemas, failure modes, and trade-off preferences.
What the agent should ask: “How are we defining Lead Time versus Cycle Time for this specific organization? What is the schema of the incoming JSON? How should the system handle missing data points?”
Your role: Answer with as much raw context as possible. Do not worry about formatting. Get the “why” and “how” out. The agent will structure it later.
This is context engineering in practice: you are building the information environment the specification will formalize.
Phase 3: Drafting (Specification)
Once the agent has enough context, ask it to synthesize the conversation into a structured specification.
Based on our discussion, generate the first draft of the specification
document. Structure it as: Intent Description, User-Facing Behavior
(BDD scenarios), Feature Description (architectural constraints),
Task Decomposition, and Acceptance Criteria (including evaluation
design with test cases). Ensure the Task Decomposition follows a
planner-worker pattern where tasks are broken into sub-two-hour chunks.The sections map to the agent delivery contract and the specification engineering skill set. The agent drafts. You review using the same four-step cycle described at the top of this page.
Phase 4: Stress-Test Review
Before finalizing, ask the agent to find gaps in its own output.
Critique this specification. Where would a junior developer or an
autonomous agent get confused? What constraints are still too vague?
What edge cases are missing from the evaluation design?This is the same validation step as the specification consistency check, applied to the discovery loop’s output.
How This Differs from Turn-by-Turn Prompting
| Step | Turn-by-turn prompting | Discovery loop |
|---|---|---|
| Beginning | Write a long prompt and hope for the best | State a high-level goal and ask to be interviewed |
| Development | Fix the agent’s code mistakes turn by turn | Fix the specification until it is agent-proof |
| Quality | Eyeball the result | Define evaluation design (test cases) up front |
| Hand-off | Copy-paste code into the editor | Hand the specification to a long-running worker |
The discovery loop front-loads the work where it is cheapest: in conversation, before any code exists.
During long discovery conversations, ask the agent to maintain a running context log of key decisions. This prevents core decisions from getting lost in the middle of the context window as the conversation grows. The context log becomes the raw material for Phase 3.
The complete specification example below shows the output this workflow produces.
Complete Specification Example
The four specification stages produce concise, structured documents. The example below shows what a complete specification looks like when all four disciplines from The Four Prompting Disciplines are applied. This is a real-scale example, not a simplified illustration.
Notice what makes this specification agent-executable: every section is self-contained, acceptance criteria are verifiable by an independent observer, the decomposition defines clear module boundaries, and test cases include known-good outputs.
What to notice:
- Self-contained: An agent receiving only this document can implement without asking clarifying questions. That is the self-containment test.
- Decomposed with boundaries: Each module has explicit inputs and outputs. An orchestrator can route each module to a separate agent session (see Small-Batch Sessions).
- Acceptance criteria are observable: Each criterion describes a user-visible outcome, not an internal implementation detail. These map directly to Acceptance Criteria.
- Test cases include expected outputs: The evaluation design gives the agent known-good results to verify against, which is the specification engineering skill of evaluation design.
Related Content
- The ACD Workflow - the full workflow these tips support
- Agent Delivery Contract - detailed definitions of each artifact
- The Four Prompting Disciplines - the skill framework that produces specifications like the example above
- Small Batches - why changes must stay small enough for frequent, safe deployment
- Hypothesis-Driven Development - the lifecycle for forming, testing, and validating hypotheses