stonewall.dev
Back to Blog
AI claude-code cursor specs vibe-coding

AI Agents Need Better Specs, Not Better Models

Stonewall · · 8 min read

Your Agent Is Smart. Your Spec Is the Problem.

You gave Claude Code a one-paragraph description of a feature. It wrote 400 lines of working code in three minutes. The code compiles. The tests pass. The feature doesn't do what you wanted.

Not because the model is bad. Because you gave it bad instructions.

This happens constantly. Engineers blame the model — "it hallucinated," "it went off track," "it didn't understand the requirement." And sometimes that's true. But more often, the real failure is upstream: the spec was vague, the context was missing, and the agent filled in the gaps with reasonable assumptions that happened to be wrong.

The model is the most capable part of your stack. The spec — the thing that tells it what to build — is the weakest.

The Vibe Coding Trap

"Vibe coding" — Andrej Karpathy's term for building software by describing what you want conversationally — captured something real. It IS possible to build working software by talking to an AI agent. For prototypes, side projects, and exploratory work, it's remarkable.

But vibe coding has a failure mode that nobody talks about: it works brilliantly for the first 80% and catastrophically for the last 20%.

The vibe coding failure curve: First 80% feels like magic. The agent builds fast, fills gaps, makes good guesses. Last 20% is where ambiguity compounds — edge cases, integration points, error states. The agent guesses wrong, and you spend more time correcting than you saved.

The first 80% works because the agent's assumptions align with your intent. You say "add a user profile page" and it builds something reasonable. The data model is close enough. The API shape is close enough. The UI is close enough.

The last 20% is where the ambiguity lives. What happens when the user hasn't uploaded a photo? Which fields are required? Can the user change their email or only their display name? What's the error state when the API fails? Does this page respect role permissions?

Each of these questions has a correct answer that the agent can't derive from "add a user profile page." So it guesses. And every guess is a coin flip between "exactly what you wanted" and "30 minutes of debugging to understand why it did that."

Context Is the Real Input, Not the Prompt

The AI community spent 2024-2025 obsessing over prompt engineering. Better prompts, longer prompts, chain-of-thought prompts. And prompt quality matters. But the real leverage isn't in how you phrase the request — it's in what information the agent has access to when it processes the request.

An agent with a vague prompt and full codebase context will outperform an agent with a perfect prompt and no context. Every time.

41% New code is AI-generated
84% Devs using AI tools
~0% Have structured spec context

Here's what your AI coding agent actually needs to build the right thing:

What exists. The current data models, API surface, component library, and architectural patterns. Not "we use React" — the specific components, the specific hooks, the specific patterns your team follows.

What's intended. The acceptance criteria, edge cases, error states, and scoping decisions. Not "add notifications" — which notification channels, what triggers them, who receives them, what the fallback is when delivery fails.

What's decided. The questions that came up during planning and how they were resolved. Not just the current answer, but the reasoning — so the agent can make consistent decisions when it encounters adjacent ambiguity.

What's adjacent. The other features in progress that might conflict, the shared infrastructure being modified, the upcoming changes that affect the same code paths.

No current workflow gives the agent all four. Most give it one — the code — and expect it to infer the rest.

The CLAUDE.md Pattern (And Its Limits)

Smart teams have started writing extensive context files for their AI agents. A Meta staff engineer maintains a 300-line claude.md as institutional memory. Teams commit coding standards, architectural decisions, and project context into files that agents read at the start of every session.

This is a step in the right direction. It's also hitting its limits.

A static context file can describe your codebase conventions. It can't describe what you're building this week. It can't tell the agent which spec section is relevant to the current task. It can't update when the spec changes mid-implementation. It's a photograph of your project — useful, but frozen in time.

A CLAUDE.md file is a photograph of your project. A living spec is a live feed. The agent needs both — but only one stays current.

The teams getting the best results from AI agents are the ones that bridge this gap. They don't just give the agent code context — they give it spec context. The requirements, the acceptance criteria, the open questions, the decisions made since the spec was written.

MCP: The Spec-to-Agent Bridge

Model Context Protocol (MCP) is changing how agents access context. Instead of pasting specs into chat windows or committing them as markdown files, MCP lets an agent query a structured knowledge base in real time.

What this looks like in practice: your AI coding agent starts a task. Instead of reading a stale spec file, it queries an MCP server that returns the relevant spec sections, the referenced codebase files, the open Q&A threads, and the dependency context. The spec is live — if someone updated a requirement ten minutes ago, the agent sees the update.

This is the difference between giving someone a printed map and giving them GPS. Both contain spatial information. One updates as reality changes.

The spec becomes the agent's product brain. Not a document it read once — a live system it queries as questions arise during implementation. "Should this endpoint require authentication?" → check the spec. "What's the error response format?" → check the spec. "Are there known edge cases for this input?" → check the spec.

What a Spec-Ready Codebase Looks Like

If you want your AI agents to build the right thing, the spec needs to be structured, connected, and queryable. Here's what that means in practice:

Structured Requirements
Each requirement has an ID, acceptance criteria, priority, and status. The agent can reference specific requirements, not paragraphs.
Code References
The spec knows which files and modules are relevant. The agent doesn't guess where to make changes — the spec tells it.
Resolved Questions
Questions raised during implementation are captured with answers. The agent inherits decisions, not ambiguity.

Acceptance criteria are testable assertions. Not "the user should be able to see their profile" — "GET /api/users/:id returns a 200 with { name, email, picture, createdAt } for authenticated users and 401 for unauthenticated requests." The agent can implement AND verify against this.

Scope is explicit. What's in and what's out. "This feature handles email notifications only. Push notifications are out of scope for this spec." Without explicit scope, the agent defaults to building everything — and you get a 2,000-line PR that includes push notification infrastructure nobody asked for.

Edge cases are enumerated. Not "handle errors gracefully" — "when the email provider returns a 429, retry with exponential backoff up to 3 attempts, then mark the notification as failed and surface it in the admin dashboard." The agent needs this level of specificity for the 20% that isn't obvious.

Stop Upgrading Models. Upgrade Your Specs.

Every quarter, a new model claims the crown. GPT-5. Claude 4.5. Gemini Ultra. The benchmarks improve. The context windows expand. The reasoning gets sharper.

And your agent still builds the wrong feature — because the bottleneck was never the model. The bottleneck is the specification. The quality of the agent's output is bounded by the quality of the agent's input. Always has been, always will be.

The most impactful thing you can do for your AI-assisted development workflow isn't switching models. It's giving the model structured, connected, live specifications that tell it exactly what to build, where to build it, and how to verify it built the right thing.

That's not a model problem. It's a tooling problem. And the tooling is finally catching up.

Give your agent specs it can think with.
Stonewall's MCP server exposes living specs as structured context for Claude Code and Cursor. Relevant sections, referenced files, open questions, dependency context — surgical retrieval, minimal tokens.
Try the free AI PRD writer

Related Posts