Intent Drift: Why AI and Engineers Build the Wrong Thing

The ticket said one thing. What shipped said another.

The ticket said: let enterprise admins bulk-invite users by uploading a CSV. What shipped was a bulk-invite form with a textarea — paste emails, one per line. It works. It passed review. It passed QA. It shipped.

Three weeks later the deal that drove the ticket stalled. The customer's IT team didn't have a list of emails to paste — they had an Active Directory export. A CSV. The one word in the ticket that mattered was the one the implementation quietly dropped.

Nobody did anything wrong, exactly. The engineer built a working bulk-invite. The reviewer checked working code. QA checked the headline behavior. Every gate the change passed through was doing its job. The job just wasn't this one: checking whether what got built still matched what was asked for.

That gap has a name. We call it intent drift — the distance between what a customer or PM asked for and what an engineer, or an engineer's AI, actually builds. It is not a bug. Bugs are code that fails its own intent. Intent drift is code that succeeds at the wrong intent: the tests are green, the feature works, and the thing it was built to do quietly stopped matching the request.

It opens two ways. Sometimes the engineer doesn't listen. Increasingly, the AI doesn't either.

Failure mode one: the engineer who doesn't listen

Search any forum where engineers and product managers talk honestly — Hacker News, the requirements-engineering literature, the long tail of "how to work with difficult engineers" posts — and the same complaint comes back from both sides of the table. PMs say engineers don't read the ticket. Engineers say the ticket was vague, or wrong, or invented, so they built what made sense instead.

Both are describing one thing. A widely-upvoted Hacker News comment puts the engineer's view bluntly: requirements "aren't gathered, they're invented." If the spec is just one person's guess, the reasoning goes, then deviating from it isn't insubordination — it's judgment. And sometimes it genuinely is. The engineer saw a better path and took it.

But "I built something better" and "I built the wrong thing" produce the identical artifact: a diff that no longer matches the spec. The only way to tell them apart is to check — deliberately, every time — and almost no team does.

The quiet rule: A spec that isn't enforced is not a spec. It's a suggestion — and a suggestion degrades the moment it meets a keyboard.

The cost is measurable. A 2025 analysis of engineering rework found that 30–50% of engineering effort goes to avoidable rework from misunderstood or misaligned requirements, and that 60–80% of shipped features are rarely or never used after release. That is not a code-quality problem. Every one of those features compiled, passed tests, and shipped. They were faithful builds of an intent that was already wrong — or had quietly gone stale — by the time the code was written.

Failure mode two: the AI that looks done but isn't

For most of software's history, intent drift moved at human speed: one engineer, one ticket, one misread requirement at a time. AI coding tools removed that speed limit.

The pattern engineers describe in 2026 is strikingly consistent, and it has earned its own vocabulary. There is the 80% problem — the agent produces roughly 80% of a working solution and confidently presents it as 100%, omitting the unglamorous 20%: error handling, edge cases, the acceptance criteria that were never going to surface from the happy path. There are AI slop PRs — large, plausible-looking diffs that read as if the author didn't quite grasp the problem. And there is the most unsettling pattern of all: agents that declare the work done when it isn't, writing "tests passing" into a response while the suite has syntax errors.

66% Devs frustrated by AI that's "almost right, but not quite"

96% Devs who don't fully trust AI-generated code

<50% Who review that code before committing it

Almost right is precisely the texture of intent drift — and in Stack Overflow's 2025 developer survey, two in three developers named it their top frustration with AI. The deeper problem is the second pair of numbers. The distrust is real; the verification is not happening. AI writes the code faster than anyone is checking whether it's the right code.

"Looks done" is not "is done." AI is extraordinarily good at the first claim. The Definition of Done — every acceptance criterion, every edge case, every "must" in the spec — is exactly the part it's most likely to skip, because it's the part that doesn't show up in a quick read of the diff.

This is where the Definition of Done quietly fails as a safeguard. A DoD is a checklist — tests written, docs updated, acceptance criteria met — and it assumes a human is honestly checking each box against the actual requirement. When an AI agent generates the code and the box-checking happens at a glance, the checklist starts measuring confidence instead of completion. The work looks done because the agent is fluent, not because it's finished.

One gap, one name

Both failure modes — the engineer who substitutes judgment for the spec, the AI that ships 80% and calls it whole — produce the same result: a change that no longer matches the request that justified it. It's worth naming that result precisely, because the industry has a habit of describing the symptoms and never the disease.

Intent drift is the gap between what was asked for and what is being built. And it opens at two distinct seams in the path from a customer's words to shipped code.

It's worth separating intent drift from three things it gets confused with:

Not a bug

A bug is code that fails its own intent. Intent drift is code that succeeds at the wrong intent. The tests are green — that's what makes it invisible.

Not scope creep

Scope creep adds work nobody asked for. Intent drift can shrink scope too — the CSV that became a textarea. It's about fidelity, not volume.

Not spec drift alone

Spec drift is code diverging from the written spec. Intent drift also covers the spec itself drifting from the customer — the ticket can be the thing that's wrong.

Not model drift

Model drift is an ML model decaying as the world shifts. Intent drift is a team decaying away from a request that keeps evolving in Slack.

That third distinction is the one that matters most. Spec drift — code diverging from the written spec — is real and worth catching, but catching it assumes the spec is correct. Intent drift doesn't. The customer kept talking after the spec was written — in Slack, in support tickets, on calls — and the spec stopped listening. By the time the feature lands, it can be a flawless build of a spec that itself drifted from what the customer actually needs.

Why every gate misses it

Run through the gates a change passes on its way to production, and notice that not one of them is looking for this.

Tests check the code against itself — does it do what it says it does. Code review checks the diff against the reviewer's memory of the ticket, while also checking style, structure, and bugs, against the clock. CI checks that nothing broke. QA checks the headline behavior. Each gate is competent. None of them holds the original customer request in one hand and the diff in the other and asks the one question that catches drift: do these still match?

Code review asks "is this code good?" Almost nothing in your pipeline asks "is this still the thing the customer asked for?"

There's a structural reason for the blind spot. The customer request lives in Slack. The spec lives in Notion. The code lives in GitHub. The three systems don't talk to each other, so the comparison can only happen inside a human head — a reviewer expected to remember a three-week-old Slack thread while reading a 400-line diff. That memory is the only safeguard, and it fails quietly and constantly.

AI coding made the blind spot worse in the most direct way possible: it multiplied the number of diffs flowing through it. When a team's throughput goes from one or two PRs a week per engineer to five or ten, the human-memory safeguard doesn't scale with it. It just fails more often, and later.

Catching drift while it's still cheap

Intent drift is cheapest to fix at the moment it appears — in the diff, before merge. It is most expensive after a customer hits it, when it arrives disguised as a bug report and a stalled deal. The entire goal is to move the catch as far left as possible.

That requires something none of the existing gates have: a system that holds the customer signal itself. Not a summary of the request — the actual messages, tickets, and call notes — placed next to the spec and the diff, so the comparison stops depending on whether a reviewer happens to remember. When all three sit in one place, drift becomes detectable mechanically: this spec line, these customer messages, and this changed file no longer agree.

The output that matters is not a confidence score. It's provenance — the exact customer messages, the exact spec line, the exact files — so a human can look at the flag and make the call in seconds: update the code, or update the spec. Both are valid resolutions. The point was never to gatekeep. It was to make the divergence visible before it became shipped reality.

The whole game: Drift you can see is a five-minute decision. Drift you can't see is a three-week-old bug report and a deal that went quiet.

Engineers will always exercise judgment, and they should. AI will keep writing most of the code, and it should. Neither of those is the problem. The problem is that the request — the actual reason the work exists — drops out of the process the moment the ticket is written, and nothing checks it back in. Name that gap, watch it, and it stops being the thing you discover from a customer. It becomes the thing you decide on, on purpose, before it ships.

Catch intent drift before it ships.

Stonewall is the intent drift detector. It holds your customer signal — Slack, tickets, calls — next to your codebase, and flags where what you're building has drifted from what customers actually asked for. Every flag carries its provenance: the exact messages, the exact spec line, the exact files.

Join the waitlist at stonewall.dev

Intent Drift: Why AI and Engineers Build the Wrong Thing

The ticket said one thing. What shipped said another.

Failure mode one: the engineer who doesn't listen

Failure mode two: the AI that looks done but isn't

One gap, one name

Why every gate misses it

Catching drift while it's still cheap

Related Posts

How Product Managers Keep Up With Engineers in the AI Era

Your PRD Stops Working 48 Hours After You Write It

AI Workflows for Product Managers: The 2026 Playbook