stonewall.dev
Back to Blog
drift-detection spec-driven-development GitHub code-quality

What Is Drift Detection? Spec-to-Code Alignment

Stonewall · · 10 min read

The Bug That Isn't a Bug

Your spec says the API returns { userId, name, email }. The PR implements { id, displayName, emailAddress }. The tests pass. The code compiles. The reviewer approves. The PR merges.

Nobody notices for three weeks — until the frontend team builds against the spec and everything breaks. The "bug" was never a bug. It was drift: the code silently diverged from the specification, and nothing in your workflow caught it.

This happens constantly. Not because engineers are careless — because the spec and the code live in different systems with no connection between them. The spec is in Notion. The code is in GitHub. The only bridge is a human who's supposed to remember what the spec said while reviewing a 400-line diff. That bridge fails every time.

Drift detection is the automated layer that catches this divergence. And until recently, it didn't exist for product specifications.

Infrastructure Drift vs. Spec Drift

If you've worked with Terraform or CloudFormation, you know infrastructure drift detection. Your infrastructure-as-code defines what should exist. The drift detector compares that definition against what actually exists. When reality diverges from the declaration, it flags the difference.

Spec drift is the same concept applied to product specifications. Your spec defines what should be built. Your code is what actually gets built. Drift detection compares the two and flags divergence — not after deployment, but at the pull request level, before code merges.

The analogy: Infrastructure drift detection compares Terraform files against cloud resources. Spec drift detection compares product specs against pull requests. Same principle, different layer — and the product layer has been unprotected until now.

The difference matters. Infrastructure drift is binary — a resource either exists or it doesn't, a config value is either correct or wrong. Spec drift is semantic — the PR might implement the spirit of the requirement differently than the letter. A good drift detector doesn't just compare strings. It understands intent.

Three Types of Spec Drift

Not all drift is equal. Understanding the types helps you decide what to flag and what to let through.

Contradictory Drift
The code does the opposite of what the spec says. The spec requires auth on an endpoint; the PR ships it without auth. This is always a flag.
Missing Drift
The spec requires something the PR doesn't implement. An acceptance criterion was skipped, an edge case wasn't handled, a validation was omitted.
Additive Drift
The PR implements something the spec didn't mention. New endpoints, extra fields, bonus features. Sometimes good engineering, sometimes scope creep.

Contradictory drift is the most dangerous — the code actively violates the spec. This is rare but critical when it happens. It usually indicates a miscommunication or an undocumented decision.

Missing drift is the most common. Specs lose about half their requirements within 48 hours of implementation starting. Edge cases get skipped. "Nice to have" acceptance criteria get deprioritized silently. Without drift detection, these omissions are invisible until a customer finds them.

Additive drift is the most nuanced. The engineer saw a better approach and implemented it. The question isn't whether the addition is good — it often is — but whether the spec should be updated to reflect reality. Untracked additions become invisible product decisions that nobody knows about.

How Drift Detection Works

A drift detection system operates at the pull request level. Here's the flow:

PR Opened Find Spec Compare Diff Flag Drift Update Spec or Approve

Step 1: PR opened. A GitHub App or webhook triggers when a new pull request is created or updated.

Step 2: Find the spec. The system identifies which spec the PR implements. This could be through PR description references, branch naming conventions, or linked tickets that map back to spec sections.

Step 3: Compare the diff. AI reads the PR diff and the relevant spec sections. It compares what was specced against what's being implemented — not at a string level, but at a semantic level. "POST /api/orders" in the spec and "POST /api/v2/orders" in the code is drift. "Returns user object" in the spec and "returns user DTO with additional metadata" might or might not be drift, depending on the acceptance criteria.

Step 4: Flag drift. Discrepancies are flagged as PR comments or check annotations. Each flag includes the spec section that's affected, the nature of the drift (contradictory, missing, or additive), and context for the reviewer.

Step 5: Resolve. The team decides: update the code to match the spec, or update the spec to match the code. Either is valid. The point isn't to prevent changes — it's to make changes visible and deliberate.

Why Code Review Doesn't Catch This

"We do code review" is the most common objection to drift detection. And code review IS valuable. But code review operates at the code level. Reviewers check: does this code work? Is it well-structured? Does it follow patterns? Are there bugs?

Reviewers almost never check: does this code match the spec? Because the spec is in a different system. Opening the Notion doc, finding the right section, and comparing it against a 400-line diff while also checking code quality is cognitive overload. So reviewers focus on the code — and the spec comparison gets skipped.

Code review answers "is this code good?" Drift detection answers "is this the right code?" They're different questions, and your workflow needs both.

The data backs this up. Only 34% of projects complete on time. The most commonly cited cause isn't bad code — it's misalignment between intent and implementation. Features that technically work but don't match what was specified. Edge cases that were discussed in the spec review but never implemented. Scope changes that were decided in Slack and never reflected in the spec.

Code review catches bugs. Drift detection catches misalignment. You need both.

What Drift Detection Is NOT

It's not a linter. Drift detection doesn't check code quality, formatting, or style. It checks whether the code implements what the spec describes.

It's not a blocking gate. Drift flags are informational, not blocking. The engineer might be right and the spec might need updating. The goal is visibility, not gatekeeping.

It's not a substitute for communication. Drift detection surfaces misalignment — it doesn't resolve it. The team still needs to decide whether to update the code or the spec. But at least now they know the misalignment exists.

It's not only for large teams. A solo developer drifts from their own specs. A two-person team drifts faster than a ten-person team because there's less review overhead to catch it. Drift detection is most valuable for small teams where one person is both the spec author and the implementer.

The Spec Must Be Machine-Readable

Here's the practical requirement that most teams miss: drift detection only works if the spec is structured enough for a machine to compare against.

A wall of prose in a Google Doc isn't machine-readable. "The notification system should be intuitive and user-friendly" can't be compared against a PR. But "POST /api/notifications creates a notification with { userId, type, message } and returns 201" — that's a testable assertion a drift detector can validate.

This is why living specs matter. A spec with structured acceptance criteria — testable assertions about API shapes, data models, error codes, and business rules — is a spec a drift detector can work with. A spec without them is a document that was interesting to write and useless to enforce.

34% Projects delivered on time
48hrs PRD half-life
0 PM tools with drift detection

The Post-Ship Feedback Loop

Drift detection doesn't stop at merge. The most powerful application is post-ship: comparing what was built against the customer feedback that motivated it.

The spec was created because customers reported a problem. The feature was built to solve that problem. After shipping, new feedback comes in. Does the new feedback indicate the problem was solved? Or are customers still reporting the same issue — meaning the implementation drifted from the intent, even if it matched the spec?

This closes the loop: customer feedback → spec → implementation → drift check → ship → new feedback → did it work? The spec becomes a traceable chain from customer problem to shipped solution, with drift detection at every link.

The Spec Layer Gets Enforcement

Every other layer of software development has automated validation. Code has tests. APIs have schema validation. Infrastructure has drift detection. Types have compile-time checks.

The spec layer — the place where "what to build and why" lives — has nothing. It's the last unvalidated layer in the stack. Drift detection changes that. It gives the spec the same enforcement mechanism that every other artifact has had for years.

The spec is infrastructure, not a deliverable. And infrastructure without validation is just documentation — which is to say, it's a suggestion that degrades over time. Drift detection makes the spec a contract that holds.

Drift detection for your specs.
Stonewall's GitHub App watches every PR and flags when code diverges from your spec — before merge, not after launch. Contradictions, omissions, and undocumented additions, surfaced automatically.
Join the waitlist at stonewall.dev

Related Posts