stonewall.dev
Back to Blog
product-management engineering ai-coding claude-code specs workflow

How Product Managers Keep Up With Engineers in the AI Era

Stonewall · · 11 min read

Engineers got roughly 10x faster. Product managers got roughly 1.3x faster. That gap is the entire story of product management in 2026.

The 10x is real. GitHub's controlled study clocked Copilot users completing tasks 55% faster (P=.0017). DORA's 2025 report found 90% of developers using AI for a median two hours a day. The Pragmatic Engineer's Feb 2026 survey put weekly AI usage at 95% of engineers, with 75% using it for at least half their work. Satya Nadella said in April 2025 that up to 30% of Microsoft code is now AI-written; Sundar Pichai followed in July with "well over 30%" for Google. METR's Time Horizon 1.1 report (Jan 29, 2026) found agent task length is now doubling every 130.8 days — Claude Opus 4.5 can autonomously complete tasks estimated at 320 minutes of human work.

The 1.3x is also real, but in a way that quietly undermines the role. Lenny and Noam Segal's AI Productivity Survey (Dec 2025, n=1,750) found 63% of PMs save four or more hours a week with AI. Where do those hours come from? PRDs (21.5%), mockups and prototypes (19.8%), and internal comms (18.5%). Almost none come from discovery, validation, GTM, user research, or strategy.

PMs sped up at typing. Engineers sped up at building. That asymmetry — output got cheap on both sides, but the engineer's "build" is now closer to "ship" while the PM's "type" is still the same draft document — is why Andrew Ng's tweet from July 2025 ("we are more constrained by deciding what to build rather than the actual building") landed so hard.

This piece isn't about keeping up with AI itself — for that, see the sister post on the weekly operating system. This one is about the harder, more uncomfortable problem: how to keep up with the engineers on your team now that they have agents.

Why "keep up with engineers" got harder

Three things broke the old PM operating model at once.

The bottleneck moved upstream. Implementation got 3–5x faster. The constraint is now spec quality, not build speed. We've argued this before; Reforge's Moving To Higher Ground says the same thing more politely. The PMs who notice the shift are fine. The ones who don't are now the slow part.

The context war flipped. The old PM advantage was more context than the engineer: you read the customer tickets, you sat in sales calls, you built the doc nobody else built. In 2026, your engineers + their agents have semantic search over the entire codebase, every PR since the company started, every customer ticket via MCP servers, and a Claude Code session that already read all of it. The MCP server registry grew from 1,200 to 9,400+ servers in twelve months. The engineer who walks into your standup has more situational awareness than you do — and they got it in fifteen minutes.

The output moat collapsed. When PRD writing was hard, the PRD was the artifact. Now Cursor or Claude Code can draft a PRD in two minutes that passes most quality bars. Dennis Yang from Chime wrote bluntly in Lenny's newsletter that "Cursor is a much better product manager than I ever was." When the artifact is free, the differentiator is the thinking that goes into the artifact — and that has to show up somewhere your team can see it.

"PMs worry about becoming a 'Jira manager' while others drive AI. This isn't impostor syndrome — it's a wake-up call." — Reforge, Moving To Higher Ground (Aug 2025)

EY's October 2025 workforce survey found 54% of employees feel they're falling behind peers on agentic AI. The number for non-managers is 61%. The feeling is real. The fix is operational.

Three shifts that close the gap

The advice you've already read — be more curious, build trust with eng, learn the basics of how systems work — was good advice in 2018. It's not load-bearing in 2026. Three concrete shifts are.

Shift 1: Read the diff, not the standup notes

The single highest-leverage thing a PM can change tomorrow is where they get their daily product context.

The wrong answer: a written standup, a Notion update, or a Slack channel summary. By the time you read it, the engineer has already merged the PR, the agent has already picked up the next ticket, and the customer-facing behavior has already shifted.

The right answer: spend fifteen minutes a day reading the actual pull requests merged into your product. Not to review them as code — you're not the reviewer of record — but to understand what just changed. You're looking for four things:

  1. What user-visible behavior moved. Was a button renamed? A flow short-circuited? An error message changed?
  2. What's now possible that wasn't yesterday. New endpoints, new fields on entities, new background jobs — the surface area for product decisions just expanded.
  3. What got deleted. Removals are higher signal than additions. Anything an engineer ripped out is something you no longer have to spec around.
  4. What the agent struggled with. If a PR has six rounds of review or three force-pushes, that file is fragile. Spec your next feature accordingly.

This is what PostHog's "Engineer's Guide to Product Management" argued from the engineer's side: the PM who reads the diff has all of the context the engineer has, plus the customer context the engineer doesn't. That's the new moat. PostHog's framing is "the codebase is the source of truth" — your job is to make sure the truth in the codebase still matches the strategy in your head.

The tactic: every morning, before any meeting, scan yesterday's merged PRs. Not all of them — your repo's product-surface ones. Note three things you didn't know. If you can't note three things, the rest of your day is already misaligned.

Shift 2: Write specs agents can run, not PRDs humans rewrite

A PRD is for humans to debate. A spec is for agents and engineers to execute. They are no longer the same artifact and pretending they are is why your engineers stopped reading them.

Thoughtworks coined the working definition in their 2025 spec-driven development unpacking: a spec is a precise, testable description of intent that an agent or engineer can execute against. We've covered the spec-driven development pattern and why your PRD stops working after 48 hours — the deeper problem is that PMs are still optimizing for the wrong reader.

Three rules for a spec that closes the velocity gap:

  • Acceptance criteria are evals, not bullet points. "User can sort the table" is a 2018 acceptance criterion. "When user clicks the column header, the API call to /entries returns sorted results, the URL state reflects sort, and the back button restores prior order — verified against three test cases including pagination" is an eval. The latter is what an agent can actually code against and what an engineer can actually challenge.
  • Constraints belong in the spec, not the meeting. "Don't break the existing webhook signature" is a meeting comment in 2025. In 2026 it goes in the spec — because the agent will not attend the meeting, and the engineer is now reviewing the agent's PR, not writing the code from scratch.
  • The spec lives next to the code. A PRD in Notion that nobody links to is dead the moment a model context window forgets it. Stonewall's whole product thesis is that specs belong in the same retrieval surface as the code; whatever tool you use, the spec needs to be discoverable from the codebase, not the other way around.

If your spec can't be handed to an agent and produce a working draft of the feature, the engineer will rewrite the spec on the way to building it — and that rewrite is now the actual spec. You just lost authorship of your own product. We've written the engineer's-eye view of this; the PM-side discipline is to skip the round trip.

Shift 3: Review agent PRs like an eng manager, not a PM

The final shift is the most counterintuitive. As agent-generated PRs become a larger share of your team's throughput, the PM's role bends toward something engineering managers used to do alone: reviewing the diff against the intent.

Not for syntax. Not for tests. The engineer of record handles those. Your read is different. Thoughtbot's guide to reviewing AI-generated PRs flagged the specific failure mode: agents skip clarifying questions, so they hallucinate decisions that should have been escalations. That's exactly where the PM should be reading.

Four things to ask of every agent PR you scan:

  1. Did it answer a question that should have come back to me? Most product damage from agents isn't bad code — it's silently-resolved ambiguity. If the agent picked between two reasonable behaviors, you needed to make that choice.
  2. Does the change reflect the spec, or does it reflect the easy interpretation of the spec? Specs that say "the user should be notified" get implemented as a toast in 70% of cases and as an email in 30% — even when the right answer is in-app + email. Read for it.
  3. Is the surface area larger than the spec implied? Agents over-build. They add fields "for completeness." Every speculative field is a future refactor and a future product debate. Ship intent, not optionality.
  4. What did it not ship? Look for the missing edge case. Empty states, error paths, the loading state at 3G — these are the ones the agent quietly dropped because the spec didn't insist.

This isn't engineering work. It's product work that happens to live in the diff. PostHog's posture, Anthropic's PM-on-the-AI-exponential post, and the way Cat Wu describes Claude Code's own product team in Lenny's interview all point to the same conclusion: PMs at AI-native companies live in the same surface as the engineers, including the PR.

If your team's agent PR review process doesn't include a PM read, you are auto-shipping product decisions you didn't make. We'd argue that's how LinkedIn ended up replacing PMs with full-stack builders — once the PM stopped showing up in the diff, the engineer just took the role.

How to know it's working

The whole point of an operating model is that it's falsifiable. Three measurements:

  • Spec-to-merged-PR cycle time. If specs were the actual bottleneck, this number drops by 30–50% within a quarter once the spec quality lifts. If it doesn't, the spec isn't the constraint and you should look upstream — usually at discovery cadence, not spec template.
  • Engineer-initiated rework rate. Count the percentage of PRs where the engineer or agent had to revisit the original spec mid-implementation. Anything over 25% means the specs are fiction. Anything under 10% means specs are doing their job.
  • PM context lag. Pick three product surface areas. Once a week, ask yourself: do I know what shipped there in the last seven days? If you can't name a change in two of three, you've drifted out of the diff and back into the standup.

The targets are not aspirational. The PMs at companies hiring more PMs in 2026 — and per Lenny's state-of-the-job-market report, there are over 7,300 open PM roles globally, 75% above the 2023 low — are hitting them. The PMs at the companies cutting PM headcount are not.

The uncomfortable conclusion

The engineering keep-up problem is not solved by learning to vibe-code. Vibe coding is a useful party trick; it doesn't fix the asymmetry. Cursor will not make you a better product manager — it will let you ship one PRD an hour, which is exactly the wrong solution to a problem caused by typing being too cheap.

The fix is operating closer to the code. Read the diff. Write specs that an agent can run. Review the PRs the agents send back. The PMs who do this look more like staff engineers than the PMs of 2018, and that's the point. The role didn't get easier — it got more technical, more specific, and more leveraged.

Claire Vo, on stage at Lenny's Summit in 2024, put it as bluntly as anyone has: "You're going to be in a world of pain if you're not prepared for a shift in the next 18 months." It's been eighteen months. The shift happened. The PMs who closed the gap are still here. The bottleneck is no longer the bottleneck — for them.

For everyone else, the engineers are still shipping. They just stopped waiting.

Related Posts