I've been running multiple AI coding agents in parallel — five, six, sometimes eight workspaces at once, each tackling a different feature or fix on the same codebase. It's productive in bursts. You feel like you've hired a small team. Then you stop and look at what you've actually produced, and things get weird.
One agent added dynamic model discovery. Another agent, solving a different problem in a different workspace, also added dynamic model discovery — a slightly different version with a different class name. A third agent needed model listing as part of its feature, saw neither of the other two, and inlined its own implementation. I now had three versions of the same concept across three branches, none of which knew about the others.
This is what I'm calling agentic drift: the gradual, invisible divergence that happens when parallel autonomous agents work on related parts of a codebase without coordination. It's not a merge conflict in the git sense — your files might merge cleanly. It's a semantic conflict. The code compiles, the tests pass, but you've built the same thing three times and each version encodes slightly different assumptions about how it should work.
How it happens
The workflow that creates this is seductive because the beginning feels so good. You identify six things that need doing. You spin up six agents. Each gets a workspace — a clean branch, a focused task, full autonomy. You check in an hour later and each one has made real progress. Pull requests start appearing. You feel like a CTO.
The problem starts when the tasks aren't truly independent. And they almost never are. Software is a graph, not a list. Feature A needs a utility. Feature B needs a similar utility. Feature C refactors the module where that utility should live. None of these agents talk to each other. They each make locally reasonable decisions that are globally incoherent.
What you get looks like this:
- Duplicate implementations — the same concept built multiple ways, sometimes with the same name, sometimes not
- Architectural divergence — one branch simplifies a system another branch extends. Both are reasonable in isolation. Together they're contradictory
- Cross-pollination artifacts — an agent working on feature X notices a bug in module Y, fixes it as part of its branch. Another agent working on feature Z also fixes the same bug, differently. Now you have two fixes for the same bug in two unrelated PRs
- Phantom dependencies — you think a feature was built because you remember seeing it, but it was in a different workspace. The branch you're merging doesn't have it. Things break in ways that make no sense until you realize your mental model is a composite of six different realities
The longer you wait to integrate, the worse it gets. Each workspace drifts further from the others. The merge at the end isn't additive — it's archaeological. You're reconstructing intent from divergent timelines.
The integration tax
I just went through this on a side project: a Dart-based CLI/TUI coding agent I've been building as my own take on tools like Claude Code and opencode — picking the parts I like most and experimenting with some ideas of my own. After a stretch of parallel work using Conductor (which makes spinning up parallel agents dangerously easy), I had:
- 4 open PRs, two with merge conflicts
- 10+ feature branches without PRs, each with real work
- Uncommitted changes in a separate workspace on a branch that already had a PR
- 3 empty branches where work was never started
- Overlapping implementations of Ollama model discovery, skill loading, and session replay
- One PR that removed a caching system another PR depended on
Figuring out what to merge, in what order, and how to reconcile the contradictions took longer than building any individual feature. This is the integration tax. It's the cost you pay for the parallelism, and it's nonlinear — two parallel agents are maybe 1.5x the integration work; eight are closer to 5x.
The nasty part is that each individual PR looks fine. It has tests. It has a clear description. The code is clean. It's only when you lay them all out and trace the shared surfaces that you see the mess. Feature B assumes feature A was never built. Feature D removes something feature E extends. The model registry was refactored by one agent and kept intact by three others.
A prompting experiment: idealized diffing
Separately from the drift problem, I've been experimenting with a prompting technique for code improvement that I think might help with the integration step. The technique is simple:
Look at this code. Now imagine it was actually excellent — well-structured, handles edge cases elegantly, has clean data flow, clear abstractions. Describe that imaginary version in detail. Then compare it to what we actually have.
I'm calling this idealized diffing. Instead of asking "what's wrong with this code" (which tends to produce surface -level nitpicks) or "refactor this" (which tends to produce incremental changes), you ask the model to construct a complete mental image of the ideal version first, then use the gap between ideal and actual as a structured improvement plan.
The hypothesis: when you give the model a concrete codebase as reference, the "imagined better version" stays grounded. It can see the actual constraints — this is a TUI that needs to handle pasting, that's a session store with backward compatibility requirements. The idealized version respects those constraints while improving the architecture. Without a codebase as reference, the model hallucinates details or produces something generic.
Early results are promising. When I apply this to a module after merging conflicting branches, it tends to surface the right questions: "these two implementations serve the same purpose but encode different assumptions about X — here's how they should be unified." It's essentially using imagination as a form of code review, but one that produces a target state rather than a list of complaints.
The technique works as pre-work for refactoring. You don't execute the idealized version directly — it's a north star that helps you figure out what the merged code should look like before you start editing. Think of it as the architectural equivalent of writing tests before code: you define the desired shape before you start cutting.
Others are hitting this too
I'm not the only one running into this. The problem is emerging wherever people scale up parallel agent work:
- Clash is a CLI tool that detects merge conflicts between git worktrees before they become problems, using three-way merge simulation. It exists specifically because "agents work blind to each other's changes" and conflicts only surface after significant effort is wasted.
- The multi-agent coordination framework project documents a methodology proven on 5,100+ tests by coordinating Claude and GPT agents with zero shared memory across 100+ sessions. Their approach: protocols, handoff checklists, consistency gates, and structured memos instead of shared state.
- Ed Lyons at EQengineered writes about the same fear: "ugly conflicts due to agents all modifying the same files in different ways" plus an unmanageable review workload. His conclusion: restrict agents to compartmentalized, well-understood assignments.
- Google's 2025 DORA Report found that 90% AI adoption increase correlates with 9% more bugs, 91% more code review time, and 154% larger PRs. The throughput is real but so is the integration cost.
There's also MCP Agent Mail, which gives agents identities, inboxes, and file reservation leases — essentially Gmail for coding agents, backed by Git and SQLite. Agents can claim exclusive locks on files before editing and send messages to coordinate. On paper it solves the coordination problem. In practice, it feels like ceremony — another system to set up, another protocol for agents to follow, another thing that can break. I haven't used it extensively enough to say it's not worth it, but my instinct says the overhead of teaching every agent to check its mail before writing code might eat the gains from the coordination it provides. Similar vibes to Beads — thoughtful design, but the setup cost might exceed the problem cost for most workflows.
The tooling is catching up. But right now, the coordination problem is mostly unsolved — the tools detect conflicts earlier or add coordination protocols, but don't prevent the semantic drift that causes them.
Mitigations I'm thinking about
Agentic drift probably can't be eliminated. Parallelism is too useful, and the cost of full coordination between agents would eat the productivity gains. But it can be managed:
Shorter integration cycles. The single biggest lever. Merge early, merge often. Don't let five branches run for a day — integrate every few hours. The integration tax compounds.
Shared context files. Give all agents a living document that describes the current architecture, recent decisions,
and in-progress work. Something like a AGENTS.md or CLAUDE.md that every workspace reads. This doesn't prevent drift
but it reduces the radius.
Early conflict detection. Tools like Clash can hook into your agent workflow and warn before a write happens that would conflict with another worktree. This doesn't solve drift, but it catches the mechanical conflicts early enough to redirect.
Trunk-based development with agents. Instead of long-lived feature branches, have agents work in short-lived branches that merge to main quickly. One feature per branch, one branch per hour. This conflicts with the "spin up six agents" workflow but it might be net positive.
Post-merge idealized diffing. After merging a batch of branches, run the idealization prompt on each module that was touched by multiple branches. Let the model identify where the merged code has contradictions or redundancies, then clean up deliberately.
Architectural boundaries. The less shared surface area between tasks, the less drift. If agent A works on the CLI
entry point and agent B works on observability, they mostly won't step on each other. If they both touch app.dart —
and they will, because god classes are drift magnets — you have a problem.
It's still worth it
I don't want to be too down on parallel agents. The throughput is real. Features that would take a week of focused solo work can ship in a day. The quality is often surprisingly good — each individual agent does careful, tested work. The problem is purely at the integration layer.
It's the same tradeoff that real engineering teams face, just compressed into hours instead of sprints. Brooks's Law says adding people to a late project makes it later. The agentic version might be: adding agents to a coupled codebase makes the merge harder. The agents are fast, but the merge is still manual, still requires understanding the full picture, and still falls on you.
The answer isn't fewer agents. It's better integration discipline, better shared context, and maybe — if the idealized diffing technique holds up — better tools for reasoning about what the combined output should look like before you start stitching it together.
The uncomfortable question: what if isolation is the problem?
There's a possibility I keep circling back to: maybe the entire worktree-per-agent model is wrong, and the answer is just... don't isolate them.
If all agents work in the same directory on the same branch, there's no merge step. Agent A writes a utility, agent B sees it immediately, agent C builds on it. No divergence, no phantom dependencies, no archaeological merge at the end. The drift problem disappears because there's only one reality.
I've done this too, and it works — sort of. The agents step on each other less than you'd expect. They can commit their own changes in logical chunks. There's no integration tax because there's nothing to integrate.
But you lose things. For compiled languages, you get half-built broken states while agents are mid-feature. If two agents touch the same screen or module, one of them is working against a moving target. You can't preview agent A's work without also seeing agent B's half-finished changes. And the commit history becomes a mess — interleaved changes from different features, hard to revert cleanly if one feature turns out wrong.
The worktree model gives you clean isolation and clean commits at the cost of drift. The shared model gives you coherence at the cost of messy intermediate states and tangled history. Neither is obviously better. It might depend on the language (interpreted vs compiled), the codebase size, and how much the tasks overlap.
I suspect the real answer is somewhere in between — maybe two or three agents sharing one workspace, with a fourth working in isolation on something truly independent. But I haven't found that sweet spot yet. If you have, I'd like to hear about it.
For now, I'm going back to merging eight branches that all modified the same file.
