D2 intermediate ~45 min

One Repo, Many Hands: Worktrees

Absorbs: the lab whose members don't overwrite each other

Advances D2

The Pain

You had two clean ideas and one afternoon, so you ran them at once. One agent was refining the demand-elasticity prep; the other was building the trip-duration robustness checks. Same repository, because it was faster, and what could collide — they were working on different parts of the analysis. For twenty minutes nothing did.

Then both reached for src/build_panel.py, because both needed the panel and neither knew the other existed. The first agent rewrote the zone join; the second, a beat later, rewrote the same function for duration weighting and saved over it. Git, asked to hold two incompatible edits to one file from two writers who never spoke, did the only thing it could: it thrashed. A merge conflict you did not author, in a file neither of you finished, while a third write — a results file from the first run — landed on top of the second run’s half-written output. You spent the rest of the afternoon not doing either analysis but disentangling them, reading diffs to reconstruct which agent meant what, and you got it wrong once and had to do it twice.

A real lab does not seat two researchers at one desk and one keyboard. Each gets their own workspace, their own copy of the shared materials, and the work is combined deliberately, by someone whose job is to combine it — not by collision. The parallelism was never the problem. The shared desk was.

Why / When

The moment two agents work the same repository at once, they trample each other: half-written transforms collide, results overwrite, git thrashes on edits no single author made. The mechanism a lab uses to prevent this is the same one a version-control system already offers — a separate working directory per worker, backed by the same shared history, combined through ordinary review. Each agent gets its own checkout; nobody writes where anybody else is writing; merges happen on purpose, by a human acting as referee.

This is the load-bearing mechanic for everything at scale that follows: the overnight runs in D3 each want their own tree so a 3 a.m. failure is a deletable directory, and the fleets in D4 want isolation per code-touching variant. It earns its own lesson because getting it wrong is not a style problem — it is the lost afternoon in the Pain vignette, and it scales with the number of hands. The discipline accelerates nothing on its own; what it does is make parallelism safe, which is the only thing that makes parallelism worth doing.

Contrary winds

Not for: agents that only ever write results — never code — under a shared contract: a manifest with one-file-per-run rules lets them share one tree safely, and a worktree each is then just ceremony (the D4 pattern).

Mechanics

Field note

There is nothing language-specific here: worktrees are a git mechanic, and the agents inside them may write Python or R without changing a word of this page. That is why it declares no R variants.

The mechanic

A git worktree is a second working directory attached to the same repository: one shared object store and history, but each worktree has its own checked-out files and its own branch. You create one with a single command, and the agent (or session) that works there cannot touch another worktree’s files because they are literally a different directory:

# from the main checkout, give each workstream its own tree + branch
git worktree add ../weather-mobility-w1 -b w1-elasticity
git worktree add ../weather-mobility-w2 -b w2-duration

git worktree list        # the main tree plus the two new desks
# … work happens in each independently; combine through review:
git switch main && git merge w1-elasticity   # deliberate, reviewed

This beats the obvious alternative — copying the whole folder twice — on every axis that matters: the worktrees share history, so a commit in one is visible to all and there is no re-syncing; they are cheap, sharing the object store rather than duplicating it; and they force a disciplined merge, because combining work means a real git merge a human reviews, not a file-copy nobody audited. The one case where you do not need them is the notFor above: agents that only write results under a contract never touch shared code, so they can share one tree — the manifest, not the worktree, is doing the isolation there (D4).

System Player film — Worktree Collision Counterfactual

step 1/7

Step 1 of 7.

You want two agents working at once — A on the cleaning transforms, B on the figures. The obvious move is to point both at the same checkout and let them go. One repo, one working tree, two sets of hands.

The two tools reach the same isolation by different routes — one through local primitives you compose, one through a managed multitasking model — so this is a dual treatment, not a tab. Neither hides; the contrast is instructive.

Claude Code Your tool

A session per worktree, and agents spawned into one

The recommended pattern is one claude session per worktree: open the elasticity tree in one, the duration tree in another, and they cannot collide because each is rooted in a different directory. The desktop app runs these as parallel sessions across worktrees side by side.

Beyond hand-driven sessions, a subagent or workflow can be spawned directly into a fresh worktree — the orchestrator creates the tree, runs the agent there, and auto-cleans the worktree if the agent left it unchanged, so a survey that produced nothing leaves no litter. This is the primitive D4’s fleet stands on: each code-touching variant gets its own worktree, created and disposed of by the workflow, isolated from the others by construction. The composition — worktree, plus agent, plus auto-cleanup — is something you assemble from pieces, which is the Claude Code shape throughout.

Codex Your tool

Parallel threads natively, and cloud tasks as the managed worktree

The desktop app’s core model is multitasking: it runs parallel threads, each backed by its own worktree natively — starting a second thread on a second workstream gives it an isolated checkout without your asking, because that is the app’s central abstraction rather than a pattern you assemble. CLI users get the same isolation the ordinary way: a plain git worktree per thread.

The managed analogue goes one step further. A cloud task runs in an isolated cloud environment — its own container, its own branch — and returns a diff when it finishes. That is a worktree you never have to create, clean, or even keep on your laptop: the isolation is the service’s, and what comes back is reviewable exactly like a pull request. The trade is that the isolation is real but opaque — you review the returned diff, you do not watch the desk — which is the managed-delegation shape throughout.

Translation guide
Intent	Claude Code	Codex
two workstreams at once, locally	one session per worktree (desktop: parallel sessions across worktrees)	parallel threads, each its own worktree natively (CLI: git worktree per thread)
an agent isolated for a code-touching task	subagent/workflow spawned into a fresh worktree, auto-cleaned if unchanged	a cloud task — isolated container + branch, returns a reviewable diff
combining the work	deliberate git merge, human as referee	review the returned diff like a PR, then merge

Worktree discipline for analysis projects

The mechanic is cheap; the discipline is what keeps it honest, and it is the same in either tool:

Branch by workstream. w1-elasticity, w2-duration — the branch name says which analysis it carries, so the merge referee knows what they are combining before they read a line.
Know what merges and what never does. Code and specs merge — they are the shared methodology. Scratch outputs do not: a results file is owned by a contract (D4), not reconciled by a git merge. Merging two agents’ results/ is how you get a file that is neither run, and it is the second collision in the Pain vignette.
Keep results/ out of worktree merges. The D4 results contract governs result files; the git merge governs code. Conflating them re-creates exactly the overwrite you used worktrees to prevent.
The human is the merge referee. Parallelism is safe only because someone deliberately decides what combines. The tool isolates; you reconcile. Never let a merge happen by collision instead of by decision.

Guided Run — One Repo, Many Hands: a desk per workstream

Field Terminal — session: d2-worktrees Claude Code

git worktree add ../weather-mobility-w1 -b w1-elasticity

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Guided Run — One Repo, Many Hands: a thread per workstream

Field Terminal — session: d2-worktrees Claude Code

git worktree add ../weather-mobility-w1 -b w1-elasticity

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Field Assignment

Artifact make check-d2 passes — both branches merged, history linear per workstream, zero collisions

Run the project’s two workstreams concurrently — and prove they never touched each other.

Give each workstream its own worktree and branch: w1-elasticity for the demand-elasticity prep, w2-duration for the trip-duration robustness prep.
Run both at the same time — per your tool below — one agent refining the elasticity prep, one building the duration prep, each rooted in its own tree.
Merge both branches back to main deliberately, as the referee: code and specs merge; no results/ file is reconciled by the merge.
Demonstrate zero collisions: each workstream’s history is linear, and no file was overwritten across trees.

Claude Code

Open one claude session per worktree (or spawn a worktree-isolated agent per workstream). Let them run simultaneously; confirm neither session can see the other’s working files. Merge w1-elasticity then w2-duration into main, reviewing each diff as the referee.

Codex

Run the two workstreams as parallel threads (each its own worktree), or hand one to a cloud task and review its returned diff. Confirm the threads never share a working directory. Merge both branches into main, reviewing each diff — the cloud task’s exactly like a PR from a new student.

make check-d2 verifies both branches merged cleanly and that each workstream’s history is linear — no cross-tree overwrite, no merge you did not author. This is the mechanic D3 runs its overnight jobs inside and D4 fans its fleet across.

Milestone gate · make check-d2advances D2

Each workstream has its own worktree and branch — w1-elasticity and w2-duration
Branch by workstream so the merge referee knows what they're combining.
Both workstreams ran concurrently, each rooted in its own working directory
Both branches merged cleanly into main; history is linear per workstream
Zero collisions: no file was overwritten across trees, and no merge happened that you did not author
No results/ file was reconciled by a git merge — outputs are owned by the D4 contract, not by the merge
Merge code and specs; let the contract own the outputs.

Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.

Pitfalls & Gotchas

[both] 〜〜

“Two agents in one tree, just this once.” The collision does not happen the afternoon you decide it is fine — it happens the afternoon it matters, on the file you cared about, and you spend the day reconstructing which writer meant what. For a result you intend to publish, an unaudited overwrite is not an inconvenience; it is a number you can no longer explain. The worktree is one command and it is the difference.
[both]

Merging scratch outputs. Results belong to contracts (D4), not to git merges: reconciling two agents’ results/ produces a file that is neither run. Merge code and specs; let the contract own the outputs.
[CC]

Worktree-spawned agents that mutate global state — installed packages, shared caches, a global config — escape the isolation the worktree gave them, because that state lives outside any tree. Keep environments per-worktree (B2’s pinned lockfile, restored inside each tree) so the isolation is real and not just file-deep.
[CX]

Cloud-task isolation is real but opaque: you do not watch the work, you receive a diff. Review that diff like a pull request from a new student — line by line, asking what it touched and why — not like a trusted teammate’s. Opaque isolation only protects you if you read what comes back.

Check Your Bearings

D2 · 4 questions · unlimited retries, no timer

This check opens when the guided simulation above is complete — the questions assume you have seen the run.

(noted in your field journal as an override)

The interactive check needs JavaScript — without it this section shows only the quiz cover. The lesson text above is complete without the quiz; answers and journal recording require JavaScript.

Field journal

Record the two workstreams you ran in parallel, the branch each lived on, and the moment you merged — and confirm no results/ file was ever reconciled by a git merge.

as of June 2026

This is a Tier-2 split: both tools deliver true per-worker isolation, but by different primitives. Claude Code composes it from local pieces — a session or a spawned agent per git worktree, with auto-cleanup of unchanged trees — the assemble-it-yourself shape. Codex makes parallel worktree-backed threads the desktop app’s native model and offers cloud tasks as a managed worktree you never create, clean, or hold locally, returning a reviewable diff — the managed-delegation shape. The underlying git worktree is identical and available to both via the CLI; the asymmetry is in how much the tool manages for you, and it is the same design philosophy that runs through D3 and D4.

Feature-parity matrix

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

Your position

Positions

the data manager

Position vacant — engaged at C2

write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter
the methodologist

Position vacant — engaged at C1

the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
the data engineer

Position vacant — engaged at C3

MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
the RA pool

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the overnight RA

Position vacant — engaged at D3

/loop supervision + Goal Mode runs over background estimation

est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine
the adviser

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the referee

Position vacant — engaged at D4

contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
the lab manager

Position vacant — engaged at E2

scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
the reproducibility checker

Position vacant — engaged at E1

headless invocation + the fresh-clone replication self-test + CI gates

est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
the the wall — the unstaffed midnight hours between a raw file and a first plot

Position vacant — engaged at A1

the bare agent loop (prompt → act → observe → fix), zero configuration

est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free
the you, working an order of magnitude faster — but only if you direct the work

Position vacant — engaged at A2

the command surface + five prompting patterns + context hygiene

est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
the the lab manual nobody writes — the institutional knowledge that lives in your head

Position vacant — engaged at B1

instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter
the careful senior who plans before touching data

Position vacant — engaged at B2

repo scaffold + pinned environments + read-only Plan mode reconnaissance

est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention
the the lab whose members don't overwrite each other

Position vacant — engaged at D2

git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
the the onboarding the lab never has to repeat

Position vacant — engaged at E3

lab-kit — the whole methodology packaged as a one-command install

est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
the the whole lab, orchestrated — the PI who designs the system instead of doing the work

Position vacant — engaged at F1

the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson	Role	Est. human-RA	Agent (yours when measured)
A1	the wall — the unstaffed midnight hours between a raw file and a first plot	an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work	~10 minutes for the quick win, plus the same task re-run in the other language for free
A2	you, working an order of magnitude faster — but only if you direct the work	the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong	~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1	the lab manual nobody writes — the institutional knowledge that lives in your head	~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down	written once in an hour; reloaded free at the start of every session thereafter
B2	careful senior who plans before touching data	~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots	an afternoon — most of it download wall-clock, not attention
B3	the data manager who guards the raw files — the person who says no near the master copies	permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads	two profiles configured once in minutes; the fence then holds every session, tired or not
C1	the methodologist — the one person who knows how the lab actually decides	the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do	an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2	data manager / QA who never sleeps	permanent vigilance — est. 2 weeks/year of load-checking and release-note reading	half a day to install and test the 9-line block; ~20 s per run thereafter
C3	the data engineer who wires the lab to its systems	days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes	register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1	the RA pool — and the adviser who critiques from outside	a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will	~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2	the lab whose members don't overwrite each other	the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time	two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3	overnight RA	one night shift per estimation batch — and the course runs several batches	~10 min to write the check or the objective; the night itself belongs to the machine
D4	an RA bench and the PI who keeps their results comparable	the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for	13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1	reproducibility checker	a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission	~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2	lab manager's standing chores	a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped	~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3	the onboarding the lab never has to repeat	six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over	~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1	the whole lab, orchestrated — the PI who designs the system instead of doing the work	each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits	the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed		0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.

The Pain

Why / When

Mechanics

The mechanic

Worktree discipline for analysis projects

Guided Run — One Repo, Many Hands: a desk per workstream

Guided Run — One Repo, Many Hands: a thread per workstream

✳ Claude Code

⬡ Codex

Pitfalls & Gotchas

Parity note

Claude Code

Codex