D1 intermediate ~75 min

Research Assistants: Notebooks & Subagents

Absorbs: the RA pool — and the adviser who critiques from outside

Advances D1

The Pain

The weather signal is real — you saw it in the first scatter, demand rising with the rain — but it is one line through a cloud of points, and a cloud is not a finding. Before any of it survives a referee you owe the data a proper look: each borough on its own, demand against precipitation hour by hour, the obvious confounds named and ruled out or in. Manhattan, then Brooklyn, then Queens, then the Bronx, then the near-empty panel of Staten Island; then the same five against temperature, against the hour of day, against the day of the week. It is the same disciplined look, twenty times over, and it is exactly the work a lab hands to a roomful of research assistants — breadth work, each slice shallow, the whole deep only in aggregate.

You have no roomful. You have you, and you have already spent the morning on Manhattan alone, and the figure you produced is outlier-dominated and you know it, and the honest thing would be to redo it on a log scale and then redo the other four the same way. By borough three your eye has stopped being careful. By borough five you are pattern-matching against borough one instead of looking. And the worst part is the part no RA fixes: somewhere in this you will talk yourself into the result you want, because you are the one who wants it, and there is no one in the room whose job is to disagree with you from the outside. The breadth you can fake by staying up late. The cold second opinion you cannot give yourself.

Why / When

Two ideas arrive together because a lab uses them together.

The first is delegation through subagents. A subagent is a research assistant the tool spawns into a fresh, focused context with a written brief and a required report format — it does one slice of breadth work, files its report, and disappears, never spending its context (or yours) on the other nineteen slices. Many of them run at once: a fleet. This is EDA and literature review the way a lab does it, the unglamorous breadth that the analysis stage stands on.

The second, and the more important, is isolation — the pattern this whole course turns on. A critic that lives inside your working session inherits your reasoning, your hopes, and your blind spots; it will nod. A critic in a separate context, handed only the design memo and the findings and never the conversation that produced them, cannot inherit your rationalizations. Its critique is colder and better. The isolated adviser you meet here is the same mechanism that becomes the referee in D4 and the report reviewer in F1. The pattern is the lesson; the fleet is just where it first earns its keep.

Contrary winds

Not for: a single slice you will read with your own eyes in five minutes — dispatching an RA to do what you can see directly is overhead, not leverage.

Mechanics

Field note

The EDA scripts a fleet runs — borough demand against precipitation, within-hour-of-day correlations — are ordinary statistical code and can be Python or R; the orchestration around them (how RAs are spawned, briefed, and isolated) is identical either way, which is why this page declares no R variants. The dispatch is the subject; the regression inside each slice is C-unit work.

This is the breadth the fleet exists to produce — five boroughs, one RA each, rendered on identical axes so the slices are comparable at a glance:

The EDA fleet: one borough per RA — Five boroughs' hourly demand against precipitation, rendered on identical axes — the parallel-RA EDA output. Manhattan's raw correlation is +0.10; Queens's raw +0.00 flips to -0.11 once hour-of-day is partialled out — the timing confound the isolated adviser catches. Real query results.

borough	n	mean_pickups	r_raw	r_within_hod
Manhattan	2,159	4,122.02	0.1	0.12
Queens	2,159	402.27	0	-0.11
Brooklyn	2,159	58.73	0.05	0.06
Bronx	2,159	13.34	-0.01	-0
Staten Island	2,159	0.14	0.05	0.04

Read it the way the isolated adviser will. Manhattan’s raw correlation with precipitation is +0.10; Queens’s raw +0.00 flips to −0.11 once hour-of-day is partialled out. That sign flip is the timing confound — demand and rain both peak at commute hours — and it is exactly the kind of thing a tired analyst on borough three reads past and a cold critic on the outside does not. Hold that flip; the adviser will come back to it.

Agents in notebooks

The first surface is the notebook, and here the tools genuinely differ — so this is a spotlight, not a tab.

Claude Code Your tool

NotebookEdit + reading the chart it just drew

The distinctive capability is a closed loop: the agent edits a cell with NotebookEdit, executes it, looks at the figure it produced, and iterates on what it sees — not on what it predicted it would see.

> Open notebooks/eda_manhattan.ipynb. Plot hourly demand against
  precipitation for Manhattan, then read your own figure and fix what is
  wrong with it before moving on.

… edits cell 4, runs it, inspects the PNG output …

> The histogram is outlier-dominated — three storm hours compress
  everything else against the axis. Switching to a log scale on the
  count axis and binning precipitation; re-running.

The agent reading its own chart output is what closes the loop a blind script cannot: “outlier-dominated, switch to log scale” is a judgment made from the rendered image, the move you would make over the analyst’s shoulder. Keep one rule on it, though — see the pitfalls — any claim drawn from a figure must be re-stated as the number behind it, or it is an impression, not a finding.

Nearest equivalent — Codex

The same discipline assembles from two pieces and is, for analysis work, the better default anyway: write the analysis as a script (the script-first lane both tools do well — jupytext round-trips it to a notebook for humans), have it save the figure to a file, then hand that image back to the agent as image input so it can read the chart and iterate. You lose the single fluid edit-run-look cell; you gain a figure that lives in version control and an analysis that reproduces from the command line. The loop is the same — draw, look, fix — with one more explicit hop.

Watch this space as of 2026-06 Notebook-native agentic editing is on both roadmaps; recheck quarterly.

Field note

Honest sidebar on notebooks and agents: hidden state and out-of-order execution are exactly the failure modes an agent amplifies — a cell that “works” only because cell 9 ran before cell 4 is a result that evaporates on a clean rerun. The moment an exploration matters it graduates to src/ (B2’s quarantine rule). The script-first lane both tools do well is not a Codex consolation prize; it is the lane a fleet should run in regardless of tool.

The subagent model

A subagent is the same delegation in both tools — a fresh context, a written brief, a required report format — and the same A2 lesson resolved: context isolation means the survey’s noise never reaches your main session. What differs is only how you define one, so this is a tab.

System Player film — Subagent Isolation

step 1/6

Step 1 of 6.

You have been in this conversation for an hour. The working agent has read your files, run your transforms, and — turn by turn — built up a context: every choice, every excuse, every "good enough for now" you both agreed to along the way.

Claude Code

A subagent is a markdown file under .claude/agents/<name>.md — frontmatter naming the agent and its tools, a body that is the brief and the report contract:

---
name: eda-borough
description: EDA on one borough's demand–weather relationship
tools: [Read, Bash, NotebookEdit]
---
You analyze ONE borough, named in the prompt. Compute hourly demand
against precipitation and against temperature, both raw and within
hour-of-day. Report to reports/eda/<borough>.md in this structure:
each claim on its own line, followed by the exact query and the row
count it rests on. No claim without a number behind it.

Two built-ins do the generic work without a file: Explore for read-only breadth (schemas, file surveys) and Plan for assembling a reviewed plan. Dispatch the custom agent across all five boroughs at once and they run in parallel, each in its own context, each filing its own report — the fleet.

Codex

A subagent is a TOML file under .codex/agents/<name>.toml — the same two parts, a brief and a report contract, in TOML:

name = "eda-borough"
description = "EDA on one borough's demand–weather relationship"
tools = ["read", "shell"]
instructions = """
You analyze ONE borough, named in the prompt. Compute hourly demand
against precipitation and against temperature, both raw and within
hour-of-day. Report to reports/eda/<borough>.md in this structure:
each claim on its own line, followed by the exact query and the row
count it rests on. No claim without a number behind it.
"""

Three built-ins cover the generic work: default for general tasks, worker for scoped execution, explorer for read-only breadth. Inspect any running thread’s context with the agent inspector, and note that a dispatched agent inherits the sandbox profile of its parent (the B3 callback) — a fleet launched under the pipeline profile writes only where the parent could. Dispatch the custom agent across all five boroughs and they run as parallel threads, each its own context, each filing its own report.

The report contract is what makes a fleet usable rather than a pile of prose — and it is the same contract in either dialect: every claim carries its query and its row count, so a finding is auditable and two agents that disagree mark a fragile claim for you automatically.

Two fleets and one critic

Claude Code

The EDA fleet. One eda-borough agent per slice — five boroughs, plus a weather slice and an events slice — dispatched together. Each files reports/eda/<slice>.md in the mandated structure; where two agents’ claims about the same relationship disagree, you have found a fragile claim before the referee does.

The scholarship fleet. A custom /paper-summary skill run across the ten papers nearest your question, each summary in the same fixed structure, synthesized into a related-work draft that F1 will inherit.

The isolated adviser. A custom /demanding-adviser skill, run as a separate subagent that reads only the design memo and the fleet’s findings — never your working conversation. Ask it about the Queens sign flip and it does not know which answer you were hoping for. Its critique is the cold one.

Codex

The EDA fleet. One eda-borough agent per slice — five boroughs, plus a weather slice and an events slice — dispatched as parallel threads. Each files reports/eda/<slice>.md in the mandated structure; where two agents’ claims about the same relationship disagree, you have found a fragile claim before the referee does.

The scholarship fleet. A custom $paper-summary skill run across the ten papers nearest your question, each summary in the same fixed structure, synthesized into a related-work draft that F1 will inherit.

The isolated adviser. A custom $demanding-adviser skill, run as a separate subagent (a fresh thread that never sees the working one) reading only the design memo and the fleet’s findings. Ask it about the Queens sign flip and it does not know which answer you were hoping for. Its critique is the cold one.

The adviser is the load-bearing idea, not the fleet. A critic prompted inside your working session has read every rationalization you typed on the way to your preferred answer; it argues your side. The same critic in a separate context, fed only the memo and the findings, has nothing to inherit — and is the only one that asks why Queens flips sign. That isolation is the mechanism this course returns to: the referee in D4, the report reviewer in F1, both this pattern at higher stakes.

Guided Run — The RA Pool: one borough per agent

Field Terminal — session: d1-eda-fleet Claude Code

Define an eda-borough subagent with a report contract: claim, query, row count, one per line

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Guided Run — The RA Pool: one borough per thread

Field Terminal — session: d1-eda-fleet Claude Code

Define an eda-borough subagent with a report contract: claim, query, row count, one per line

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Guided Run — The Adviser From Outside: isolation as a method

Field Terminal — session: d1-adviser Claude Code

In this session, critique my weather finding — does it hold up?

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Guided Run — The Adviser From Outside: isolation as a method

Field Terminal — session: d1-adviser Claude Code

In this session, critique my weather finding — does it hold up?

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Field Assignment

Artifact make check-d1 passes — both fleets filed in structure, syntheses written, the isolated adviser's critique logged

Run the lab the way a lab runs: dispatch the breadth, synthesize it, then submit it to a critic who was never in the room.

Define the eda-borough subagent with its report contract, and dispatch the EDA fleet across the five boroughs plus weather and events — in parallel, each filing reports/eda/<slice>.md with every claim backed by its query and row count.
Run the scholarship fleet (the paper-summary skill) across the ten nearest papers; synthesize the summaries into a related-work draft for F1.
Read the fleet’s reports for disagreement — two slices that describe the same relationship differently are your fragile claims; list them.
Run the isolated adviser as a separate subagent over the design memo and the findings only — not your session — and log its critique, especially anything it says about the Queens sign flip.
File the syntheses and the critique in journal/. Then make check-d1.

make check-d1 verifies the report structures (every claim has a query and a count), the two syntheses exist, and the isolated adviser’s critique is logged separately from your own notes. This feeds D2 (the fleet needs worktrees the moment its agents write code, not just reports) and D4 (the adviser becomes the referee).

Milestone gate · make check-d1advances D1

An eda-borough subagent is defined with a report contract (claim, query, row count, one per line)
The contract lives in the agent definition — markdown frontmatter (CC) or TOML (CX).
The EDA fleet ran in parallel across five boroughs plus weather and events, each filing reports/eda/<slice>.md in the mandated structure
The scholarship fleet ran across the ten nearest papers and the summaries were synthesized into a related-work draft for F1
Fleet disagreement was read for fragile claims — the Queens precipitation sign flip flagged
Two slices describing the same relationship differently is a fragile claim.
The isolated adviser ran as a separate subagent over the memo and findings only, and its critique is logged separately from your own notes
Isolation is the method: the critic must never read the working conversation.

Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.

Pitfalls & Gotchas

[both]

Subagents without a report contract return prose soup — five differently-shaped essays you cannot compare or audit. Mandate the structure in the agent definition itself: claim, query, row count, one per line. The contract is what turns a fleet’s output into evidence instead of opinions.
[both] 〜〜

A critic prompted inside your working session inherits its blind spots. It has read every step of your reasoning toward the answer you wanted and will defend it with you. Isolation is load-bearing, not stylistic: the adviser must read the memo and the findings and never the conversation, or it is not a second opinion — it is your first opinion in a different font.
[CC]

Chart-reading is not magic. “The relationship looks weak” read off a figure is an impression; the agent can hallucinate a slope as easily as see one. Require the number behind every claim made from an image — the correlation, the row count — and the figure becomes evidence instead of a vibe.
[both] 〜〜

Fan-out multiplies token cost: a seven-agent fleet spends roughly seven times one agent, and a careless re-run doubles it. The breadth is worth it for genuine breadth work; it is waste on a slice you could read yourself. Cost budgeting arrives properly in D4 — until then, count your slices before you dispatch them.

Check Your Bearings

D1 · 4 questions · unlimited retries, no timer

This check opens when the guided simulation above is complete — the questions assume you have seen the run.

(noted in your field journal as an override)

The interactive check needs JavaScript — without it this section shows only the quiz cover. The lesson text above is complete without the quiz; answers and journal recording require JavaScript.

Field journal

Record one fragile claim the fleet’s disagreement surfaced, and one thing the isolated adviser saw that you had read past — and whether you would have caught it yourself by borough five.

as of June 2026

The split here is honest. In-notebook chart-reading is genuinely Claude Code’s: NotebookEdit plus reading the rendered figure is a first-class loop, where Codex assembles the same discipline from a script-first workflow and image input — a hop longer, and arguably the better default for analysis that must reproduce. The subagent model, by contrast, is real parity: a fresh context, a written brief, a report contract, and parallel dispatch, differing only in whether the definition is markdown frontmatter or TOML and in the built-in roster (Explore/Plan/general-purpose versus default/worker/explorer). Isolation — the adviser who never sees the working conversation — is identical in both, because it is a discipline, not a feature.

Feature-parity matrix

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

Your position

Positions

the data manager

Position vacant — engaged at C2

write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter
the methodologist

Position vacant — engaged at C1

the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
the data engineer

Position vacant — engaged at C3

MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
the RA pool

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the overnight RA

Position vacant — engaged at D3

/loop supervision + Goal Mode runs over background estimation

est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine
the adviser

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the referee

Position vacant — engaged at D4

contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
the lab manager

Position vacant — engaged at E2

scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
the reproducibility checker

Position vacant — engaged at E1

headless invocation + the fresh-clone replication self-test + CI gates

est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
the the wall — the unstaffed midnight hours between a raw file and a first plot

Position vacant — engaged at A1

the bare agent loop (prompt → act → observe → fix), zero configuration

est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free
the you, working an order of magnitude faster — but only if you direct the work

Position vacant — engaged at A2

the command surface + five prompting patterns + context hygiene

est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
the the lab manual nobody writes — the institutional knowledge that lives in your head

Position vacant — engaged at B1

instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter
the careful senior who plans before touching data

Position vacant — engaged at B2

repo scaffold + pinned environments + read-only Plan mode reconnaissance

est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention
the the lab whose members don't overwrite each other

Position vacant — engaged at D2

git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
the the onboarding the lab never has to repeat

Position vacant — engaged at E3

lab-kit — the whole methodology packaged as a one-command install

est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
the the whole lab, orchestrated — the PI who designs the system instead of doing the work

Position vacant — engaged at F1

the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson	Role	Est. human-RA	Agent (yours when measured)
A1	the wall — the unstaffed midnight hours between a raw file and a first plot	an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work	~10 minutes for the quick win, plus the same task re-run in the other language for free
A2	you, working an order of magnitude faster — but only if you direct the work	the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong	~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1	the lab manual nobody writes — the institutional knowledge that lives in your head	~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down	written once in an hour; reloaded free at the start of every session thereafter
B2	careful senior who plans before touching data	~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots	an afternoon — most of it download wall-clock, not attention
B3	the data manager who guards the raw files — the person who says no near the master copies	permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads	two profiles configured once in minutes; the fence then holds every session, tired or not
C1	the methodologist — the one person who knows how the lab actually decides	the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do	an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2	data manager / QA who never sleeps	permanent vigilance — est. 2 weeks/year of load-checking and release-note reading	half a day to install and test the 9-line block; ~20 s per run thereafter
C3	the data engineer who wires the lab to its systems	days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes	register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1	the RA pool — and the adviser who critiques from outside	a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will	~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2	the lab whose members don't overwrite each other	the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time	two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3	overnight RA	one night shift per estimation batch — and the course runs several batches	~10 min to write the check or the objective; the night itself belongs to the machine
D4	an RA bench and the PI who keeps their results comparable	the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for	13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1	reproducibility checker	a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission	~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2	lab manager's standing chores	a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped	~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3	the onboarding the lab never has to repeat	six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over	~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1	the whole lab, orchestrated — the PI who designs the system instead of doing the work	each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits	the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed		0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.

Research Assistants: Notebooks & Subagents

The Pain

Why / When

Mechanics

borough_hourly 10,795 rows

correlations

Agents in notebooks

NotebookEdit + reading the chart it just drew

The subagent model

Claude Code

Codex

Two fleets and one critic

Claude Code

Codex

Guided Run — The RA Pool: one borough per agent

Guided Run — The RA Pool: one borough per thread

Guided Run — The Adviser From Outside: isolation as a method

Guided Run — The Adviser From Outside: isolation as a method

Field Assignment

Pitfalls & Gotchas

Check Your Bearings

Ledger — D1

The Lab Roster

Your position

Positions

Running Totals

The Pain

Why / When

Mechanics

borough_hourly 10,795 rows

correlations

Agents in notebooks

The subagent model

✳ Claude Code

⬡ Codex

Two fleets and one critic

✳ Claude Code

⬡ Codex

Guided Run — The RA Pool: one borough per agent

Guided Run — The RA Pool: one borough per thread

Guided Run — The Adviser From Outside: isolation as a method

Guided Run — The Adviser From Outside: isolation as a method

Pitfalls & Gotchas

Parity note

Claude Code

Codex

Claude Code

Codex