B1 beginner ~45 min

The Lab Manual

Absorbs: the lab manual nobody writes

Advances B1

The Pain

The new RA is bright, fast, and completely uninformed. She does not know that data/raw/ is never to be edited, that the borough of a pickup comes from the lookup table and not the shapefile, that the zone ids run two past the polygons, that everything in the warehouse is stored in local time and the raw files are in UTC. She does not know that the February file, the one everyone trusts, quietly carries voided meters and refunds that have to be filtered before a single number is reported. None of this is written down. It lives in your head and in the scar tissue of three afternoons you would rather not relive.

So you explain it. You walk her through the directory, you flag the expensive query that once ran for forty minutes, you say never load a whole Parquet into memory twice because it matters. By Thursday she is useful. The following Monday a different RA arrives, equally bright, equally uninformed, and you explain it again. And then again. The explaining never compounds, because the person you are onboarding forgets everything the moment they leave — not from carelessness, but by construction. They are new every morning. You have hired, without quite noticing, a colleague with no memory of yesterday, and you are paying for that absence in the most expensive currency you own, which is your own repeated attention.

Why / When

An instruction file is the onboarding document you write once and the agent reads at the start of every session: where data lives, what columns mean, which operations are expensive, what must never be done. It is the lab manual nobody writes, because in a human lab the manual lives in people’s heads and walks out the door at five o’clock. With an agent, the manual is a file, and the file does not forget.

This is the cheapest, highest-leverage thing in the entire course, and it is also the least glamorous: prose in a text file. It sits at the very front of the project — before the first transform, before the first plot — and it pays out on every session that follows. The role it absorbs is the institutional knowledge that usually exists only as folklore; the agent stops being a permanently-new RA and starts every session already briefed.

It is not a place for aspirations. “We value clean, well-tested code” is a sentence an agent cannot act on. “Run ruff check before every commit; never load a full Parquet — query through DuckDB” is. An instruction file that reads like a mission statement is worse than no file at all, because it consumes context and teaches nothing.

Contrary winds

Not for: a throwaway script in a fresh directory you will delete by lunch — onboarding earns its keep across sessions, and a one-shot has none.

Mechanics

Both tools read a plain-text manual at session start. The concept is shared; the filenames, scoping rules, and budget differ, and that is the tab split below. After the tabs, one decision table both tools share, and then the demonstration that makes the whole argument.

What goes in the manual

Independent of tool, a good lab manual is a list of checkable facts and prohibitions, ordered by how much damage getting them wrong does:

Data dictionary pointers — where the real schema lives (docs/data-dictionary.md), not the schema restated and left to rot.
Naming and layout conventions — data/raw/ is append-only; data/processed/ is disposable; results land in results/.
Expensive-operation warnings — never load a full Parquet into memory; query through DuckDB — the one rule whose violation you watched cost forty minutes.
Domain landmines — borough comes from taxi_zone_lookup.csv via PULocationID; the shapefile is for maps; zone ids 264–265 exist in the lookup and not the polygons; timestamps are America/New_York.
The cleaning contract — counts come from the cleaned panel, not raw files, because raw carries refunds and voided meters.

Every line is something a wrong answer would corrupt. None of it is encouragement.

Two dialects for the same manual

Claude Code

The manual is CLAUDE.md, read at the start of every session. It has two scopes that compose:

Project scope — CLAUDE.md at the repo root, committed, so the manual arrives with the clone. This is the lab’s shared knowledge.
User scope — ~/.claude/CLAUDE.md, your personal preferences across every project (your shell, your editor, how terse you like summaries). It is not committed and not the place for project facts.

For rules that should bite only inside a subtree, path-scoped rules live in .claude/rules/: a stricter rule that applies only when the agent is working under data/ — under this path, treat every file as append-only; propose, never write — sits in a rule file scoped to that directory, rather than as a sentence in the root manual that the agent must remember to apply selectively. Scope the prohibition to where the danger is.

Quick-adds keep the manual honest. When you catch yourself explaining the same fact twice in a session, the # prefix appends it straight to CLAUDE.md mid-conversation, so the manual grows from the moments that actually required it rather than from a doomed up-front attempt to think of everything.

# weather-mobility — lab manual

## Data
- `data/raw/` is APPEND-ONLY. Never edit a raw file. (Enforced in B3.)
- Counts come from `panel_zone_hour` (cleaned), NOT raw Parquet —
  raw carries refunds and voided meters.
- Never load a full Parquet into memory. Query through DuckDB.
- Schema of record: `docs/data-dictionary.md`. Trust it over your memory.

## Joins
- Borough = `taxi_zone_lookup.csv` via `PULocationID`. The shapefile is
  for maps, not joins. Zone ids 264–265 exist in the lookup only.

## Time
- Warehouse is `America/New_York`. Raw Parquet timestamps are UTC.

Codex

The manual is AGENTS.md, read at session start, and it is hierarchical by design. Files merge from general to specific, so a fact stated once at the right level reaches everywhere below it:

Global — ~/.codex/AGENTS.md, your cross-project preferences.
Repo root — AGENTS.md, committed, the lab’s shared manual.
Nested — an AGENTS.md inside data/ carries the stricter rules that apply only there (append-only; propose, never write), so the prohibition lives next to the danger instead of as a clause the agent must remember to apply selectively.
Override — AGENTS.override.md at a level wins over the inherited text at that level, for the rare case where a subtree must contradict its parent rather than extend it.

The forcing function is the 32 KiB budget: the merged manual is capped, and that cap is a feature. It makes verbosity expensive and pushes you toward terse, checkable rules. When the root file grows bloated, the fix is not a bigger budget — it is moving directory-specific detail down into nested files where it is only loaded when the agent is actually working there.

# weather-mobility — lab manual

## Data
- `data/raw/` is APPEND-ONLY. Never edit a raw file. (Enforced in B3.)
- Counts come from `panel_zone_hour` (cleaned), NOT raw Parquet —
  raw carries refunds and voided meters.
- Never load a full Parquet into memory. Query through DuckDB.
- Schema of record: `docs/data-dictionary.md`. Trust it over your memory.

## Joins
- Borough = `taxi_zone_lookup.csv` via `PULocationID`. The shapefile is
  for maps, not joins. Zone ids 264–265 exist in the lookup only.

## Time
- Warehouse is `America/New_York`. Raw Parquet timestamps are UTC.

(Directory-specific rules for data/ move into data/AGENTS.md — that is the budget working as intended.)

What belongs where

The manual is one of three places a fact can live, and putting a fact in the wrong one is its own failure mode. The full decision table is completed in C1, once skills exist; the part that matters now is the line between the manual and a README.

If the fact is…	It belongs in…	Because
always-relevant context for the agent	the instruction file	read every session, costs context budget — keep it terse
documentation for a human reader	`README.md` / `docs/`	the agent reads it only when pointed at it; length is cheap
a repeatable procedure with steps	a skill (C1)	loaded on demand when invoked, not always-on

The instruction file is expensive real estate — it is read in full on every session and counts against context every time. A paragraph of prose explaining the project’s history belongs in the README; the manual points at the README. The test is brutal and simple: if the agent does not need this fact to avoid a mistake on a typical session, it does not go in the manual.

Auto-memory

A lab manual you write is one thing; a notebook the assistant keeps for itself is another. Here the tools genuinely diverge, so this is a spotlight, not a tab.

Claude Code Your tool

Auto-memory — the agent's own notebook

Claude Code maintains a memory that persists across sessions: facts it decides are worth keeping — a path it discovered, a convention it inferred, a correction you made — written to its own store and reloaded next time. It is the closest thing the agent has to learning from yesterday without you writing anything down, and on a long project it quietly accumulates the texture of how you actually work.

That same automaticity is the hazard. The memory grows without your review, and an agent’s inferred fact can be wrong: it decides, from one unlucky session, that “the green-taxi files are unreliable” or that “the 2024-02 zone join needs a manual fix,” and then it carries that conclusion — stated with the same confidence as a fact you taught it — into every session after. Unaudited auto-memory is an RA’s private notes that no one reads: useful right up until a wrong note becomes doctrine.

The discipline is to treat it like an RA’s notebook you do periodically read. Audit the memory the way you would audit a contract suite: open it, prune the conclusions that were true once and are not invariants, and promote the ones that are invariants into CLAUDE.md, where they are committed, visible, and reviewed in pull requests rather than living in a store only the agent sees. Memory is for discovery; the manual is for truth.

Nearest equivalent — Codex

Codex does not keep a private cross-session notebook; each session starts from the committed AGENTS.md and nothing the agent inferred on its own. That sounds like a loss and is partly a discipline: there is no unaudited store to accumulate wrong conclusions, because there is no store.

The substitute is the deliberate end-of-session update. When a session surfaces a fact worth keeping — a convention you settled, a landmine you hit — you (or the agent, at your instruction) append it to AGENTS.md before closing out, as an explicit, reviewable edit. What Claude Code does automatically and you must audit after, Codex makes you do by hand before it counts. The cost is a habit you must hold; the benefit is that every remembered fact passed through a human and a diff on its way into the manual. Write “update the manual” into your session-close checklist and the discipline holds.

Watch this space as of 2026-06 Persistent agent memory is on both roadmaps and converging; recheck quarterly.

The A/B demonstration

Argument is cheap; demonstration is not. Below is one prompt — summarize February 2024 pickups by borough — run twice against the same repo and the same data. On the left, the agent opens with the lab manual loaded. On the right, the manual is absent: a permanently-new RA, started cold. Step through the divergence; the lesson teaches itself.

The A/B demonstration — one prompt, two labs One prompt “Summarize February 2024 pickups by borough.” First divergence ↓

A With the lab manual CLAUDE.md present — the agent opens the session briefed 7 lines

claude ❯ Summarize February 2024 pickups by borough.
Divergence 1: Loaded CLAUDE.md — weather-mobility lab manual: data dictionary in docs/ · never load a full Parquet, query through DuckDB · borough comes from taxi_zone_lookup.csv via PULocationID (the shapefile is for maps, not joins) · counts come from the cleaned panel — raw files carry refunds and voided meters · timezone America/New_York
Divergence 2: The manual routes counting through panel_zone_hour — the cleaned panel already joins pickups to boroughs. One aggregate query.
Divergence 2: Bash(duckdb warehouse.duckdb -c "SELECT borough, sum(pickups) AS pickups FROM panel_zone_hour WHERE strftime(ts_local, '%Y-%m') = '2024-02' GROUP BY 1 ORDER BY 2 DESC")
Divergence 3: Manhattan 2,675,686 · Queens 243,807 · Brooklyn 27,555 · Unknown 9,456 · Bronx 6,884 · N/A 1,316 · EWR 270 · Staten Island 54 (8 rows, 0.4 s)
Divergence 3: February 2024, cleaned panel: 2,965,028 pickups. Manhattan dominates with 2,675,686 (90.2%); Queens 243,807 — that's JFK and LaGuardia; Brooklyn 27,555; Bronx 6,884; EWR 270; Staten Island 54. 'Unknown' (9,456) and 'N/A' (1,316) are the lookup table's own honesty rows — zones 264 and 265, reported rather than dropped.
Divergence 4: One query, six lines of SQL. The Parquet never left disk.

B Without the lab manual no instruction file — a permanently new RA, un-onboarded 16 lines

claude ❯ Summarize February 2024 pickups by borough.
Divergence 1: No CLAUDE.md found — session opens cold: no data dictionary, no conventions, no warnings.
Divergence 2: I'll load the February file into pandas and aggregate by borough.
Divergence 2: Bash(python -c "import pandas as pd; df = pd.read_parquet('data/raw/yellow_2024-02.parquet'); print(len(df), 'rows')")
Divergence 2: 3007526 rows (full file deserialized into memory; ~28 s)
Divergence 2: Loaded 3,007,526 rows — 19 columns, and none of them is a borough. I need a zone table; trying the obvious filename.
Divergence 3: Bash(python -c "... df.merge(pd.read_csv('data/raw/zones.csv'), left_on='PULocationID', right_on='LocationID') ...")
Divergence 3: FileNotFoundError: 'data/raw/zones.csv'
Divergence 3: There's a data/raw/taxi_zones.zip — a shapefile. I'll install geopandas and read the attribute table.
Divergence 3: Bash(pip install geopandas)
Divergence 3: Successfully installed geopandas pyproj shapely fiona ... (a GIS stack, to do one CSV join)
Divergence 3: Bash(python -c "... zones = gpd.read_file('taxi_zones.shp'); df.merge(zones, left_on='PULocationID', right_on='LocationID').groupby('Borough').size() ...")
Divergence 3: KeyError: 'Borough' — column is spelled 'borough' in the shapefile
Divergence 3: Fixed the casing and the join runs — but two flags before I report numbers. First, these are raw rows: no filter for refunds, voided meters, or out-of-month timestamps. Second, the shapefile has 263 polygons and the data references zone ids past that — every pickup in zones 264 and 265 just vanished from the inner join without an error.
Divergence 4: Do you have a canonical zone lookup and a documented cleaning procedure I should be using?
Divergence 4: Four tool calls, one GIS install, the whole Parquet resident in memory — and the run ends on the question an onboarding file answers.

The without-manual run is not incompetent — it is uninformed. It deserializes the entire 3,007,526-row Parquet into memory (the one rule the manual states first), guesses a filename for the zone table, installs a GIS stack to do a CSV join, trips on column casing, and silently drops zones 264 and 265 to an inner join — then, to its credit, ends by asking the exact question the manual exists to pre-answer. Every one of those detours is a fact that lived in your head and not in a file. The left-hand run is what your head looks like, written down.

(The curriculum’s draft phrased this as a January summary; the project’s pinned study window is February, March, and June 2024, and the answer key carries only those months — so the demonstration summarizes February 2024, and every number on the left is the cleaned panel’s 2024-02 column.)

Field Assignment

Artifact CLAUDE.md and AGENTS.md committed; the A/B diff logged

Write the lab manual once, in both dialects, from a single set of source notes — then run the demonstration and watch the difference you just wrote into existence.

Start from the project’s source notes: the data dictionary, the directory contract from B2, and the three landmines (the zone join, the timezone split, raw-versus-cleaned counts).
Write the manual in both dialects from those same notes — every line a checkable fact or prohibition, nothing aspirational. Keep it terse; if it would not prevent a mistake on a typical session, cut it.

Claude Code

Author CLAUDE.md at the repo root and commit it. Move the data/-specific append-only rule into .claude/rules/ scoped to that path, so the strict rule lives where the danger is. Confirm the manual loads by opening a fresh session — the briefing line names the manual.

Codex

Author AGENTS.md at the repo root and commit it. Move the data/-specific append-only rule into a nested data/AGENTS.md, keeping the root file under the 32 KiB budget. Confirm the merged manual loads by opening a fresh session in the repo.

Run the A/B demo: the same borough-summary prompt with the manual present, then with it renamed away. Diff the two transcripts and log the divergence in journal/b1-ab.md — the moment the manual paid for itself.

The committed manual is the substrate every later lesson assumes: B3 fences what it names, C2 enforces it, and every session from here opens already briefed.

Milestone gate · make check-b1advances B1

Instruction file written at the repo root and committed — every line a checkable fact or prohibition, nothing aspirational
If a line would not prevent a mistake on a typical session, cut it. The manual is expensive real estate, read in full every session.
Both dialects authored from the same source notes (CLAUDE.md and AGENTS.md)
Same facts, two surfaces — the lab manual does not change because the tool did.
The data/-specific append-only rule scoped to where the danger is (path-scoped rule / nested file), not as a clause in the root manual
AGENTS.md kept under the 32 KiB budget by pushing directory-specific detail into nested files
The cap is a forcing function. A manual that no longer fits stopped being terse.
A/B demo run and the divergence logged in journal/b1-ab.md — the same borough-summary prompt with and without the manual
The without-manual run loads the full Parquet and fumbles the zone join — the lesson teaches itself.

Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.

Pitfalls & Gotchas

[both]

Aspirations instead of checkable rules. “We value clean code” is a sentence an agent cannot act on; it consumes context and changes no behavior. Every line of the manual should be a fact a wrong answer would corrupt or a prohibition with a clear test. If you cannot write the check, it is not a rule — it is a mood.
[both] 〜〜

Stale data dictionaries are worse than none. The agent trusts the manual completely, so a dictionary that says airport_fee when the file now says Airport_fee does not merely fail to help — it actively misleads, and the agent will defend the wrong answer with your own document. Point at the schema of record and keep the pointer fresh; do not restate a schema you will forget to update.
[CX]

Blowing the 32 KiB budget with prose. The cap is a forcing function, not an obstacle: when the root manual bloats, the fix is to push directory-specific detail into nested files, not to mourn the limit. A manual that no longer fits is a manual that stopped being terse.
[CC]

Unaudited auto-memory accumulates wrong conclusions. The agent’s private notebook records inferences with the same confidence as facts, and a conclusion that was true for one session becomes doctrine for fifty. Read it like an RA’s notes: prune what was situational, promote what is invariant into the committed manual.

Check Your Bearings

B1 · 4 questions · unlimited retries, no timer

Question 1Choose one
Two candidate lines for the project's instruction file. Which one earns its place, and why?
"We value clean, well-tested code."
"Never load a full Parquet into memory — query through DuckDB."
A three-paragraph history of why the project chose DuckDB
Question 2Match the dialects
A fact can live in three places. Match each fact to where it belongs.
data/raw/ is append-only — never edit a raw file
The narrative of how the project's sampling frame was chosen
The full multi-step trip-cleaning procedure
Question 3What happens next?dialect check — Claude Code
Auditing the agent's auto-memory, you find an entry it wrote for itself three weeks ago. What should you do with it?
```
auto-memory (session 2024-06-03):
  "The 2024-02 zone join needs a manual casing fix every time."
```
Leave it — the agent wrote it, so it must be true
If it is a real invariant, promote it into the committed instruction file; if it was situational, prune it
Delete the entire auto-memory store to be safe
Question 4Read the configdialect check — Codex
The repo-root instruction file has grown past the 32 KiB budget, bloated with rules that only matter inside data/. What is the right fix?
```
AGENTS.md          (repo root) — 41 KiB, over budget
  ## Data
  ## data/raw/ append-only rules (long)
  ## data/processed/ rebuild rules (long)
  ## Joins
  ## Time
```
Move the data/-specific rules into a nested data/AGENTS.md, leaving the root terse
Request a larger budget so the whole manual fits at the root
Delete the data/ rules entirely — they are too long to keep

The interactive check needs JavaScript — without it this section shows only the quiz cover. The lesson text above is complete without the quiz; answers and journal recording require JavaScript.

Field journal

Record the one rule in your manual whose absence the A/B demo most visibly punished — the fact the un-onboarded run had to rediscover the expensive way.

as of June 2026

Instruction files are a genuine parity feature: both tools read a plain-text manual at session start, differing in surface — CLAUDE.md with user/project scopes and .claude/rules/ path-scoping on one side, the hierarchical AGENTS.md with AGENTS.override.md and a 32 KiB budget on the other. The real asymmetry is auto-memory: Claude Code keeps a private cross-session notebook the agent writes for itself (powerful and in need of auditing), while Codex has no equivalent and substitutes the discipline of a deliberate end-of-session manual update. One tool makes you audit after; the other makes you commit before.

Feature-parity matrix

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

Your position

Positions

the data manager

Position vacant — engaged at C2

write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter
the methodologist

Position vacant — engaged at C1

the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
the data engineer

Position vacant — engaged at C3

MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
the RA pool

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the overnight RA

Position vacant — engaged at D3

/loop supervision + Goal Mode runs over background estimation

est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine
the adviser

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the referee

Position vacant — engaged at D4

contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
the lab manager

Position vacant — engaged at E2

scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
the reproducibility checker

Position vacant — engaged at E1

headless invocation + the fresh-clone replication self-test + CI gates

est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
the the wall — the unstaffed midnight hours between a raw file and a first plot

Position vacant — engaged at A1

the bare agent loop (prompt → act → observe → fix), zero configuration

est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free
the you, working an order of magnitude faster — but only if you direct the work

Position vacant — engaged at A2

the command surface + five prompting patterns + context hygiene

est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
the the lab manual nobody writes — the institutional knowledge that lives in your head

Position vacant — engaged at B1

instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter
the careful senior who plans before touching data

Position vacant — engaged at B2

repo scaffold + pinned environments + read-only Plan mode reconnaissance

est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention
the the lab whose members don't overwrite each other

Position vacant — engaged at D2

git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
the the onboarding the lab never has to repeat

Position vacant — engaged at E3

lab-kit — the whole methodology packaged as a one-command install

est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
the the whole lab, orchestrated — the PI who designs the system instead of doing the work

Position vacant — engaged at F1

the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson	Role	Est. human-RA	Agent (yours when measured)
A1	the wall — the unstaffed midnight hours between a raw file and a first plot	an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work	~10 minutes for the quick win, plus the same task re-run in the other language for free
A2	you, working an order of magnitude faster — but only if you direct the work	the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong	~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1	the lab manual nobody writes — the institutional knowledge that lives in your head	~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down	written once in an hour; reloaded free at the start of every session thereafter
B2	careful senior who plans before touching data	~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots	an afternoon — most of it download wall-clock, not attention
B3	the data manager who guards the raw files — the person who says no near the master copies	permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads	two profiles configured once in minutes; the fence then holds every session, tired or not
C1	the methodologist — the one person who knows how the lab actually decides	the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do	an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2	data manager / QA who never sleeps	permanent vigilance — est. 2 weeks/year of load-checking and release-note reading	half a day to install and test the 9-line block; ~20 s per run thereafter
C3	the data engineer who wires the lab to its systems	days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes	register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1	the RA pool — and the adviser who critiques from outside	a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will	~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2	the lab whose members don't overwrite each other	the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time	two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3	overnight RA	one night shift per estimation batch — and the course runs several batches	~10 min to write the check or the objective; the night itself belongs to the machine
D4	an RA bench and the PI who keeps their results comparable	the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for	13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1	reproducibility checker	a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission	~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2	lab manager's standing chores	a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped	~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3	the onboarding the lab never has to repeat	six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over	~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1	the whole lab, orchestrated — the PI who designs the system instead of doing the work	each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits	the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed		0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.

The Pain

Why / When

Mechanics

What goes in the manual

Two dialects for the same manual

✳ Claude Code

⬡ Codex

What belongs where

Auto-memory

The A/B demonstration

✳ Claude Code

⬡ Codex

Pitfalls & Gotchas

Parity note

Claude Code

Codex

Claude Code

Codex