Cheat sheet

B1 beginner ~45 min

The Lab Manual

Absorbs: the lab manual nobody writes

Advances B1

The Pain

The new RA is bright, fast, and completely uninformed. She does not know that data/raw/ is never to be edited, that the borough of a pickup comes from the lookup table and not the shapefile, that the zone ids run two past the polygons, that everything in the warehouse is stored in local time and the raw files are in UTC. She does not know that the February file, the one everyone trusts, quietly carries voided meters and refunds that have to be filtered before a single number is reported. None of this is written down. It lives in your head and in the scar tissue of three afternoons you would rather not relive.

So you explain it. You walk her through the directory, you flag the expensive query that once ran for forty minutes, you say never load a whole Parquet into memory twice because it matters. By Thursday she is useful. The following Monday a different RA arrives, equally bright, equally uninformed, and you explain it again. And then again. The explaining never compounds, because the person you are onboarding forgets everything the moment they leave — not from carelessness, but by construction. They are new every morning. You have hired, without quite noticing, a colleague with no memory of yesterday, and you are paying for that absence in the most expensive currency you own, which is your own repeated attention.

Why / When

An instruction file is the onboarding document you write once and the agent reads at the start of every session: where data lives, what columns mean, which operations are expensive, what must never be done. It is the lab manual nobody writes, because in a human lab the manual lives in people’s heads and walks out the door at five o’clock. With an agent, the manual is a file, and the file does not forget.

This is the cheapest, highest-leverage thing in the entire course, and it is also the least glamorous: prose in a text file. It sits at the very front of the project — before the first transform, before the first plot — and it pays out on every session that follows. The role it absorbs is the institutional knowledge that usually exists only as folklore; the agent stops being a permanently-new RA and starts every session already briefed.

It is not a place for aspirations. “We value clean, well-tested code” is a sentence an agent cannot act on. “Run ruff check before every commit; never load a full Parquet — query through DuckDB” is. An instruction file that reads like a mission statement is worse than no file at all, because it consumes context and teaches nothing.

Contrary winds

Not for: a throwaway script in a fresh directory you will delete by lunch — onboarding earns its keep across sessions, and a one-shot has none.

Mechanics

Both tools read a plain-text manual at session start. The concept is shared; the filenames, scoping rules, and budget differ, and that is the tab split below. After the tabs, one decision table both tools share, and then the demonstration that makes the whole argument.

What goes in the manual

Independent of tool, a good lab manual is a list of checkable facts and prohibitions, ordered by how much damage getting them wrong does:

  • Data dictionary pointers — where the real schema lives (docs/data-dictionary.md), not the schema restated and left to rot.
  • Naming and layout conventionsdata/raw/ is append-only; data/processed/ is disposable; results land in results/.
  • Expensive-operation warningsnever load a full Parquet into memory; query through DuckDB — the one rule whose violation you watched cost forty minutes.
  • Domain landmines — borough comes from taxi_zone_lookup.csv via PULocationID; the shapefile is for maps; zone ids 264–265 exist in the lookup and not the polygons; timestamps are America/New_York.
  • The cleaning contract — counts come from the cleaned panel, not raw files, because raw carries refunds and voided meters.

Every line is something a wrong answer would corrupt. None of it is encouragement.

Two dialects for the same manual

Claude Code

The manual is CLAUDE.md, read at the start of every session. It has two scopes that compose:

  • Project scopeCLAUDE.md at the repo root, committed, so the manual arrives with the clone. This is the lab’s shared knowledge.
  • User scope~/.claude/CLAUDE.md, your personal preferences across every project (your shell, your editor, how terse you like summaries). It is not committed and not the place for project facts.

For rules that should bite only inside a subtree, path-scoped rules live in .claude/rules/: a stricter rule that applies only when the agent is working under data/under this path, treat every file as append-only; propose, never write — sits in a rule file scoped to that directory, rather than as a sentence in the root manual that the agent must remember to apply selectively. Scope the prohibition to where the danger is.

Quick-adds keep the manual honest. When you catch yourself explaining the same fact twice in a session, the # prefix appends it straight to CLAUDE.md mid-conversation, so the manual grows from the moments that actually required it rather than from a doomed up-front attempt to think of everything.

CLAUDE.md — project scope, committed
# weather-mobility — lab manual
## Data
- `data/raw/` is APPEND-ONLY. Never edit a raw file. (Enforced in B3.)
- Counts come from `panel_zone_hour` (cleaned), NOT raw Parquet —
raw carries refunds and voided meters.
- Never load a full Parquet into memory. Query through DuckDB.
- Schema of record: `docs/data-dictionary.md`. Trust it over your memory.
## Joins
- Borough = `taxi_zone_lookup.csv` via `PULocationID`. The shapefile is
for maps, not joins. Zone ids 264–265 exist in the lookup only.
## Time
- Warehouse is `America/New_York`. Raw Parquet timestamps are UTC.

Codex

The manual is AGENTS.md, read at session start, and it is hierarchical by design. Files merge from general to specific, so a fact stated once at the right level reaches everywhere below it:

  • Global~/.codex/AGENTS.md, your cross-project preferences.
  • Repo rootAGENTS.md, committed, the lab’s shared manual.
  • Nested — an AGENTS.md inside data/ carries the stricter rules that apply only there (append-only; propose, never write), so the prohibition lives next to the danger instead of as a clause the agent must remember to apply selectively.
  • OverrideAGENTS.override.md at a level wins over the inherited text at that level, for the rare case where a subtree must contradict its parent rather than extend it.

The forcing function is the 32 KiB budget: the merged manual is capped, and that cap is a feature. It makes verbosity expensive and pushes you toward terse, checkable rules. When the root file grows bloated, the fix is not a bigger budget — it is moving directory-specific detail down into nested files where it is only loaded when the agent is actually working there.

AGENTS.md — repo root, committed
# weather-mobility — lab manual
## Data
- `data/raw/` is APPEND-ONLY. Never edit a raw file. (Enforced in B3.)
- Counts come from `panel_zone_hour` (cleaned), NOT raw Parquet —
raw carries refunds and voided meters.
- Never load a full Parquet into memory. Query through DuckDB.
- Schema of record: `docs/data-dictionary.md`. Trust it over your memory.
## Joins
- Borough = `taxi_zone_lookup.csv` via `PULocationID`. The shapefile is
for maps, not joins. Zone ids 264–265 exist in the lookup only.
## Time
- Warehouse is `America/New_York`. Raw Parquet timestamps are UTC.

(Directory-specific rules for data/ move into data/AGENTS.md — that is the budget working as intended.)

What belongs where

The manual is one of three places a fact can live, and putting a fact in the wrong one is its own failure mode. The full decision table is completed in C1, once skills exist; the part that matters now is the line between the manual and a README.

If the fact is…It belongs in…Because
always-relevant context for the agentthe instruction fileread every session, costs context budget — keep it terse
documentation for a human readerREADME.md / docs/the agent reads it only when pointed at it; length is cheap
a repeatable procedure with stepsa skill (C1)loaded on demand when invoked, not always-on

The instruction file is expensive real estate — it is read in full on every session and counts against context every time. A paragraph of prose explaining the project’s history belongs in the README; the manual points at the README. The test is brutal and simple: if the agent does not need this fact to avoid a mistake on a typical session, it does not go in the manual.

Auto-memory

A lab manual you write is one thing; a notebook the assistant keeps for itself is another. Here the tools genuinely diverge, so this is a spotlight, not a tab.

Claude Code Your tool

Auto-memory — the agent's own notebook

Claude Code maintains a memory that persists across sessions: facts it decides are worth keeping — a path it discovered, a convention it inferred, a correction you made — written to its own store and reloaded next time. It is the closest thing the agent has to learning from yesterday without you writing anything down, and on a long project it quietly accumulates the texture of how you actually work.

That same automaticity is the hazard. The memory grows without your review, and an agent’s inferred fact can be wrong: it decides, from one unlucky session, that “the green-taxi files are unreliable” or that “the 2024-02 zone join needs a manual fix,” and then it carries that conclusion — stated with the same confidence as a fact you taught it — into every session after. Unaudited auto-memory is an RA’s private notes that no one reads: useful right up until a wrong note becomes doctrine.

The discipline is to treat it like an RA’s notebook you do periodically read. Audit the memory the way you would audit a contract suite: open it, prune the conclusions that were true once and are not invariants, and promote the ones that are invariants into CLAUDE.md, where they are committed, visible, and reviewed in pull requests rather than living in a store only the agent sees. Memory is for discovery; the manual is for truth.

Codex has no cross-session auto-memory — the nearest equivalent is a deliberate habit, below.

Nearest equivalent — Codex

Codex does not keep a private cross-session notebook; each session starts from the committed AGENTS.md and nothing the agent inferred on its own. That sounds like a loss and is partly a discipline: there is no unaudited store to accumulate wrong conclusions, because there is no store.

The substitute is the deliberate end-of-session update. When a session surfaces a fact worth keeping — a convention you settled, a landmine you hit — you (or the agent, at your instruction) append it to AGENTS.md before closing out, as an explicit, reviewable edit. What Claude Code does automatically and you must audit after, Codex makes you do by hand before it counts. The cost is a habit you must hold; the benefit is that every remembered fact passed through a human and a diff on its way into the manual. Write “update the manual” into your session-close checklist and the discipline holds.

Watch this space Persistent agent memory is on both roadmaps and converging; recheck quarterly.

The A/B demonstration

Argument is cheap; demonstration is not. Below is one prompt — summarize February 2024 pickups by borough — run twice against the same repo and the same data. On the left, the agent opens with the lab manual loaded. On the right, the manual is absent: a permanently-new RA, started cold. Step through the divergence; the lesson teaches itself.

The A/B demonstration — one prompt, two labs One prompt “Summarize February 2024 pickups by borough.” First divergence ↓
A With the lab manual CLAUDE.md present — the agent opens the session briefed 7 lines
  1. claude ❯ Summarize February 2024 pickups by borough.
  2. Divergence 1: Loaded CLAUDE.md — weather-mobility lab manual: data dictionary in docs/ · never load a full Parquet, query through DuckDB · borough comes from taxi_zone_lookup.csv via PULocationID (the shapefile is for maps, not joins) · counts come from the cleaned panel — raw files carry refunds and voided meters · timezone America/New_York
  3. Divergence 2: The manual routes counting through panel_zone_hour — the cleaned panel already joins pickups to boroughs. One aggregate query.
  4. Divergence 2: Bash(duckdb warehouse.duckdb -c "SELECT borough, sum(pickups) AS pickups FROM panel_zone_hour WHERE strftime(ts_local, '%Y-%m') = '2024-02' GROUP BY 1 ORDER BY 2 DESC")
  5. Divergence 3: Manhattan 2,675,686 · Queens 243,807 · Brooklyn 27,555 · Unknown 9,456 · Bronx 6,884 · N/A 1,316 · EWR 270 · Staten Island 54 (8 rows, 0.4 s)
  6. Divergence 3: February 2024, cleaned panel: 2,965,028 pickups. Manhattan dominates with 2,675,686 (90.2%); Queens 243,807 — that's JFK and LaGuardia; Brooklyn 27,555; Bronx 6,884; EWR 270; Staten Island 54. 'Unknown' (9,456) and 'N/A' (1,316) are the lookup table's own honesty rows — zones 264 and 265, reported rather than dropped.
  7. Divergence 4: One query, six lines of SQL. The Parquet never left disk.
B Without the lab manual no instruction file — a permanently new RA, un-onboarded 16 lines
  1. claude ❯ Summarize February 2024 pickups by borough.
  2. Divergence 1: No CLAUDE.md found — session opens cold: no data dictionary, no conventions, no warnings.
  3. Divergence 2: I'll load the February file into pandas and aggregate by borough.
  4. Divergence 2: Bash(python -c "import pandas as pd; df = pd.read_parquet('data/raw/yellow_2024-02.parquet'); print(len(df), 'rows')")
  5. Divergence 2: 3007526 rows (full file deserialized into memory; ~28 s)
  6. Divergence 2: Loaded 3,007,526 rows — 19 columns, and none of them is a borough. I need a zone table; trying the obvious filename.
  7. Divergence 3: Bash(python -c "... df.merge(pd.read_csv('data/raw/zones.csv'), left_on='PULocationID', right_on='LocationID') ...")
  8. Divergence 3: FileNotFoundError: 'data/raw/zones.csv'
  9. Divergence 3: There's a data/raw/taxi_zones.zip — a shapefile. I'll install geopandas and read the attribute table.
  10. Divergence 3: Bash(pip install geopandas)
  11. Divergence 3: Successfully installed geopandas pyproj shapely fiona ... (a GIS stack, to do one CSV join)
  12. Divergence 3: Bash(python -c "... zones = gpd.read_file('taxi_zones.shp'); df.merge(zones, left_on='PULocationID', right_on='LocationID').groupby('Borough').size() ...")
  13. Divergence 3: KeyError: 'Borough' — column is spelled 'borough' in the shapefile
  14. Divergence 3: Fixed the casing and the join runs — but two flags before I report numbers. First, these are raw rows: no filter for refunds, voided meters, or out-of-month timestamps. Second, the shapefile has 263 polygons and the data references zone ids past that — every pickup in zones 264 and 265 just vanished from the inner join without an error.
  15. Divergence 4: Do you have a canonical zone lookup and a documented cleaning procedure I should be using?
  16. Divergence 4: Four tool calls, one GIS install, the whole Parquet resident in memory — and the run ends on the question an onboarding file answers.
Where they part ways
  1. The briefing. Same session start — one agent opens knowing the project's conventions, the other opens cold. Everything downstream follows from this line.
  2. The first move. Briefed: one aggregate query through DuckDB. Cold: the whole 3,007,526-row Parquet deserialized into memory — the exact thing the manual's one-line rule forbids.
  3. The zone join. The manual names the join (taxi_zone_lookup.csv via PULocationID); without it the agent guesses a filename, installs a GIS stack for a CSV join, trips on column casing, and silently loses zones 264–265 to an inner join.
  4. The close. Side A files cleaned, citable counts that match the answer key. Side B ends with raw, unfiltered rows and a question — honest, but it is the question CLAUDE.md exists to pre-answer.

The without-manual run is not incompetent — it is uninformed. It deserializes the entire 3,007,526-row Parquet into memory (the one rule the manual states first), guesses a filename for the zone table, installs a GIS stack to do a CSV join, trips on column casing, and silently drops zones 264 and 265 to an inner join — then, to its credit, ends by asking the exact question the manual exists to pre-answer. Every one of those detours is a fact that lived in your head and not in a file. The left-hand run is what your head looks like, written down.

(The curriculum’s draft phrased this as a January summary; the project’s pinned study window is February, March, and June 2024, and the answer key carries only those months — so the demonstration summarizes February 2024, and every number on the left is the cleaned panel’s 2024-02 column.)

Field Assignment

Artifact CLAUDE.md and AGENTS.md committed; the A/B diff logged

Write the lab manual once, in both dialects, from a single set of source notes — then run the demonstration and watch the difference you just wrote into existence.

  1. Start from the project’s source notes: the data dictionary, the directory contract from B2, and the three landmines (the zone join, the timezone split, raw-versus-cleaned counts).
  2. Write the manual in both dialects from those same notes — every line a checkable fact or prohibition, nothing aspirational. Keep it terse; if it would not prevent a mistake on a typical session, cut it.

Claude Code

Author CLAUDE.md at the repo root and commit it. Move the data/-specific append-only rule into .claude/rules/ scoped to that path, so the strict rule lives where the danger is. Confirm the manual loads by opening a fresh session — the briefing line names the manual.

Codex

Author AGENTS.md at the repo root and commit it. Move the data/-specific append-only rule into a nested data/AGENTS.md, keeping the root file under the 32 KiB budget. Confirm the merged manual loads by opening a fresh session in the repo.

  1. Run the A/B demo: the same borough-summary prompt with the manual present, then with it renamed away. Diff the two transcripts and log the divergence in journal/b1-ab.md — the moment the manual paid for itself.

The committed manual is the substrate every later lesson assumes: B3 fences what it names, C2 enforces it, and every session from here opens already briefed.

Milestone gate · make check-b1advances B1
  1. If a line would not prevent a mistake on a typical session, cut it. The manual is expensive real estate, read in full every session.

  2. Same facts, two surfaces — the lab manual does not change because the tool did.

  3. The cap is a forcing function. A manual that no longer fits stopped being terse.

  4. The without-manual run loads the full Parquet and fumbles the zone join — the lesson teaches itself.

Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.

Pitfalls & Gotchas

  • [both]

    Aspirations instead of checkable rules. “We value clean code” is a sentence an agent cannot act on; it consumes context and changes no behavior. Every line of the manual should be a fact a wrong answer would corrupt or a prohibition with a clear test. If you cannot write the check, it is not a rule — it is a mood.

  • [both] 〜〜

    Stale data dictionaries are worse than none. The agent trusts the manual completely, so a dictionary that says airport_fee when the file now says Airport_fee does not merely fail to help — it actively misleads, and the agent will defend the wrong answer with your own document. Point at the schema of record and keep the pointer fresh; do not restate a schema you will forget to update.

  • [CX]

    Blowing the 32 KiB budget with prose. The cap is a forcing function, not an obstacle: when the root manual bloats, the fix is to push directory-specific detail into nested files, not to mourn the limit. A manual that no longer fits is a manual that stopped being terse.

  • [CC]

    Unaudited auto-memory accumulates wrong conclusions. The agent’s private notebook records inferences with the same confidence as facts, and a conclusion that was true for one session becomes doctrine for fifty. Read it like an RA’s notes: prune what was situational, promote what is invariant into the committed manual.

Check Your Bearings

B1 · 4 questions · unlimited retries, no timer
  1. Question 1Choose one

    Two candidate lines for the project's instruction file. Which one earns its place, and why?

  2. Question 2Match the dialects

    A fact can live in three places. Match each fact to where it belongs.

    data/raw/ is append-only — never edit a raw file
    The narrative of how the project's sampling frame was chosen
    The full multi-step trip-cleaning procedure
  3. Question 3What happens next?dialect check — Claude Code

    Auditing the agent's auto-memory, you find an entry it wrote for itself three weeks ago. What should you do with it?

    auto-memory (session 2024-06-03):
      "The 2024-02 zone join needs a manual casing fix every time."
  4. Question 4Read the configdialect check — Codex

    The repo-root instruction file has grown past the 32 KiB budget, bloated with rules that only matter inside data/. What is the right fix?

    AGENTS.md          (repo root) — 41 KiB, over budget
      ## Data
      ## data/raw/ append-only rules (long)
      ## data/processed/ rebuild rules (long)
      ## Joins
      ## Time

Field journal

as of June 2026

Parity note

Instruction files are a genuine parity feature: both tools read a plain-text manual at session start, differing in surface — CLAUDE.md with user/project scopes and .claude/rules/ path-scoping on one side, the hierarchical AGENTS.md with AGENTS.override.md and a 32 KiB budget on the other. The real asymmetry is auto-memory: Claude Code keeps a private cross-session notebook the agent writes for itself (powerful and in need of auditing), while Codex has no equivalent and substitutes the discipline of a deliberate end-of-session manual update. One tool makes you audit after; the other makes you commit before.

Ledger — B1

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

Your position

Lesson A1Lesson A2Lesson B1Lesson B2Lesson B3Lesson C1Lesson C2Lesson C3Lesson D1Lesson D2Lesson D3Lesson D4Lesson E1Lesson E2Lesson E3Lesson F1abcdef

Positions

  • the data manager

    Position vacant — engaged at C2

    write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

    est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter

  • the methodologist

    Position vacant — engaged at C1

    the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

    est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked

  • the data engineer

    Position vacant — engaged at C3

    MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

    est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication

  • the RA pool

    Position vacant — engaged at D1

    parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

    est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes

  • the overnight RA

    Position vacant — engaged at D3

    /loop supervision + Goal Mode runs over background estimation

    est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine

  • the adviser

    Position vacant — engaged at D1

    parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

    est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes

  • the referee

    Position vacant — engaged at D4

    contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

    est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass

  • the lab manager

    Position vacant — engaged at E2

    scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

    est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate

  • the reproducibility checker

    Position vacant — engaged at E1

    headless invocation + the fresh-clone replication self-test + CI gates

    est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter

  • the the wall — the unstaffed midnight hours between a raw file and a first plot

    Position vacant — engaged at A1

    the bare agent loop (prompt → act → observe → fix), zero configuration

    est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free

  • the you, working an order of magnitude faster — but only if you direct the work

    Position vacant — engaged at A2

    the command surface + five prompting patterns + context hygiene

    est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts

  • the the lab manual nobody writes — the institutional knowledge that lives in your head

    Position vacant — engaged at B1

    instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

    est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter

  • the careful senior who plans before touching data

    Position vacant — engaged at B2

    repo scaffold + pinned environments + read-only Plan mode reconnaissance

    est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention

  • the the lab whose members don't overwrite each other

    Position vacant — engaged at D2

    git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

    est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end

  • the the onboarding the lab never has to repeat

    Position vacant — engaged at E3

    lab-kit — the whole methodology packaged as a one-command install

    est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt

  • the the whole lab, orchestrated — the PI who designs the system instead of doing the work

    Position vacant — engaged at F1

    the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

    est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson Role Est. human-RA Agent (yours when measured)
A1 the wall — the unstaffed midnight hours between a raw file and a first plot an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work ~10 minutes for the quick win, plus the same task re-run in the other language for free
A2 you, working an order of magnitude faster — but only if you direct the work the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1 the lab manual nobody writes — the institutional knowledge that lives in your head ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down written once in an hour; reloaded free at the start of every session thereafter
B2 careful senior who plans before touching data ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots an afternoon — most of it download wall-clock, not attention
B3 the data manager who guards the raw files — the person who says no near the master copies permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads two profiles configured once in minutes; the fence then holds every session, tired or not
C1 the methodologist — the one person who knows how the lab actually decides the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2 data manager / QA who never sleeps permanent vigilance — est. 2 weeks/year of load-checking and release-note reading half a day to install and test the 9-line block; ~20 s per run thereafter
C3 the data engineer who wires the lab to its systems days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1 the RA pool — and the adviser who critiques from outside a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2 the lab whose members don't overwrite each other the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3 overnight RA one night shift per estimation batch — and the course runs several batches ~10 min to write the check or the objective; the night itself belongs to the machine
D4 an RA bench and the PI who keeps their results comparable the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1 reproducibility checker a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2 lab manager's standing chores a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3 the onboarding the lab never has to repeat six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1 the whole lab, orchestrated — the PI who designs the system instead of doing the work each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed 0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.