Cheat sheet

A1 beginner ~20 min

Day One: Your First Agent

Advances A1

The Pain

The file came down at midnight, the way these things do, and by one in the morning you had it open and you understood the shape of the next three days. Three and a half million rows of yellow-cab trips, and somewhere in the column called fare there was a charge of negative eight hundred dollars. A trip that covered a hundred and seventy-six thousand miles in thirteen minutes. A pickup timestamped to the last day of 2002, nineteen years before the meter that recorded it existed. Whole rows where the passenger count was blank — not zero, blank — and you had no way yet to know whether that was four hundred thousand rows or forty.

You are the whole lab. You are the methodologist who will eventually estimate how weather moves the demand for rides, and you are also the person who has to sit here at one in the morning deciding, by hand, what counts as a real trip. Nobody hands a graduate student a clean dataset. The cleaning is the dissertation’s foundation, and it is unglamorous, and it is yours, and it is the wall every empirical project hits first — the days that vanish before the first honest plot, the work no methods section ever describes because it is assumed and never done. You make coffee. You start writing the same defensive parsing code you have written for three previous projects, knowing that by the time it works you will have forgotten why each rule is there. The wall is not the analysis. The wall is everything between you and the analysis, and tonight, as usual, you are facing it alone.

Why / When

An agentic command-line tool is a language model given four things a chat window withholds: a filesystem it can read and write, a shell it can run commands in, persistence so it remembers the project across a session, and a set of tools it chooses among on its own. That combination is a difference in kind, not degree. A chat window can advise you about data it has never touched. An agent reads row 4,217, runs the cleaning script, sees the traceback, and fixes it — the loop this whole course is built on: prompt → act → observe → fix.

Two tools teach this course, and they share that loop while differing in temperament. One is built around a local session and small composable pieces you assemble yourself; the other leans toward delegating whole tasks, with first-class cloud runs. The concepts transfer; only the dialect changes — which is exactly why you learn both. And you learn the honest limits up front, because they are real: agents misread schemas, invent joins that look right, and will happily optimize a metric into meaninglessness if you let them. Unit C answers the first with enforcement; Unit D answers the last with adversarial review. Today is none of that. Today you watch the bare loop work, so you know what you are later making trustworthy.

In the research pipeline this is the very first stage — data cleaning and first contact — and the lab role it absorbs is no single person. It is the wall itself: the unstaffed midnight hours between a raw file and a defensible first plot.

Contrary winds

Not for: a number you can get from one line of SQL you already know — opening an agent to compute a single mean is ceremony, not leverage.

Mechanics

Today is deliberately configuration-free. You install one of the two tools, authenticate, and point it at a mangled file. No instruction files, no settings, no skills — those arrive across Units B and C. The bare loop first.

What these tools are

Both tools are the same animal: a model with hands. You type a request in plain language; the model plans, calls a tool (read a file, run a command, write a patch), reads what came back, and decides what to do next — looping until the work is done or it needs you. Two controls matter on day one and are common to both tools, under different names:

  • An approval prompt / approval mode — the agent pauses before a consequential action (writing a file, installing a package, running a destructive command) and waits for your y/n. This is your hand on the tiller. Deny anything you do not understand.
  • A model and reasoning-effort setting — which model drives, and how hard it thinks. The default is fine today; A2 makes this a daily habit.

Before you run anything, watch one full turn of the loop in slow motion. This is the “what just happened” view of everything you are about to see scroll past — the sentence becoming a tool call becoming an observation becoming the next decision.

System Player film — One Agent Turn
One agent turn: a user prompt reaches the agent; the agent calls a tool, reads the observation, and loops — tool call after tool call — until the turn ends in an artifact. REPEAT WHEN DONE USER PROMPT THE AGENT TOOL CALL OBSERVATION ARTIFACT
step 1/6

Step 1 of 6.

It starts as a sentence, not a script. You describe the artifact you want — clean the February file and plot the fare distribution — and press enter. That sentence is the whole program.

That single turn, repeated until the work is done, is the entire mechanism. Everything else in this course makes that turn safer, cheaper, or more trustworthy.

Install and authenticate

Pick the tool you will follow the course in — you can install the other later; the concepts are identical. Install, authenticate, and read the opening banner, because the banner tells you the two things that matter: that you are running unconfigured, and what the agent may do without asking.

Claude Code

install + first launch — Claude Code
npm install -g @anthropic-ai/claude-code
cd scratch/day-one # a throwaway folder holding only data/messy.csv
claude # opens a session; first run walks you through login

The first launch sends you to the browser to authenticate, then drops you at a prompt inside the current directory. The banner notes there is no CLAUDE.md here — no project instructions — so the agent is running on defaults. That is the point of day one. When the agent later wants to write a file, you will see an approval prompt like Apply edit to clean_taxi.py? (y/n); that pause is where you stay in control. claude --resume brings a past session back; you will not need it today.

Codex

install + first launch — Codex
npm install -g @openai/codex
cd scratch/day-one # a throwaway folder holding only data/messy.csv
codex # opens a session; first run walks you through sign-in

The first launch authenticates you (browser sign-in or an API key), then opens a session in the current directory. The banner reports its sandbox and approval mode — typically sandbox: workspace-write · approvals: on-request. Read that pair literally: the sandbox bounds where the agent may act (this directory tree, not your whole machine), and the approval mode sets when it consults you (on consequential actions, like writing clean_taxi.py). Two independent dials; B3 turns both into real safety profiles. codex exec resume continues a run headlessly; not needed today.

The dialect differs — login flow, banner wording, the resume command — but the session you are now sitting in front of is the same loop in both.

The quick win

Here is the file the agent is about to meet. These are not invented horrors; every row is verbatim from the raw 2024 yellow-cab files — the wall from the Pain vignette, made concrete.

What the meter actually wrote down
Seven verbatim rows from the raw 2024-02/2024-03 yellow files: a 2002 timestamp, a −$800 fare, a 0-second trip, a 176,836-mile odometer reading, a NULL passenger count, and a trip spanning the nonexistent 02:00 DST hour.
the numbers behind this figure

data window 2024-02, 2024-03, 2024-06 (yellow taxi; local time America/New_York)

generated by figures-pipeline/src/figures.py · a1-mangled-before

ok 1 row

SELECT tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count, trip_distance, fare_amount, total_amount FROM trips_raw WHERE file_month='2024-02' AND tpep_pickup_datetime BETWEEN TIMESTAMP '2024-02-14 09:00' AND TIMESTAMP '2024-02-14 09:05' AND passenger_count = 1 AND fare_amount BETWEEN 5 AND 30 AND trip_distance BETWEEN 0.5 AND 5 ORDER BY tpep_pickup_datetime LIMIT 1

misdated 1 row

SELECT tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count, trip_distance, fare_amount, total_amount FROM trips_raw WHERE file_month='2024-03' AND tpep_pickup_datetime < TIMESTAMP '2024-01-01' ORDER BY tpep_pickup_datetime LIMIT 1

neg_fare 1 row

SELECT tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count, trip_distance, fare_amount, total_amount FROM trips_raw WHERE file_month='2024-03' AND fare_amount = -800 AND trip_distance = 0 LIMIT 1

zero_sec 1 row

SELECT tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count, trip_distance, fare_amount, total_amount FROM trips_raw WHERE file_month='2024-03' AND tpep_dropoff_datetime = tpep_pickup_datetime AND trip_distance > 1 ORDER BY tpep_pickup_datetime LIMIT 1

speed 1 row

SELECT tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count, trip_distance, fare_amount, total_amount FROM trips_raw WHERE file_month='2024-03' ORDER BY trip_distance DESC LIMIT 1

null_pass 1 row

SELECT tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count, trip_distance, fare_amount, total_amount FROM trips_raw WHERE file_month='2024-03' AND passenger_count IS NULL AND fare_amount > 0 ORDER BY tpep_pickup_datetime LIMIT 1

dst 1 row

SELECT tpep_pickup_datetime, tpep_dropoff_datetime, passenger_count, trip_distance, fare_amount, total_amount FROM trips_raw WHERE file_month='2024-03' AND tpep_pickup_datetime BETWEEN TIMESTAMP '2024-03-10 01:00' AND TIMESTAMP '2024-03-10 01:59:59' AND tpep_dropoff_datetime >= TIMESTAMP '2024-03-10 03:00' ORDER BY tpep_pickup_datetime LIMIT 1

honesty note All rows verbatim from the raw files; nothing synthesized.

Now point the agent at it. The discipline, which A2 will name formally, is already visible in the prompt: you point at the file, you do not paste its rows; and you demand artifacts — a written summary and a plot on disk, not a verdict in the scrollback.

Claude Code

the whole prompt
> Clean data/messy.csv. Count every problem you find, drop rows by
documented rules, and write a cleaned summary to cleaned_summary.md
plus one plot of trips by hour to plot.png.

The agent reads the head of the file first (a few hundred tokens reveal the delimiters, the dtypes, and the first specimens of trouble), proposes a small cleaning script, and pauses for your approval before writing it. You approve; it runs; it reports counts per rule. The interactive run below is that turn by turn — drive it yourself.

Codex

the whole prompt
> Clean data/messy.csv. Count every problem you find, drop rows by
documented rules, and write a cleaned summary to cleaned_summary.md
plus one plot of trips by hour to plot.png.

The agent reads the head of the file first (a few hundred tokens reveal the delimiters, the dtypes, and the first specimens of trouble), proposes a small cleaning script, and pauses on-request before writing it. You approve; it runs; it reports counts per rule. The prompt is identical to the Claude Code tab — same destination, same artifacts; only the approval surface around the write differs.

What you get back is the same file, cleaned and counted — every removal named, the worst offenders quoted, the survivors plotted:

The same month, after the documented cascade
2024-03 cleaning ledger (3,582,628 → 3,521,703 rows, every removal counted) beside the cleaned Manhattan demand curve by hour of day.
the numbers behind this figure

data window 2024-02, 2024-03, 2024-06 (yellow taxi; local time America/New_York)

generated by figures-pipeline/src/figures.py · a1-cleaned-after

march_cascade

SELECT count(*) AS raw,
  count(*) FILTER (WHERE ok_month) AS s1,
  count(*) FILTER (WHERE ok_month AND ok_fare) AS s2,
  count(*) FILTER (WHERE ok_month AND ok_fare AND ok_duration) AS s3,
  count(*) FILTER (WHERE ok_month AND ok_fare AND ok_duration
                     AND implied_mph <= 65) AS s4
FROM trips_flagged WHERE file_month='2024-03'
raw 3,582,628
s1 3,582,605
s2 3,524,141
s3 3,523,019
s4 3,521,703

manhattan_hourly 24 rows · 3,148,474 trips total

SELECT hour(tpep_pickup_datetime) AS hh, count(*) AS trips
FROM trips_clean t JOIN zones z ON z.location_id = t.PULocationID
WHERE t.file_month = '2024-03' AND z.borough = 'Manhattan'
GROUP BY 1 ORDER BY 1

The receipts matter more than the plot. The agent did not silently delete the −800fare;itreported"82negativefares(worst:800 fare; it reported "82 negative fares (worst: −800.00 on a 0.00-mile trip)” and left a script you can read and defend. That is the difference between an edit and a finding.

Ask in Python, then in R

The lab is bilingual, and the agent does not care which language it works in. Re-ask for the same task in the other language and watch the verdicts come back identical — the same counts, rule for rule, with dplyr filtering where pandas masked.

Python

the same task, in Python
> Now do the same cleaning task again, this time in Python — same drop
rules, same counted summary.

This block is orchestration, not statistics — it’s the same in R. Ask the agent to translate (Lesson A1).

R

the same task, in R
> Now do the same cleaning task again, this time in R — same drop rules,
same counted summary.

This is the language policy, stated once for the whole course: your statistics live in Python or R, and the agentic skills you are learning transfer untouched between them. The R toggle in this site’s header works the same way — flip it and the statistical code rewrites; the lesson does not. This course teaches the tools, not the languages.

The research project

Everything from here builds one project: Weather and the Demand for Urban Mobility. The question is plain — when the weather turns, who still rides? — and the answer is a report. The data is twenty-four months of New York yellow and green taxi trips joined to weather; the deliverable is a reproducible estimate of how rain, snow, and heat move demand across the city’s zones. The messy file you just cleaned is one sample month of it.

Clone the starter kit and you have the project’s skeleton — the directory contract every later lesson assumes, an empty journal/ for the receipts you are about to start keeping, and a Makefile whose make check-a1 … check-f1 targets are the milestones you will tick off one unit at a time. The one command python3 get_data.py then fetches the fixed course slice into ./data/: the 2024 yellow-taxi parquet months, the zone lookup, and the NYC hourly weather every later lesson is built on.

clone the kit, then fetch the data
git clone https://github.com/junwei-lu/agentic-datascience-course-kit.git
cd agentic-datascience-course-kit && python3 get_data.py

No clone needed? The same slice is available zero-install over DuckDB-over-HTTP, or by curling just the script — see Get the data for every path.

Guided Run — The Ten-Minute Quick Win

Field Terminal — session: a1-quick-win Claude Code
claude

Field Assignment

Artifact quick-win transcripts saved to journal/; starter repo cloned

Get hired. By the end you have both the muscle memory of one full loop and the project that the rest of the course advances.

Claude Code

  1. Install Claude Code, authenticate, and launch claude in a scratch folder holding only data/messy.csv. Confirm the banner reports no CLAUDE.md — you are running the bare loop on purpose.
  2. Run the quick win: clean data/messy.csv into cleaned_summary.md and plot.png, approving the cleaning script when prompted. Read the counts per rule before you accept them.
  3. Re-ask for the same task in the other language (Python ↔ R) and confirm the verdicts match.
  4. Ask the agent to save this run’s summary to journal/quick-win.md.
  5. Clone the starter repo for Weather and the Demand for Urban Mobility.

Codex

  1. Install Codex, authenticate, and launch codex in a scratch folder holding only data/messy.csv. Read the banner’s sandbox + approval line — you are running the bare loop on purpose.
  2. Run the quick win: clean data/messy.csv into cleaned_summary.md and plot.png, approving the cleaning patch on-request. Read the counts per rule before you accept them.
  3. Re-ask for the same task in the other language (Python ↔ R) and confirm the verdicts match.
  4. Ask the agent to save this run’s summary to journal/quick-win.md.
  5. Clone the starter repo for Weather and the Demand for Urban Mobility.

The artifact is the saved transcript and the cloned repo. It feeds A2, where you stop watching the loop and start directing it — and where the journal/ you just opened becomes a standing discipline.

Milestone gate · make check-a1advances A1
  1. Day one is the zero-configuration 'before' picture on purpose.

  2. Read the receipts — negative fares, zero-distance paid trips, NULL passenger counts — before accepting.

Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.

Pitfalls & Gotchas

  • [both] 〜〜

    Accepting the cleaned file without reading the counts. The whole value of the quick win is the receipt — “82 negative fares, 98 zero-distance paid trips” — not the tidy output. A cleaning you cannot describe rule-by-rule is a cleaning you cannot defend in a methods section, and the agent will produce a confident, plausible, undocumented one if you let it.

  • [both]

    Pasting rows of the CSV into the prompt instead of pointing at the file. It burns context, it loses provenance, and it caps the agent at whatever you happened to copy. Point at the path; let the agent read row 4,217 itself.

  • [both]

    Treating day one as proof the agent is trustworthy. It is not — it is proof the loop works. Agents misread schemas and invent joins; you watched a clean run, not a guaranteed one. The trust is built across the next five units, not asserted here.

  • [both] 〜〜

    Clicking through approval prompts without reading them. The pause before a write is the only place you stay in control on day one. An approval you grant reflexively is a file you did not actually authorize — and on a real project that is how data/raw/ gets edited “just this once.”

Check Your Bearings

A1 · 4 questions · unlimited retries, no timer

This check opens when the guided simulation above is complete — the questions assume you have seen the run.

(noted in your field journal as an override)

Field journal

as of June 2026

Parity note

Day one is genuine parity. Both tools install with one command, authenticate through the browser, open a session inside your working directory, and run the same prompt → act → observe → fix loop against the same mangled file to the same cleaned, counted result. The differences are dialect: the login flow, the banner wording, the resume command, and how the pre-write pause is framed — an approval prompt on one side, an approval mode layered over an OS sandbox on the other. Those surfaces diverge more as the course goes on; the loop underneath does not.

Ledger — A1

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

Your position

Lesson A1Lesson A2Lesson B1Lesson B2Lesson B3Lesson C1Lesson C2Lesson C3Lesson D1Lesson D2Lesson D3Lesson D4Lesson E1Lesson E2Lesson E3Lesson F1abcdef

Positions

  • the data manager

    Position vacant — engaged at C2

    write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

    est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter

  • the methodologist

    Position vacant — engaged at C1

    the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

    est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked

  • the data engineer

    Position vacant — engaged at C3

    MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

    est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication

  • the RA pool

    Position vacant — engaged at D1

    parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

    est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes

  • the overnight RA

    Position vacant — engaged at D3

    /loop supervision + Goal Mode runs over background estimation

    est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine

  • the adviser

    Position vacant — engaged at D1

    parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

    est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes

  • the referee

    Position vacant — engaged at D4

    contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

    est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass

  • the lab manager

    Position vacant — engaged at E2

    scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

    est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate

  • the reproducibility checker

    Position vacant — engaged at E1

    headless invocation + the fresh-clone replication self-test + CI gates

    est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter

  • the the wall — the unstaffed midnight hours between a raw file and a first plot

    Position vacant — engaged at A1

    the bare agent loop (prompt → act → observe → fix), zero configuration

    est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free

  • the you, working an order of magnitude faster — but only if you direct the work

    Position vacant — engaged at A2

    the command surface + five prompting patterns + context hygiene

    est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts

  • the the lab manual nobody writes — the institutional knowledge that lives in your head

    Position vacant — engaged at B1

    instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

    est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter

  • the careful senior who plans before touching data

    Position vacant — engaged at B2

    repo scaffold + pinned environments + read-only Plan mode reconnaissance

    est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention

  • the the lab whose members don't overwrite each other

    Position vacant — engaged at D2

    git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

    est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end

  • the the onboarding the lab never has to repeat

    Position vacant — engaged at E3

    lab-kit — the whole methodology packaged as a one-command install

    est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt

  • the the whole lab, orchestrated — the PI who designs the system instead of doing the work

    Position vacant — engaged at F1

    the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

    est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson Role Est. human-RA Agent (yours when measured)
A1 the wall — the unstaffed midnight hours between a raw file and a first plot an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work ~10 minutes for the quick win, plus the same task re-run in the other language for free
A2 you, working an order of magnitude faster — but only if you direct the work the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1 the lab manual nobody writes — the institutional knowledge that lives in your head ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down written once in an hour; reloaded free at the start of every session thereafter
B2 careful senior who plans before touching data ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots an afternoon — most of it download wall-clock, not attention
B3 the data manager who guards the raw files — the person who says no near the master copies permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads two profiles configured once in minutes; the fence then holds every session, tired or not
C1 the methodologist — the one person who knows how the lab actually decides the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2 data manager / QA who never sleeps permanent vigilance — est. 2 weeks/year of load-checking and release-note reading half a day to install and test the 9-line block; ~20 s per run thereafter
C3 the data engineer who wires the lab to its systems days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1 the RA pool — and the adviser who critiques from outside a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2 the lab whose members don't overwrite each other the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3 overnight RA one night shift per estimation batch — and the course runs several batches ~10 min to write the check or the objective; the night itself belongs to the machine
D4 an RA bench and the PI who keeps their results comparable the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1 reproducibility checker a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2 lab manager's standing chores a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3 the onboarding the lab never has to repeat six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1 the whole lab, orchestrated — the PI who designs the system instead of doing the work each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed 0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.