Cheat sheet

C3 intermediate ~60 min

Plugging Into Data Systems

Absorbs: the data engineer

Advances C3

The Pain

The research question has been answerable in principle since week two: does weather move demand for taxis? In practice it has been answerable by nobody, because the answer lives in five systems that do not speak to each other. The trips are a heap of monthly Parquet files. The weather is an API that returns hourly JSON if you ask it in exactly the right dialect of date range and forget a column you needed. The events are a city open-data portal with a query language someone designed in 2013. The holidays are a library. The zones are a lookup table and, when you want a map, a shapefile.

A lab would hand this to the data engineer — the person who knows that the weather archive caps at a date range, that the events portal paginates at a thousand rows, that the timestamps come back in three different timezones and one of them is lying. She would build the seams once, cache every raw response so the joins reproduce in February when the API has quietly changed its mind, and hand you a single table you could actually regress on. Then she would go on sabbatical, and the seams would rot, and the next person would rebuild them slightly wrong.

You have spent two days writing glue. The glue works until it doesn’t — a rate limit here, a renamed field there — and every time it breaks, the analysis stops while you re-learn a system you touch twice a year. The data is all there. It is simply not anywhere you can use it.

Why / When

Research data lives in systems — databases, APIs, the open web — and the obstacle is rarely the analysis; it is the reach. MCP (the Model Context Protocol) is a standard socket between an agent and those systems: you register a server once, and its operations appear to the agent as native tools, the way a built-in file-read does. A DuckDB server means the agent queries the warehouse directly instead of shelling out and parsing text; a database server means credentials are held by the connection, not pasted into a prompt.

The honest boundary matters as much as the capability. The agent can already run a CLI through the shell — B2 had it download 24 months with plain curl and query DuckDB at a shell prompt. MCP earns its keep exactly where the shell strains: persistent connections the agent returns to across many calls without re-establishing state, credentialed systems where you want the secret in the connection and not the transcript, and non-CLI services that have no command-line at all. Where a single shell line already reaches the system, a protocol around it is ceremony.

In the pipeline, this is the data-acquisition and integration stage, and the role it absorbs is the data engineer: the seams built once, the raw responses cached so the joins reproduce, the warehouse and the analysis panel that turn five systems into one table you can regress on.

Contrary winds

Not for: a system you already reach in one shell line — the agent can run `duckdb` or `curl` directly (B2's lesson); wiring a protocol around a CLI you already have is plumbing nobody asked for.

Mechanics

The shared model first, then registration in each dialect, then the three things the project actually builds: the warehouse, the enrichment joins, and the panel.

The MCP model, and the honest sidebar

An MCP server is a process that exposes a set of operations — query this database, fetch this URL, screenshot this page — over a uniform protocol. The agent discovers them at startup and calls them like any other tool; you register the server once and it is there every session. That is the whole idea, and the discipline is knowing when not to reach for it:

Reach for MCP when…Stay with the shell when…
the connection is persistent — many calls, shared stateone call, one CLI line does it
the system is credentialed — keep the secret in the connectionthe data is open, no auth
the service is non-CLI — a browser, a SaaS API, a live DBa curl or duckdb invocation already reaches it

DuckDB sits right on the line, and it is instructive that it does. The agent can run duckdb warehouse.duckdb -c "…" at a shell prompt all day. A DuckDB MCP server earns its place only because the warehouse is a persistent connection the agent returns to dozens of times in a session — keeping one handle open beats re-opening the database on every query and re-parsing text output each time. If your interaction is one query, the shell is the right tool and the protocol is overhead.

Registering a server

The server you run is identical; the registration surface differs.

Claude Code

Register a server in .mcp.json at the repo root — committed, so it arrives with the clone — or add it from the command line with claude mcp add:

.mcp.json — the warehouse connection, committed
{
"mcpServers": {
"duckdb": {
"command": "uvx",
"args": ["mcp-server-motherduck", "--db-path", "warehouse.duckdb"]
}
}
}

A project-scoped server registers per project, which is the discipline that matters: a server added globally loads its tools into every session you ever run, paying context and startup cost on projects that never touch a warehouse. Register the DuckDB server in this repo’s .mcp.json, not your user config, and the next project starts clean.

Codex

Register a server under [mcp_servers] in config.toml — the project-layer team config, committed with the repo — or add it from the command line with codex mcp add:

config.toml — the warehouse connection, committed
[mcp_servers.duckdb]
command = "uvx"
args = ["mcp-server-motherduck", "--db-path", "warehouse.duckdb"]

Keep the registration in the project-layer config rather than your personal ~/.codex/config.toml: a server declared globally loads its tools into every session, paying context and startup cost on projects that never open a warehouse. Project-scoped registration keeps the next session you start somewhere else clean.

The warehouse

With the DuckDB server registered, the agent builds the warehouse by querying the connection directly. The first move is reconnaissance, not construction: have it explore INFORMATION_SCHEMA so it learns the real schema from the files instead of assuming one — the B2 plan-first habit, applied to a live system. Then it materializes the raw table straight from the TLC CDN with read_parquet — the same public slice the kit mirrors, which you can also read in place over DuckDB-over-HTTP (Get the data) — and registers C2’s cleaned table as a view, so the contracted cleaning SOP is the only path into analysis:

building trips_raw, then the cleaned view
-- 1. Learn the schema from the source, don't assume it.
DESCRIBE SELECT * FROM read_parquet('data/raw/yellow_2024-03.parquet');
-- 2. Materialize 24 months; union_by_name absorbs the schema drift C2 maps.
CREATE TABLE trips_raw AS
SELECT * FROM read_parquet('data/raw/yellow_*.parquet', union_by_name = true);
-- 3. Register C2's cleaning SOP as the only path into analysis.
CREATE VIEW trips_clean AS SELECT * FROM standardize(trips_raw);

union_by_name = true is the quiet hero: it aligns columns across 24 files that do not all agree on order or presence — a second layer under C2’s rename map against exactly the casing drift that cost eleven days in that lesson. Belt, and suspenders.

Enrichment joins, cached raw then derived

The trips are now one table; the question needs four more sources joined to them. Two come from live systems — Open-Meteo for hourly weather, the Socrata portal for permitted events — and the discipline for both is the same and non-negotiable: cache the raw response, then derive. Every API payload is checksummed into data/raw/api/ before any transform touches it, because the API you query in June is not the API that answers in February, and a join you cannot reproduce is a result you cannot defend — these joins get interrogated by the referee in D4.

Python

scripts/enrich/weather.py — cache raw, then derive
import hashlib, json, pathlib, httpx
RAW = pathlib.Path("data/raw/api")
def fetch_weather(start: str, end: str) -> dict:
# One call per archive window; the response is the artifact of record.
r = httpx.get("https://archive-api.open-meteo.com/v1/archive", params={
"latitude": 40.71, "longitude": -74.01, "start_date": start,
"end_date": end, "hourly": "temperature_2m,precipitation,snowfall,wind_speed_10m",
"timezone": "America/New_York", # ask for local time explicitly
})
r.raise_for_status()
body = r.content
# Checksum the raw bytes into data/raw/api/ BEFORE any parsing.
digest = hashlib.sha256(body).hexdigest()[:16]
(RAW / f"weather_{start}_{end}.{digest}.json").write_bytes(body)
return json.loads(body) # derive only from what we just cached

This block is orchestration, not statistics — it’s the same in R. Ask the agent to translate (Lesson A1).

R

R/enrich/weather.R — cache raw, then derive
library(httr2); library(jsonlite); library(digest)
fetch_weather <- function(start, end) {
# One call per archive window; the response is the artifact of record.
resp <- request("https://archive-api.open-meteo.com/v1/archive") |>
req_url_query(latitude = 40.71, longitude = -74.01,
start_date = start, end_date = end,
hourly = "temperature_2m,precipitation,snowfall,wind_speed_10m",
timezone = "America/New_York") |> # ask for local time explicitly
req_perform()
body <- resp_body_raw(resp)
# Checksum the raw bytes into data/raw/api/ BEFORE any parsing.
digest16 <- substr(digest(body, algo = "sha256", serialize = FALSE), 1, 16)
writeBin(body, file.path("data/raw/api",
sprintf("weather_%s_%s.%s.json", start, end, digest16)))
fromJSON(rawToChar(body)) # derive only from what we just cached
}

The events join carries one more trap worth naming: Socrata paginates, and the holidays come from a library, so weather_hourly, events, holidays, and zones each land as a cached-raw-then-derived table before any of them touches the panel. The rule is uniform — nothing enters the warehouse that was not first written to data/raw/api/ under its checksum.

Building the analysis panel, and the DST that everyone gets wrong

The panel is a zone × hour grid: one row per zone per hour across the whole window, demand attached, weather and events and holidays joined on. Two properties make it honest, and both are easy to get wrong.

First, zero-demand cells are real data. An hour in which a zone saw no pickups is a 0, not a missing row — and a 3 a.m. outer-borough zone is zero far more often than not. Build the grid as a CROSS JOIN of zones and an hour spine, LEFT JOIN the demand onto it, and coalesce the nulls to zero; an inner join would silently drop exactly the low-demand cells a demand model most needs to see.

Second — DST, and not the trap you were warned about. The folklore says spring-forward creates phantom timestamps. It does not, at least not here: the TLC meters got the 2024-03-10 spring-forward right, and zero trips are stamped inside the 02:00 hour that the local clock skips. The real trap is two-headed:

  • The hour spine. America/New_York has 743 local hours in March 2024, not 744 — the 02:00 hour does not exist. A naive spine of 31 × 24 invents a phantom hour no trip can ever fill, and your panel carries a row of structural zeros that are not zeros, they are nothing. Build the spine in local time and it self-corrects: 743 hours, matching the data.

  • Gap-spanning durations. This is the one that bites the duration workstream. A trip from 01:03 to 03:01 on spring-forward night reads 118 clock-minutes by naive subtraction — but only 58 minutes actually elapsed; the 02:00 hour never happened. Compute durations from wall-clock fields and every gap-spanning trip is overstated by exactly sixty minutes. Compute them in UTC (or from epoch differences) and they are true. The panel’s spine lives in local time — hour-of-day effects are the research question — but durations live in UTC. Each timezone job gets its own correct answer; using one answer for both is the silent bug.

The night an hour vanished
Citywide hourly pickups around the 2024-03-10 spring-forward: by the local clock there is a hole at 02:00 (the hour does not exist, and contains 0 trips); in UTC the same night is seamless.
the numbers behind this figure

data window 2024-02, 2024-03, 2024-06 (yellow taxi; local time America/New_York)

generated by figures-pipeline/src/figures.py · c3-dst

dst_window 35 rows

SELECT ts_local, ts_utc, sum(pickups) AS pickups
FROM panel_zone_hour
WHERE ts_local >= TIMESTAMP '2024-03-09 12:00'
  AND ts_local <= TIMESTAMP '2024-03-10 23:00'
GROUP BY 1, 2 ORDER BY 1

phantom_hour_trips count = 0

SELECT count(*) FROM trips_raw WHERE tpep_pickup_datetime >= '2024-03-10 02:00' AND tpep_pickup_datetime < '2024-03-10 03:00'

honesty note March has 743 panel hours, not 744. No trips were stamped inside the phantom hour — the meters got DST right even though the analyst often doesn't.

Watch the panel materialize, the snowstorm and all. The heatmap below is one week of the real panel — Manhattan, hour × day — straight from analysis_panel with the zero cells included. The trench cut through the middle of the week is the February 13 snowstorm, which dropped 19 cm and cut Manhattan demand by 45 percent. (Rain, counterintuitively, cuts the other way: heavy-rain hours run about 62 percent above baseline — when it pours, people who would have walked take a cab. The weather scatter in Unit D makes that case in full.)

One week of the panel: Manhattan, hour × day
Demand by hour × day for Manhattan, week of 2024-02-12, straight from panel_zone_hour (zero cells included). The Feb 13 snowstorm (19.1 cm) cuts a visible trench through the week.
the numbers behind this figure

data window 2024-02, 2024-03, 2024-06 (yellow taxi; local time America/New_York)

generated by figures-pipeline/src/figures.py · c3-panel-heat

heat 168 rows

SELECT date_trunc('day', ts_local) AS day, hour(ts_local) AS hh,
       sum(pickups) AS pickups
FROM panel_zone_hour
WHERE borough = 'Manhattan'
  AND ts_local >= TIMESTAMP '2024-02-12'
  AND ts_local <  TIMESTAMP '2024-02-19'
GROUP BY 1, 2 ORDER BY 1, 2

snow_feb13_cm value = 19.11

Sources without an API — the in-app browser

Most of the project’s sources have an API or a CLI. The day one does not — a permit calendar that exists only as a rendered web page, a portal that hides its data behind a form — you need the agent to see a page, not fetch it. Here the tools genuinely diverge, so this is a spotlight, not a tab: one tool owns the capability and the other reaches it by a labeled workaround.

Codex Your tool

In-app browser / computer use

Codex can drive a real browser inside the session — load a page, read what renders, fill a form, click through, and capture what it sees (Appshots). For a source that publishes only as HTML — a borough permit calendar with no JSON endpoint — the agent navigates it the way you would, extracts the rows, and (the C3 discipline holds) writes the raw captured page into data/raw/api/ under a checksum before deriving an events supplement from it. The browsing is native: no separate server to register, the page is just another surface the agent can act on.

Claude Code has no built-in browser — the nearest equivalent is the Playwright MCP server.

Nearest equivalent — Claude Code

Claude Code reaches the same capability through MCP itself: register the Playwright MCP server and the agent gains tools to open a page, read the rendered DOM, screenshot it, and interact. It is the same job — see a page that has no API, extract the rows, cache the raw capture under a checksum, derive the events supplement — reached through a server you register rather than a capability built into the session. The seam is the registration: one more entry in .mcp.json, and the agent that builds your warehouse can also read the web pages your warehouse needs.

Watch this space Browser access is converging fast on both roadmaps; recheck quarterly.

Guided Run — Building the Warehouse

Field Terminal — session: c3-warehouse Claude Code
claude

Field Assignment

Artifact make check-c3 passes

Build the warehouse, wire the enrichment, and materialize the analysis panel — all of it under the B3 pipeline profile and the C2 contracts, so every write that builds the panel passes the gate that protects it.

Claude Code

  1. Register the DuckDB server in .mcp.json (project-scoped) and confirm its tools load at session start.
  2. Have the agent explore INFORMATION_SCHEMA, then build warehouse.duckdb from the TLC CDN via read_parquet and register C2’s trips_clean as a view.
  3. Build weather_hourly, events, holidays, and zones — each cached-raw-then-derived, every API response checksummed into data/raw/api/ before any transform.
  4. Build analysis_panel: a complete zone × hour grid, zero-cells included via LEFT JOIN + coalesce, with the DST handling above — a 743-hour local spine for March, durations computed in UTC.
  5. make check-c3.

Codex

  1. Register the DuckDB server under [mcp_servers] in config.toml (project-scoped) and confirm its tools load at session start.
  2. Have the agent explore INFORMATION_SCHEMA, then build warehouse.duckdb from the TLC CDN via read_parquet and register C2’s trips_clean as a view.
  3. Build weather_hourly, events, holidays, and zones — each cached-raw-then-derived, every API response checksummed into data/raw/api/ before any transform.
  4. Build analysis_panel: a complete zone × hour grid, zero-cells included via LEFT JOIN + coalesce, with the DST handling above — a 743-hour local spine for March, durations computed in UTC.
  5. make check-c3.

make check-c3 validates schemas, zero-cell completeness, the DST hour counts (743 for March, 745 for November), and that every enrichment table has a cached raw response under its checksum. This panel is what D1–D4 estimate on — every elasticity, every event study, every robustness spec reads from the table you build here, which is why its zero-cells and its DST handling have to be right exactly once.

Milestone gate · make check-c3advances C3
  1. The agent explores INFORMATION_SCHEMA first — it should learn the schema, not assume it.

  2. Every API response checksummed into data/raw/api/ BEFORE any transform — cache-raw-then-derive.

  3. An hour with no pickups is a real 0, not a missing row; left-join the grid, coalesce to 0.

  4. March has 743 panel hours, not 744; a gap-spanning trip's clock minutes overstate its true minutes.

  5. C3 builds on C2's gate — the panel is only worth plumbing because every write passed it.

Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.

Pitfalls & Gotchas

  • [both] 〜〜

    Deriving from an API response without caching the raw payload first. The Open-Meteo archive and the Socrata portal both revise — silently, on their own schedule — and a join you built against last month’s response cannot be reproduced against this month’s. Checksum the raw bytes into data/raw/api/ before any parse; the cached payload is the artifact of record the referee will ask for in D4.

  • [both] 〜〜

    Using one timezone answer for two timezone jobs. A local-hour spine is correct for hour-of-day demand and wrong for trip durations; a UTC duration is correct for elapsed time and wrong for “what hour was this.” The gap-spanning trip that reads 118 clock-minutes but elapsed 58 is the classic silent bug — the panel’s DST checks exist to catch exactly it.

  • [both]

    Building the panel with an inner join and losing the zeros. An hour a zone saw no pickups is a 0, and a demand model that never sees its zeros is estimating on a truncated sample without knowing it. Grid first, demand left-joined, nulls coalesced to zero — the completeness is the data, not a formatting nicety.

  • [both] 〜〜

    MCP servers with write access to systems you care about. A server that can write your warehouse is a server that can corrupt it; scope its credentials per B3’s least-privilege profiles, and prefer a read-only connection for any server whose only job is to be queried. The protocol’s convenience is not a reason to hand it your keys.

  • [CC]

    A globally registered server piles its tools into every session. The DuckDB or Playwright server you add to your user config loads — and costs context and startup — on every project, including the ones with no warehouse and no web sources. Register servers per project in the repo’s .mcp.json, and the next project starts clean.

Check Your Bearings

C3 · 4 questions · unlimited retries, no timer

This check opens when the guided simulation above is complete — the questions assume you have seen the run.

(noted in your field journal as an override)

Field journal

as of June 2026

Parity note

MCP is a genuine parity feature: both tools speak the same protocol and register servers the same way in substance, differing only in surface — .mcp.json / claude mcp add on one side, [mcp_servers] in config.toml / codex mcp add on the other. The asymmetry is at the edge of the lesson, not its center: seeing a web page that has no API is native to Codex (its in-app browser and computer use) and reached in Claude Code through MCP itself (the Playwright server) — same job, one built in and one registered. Both converge quickly; the warehouse and the enrichment joins, which are the milestone, are identical either way.

Ledger — C3

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

Your position

Lesson A1Lesson A2Lesson B1Lesson B2Lesson B3Lesson C1Lesson C2Lesson C3Lesson D1Lesson D2Lesson D3Lesson D4Lesson E1Lesson E2Lesson E3Lesson F1abcdef

Positions

  • the data manager

    Position vacant — engaged at C2

    write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

    est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter

  • the methodologist

    Position vacant — engaged at C1

    the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

    est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked

  • the data engineer

    Position vacant — engaged at C3

    MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

    est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication

  • the RA pool

    Position vacant — engaged at D1

    parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

    est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes

  • the overnight RA

    Position vacant — engaged at D3

    /loop supervision + Goal Mode runs over background estimation

    est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine

  • the adviser

    Position vacant — engaged at D1

    parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

    est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes

  • the referee

    Position vacant — engaged at D4

    contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

    est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass

  • the lab manager

    Position vacant — engaged at E2

    scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

    est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate

  • the reproducibility checker

    Position vacant — engaged at E1

    headless invocation + the fresh-clone replication self-test + CI gates

    est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter

  • the the wall — the unstaffed midnight hours between a raw file and a first plot

    Position vacant — engaged at A1

    the bare agent loop (prompt → act → observe → fix), zero configuration

    est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free

  • the you, working an order of magnitude faster — but only if you direct the work

    Position vacant — engaged at A2

    the command surface + five prompting patterns + context hygiene

    est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts

  • the the lab manual nobody writes — the institutional knowledge that lives in your head

    Position vacant — engaged at B1

    instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

    est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter

  • the careful senior who plans before touching data

    Position vacant — engaged at B2

    repo scaffold + pinned environments + read-only Plan mode reconnaissance

    est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention

  • the the lab whose members don't overwrite each other

    Position vacant — engaged at D2

    git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

    est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end

  • the the onboarding the lab never has to repeat

    Position vacant — engaged at E3

    lab-kit — the whole methodology packaged as a one-command install

    est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt

  • the the whole lab, orchestrated — the PI who designs the system instead of doing the work

    Position vacant — engaged at F1

    the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

    est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson Role Est. human-RA Agent (yours when measured)
A1 the wall — the unstaffed midnight hours between a raw file and a first plot an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work ~10 minutes for the quick win, plus the same task re-run in the other language for free
A2 you, working an order of magnitude faster — but only if you direct the work the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1 the lab manual nobody writes — the institutional knowledge that lives in your head ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down written once in an hour; reloaded free at the start of every session thereafter
B2 careful senior who plans before touching data ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots an afternoon — most of it download wall-clock, not attention
B3 the data manager who guards the raw files — the person who says no near the master copies permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads two profiles configured once in minutes; the fence then holds every session, tired or not
C1 the methodologist — the one person who knows how the lab actually decides the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2 data manager / QA who never sleeps permanent vigilance — est. 2 weeks/year of load-checking and release-note reading half a day to install and test the 9-line block; ~20 s per run thereafter
C3 the data engineer who wires the lab to its systems days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1 the RA pool — and the adviser who critiques from outside a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2 the lab whose members don't overwrite each other the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3 overnight RA one night shift per estimation batch — and the course runs several batches ~10 min to write the check or the objective; the night itself belongs to the machine
D4 an RA bench and the PI who keeps their results comparable the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1 reproducibility checker a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2 lab manager's standing chores a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3 the onboarding the lab never has to repeat six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1 the whole lab, orchestrated — the PI who designs the system instead of doing the work each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed 0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.