An expedition in sixteen lessons

Atlas of Agents

Claude Code & Codex for data-science research

Begin the survey ↓
Atlas of Agents — a surveyor's map of downtown Manhattan, the Battery up to Central Park, traced from the course's taxi-zone shapefile; six unit landmarks on a dotted expedition route with sixteen lesson stops.Six unit landmarks on a dotted expedition route down the island of Manhattan; the itinerary ledger after the map lists the same sixteen lessons in order.Unit A — Day One: The Harbor. Arrival; you hire your first hand at the dock.Lesson A1A1Lesson A2A2Unit B — Onboarding: The Library. The Lab Manual is written; the rare-books vault holds the guardrails.Lesson B1B1Lesson B2B2Lesson B3B3Unit C — The Methods: The Waterworks. SOPs codified; contracts filter at the pipe; plumbed into the data systems.Lesson C1C1Lesson C2C2Lesson C3C3Unit D — The Lab at Scale: Grand Central. RAs dispatched; parallel tracks; overnight trains; the fleet timetable and the referee.Lesson D1D1Lesson D2D2Lesson D3D3Lesson D4D4Unit E — Autopilot: The Lighthouse. The beacon mechanism; the keeper’s schedule; the lamp design, shipped.Lesson E1E1Lesson E2E2Lesson E3E3Unit F — The Payoff: The Printing House. The press that runs itself — research orchestrated on a loop, refereed, shipped as a report.Lesson F1F1
Skip the curiosities

Curiosities

Storm Front

A storm front off the Narrows. The research question begins here: when the weather turns, who still rides?

Weather Vane

The weather vane — Open-Meteo, the survey’s sky-watcher. Hourly observations, no key required.

Sea Serpent

HERE BE OUTLIERS — negative fares, 90-mph crossings, trips that end before they begin. The deep water of every taxi month.

Pigeon Post

The pigeon post — an MCP courier. Letters pass between tools that never met, and arrive in the same envelope format.

Ticker Tape

The ticker at the Printing House — results wired out as fast as the press can set them. Mind what you publish.

Survey Balloon

The Central Park station has watched the sky since 1869 — every storm in the demand panel passed over this meadow.

Barometer

The barometer falls before the storm — and that fall is the experiment. D3 runs an event study around each weather shock, reading demand the days before and after, the storm standing in for the trial nobody could run on purpose.

Pilot Chart

A pilot chart of the whole survey — sixteen stops, six landmarks, one expedition. The itinerary ledger below the map is the real route; this drawn world is the flourish laid over it. Follow either; arrive the same.

Bridge Toll

The toll-house on the bridge approach — every crossing pays a fare. The demand panel counts those fares: each East-River span is a column in the join, each trip a row that already settled up.

El Train

The elevated railway over the waterfront — the line that never touches the street. The agents run on rails like these: dispatched overnight, refereed at the platform, arriving while the city sleeps.

View the full map — or take the itinerary below; they are the same route.

The Preface

You are a lone researcher chartering a lab to survey how weather moves a city — when the sky turns, who still rides? The staff you cannot hire — the data manager, the methodologist, the RA pool, the referee — you will configure instead: coding agents, taught your project, bound by your rules, auditable to the last row count. The survey runs in order. You hire your first hand at the Harbor and feel the loop work (Unit A). You onboard it at the Library — the lab manual, a reproducible home, the vault rules (Unit B). At the Waterworks you codify the methods until advice becomes physics (Unit C). You scale to a fleet on Grand Central’s parallel tracks (Unit D), set the Lighthouse running unattended (Unit E), and bring it all together at the Printing House: a press that runs itself — research orchestrated on a loop, refereed, and shipped as a report (Unit F). Two tools taught side by side, in Python and R — and at the end, honest accounting of the lab you never had to hire.

The full expedition. Begin at the Harbor and take the route in order, A through F — each capability arrives the moment the growing lab needs it.

The working researcher’s ferry. Already fluent with an agent? Skim A–B and board at the Waterworks — start at C1, where the methods are codified. The dotted ferry line on the map marks the shortcut.

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

The expedition so far

Lesson A1Lesson A2Lesson B1Lesson B2Lesson B3Lesson C1Lesson C2Lesson C3Lesson D1Lesson D2Lesson D3Lesson D4Lesson E1Lesson E2Lesson E3Lesson F1abcdef

Positions

  • the data manager

    Position vacant — engaged at C2

    write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

    est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter

  • the methodologist

    Position vacant — engaged at C1

    the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

    est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked

  • the data engineer

    Position vacant — engaged at C3

    MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

    est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication

  • the RA pool

    Position vacant — engaged at D1

    parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

    est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes

  • the overnight RA

    Position vacant — engaged at D3

    /loop supervision + Goal Mode runs over background estimation

    est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine

  • the adviser

    Position vacant — engaged at D1

    parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

    est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes

  • the referee

    Position vacant — engaged at D4

    contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

    est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass

  • the lab manager

    Position vacant — engaged at E2

    scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

    est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate

  • the reproducibility checker

    Position vacant — engaged at E1

    headless invocation + the fresh-clone replication self-test + CI gates

    est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter

  • the the wall — the unstaffed midnight hours between a raw file and a first plot

    Position vacant — engaged at A1

    the bare agent loop (prompt → act → observe → fix), zero configuration

    est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free

  • the you, working an order of magnitude faster — but only if you direct the work

    Position vacant — engaged at A2

    the command surface + five prompting patterns + context hygiene

    est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts

  • the the lab manual nobody writes — the institutional knowledge that lives in your head

    Position vacant — engaged at B1

    instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

    est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter

  • the careful senior who plans before touching data

    Position vacant — engaged at B2

    repo scaffold + pinned environments + read-only Plan mode reconnaissance

    est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention

  • the the lab whose members don't overwrite each other

    Position vacant — engaged at D2

    git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

    est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end

  • the the onboarding the lab never has to repeat

    Position vacant — engaged at E3

    lab-kit — the whole methodology packaged as a one-command install

    est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt

  • the the whole lab, orchestrated — the PI who designs the system instead of doing the work

    Position vacant — engaged at F1

    the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

    est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson Role Est. human-RA Agent (yours when measured)
A1 the wall — the unstaffed midnight hours between a raw file and a first plot an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work ~10 minutes for the quick win, plus the same task re-run in the other language for free
A2 you, working an order of magnitude faster — but only if you direct the work the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1 the lab manual nobody writes — the institutional knowledge that lives in your head ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down written once in an hour; reloaded free at the start of every session thereafter
B2 careful senior who plans before touching data ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots an afternoon — most of it download wall-clock, not attention
B3 the data manager who guards the raw files — the person who says no near the master copies permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads two profiles configured once in minutes; the fence then holds every session, tired or not
C1 the methodologist — the one person who knows how the lab actually decides the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2 data manager / QA who never sleeps permanent vigilance — est. 2 weeks/year of load-checking and release-note reading half a day to install and test the 9-line block; ~20 s per run thereafter
C3 the data engineer who wires the lab to its systems days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1 the RA pool — and the adviser who critiques from outside a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2 the lab whose members don't overwrite each other the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3 overnight RA one night shift per estimation batch — and the course runs several batches ~10 min to write the check or the objective; the night itself belongs to the machine
D4 an RA bench and the PI who keeps their results comparable the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1 reproducibility checker a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2 lab manager's standing chores a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3 the onboarding the lab never has to repeat six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1 the whole lab, orchestrated — the PI who designs the system instead of doing the work each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed 0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.