E2 advanced ~45 min

The Lab That Runs Itself: Scheduled & Cloud Agents

Absorbs: the lab manager's standing chores

Advances E2

The Pain

The TLC publishes its taxi data on a calendar you do not control: a new month, every month, on a roughly two-month lag, dropped onto a CDN without an announcement that reaches you. Your analysis was current the day you ran it and has been quietly rotting ever since. There are three fresh months on the server right now that your panel does not know exist, and the elasticities you are about to present rest on a window that closed in the winter.

Keeping a study current is the least glamorous job in any lab and the one most reliably skipped. Someone has to remember the cadence, check the source on a schedule, pull the new file when it lands, run it through the same gauntlet every prior month passed, fold it into the warehouse, and re-estimate — and then, crucially, not just publish the new numbers, but bring them to a person who decides whether they belong. It is standing-chore work: low-judgment to perform, high-stakes to skip, and impossible to do on a cadence when the person responsible is also writing the paper, supervising the RAs, and teaching two sections. The data keeps arriving. Your attention does not keep pace. The gap between the latest drop and your latest estimate is the half-life of your study’s relevance, and right now nobody is watching the calendar.

Why / When

The standing chores of a lab share a shape: they recur on a cadence nobody owns, each instance is mechanical, and the cost is not in any one instance but in the forgetting. This lesson hands those chores to an agent that runs without your laptop open — scheduled in the cloud, triggered by a clock or an event or a person filing a ticket, doing the mechanical work and stopping at the one decision that needs a human.

The two tools reach this from opposite directions, and the difference is the lesson — the same split this unit keeps returning to. One composes a scheduled cloud agent you define declaratively: a trigger, a narrow profile, a task, on a cron. The other delegates to a cloud environment you assign work to like an RA — you open a ticket, the agent picks it up, investigates, and reports back. Schedule versus delegate; a clock that fires versus a colleague you hand a task. Both absorb the lab manager’s standing chores; both run somewhere other than your machine, which is the source of both their power and their distinct risk class. This serves the ingestion and maintenance stage — the work of keeping a live study live.

Contrary winds

Not for: a one-time backfill you'll run once and never again — scheduling has standing overhead and a standing risk surface, so don't put a single errand on a cron.

Mechanics

The chore made visible — the monthly cadence the rest of this lesson automates. Watch where it stops:

System Player film — A Month in the Life

step 1/7

Step 1 of 7.

The lab is no longer something you sit at. TLC publishes new taxi data every month — with a ~2-month lag — so a static analysis rots on contact with the next drop. The fix is a cadence: a routine set on a cron, waiting for a date that hasn't come yet.

The payoff is the stop. The routine does every mechanical step — checks the source, downloads, runs the C2 contracts, appends, re-estimates — and then opens a pull request and waits. It reports the updated estimates; it never publishes them. “Report, don’t act” on anything irreversible is the whole safety posture of an unattended agent, and the two tools below implement the same cadence through different primitives.

Claude Code Your tool

Routines — a scheduled cloud agent

A Routine is a cloud-hosted agent on a trigger: a cron schedule, a GitHub event, or an API call wakes it, it runs its task in a managed environment with no laptop involved, and it goes back to sleep. You define it declaratively — when it fires, what profile it runs under, what it does — and the cloud keeps the clock.

The project’s routine is the monthly ingest — it keeps current the same fixed slice you first fetched with the kit’s python3 get_data.py (Get the data), extending it as new months land. It fires after the TLC’s usual drop date and walks the cadence end to end:

Trigger: cron, the 5th of each month, 06:00 ET.
Profile: ingest-only (write data/raw/ and results/, nothing else).

1. Check the TLC CDN for a yellow-taxi month newer than the latest in
   data/raw/. If none, exit quietly — no PR, no noise.
2. Download it to a temp path, verify the checksum, move into data/raw/.
3. Run scripts/validate_contracts.py over the new month (the C2 gate).
   If it fails, STOP and open an issue with the contract output — do
   not append a month that breaks its contract.
4. Append to the warehouse; re-estimate the headline specs.
5. Open a PR titled "ingest: <month>" with the updated estimates and a
   diff of the elasticity table — and STOP. A human merges, or doesn't.

The structure is the safety. The routine’s world is exactly the cloud environment it runs in — it cannot reach a local MCP server or a file on your laptop, because your laptop is asleep, so the task is designed for a self-contained world: public CDN in, repository out. And the cron is not the interesting part; the stop is. Step 5 produces a reviewable diff and hands it to you. The estimate-changing decision — does this month’s data belong in the published result — is the one thing the routine is forbidden to make.

The same Routine machinery runs the E1 reproducibility self-test on a weekly cron, or fires the referee on a GitHub event. The monthly ingest is one instance of a general pattern: standing work, on a schedule, ending at a gate.

Codex Your tool

Cloud delegation — assign the chore like an RA

The other model is delegation to a cloud environment: a managed, sandboxed container where the agent does work you assign, asynchronously, without your machine. You do not write a cron; you hand it a task the way you would hand one to a research assistant — by filing it — and the agent picks it up, works in its isolated environment, and posts its findings back where you filed it.

The refresh investigation is assigned rather than scheduled. You open a GitHub issue and mention the cloud agent in it:

Title: Monthly refresh — is there a new TLC month?

@reviewer check the TLC CDN for a yellow-taxi month newer than the
latest in data/raw/. If there is one: download it, run
scripts/validate_contracts.py over it, and report back here whether it
passes its contract and what the headline elasticities would become if
we appended it. Do NOT append it or open a PR yet — just report.

The agent spins up its environment, does the investigation, and posts a comment on the issue: the new month’s number, the contract verdict, the estimates it would produce. The chore becomes a conversation in the issue tracker — auditable, assignable, and stopping by default at a report rather than an action. If the report looks right, you ask it to open the PR in a follow-up; the irreversible step stays yours.

For teams that live in a project tracker rather than GitHub, the same delegation flows from the tracker’s sidebar — file the refresh as a tracker task, the agent picks it up in its cloud environment and reports back on the task. The surface changes; the model does not: assign, investigate, report, await your word on anything that writes.

Translation guide
Intent	Claude Code	Codex
scheduled autonomous work	Routines (cloud-hosted, cron / GitHub-event / API triggers)	cloud tasks delegated via the issue tracker + GitHub integration
kick off the monthly refresh	a cron trigger fires the routine unattended	you (or a teammate) file an issue assigning it to the cloud agent
where the unattended work runs	a managed cloud environment — no local files or MCP servers reachable	an isolated cloud container — local-only resources are likewise out of reach
the irreversible step (publish the estimate-changing PR)	routine opens the PR and STOPS; a human merges	agent reports; a human asks for the PR; a human merges

Guardrails for unattended agents

This is the sober section, and it is shared because the discipline is identical regardless of which primitive runs the chore. An unattended agent is a different risk class from an interactive one: there is no human watching the step it is about to take, so every guardrail you lean on interactively — I’ll just glance at what it’s doing — is gone. Four non-negotiables:

A dedicated, narrower profile — never the interactive one. The routine that ingests data needs to write data/raw/ and results/ and nothing else. It does not need your full permission set, and the blast radius of an unattended agent is whatever you granted it while no one was looking. Build the profile for the chore, not for your convenience.
Spend caps. A scheduled agent with a loop and no budget ceiling is a bill with no upper bound. Cap the tokens and the wall-clock per run; a routine that blows its cap should stop and report, not push through.
“Report, don’t act” on anything irreversible. The default for an unattended agent facing a one-way door — publishing, deleting, merging, sending — is to describe what it would do and stop. The monthly ingest does every reversible step and halts at the PR. The irreversible step is a human’s.
Human approval on any estimate-changing PR. This is the specific case the whole lesson protects. A month that quietly shifts the published elasticities is exactly the month a person must look at. The PR is the gate; auto-merging it is how one bad TLC drop becomes your published result.

These are not paranoia; they are the price of the laptop being closed. The C2 hook protected a write you were present for; these protect a write made while you were asleep, which is strictly the more dangerous one.

Guided Run — The Standing Chore

Field Terminal — session: e2-routine Claude Code

Define a monthly-ingest routine on a cron with an ingest-only profile

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Guided Run — The Standing Chore

Field Terminal — session: e2-routine Claude Code

Define a monthly-ingest routine on a cron with an ingest-only profile

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Field Assignment

Artifact make check-e2 passes — the monthly refresh runs unattended, dry-run against the latest real TLC drop, ending at a PR/report a human approves

Stand up the monthly refresh under each tool and dry-run it against the latest real TLC drop. This is a both-tools exercise: the contrast in how the chore is triggered and where it stops is the deliverable.

[CC] Define the monthly-ingest Routine with a dedicated ingest-only profile, a spend cap, and a cron trigger. Dry-run it against the latest real TLC month: it must check the CDN, download, pass the C2 contracts, re-estimate, and stop at a PR — never merge.
[CX] File the refresh as an issue assigned to the cloud agent. Confirm it investigates in its isolated environment and reports back on the issue — the new month, the contract verdict, the would-be estimates — without writing.
For both: confirm the guardrails actually bind. Try to make each one take the irreversible step (merge / append) unattended and verify it refuses and reports instead.
Log behavior and cost for both runs in journal/: what each did autonomously, where each stopped, the token and wall-clock spend, and which guardrail you were most glad you set. Then make check-e2.

make check-e2 verifies the refresh ran end-to-end against a real month, that it stopped at a human-approval gate rather than publishing, and that the run cost and behavior are logged. This is the cadence E3 packages so a new lab inherits it in one command.

Milestone gate · make check-e2advances E2

The monthly refresh runs unattended against the latest real TLC drop — check CDN, download, C2 contracts, re-estimate
Designed for a self-contained world: public CDN in, repository out — no local files or MCP servers reachable.
The run stops at a human-approval gate (a PR or a report) and never publishes estimate-changing numbers itself
Auto-merging an estimate-changing PR is how one bad month becomes the published result.
The unattended agent runs under a dedicated, narrower profile with a spend cap — never the interactive profile
journal/ logs both runs' behavior and cost: what each did autonomously, where each stopped, token + wall-clock spend

Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.

Pitfalls & Gotchas

[both] 〜〜

Unattended agents are a different risk class — dedicated profile, always. The permission set that is fine when you are watching every step is a standing liability when no one is. A routine that runs at 3 a.m. under your full interactive profile is a key under the mat; the chore needs exactly the access the chore requires and not one scope more.
[both] 〜〜

Auto-merging an estimate-changing PR is how one bad month becomes the published result. The entire value of the monthly refresh is that it stops at a human before the numbers ship; wire it to merge itself and you have automated the one decision that needed judgment, turning a schema drift or a half-published TLC file into your headline elasticity with no one in the loop.
[CC]

A routine runs in the cloud: local-only MCP servers and files on your laptop are not reachable, because your laptop is closed. Design the routine’s world to be self-contained — public source in, repository out — or it will fail at 6 a.m. reaching for a server that only exists where you are asleep.

Check Your Bearings

E2 · 4 questions · unlimited retries, no timer

This check opens when the guided simulation above is complete — the questions assume you have seen the run.

(noted in your field journal as an override)

The interactive check needs JavaScript — without it this section shows only the quiz cover. The lesson text above is complete without the quiz; answers and journal recording require JavaScript.

Field journal

Log the monthly refresh: what the agent did autonomously, exactly where it stopped and waited for you, the run’s token and wall-clock cost, and which guardrail you were most glad you had set before you closed the laptop.

as of June 2026

There is no isomorphism here and this page has not pretended one. Claude Code composes a scheduled cloud agent you define declaratively — a trigger, a profile, a task on a cron — while Codex delegates to a cloud environment you assign work to through the issue tracker, with a project tracker as an alternate intake. Neither is the other’s primitive: a cron that fires unattended is not the same shape as a colleague you hand a ticket, even when both produce the monthly refresh and both stop at the same human-approval gate. The guardrails — a dedicated narrow profile, spend caps, report-don’t-act on the irreversible, human approval on the estimate-changing PR — are identical across both, because they are a discipline the laptop being closed demands, not a feature either vendor sells. See the parity matrix for the dated comparison.

Feature-parity matrix

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

Your position

Positions

the data manager

Position vacant — engaged at C2

write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter
the methodologist

Position vacant — engaged at C1

the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
the data engineer

Position vacant — engaged at C3

MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
the RA pool

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the overnight RA

Position vacant — engaged at D3

/loop supervision + Goal Mode runs over background estimation

est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine
the adviser

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the referee

Position vacant — engaged at D4

contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
the lab manager

Position vacant — engaged at E2

scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
the reproducibility checker

Position vacant — engaged at E1

headless invocation + the fresh-clone replication self-test + CI gates

est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
the the wall — the unstaffed midnight hours between a raw file and a first plot

Position vacant — engaged at A1

the bare agent loop (prompt → act → observe → fix), zero configuration

est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free
the you, working an order of magnitude faster — but only if you direct the work

Position vacant — engaged at A2

the command surface + five prompting patterns + context hygiene

est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
the the lab manual nobody writes — the institutional knowledge that lives in your head

Position vacant — engaged at B1

instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter
the careful senior who plans before touching data

Position vacant — engaged at B2

repo scaffold + pinned environments + read-only Plan mode reconnaissance

est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention
the the lab whose members don't overwrite each other

Position vacant — engaged at D2

git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
the the onboarding the lab never has to repeat

Position vacant — engaged at E3

lab-kit — the whole methodology packaged as a one-command install

est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
the the whole lab, orchestrated — the PI who designs the system instead of doing the work

Position vacant — engaged at F1

the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson	Role	Est. human-RA	Agent (yours when measured)
A1	the wall — the unstaffed midnight hours between a raw file and a first plot	an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work	~10 minutes for the quick win, plus the same task re-run in the other language for free
A2	you, working an order of magnitude faster — but only if you direct the work	the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong	~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1	the lab manual nobody writes — the institutional knowledge that lives in your head	~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down	written once in an hour; reloaded free at the start of every session thereafter
B2	careful senior who plans before touching data	~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots	an afternoon — most of it download wall-clock, not attention
B3	the data manager who guards the raw files — the person who says no near the master copies	permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads	two profiles configured once in minutes; the fence then holds every session, tired or not
C1	the methodologist — the one person who knows how the lab actually decides	the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do	an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2	data manager / QA who never sleeps	permanent vigilance — est. 2 weeks/year of load-checking and release-note reading	half a day to install and test the 9-line block; ~20 s per run thereafter
C3	the data engineer who wires the lab to its systems	days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes	register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1	the RA pool — and the adviser who critiques from outside	a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will	~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2	the lab whose members don't overwrite each other	the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time	two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3	overnight RA	one night shift per estimation batch — and the course runs several batches	~10 min to write the check or the objective; the night itself belongs to the machine
D4	an RA bench and the PI who keeps their results comparable	the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for	13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1	reproducibility checker	a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission	~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2	lab manager's standing chores	a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped	~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3	the onboarding the lab never has to repeat	six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over	~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1	the whole lab, orchestrated — the PI who designs the system instead of doing the work	each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits	the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed		0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.

The Pain

Why / When

Mechanics

Guardrails for unattended agents

Guided Run — The Standing Chore

Guided Run — The Standing Chore

Pitfalls & Gotchas

Parity note