E3 advanced ~30 min

Methodology in a Box

Absorbs: the onboarding the lab never has to repeat

Advances E3

The Pain

The methodology works. It took the whole course to build: the lab manual that briefs every session, the cleaning procedure with its filter cascade, the contract hooks that caught the schema drift, the RA subagents with their report contracts, the referee that argues from outside, the warehouse plumbing, the worktree discipline that keeps a 3 a.m. failure deletable. It lives in your dotfiles and your repo’s configuration and, more than you would like to admit, in your head — the order you do things in, the thing you always check, the reason that hook exists.

And it is going to graduate when you do. The new student joining in the fall will spend their first six weeks rediscovering what you already know, badly, while the careful machinery you built sits one directory over, uninstalled, because there is no command that installs it. The lab will re-onboard from scratch — the same dead ends, the same drift caught late, the same evening lost to a 100%-null column — not because the knowledge was lost but because it was never packaged. A methodology that lives in one person’s setup dies with that person’s tenure. The work was never just the analysis; it was the apparatus around the analysis, and an apparatus you cannot hand to the next person is an apparatus you have to rebuild every time the lab turns over.

Why / When

This lesson packages everything the course built into one installable artifact — call it lab-kit — so the methodology ships like software instead of dying like folklore. This is the distribution problem named in the methods literature as a first-class research output: a way of working is only reproducible if someone other than its author can adopt it whole.

The pipeline stage this serves is onboarding and distribution — the work of making the lab’s way of working portable. The role it absorbs is the onboarding nobody wants to run twice: instead of walking each new member through six weeks of setup, you hand them a kit and a command. It is worth doing precisely when the conventions have stopped changing — a packaged methodology is a release, and releasing something that is still in flux ships a wrong answer with a version number on it. Package the parts of the course that have settled into law; leave the parts you are still arguing about un-boxed.

Contrary winds

Not for: a convention that is still changing weekly — package a methodology once it has stopped moving, or you will ship a kit that is wrong by the time it installs.

Mechanics

What’s in the box

The kit is not new work — it is the course, inventoried. Everything you built is already an artifact; packaging is collecting the artifacts and declaring how they install. The box holds, by the lesson that produced each piece:

From	Artifact	What it carries
B1	instruction templates	the lab manual that briefs every session
C1	the skill library + the referee	the cleaning procedure and the cold critic, as invocable skills
C2	the hook suite	the contract gates that block silent corruption
C3	the MCP config	the warehouse and external-source plumbing
D1	the agent definitions	the RA subagents with their report contracts
D2 / D4	worktree + contract conventions	the isolation and provenance discipline, as documentation

The first four install as configuration; the last two install as documentation, because a convention is a thing you teach, not a binary you drop in. The skill is the reusable unit at the center of all of it — its anatomy (a trigger, an instruction body, the bundled scripts it calls) is what makes the cleaning procedure portable across labs, and it is the same anatomy whether the kit ships one skill or twenty.

Packaging

A loose pile of files is not a kit; a kit is a manifest plus the files, versioned and installable in one step. Both tools bundle skills, hooks, agents, and MCP config into a single distributable unit and install it with one command — the dialects differ in the packaging format and where the bundle is published.

Claude Code

The unit of distribution is a plugin: a manifest that bundles the skills, the hook suite, the agent definitions, and the MCP config into one installable package, published to a marketplace a new member installs from.

{
  "name": "lab-kit",
  "version": "1.0.0",
  "description": "The weather-mobility lab's methodology, packaged.",
  "skills": ["clean-trips", "paper-summary", "demanding-adviser"],
  "hooks": "hooks/settings.json",
  "agents": ["agents/eda-borough.md"],
  "mcpServers": "mcp/.mcp.json"
}

The manifest names the parts; the plugin carries them. A new member adds the marketplace and installs the plugin, and the cleaning skill, the contract hooks, the RA subagents, and the warehouse MCP server all arrive together — the methodology, in one command. The skills are the same .claude/skills/<name>/SKILL.md units from C1, now bundled rather than copied by hand; the hooks are C2’s settings.json block; the agents are D1’s markdown definitions. Nothing in the box is new. The plugin is the envelope.

Codex

Distribution has two layers. The first is a plugin bundle — the same skills, hooks, agents, and MCP config collected into one installable package. The second is layered team config: a committed .codex/ directory that carries the conventions a whole team shares, merging over each member’s personal setup so the lab’s defaults arrive with the repo.

name = "lab-kit"
version = "1.0.0"
description = "The weather-mobility lab's methodology, packaged."
skills = ["clean-trips", "paper-summary", "demanding-adviser"]
hooks = "hooks/hooks.json"
agents = ["agents/eda-borough.toml"]
mcp = "mcp/config.toml"

The skills are the same .agents/skills/<name>/SKILL.md units from C1, now bundled; the hooks are C2’s [hooks] table; the agents are D1’s TOML definitions. Individual skills publish to the agentskills.io standard — the open skill format both ecosystems read — so the cleaning skill is installable on its own, not only as part of the kit. A new member adds the team config and installs the bundle, and the whole apparatus arrives with the repo. Nothing in the box is new; the bundle is the envelope and the layered config is how it reaches the team.

The payoff

Here is the scene the whole lesson exists for. A new student clones an empty scratch repository, fetches the course data with one command (python3 get_data.py — Get the data), installs lab-kit with one more, and types a single prompt:

> Clean the latest TLC month and profile the demand–weather relationship.

And it works. The cleaning skill fires with its filter cascade, the contract hooks guard the writes, the warehouse MCP server answers the queries, the EDA subagent profiles the relationship — none of which the student configured, all of which arrived in the kit. Six weeks of onboarding collapses into one install and one prompt. That is the test of a methodology distribution: not that the box contains the right files, but that an empty repo plus the box plus one sentence reproduces the lab’s way of working. If the prompt above runs clean in a scratch repo, the methodology is portable. If it does not, the kit is a folder, not a release.

Guided Run — Methodology in a Box

Field Terminal — session: e3-labkit Claude Code

Assemble lab-kit as a plugin bundling the C1 skills, C2 hooks, D1 agents, C3 MCP config

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Guided Run — Methodology in a Box

Field Terminal — session: e3-labkit Claude Code

Assemble lab-kit as a plugin bundling the C1 skills, C2 hooks, D1 agents, C3 MCP config

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Field Assignment

Artifact make check-e3 passes — lab-kit installs in a scratch repo and one-prompt clean-and-profile works out of the box

Package the course into lab-kit and prove it installs. The deliverable is a smoke test: an empty repo, one install, one prompt, a clean profile.

Claude Code

Assemble lab-kit as a plugin: the manifest bundling C1’s skills, C2’s hook suite, D1’s agent definitions, and C3’s MCP config, with the D2/D4 conventions as bundled documentation. Version it 1.0.0.
Publish it to a marketplace (a local one is fine for the smoke test).
In a fresh scratch repo with none of your config, add the marketplace and install the plugin in one command.
Run the smoke test: the single prompt clean the latest month and profile the demand–weather relationship. It must work with zero further setup — skill fires, hooks guard, MCP answers, subagent profiles.
Bundle no credentials and no machine-specific paths — confirm by reading the kit as a stranger would. File the smoke-test transcript in journal/. Then make check-e3.

Codex

Assemble lab-kit as a plugin bundle plus a layered team config: the manifest bundling C1’s skills, C2’s hook suite, D1’s agent definitions, and C3’s MCP config, with the D2/D4 conventions as bundled documentation. Version it 1.0.0; publish the standalone skills to the agentskills.io standard.
Stage the bundle where a new member installs from.
In a fresh scratch repo with none of your config, add the team config and install the bundle in one command.
Run the smoke test: the single prompt clean the latest month and profile the demand–weather relationship. It must work with zero further setup — skill fires, hooks guard, MCP answers, subagent profiles.
Bundle no credentials and no machine-specific paths — confirm by reading the kit as a stranger would. File the smoke-test transcript in journal/. Then make check-e3.

make check-e3 verifies two things: the kit’s manifest is complete and versioned, and the one-prompt smoke test runs clean in a scratch repo with nothing of yours installed. A kit that passes this is the lab’s methodology made portable — the thing F1 inherits when it ships the analysis as a replication package the world can run.

Milestone gate · make check-e3advances E3

lab-kit bundles C1 skills, the C2 hook suite, D1 agent definitions, and C3 MCP config, with D2/D4 conventions as documentation
The kit carries a complete, versioned manifest (1.0.0) — release it like software, not folklore
An unversioned kit just relocates the drift it was meant to stop.
Installed in a fresh scratch repo with none of your config, the one-prompt clean-and-profile works out of the box
The empty-repo one-prompt run is the only honest check that the box is whole.
No credentials and no machine-specific paths in the kit — it ships the methodology, not your laptop

Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.

Pitfalls & Gotchas

[both] 〜〜

An unversioned lab kit drifts exactly like the un-codified methodology it replaced. A kit with no version number is a moving target two members install on two days and get two different labs from — which is the folklore problem with a manifest stapled to it. Release the kit like software: a version, a changelog, a tag. The whole point was to stop the methodology from drifting; an unversioned kit just relocates the drift.
[both] 〜〜

Bundling credentials or machine-specific paths into the kit ships your laptop, not your methodology. An API key, a /Users/you/ path, a warehouse URL only your machine resolves — each one turns a portable kit into one that works exactly nowhere but where it was built. Read the kit as a stranger on a clean machine would: anything that only makes sense from your seat does not belong in the box.
[both]

The smoke test is the spec, not a courtesy. A kit that installs cleanly and looks complete can still fail the one prompt that matters because a skill quietly depends on a file the bundle forgot. The empty-repo one-prompt run is the only honest check that the box is whole — pass it before you call the kit released, every version.

Check Your Bearings

E3 · 4 questions · unlimited retries, no timer

This check opens when the guided simulation above is complete — the questions assume you have seen the run.

(noted in your field journal as an override)

The interactive check needs JavaScript — without it this section shows only the quiz cover. The lesson text above is complete without the quiz; answers and journal recording require JavaScript.

Field journal

File the smoke test: the one install command, the one prompt, what fired without configuration — and one thing the kit forgot the first time you ran it in a truly empty repo.

as of June 2026

Packaging is genuine parity in shape, with a distribution wrinkle. Both tools bundle skills, hooks, agents, and MCP config into one installable unit that arrives with a single command — a plugin published to a marketplace on one side, a plugin bundle plus layered team config on the other. The wrinkle is the open standard: individual skills publish to agentskills.io, the shared skill format both ecosystems read, so a cleaning skill written for one tool is increasingly installable in the other — the convergence point this whole course bets on. The conventions in the box (worktree and contract discipline) ship as documentation in both, because a way of working is taught, not dropped in. See the parity matrix for the dated detail.

Feature-parity matrix

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

Your position

Positions

the data manager

Position vacant — engaged at C2

write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter
the methodologist

Position vacant — engaged at C1

the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
the data engineer

Position vacant — engaged at C3

MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
the RA pool

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the overnight RA

Position vacant — engaged at D3

/loop supervision + Goal Mode runs over background estimation

est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine
the adviser

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the referee

Position vacant — engaged at D4

contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
the lab manager

Position vacant — engaged at E2

scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
the reproducibility checker

Position vacant — engaged at E1

headless invocation + the fresh-clone replication self-test + CI gates

est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
the the wall — the unstaffed midnight hours between a raw file and a first plot

Position vacant — engaged at A1

the bare agent loop (prompt → act → observe → fix), zero configuration

est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free
the you, working an order of magnitude faster — but only if you direct the work

Position vacant — engaged at A2

the command surface + five prompting patterns + context hygiene

est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
the the lab manual nobody writes — the institutional knowledge that lives in your head

Position vacant — engaged at B1

instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter
the careful senior who plans before touching data

Position vacant — engaged at B2

repo scaffold + pinned environments + read-only Plan mode reconnaissance

est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention
the the lab whose members don't overwrite each other

Position vacant — engaged at D2

git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
the the onboarding the lab never has to repeat

Position vacant — engaged at E3

lab-kit — the whole methodology packaged as a one-command install

est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
the the whole lab, orchestrated — the PI who designs the system instead of doing the work

Position vacant — engaged at F1

the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson	Role	Est. human-RA	Agent (yours when measured)
A1	the wall — the unstaffed midnight hours between a raw file and a first plot	an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work	~10 minutes for the quick win, plus the same task re-run in the other language for free
A2	you, working an order of magnitude faster — but only if you direct the work	the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong	~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1	the lab manual nobody writes — the institutional knowledge that lives in your head	~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down	written once in an hour; reloaded free at the start of every session thereafter
B2	careful senior who plans before touching data	~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots	an afternoon — most of it download wall-clock, not attention
B3	the data manager who guards the raw files — the person who says no near the master copies	permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads	two profiles configured once in minutes; the fence then holds every session, tired or not
C1	the methodologist — the one person who knows how the lab actually decides	the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do	an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2	data manager / QA who never sleeps	permanent vigilance — est. 2 weeks/year of load-checking and release-note reading	half a day to install and test the 9-line block; ~20 s per run thereafter
C3	the data engineer who wires the lab to its systems	days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes	register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1	the RA pool — and the adviser who critiques from outside	a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will	~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2	the lab whose members don't overwrite each other	the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time	two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3	overnight RA	one night shift per estimation batch — and the course runs several batches	~10 min to write the check or the objective; the night itself belongs to the machine
D4	an RA bench and the PI who keeps their results comparable	the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for	13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1	reproducibility checker	a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission	~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2	lab manager's standing chores	a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped	~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3	the onboarding the lab never has to repeat	six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over	~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1	the whole lab, orchestrated — the PI who designs the system instead of doing the work	each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits	the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed		0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.

The Pain

Why / When

Mechanics

What’s in the box

Packaging

✳ Claude Code

⬡ Codex

The payoff

Guided Run — Methodology in a Box

Guided Run — Methodology in a Box

✳ Claude Code

⬡ Codex

Pitfalls & Gotchas

Parity note

Claude Code

Codex

Claude Code

Codex