C1 intermediate ~60 min

Methods, Written Down

Absorbs: the methodologist

Advances C1

The Pain

There is a person in every good lab whom nobody quite knows how to replace. She is the one who decides that a taxi doing 78 miles an hour through Midtown is a GPS error and not a fast cab, and who can tell you why the cutoff is 65 and not 70 — it was 70 once, until the winter the tunnel data came in clean and proved otherwise. She reads a paper and, in four minutes, can say what identifies the effect and what would overturn it. When she critiques your design she asks the same three questions she always asks, and they are always the right three.

None of this is written down. It lives in her judgment, accreted over years, and it transfers exactly one way: by sitting next to her for a semester and absorbing it. When she graduates, the lab does not lose a pair of hands. It loses its method — the accumulated rulings about how this group reads, cleans, and argues. The next student rediscovers the 65-mph cutoff the hard way, by shipping an elasticity estimate inflated by phantom speed demons, and the adviser sighs the sigh of a man who has explained this before to people who are no longer here.

You are, this term, both the methodologist and her only apprentice. Every decision you make about this project is a small piece of method, and right now all of it lives in your head and your shell history, which is to say nowhere your future self can find it.

Why / When

The temptation is to read what follows as a macro system — a way to alias a long command to a short name. It is not, and the distinction is the whole lesson. The Goldsmith-Pinkham insight is that a skill is codified methodology, not a saved command. Writing one means teaching the agent your lab’s decision-making process: the thresholds together with the reasons they hold, the structure your group imposes on a paper, the questions your adviser cannot stop himself from asking. The numbers are the easy part; the rationales are the method, and the rationales are what a macro throws away.

Place the skill in the unit’s taxonomy — two of the three rows you now know, the third completed in C2:

Mechanism	Nature	Runs
instruction file	always-on context	every session, as advice
skill	on-demand procedure	when invoked
hook	enforced rule	always, mechanically (C2)

An instruction file is what the agent always knows; a skill is what it knows how to do when you ask. The cost model is the point of the middle row: a skill is invisible until invoked — its procedure does not crowd the context window every session, only when its moment comes. In the research pipeline, skills sit across the whole workflow — cleaning, reading, critiquing — wherever a decision recurs with a reason behind it. The role they absorb is the methodologist: her judgment, written down once, applied identically whether you invoke it on week three or hand the repo to a student in two years.

The three rows are a decision, not a hierarchy. Before the syntax, walk the choice on a piece of knowledge you actually have — the navigator returns the ruling the situation earns, including the honest “just run it” when codifying would cost more than rerunning:

Decision rubricInstruction file, skill, or hook?

You have a piece of lab knowledge to give the agent. Which of the three mechanisms should carry it? Answer for the knowledge in front of you — the taxonomy turns on how it runs, not how important it is.

When does this knowledge need to apply?

Not how often you'll use it — how often the agent needs it loaded. That is the cost model the taxonomy is built on.

This navigator walks you to one ruling at a time; without JavaScript, here is every ruling the rubric can reach.

Recommended: Instruction file — Always-on context belongs in the instruction file: it briefs the agent every session at the cost of crowding the context window, so it earns its place only for knowledge that bears on most work. It is advice, not a law — exactly right for conventions and warnings, wrong for anything that must not be skippable under pressure.
Recommended: Hook (enforced rule) — A warning the agent can rationalize away is not enforcement. Promote it to a hook so it fires mechanically — the C2 distinction: prompted behavior can be ignored, enforced behavior cannot. The instruction file may still carry the rationale; the hook removes the option of skipping the rule.
Recommended: Skill — On-demand judgment with rationales is a skill. Its body costs nothing until the description matches and it is invoked (progressive disclosure), so a long procedure belongs here, not in the always-on instruction file. Write the description as a trigger and keep the rationales — strip them and you have written an alias.
Use with care: Just run it — The honest non-use case (C1's notFor): codifying a one-off costs more than rerunning it. A skill is for judgment you will reuse and transfer, not a command you'll forget. If you find yourself running it a third time with a reason behind it, that is the moment it becomes a skill.
Recommended: PreToolUse hook — A rule that must prevent an action belongs at PreToolUse — it sees the proposed call and can refuse it before anything happens. This is the enforcement layer B3's profiles also occupy; the hook is where you encode lab-specific gates the permission rules don't.
Recommended: PostToolUse hook — A rule that inspects what a tool produced belongs at PostToolUse — it runs after the call and can reject the result, feeding its stderr back to the agent (exit code 2 = block-and-report). This is the C2 golden beat: the contract suite runs after every warehouse write, and the QA gate and the engineer talk to each other.

Contrary winds

Not for: a one-off transformation you will run once and never explain to anyone — codifying it costs more than rerunning it; a skill is for judgment you will reuse, not a command you will forget.

Mechanics

A skill is a Markdown file with frontmatter. The shared anatomy first, then the two dialects, then the three skills the project actually needs.

Anatomy of a skill

Every skill is one file with two parts. The frontmatter carries a name and — load-bearing — a description. The body is the procedure: prose, steps, code, the rationales. The description is not a label; it is the trigger the agent matches your request against to decide whether this skill applies. Write it as the answer to “when should you reach for this,” not as a title.

---
name: clean-trips
description: >-
  Apply the lab's taxi-trip cleaning SOP — documented filters with
  rationales, emitting a filter-cascade table. Use whenever raw TLC
  trip data is loaded and needs to become an analysis-ready table.
---

## When to use
Raw monthly TLC parquet has been ingested and must be filtered to
analysis-ready trips, with every dropped row counted and justified.

## Procedure
1. … (the steps, with thresholds AND the reason each holds)

The key property is progressive disclosure: the body costs nothing until the description matches and the skill is invoked. You can keep a shelf of twenty skills and pay for none of them until their moment arrives — which is exactly why a skill, not the instruction file, is where a long procedure belongs.

Where skills live and how you invoke them

The file format is identical across both tools; the directory and the invocation token differ.

Claude Code

A skill is a SKILL.md under .claude/skills/<name>/, committed with the repo so it arrives with the clone. You invoke it by its name as a slash command — authoring clean-trips gives you /clean-trips:

.claude/skills/
├── clean-trips/SKILL.md       → invoke /clean-trips
├── paper-summary/SKILL.md     → invoke /paper-summary
└── demanding-adviser/SKILL.md → invoke /demanding-adviser

Because the directory is committed, the method travels with the project: a collaborator who clones the repo inherits your cleaning SOP, your reading template, and your adviser’s three questions, with no hand-off conversation. The skill is the hand-off conversation, frozen.

Codex

A skill is a SKILL.md under .agents/skills/<name>/, committed with the repo. You invoke it by name with a leading sigil — authoring clean-trips gives you $clean-trips:

.agents/skills/
├── clean-trips/SKILL.md       → invoke $clean-trips
├── paper-summary/SKILL.md     → invoke $paper-summary
└── demanding-adviser/SKILL.md → invoke $demanding-adviser

The directory follows the agentskills.io open standard — the same on-disk shape other agent runners read, so a skill you author here is not locked to one vendor. If you would rather not hand-write the frontmatter, the bundled $skill-creator interviews you about the procedure and scaffolds a conforming SKILL.md; treat its output as a first draft of method, not the method itself.

Three skills, one per archetype

The project needs exactly three skills to start, and they are deliberately different kinds of method: a procedure, a piece of scholarship, a unit of judgment. Author all three in both dialects from the same policy notes; the bodies are identical, only the directory and invocation differ.

`/clean-trips` — a procedure with rationales

The cleaning SOP is where the methodologist’s judgment is densest. It runs on the raw monthly trip parquet — the fixed course slice you fetched with the kit’s python3 get_data.py (Get the data). The SKILL.md does not say “drop fast trips”; it says which speed, why that number, and demands the dropped count be reported — so the filter cascade becomes the report’s data appendix instead of a forgotten command. The filters are statistics, so the body’s reference implementation comes in both of the lab’s languages:

Python

import duckdb

# Each filter is a (predicate, rationale) pair. The rationale is the
# method; the predicate is just how the method is spelled in SQL.
FILTERS = [
    ("fare_amount >= 0",
     "negative fares are voided transactions, not trips — drop and count"),
    ("trip_distance > 0 AND tpep_dropoff_datetime > tpep_pickup_datetime",
     "zero-distance / non-positive-duration rows are meter glitches"),
    ("trip_distance / (epoch(tpep_dropoff_datetime - tpep_pickup_datetime)/3600.0) "
     "< 65 OR borough <> 'Manhattan'",
     "speed > 65 mph in Manhattan is a GPS error, not a fast cab — "
     "the cutoff was 70 until the clean-tunnel winter proved 65"),
]

con = duckdb.connect("warehouse.duckdb")
cascade, surviving = [], "trips_raw"
for i, (pred, why) in enumerate(FILTERS):
    n = con.sql(f"SELECT count(*) FROM ({surviving}) t WHERE {pred}").fetchone()[0]
    cascade.append((i, why, n))               # the audit trail, row by row
    surviving = f"SELECT * FROM ({surviving}) t WHERE {pred}"

con.sql(f"CREATE OR REPLACE TABLE trips_clean AS {surviving}")
for i, why, n in cascade:
    print(f"  filter {i}: {n:>12,} survive  — {why}")

This block is orchestration, not statistics — it’s the same in R. Ask the agent to translate (Lesson A1).

R

library(duckdb)

# Each filter is a (predicate, rationale) pair. The rationale is the
# method; the predicate is just how the method is spelled in SQL.
filters <- list(
  c("fare_amount >= 0",
    "negative fares are voided transactions, not trips — drop and count"),
  c("trip_distance > 0 AND tpep_dropoff_datetime > tpep_pickup_datetime",
    "zero-distance / non-positive-duration rows are meter glitches"),
  c(paste("trip_distance / (epoch(tpep_dropoff_datetime - tpep_pickup_datetime)",
          "/3600.0) < 65 OR borough <> 'Manhattan'"),
    paste("speed > 65 mph in Manhattan is a GPS error, not a fast cab —",
          "the cutoff was 70 until the clean-tunnel winter proved 65"))
)

con <- dbConnect(duckdb(), "warehouse.duckdb")
surviving <- "trips_raw"
for (i in seq_along(filters)) {
  pred <- filters[[i]][1]; why <- filters[[i]][2]
  n <- dbGetQuery(con, sprintf(
    "SELECT count(*) FROM (SELECT * FROM %s) t WHERE %s", surviving, pred))[[1]]
  cat(sprintf("  filter %d: %12s survive  — %s\n",
              i, format(n, big.mark = ","), why))    # the audit trail
  surviving <- sprintf("SELECT * FROM (%s) t WHERE %s", surviving, pred)
}
dbExecute(con, sprintf("CREATE OR REPLACE TABLE trips_clean AS %s", surviving))

Read the comments, not the SQL: a new RA who runs this learns why 65, not merely that 65. That is the difference between a skill and an alias. In C2 this exact procedure runs under an enforced contract — the skill encodes the method, the hook removes the option of skipping it.

`/paper-summary` — scholarship with a fixed shape

The lit-review workhorse for D1 and F1. Its method is a structure: the six things your group reads every paper for, in the same order, so two summaries are comparable and a related-work section assembles itself. The skill’s body is mostly the template and the rule that every field cite a page.

## Procedure
Read the paper and emit exactly these six fields, one page total.
Cite a page or section for every empirical claim — no field may be
filled from the abstract alone.

1. **Question** — what causal or descriptive question, stated in one sentence.
2. **Identification** — what variation identifies the effect; the key assumption.
3. **Data** — source, unit, period, sample size, notable construction choices.
4. **Findings** — the headline estimate with its sign, magnitude, and precision.
5. **Limitations** — what the authors concede, plus one they don't.
6. **Relation to ours** — does it support, complicate, or supersede our design.

`/demanding-adviser` — judgment, written as questions

The hardest archetype to codify and the most valuable. It carries no procedure to run; it carries your adviser’s stance — the skeptical questions a good critique always asks of a research design. Used on your own design in D1, and the seed that becomes the adversarial referee in D4 and F1.

## Procedure
Critique the supplied research design as a demanding but fair adviser.
Be specific to THIS design — never generic. For each issue, name the
threat and what evidence would resolve it. Always ask, at minimum:

- **What is the identifying variation?** Name it. If it is "weather,"
  what makes it as-good-as-random conditional on the controls?
- **What would falsify this?** State the result that, if found, would
  sink the claim — and whether the design can produce it.
- **Where does leakage hide?** Post-treatment controls, full-series
  rolling statistics, random CV where the data are temporal.

Rank the three most serious objections. Do not soften them.

Writing skills that work

Four rules, learned the hard way, that separate a skill from a wish:

The description is an interface. A vague description never triggers, or triggers always; either way the agent’s judgment about when to use the skill is only as good as the sentence you wrote. Describe the situation, not the skill.
Checkable steps over vibes. “Clean the data sensibly” is not a method. “Drop fare_amount < 0, report the count” is. If a step cannot be checked, it cannot be transferred.
Demand evidence in the output. A skill that emits a cascade table, a cited summary, a ranked objection list produces something you can audit. One that emits a paragraph of reassurance produces compliance without judgment.
Version skills like code; review them like protocols. They drift from actual lab practice the same way a lab manual does. Re-read them quarterly, and test a skill the way you test a function — run it on a known input and check the output is the method you meant.

Field Assignment

Artifact make check-c1 passes

Build the methodologist into the repo. Author all three skills in both dialects from one set of policy notes, then put two of them to work — /paper-summary on real papers, the adviser on your own brief.

Claude Code

Author clean-trips, paper-summary, and demanding-adviser under .claude/skills/<name>/SKILL.md, each with a description written as a trigger and a body carrying thresholds with rationales.
Run /clean-trips on one month and confirm the filter cascade table prints a count and a reason per filter.
Run /paper-summary on two related papers from the project’s reading list; check every field cites a page, none from the abstract.
Run /demanding-adviser on report/brief.md and log its three hardest questions — verbatim — in journal/c1-adviser.md, marking the one you cannot yet answer.
make check-c1.

Codex

Author clean-trips, paper-summary, and demanding-adviser under .agents/skills/<name>/SKILL.md, each with a description written as a trigger and a body carrying thresholds with rationales.
Run $clean-trips on one month and confirm the filter cascade table prints a count and a reason per filter.
Run $paper-summary on two related papers from the project’s reading list; check every field cites a page, none from the abstract.
Run $demanding-adviser on report/brief.md and log its three hardest questions — verbatim — in journal/c1-adviser.md, marking the one you cannot yet answer.
make check-c1.

make check-c1 verifies the three skills exist with non-empty descriptions, that /clean-trips emits a cascade with one rationale per filter, and that journal/c1-adviser.md records three adviser questions. The cleaning skill feeds straight into C2, where it runs under contract; the adviser feeds D1 and is sharpened into D4’s referee.

Milestone gate · make check-c1advances C1

All three skills authored in both dialects from the same policy notes
/clean-trips (procedure), /paper-summary (scholarship), /demanding-adviser (judgment).
/clean-trips carries every threshold WITH its rationale, not bare numbers
speed > 65 mph in Manhattan = GPS error; negative fare = voided transaction — drop and count.
Each skill's frontmatter description reads as a trigger, not a title
The description is the interface the agent matches against; vague descriptions never fire.
/paper-summary run on two related papers produces the structured one-page read each
Question, identification, data, findings, limitations, relation to our project.
/demanding-adviser run on the project brief; its three hardest questions logged in journal/
What is the identifying variation? What would falsify this? Where does leakage hide?

Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.

Pitfalls & Gotchas

[both] 〜〜

Skills without rationales produce compliance without judgment. An RA who runs a /clean-trips that only says “drop speed > 65” learns a number; one who reads “the cutoff was 70 until the clean-tunnel winter proved 65” learns the method, and knows when to revisit it. The rationale lines are the only part that transfers — strip them and you have written an alias, not a skill.
[both]

Vague descriptions never trigger, or always do — the description is the interface, not a label. “Helps with data” matches everything and nothing; “apply the TLC cleaning SOP when raw trip parquet is loaded” matches exactly its moment. Write the description for the agent’s matching decision, not for a human skimming a list.
[both] 〜〜

Un-reviewed skills drift from actual lab practice. The cutoff moves, a new schema arrives, the adviser starts asking a fourth question — and the SKILL.md still encodes last year’s method, applied at machine speed and machine confidence. Review them quarterly, like a protocol, and re-run them on a known input to confirm the output is still the method you meant.
[CX]

Legacy ~/.codex/prompts still work, but they are superseded by the .agents/skills/ standard. A prompt is per-user and invisible to the repo; a skill is committed and travels with the clone. Migrate old prompts into skills so the method ships with the project, not with your laptop.

Check Your Bearings

C1 · 4 questions · unlimited retries, no timer

Question 1Choose one
Your /clean-trips skill drops Manhattan trips faster than 65 mph. A new RA clones the repo and runs it. What does a well-written skill transfer that a saved command would not?
The reason the cutoff is 65 — so the RA learns the method, not just the number
A faster execution path than running the SQL by hand
Automatic enforcement that the filter always runs
Question 2Match the dialects
Match each mechanism to when it runs. This is the unit's taxonomy — two rows now, the third in C2.
instruction file
skill
hook
Question 3Choose onedialect check — Claude Code
You author .claude/skills/clean-trips/SKILL.md but its description reads Helps with data cleaning. You run /clean-trips and it fires fine — yet later the agent never reaches for it on its own. Why?
The description is the interface the agent matches against — Helps with data matches everything and nothing
Skills under .claude/skills/ must be invoked explicitly and can never auto-trigger
The body is missing, so there is nothing to run
Question 4Choose onedialect check — Codex
You have the cleaning SOP as a ~/.codex/prompts entry from last term. A collaborator clones the repo and cannot find it. What is the fix?
Migrate it into .agents/skills/clean-trips/SKILL.md so the method ships with the clone
Tell the collaborator to copy your ~/.codex/prompts directory
Nothing — prompts and skills are interchangeable for collaborators

The interactive check needs JavaScript — without it this section shows only the quiz cover. The lesson text above is complete without the quiz; answers and journal recording require JavaScript.

Field journal

Log the three hardest questions /demanding-adviser asked of your project brief — and which one you have no answer for yet.

as of June 2026

Skills are a clean parity feature: both tools read an identical SKILL.md (frontmatter name + description-as-trigger, Markdown body) and offer progressive disclosure, differing only in directory and invocation token — .claude/skills/ with /name on one side, .agents/skills/ with $name on the other. Codex’s directory follows the agentskills.io open standard and ships a $skill-creator scaffolder; Claude Code has no first-party scaffolder but the format is the same, so a hand-authored skill is portable either direction. The real asymmetry is upstream of the file: Codex still honors legacy ~/.codex/prompts, a per-user surface with no Claude Code analogue and no reason to prefer it over a committed skill.

Feature-parity matrix

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

Your position

Positions

the data manager

Position vacant — engaged at C2

write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter
the methodologist

Position vacant — engaged at C1

the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
the data engineer

Position vacant — engaged at C3

MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
the RA pool

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the overnight RA

Position vacant — engaged at D3

/loop supervision + Goal Mode runs over background estimation

est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine
the adviser

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the referee

Position vacant — engaged at D4

contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
the lab manager

Position vacant — engaged at E2

scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
the reproducibility checker

Position vacant — engaged at E1

headless invocation + the fresh-clone replication self-test + CI gates

est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
the the wall — the unstaffed midnight hours between a raw file and a first plot

Position vacant — engaged at A1

the bare agent loop (prompt → act → observe → fix), zero configuration

est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free
the you, working an order of magnitude faster — but only if you direct the work

Position vacant — engaged at A2

the command surface + five prompting patterns + context hygiene

est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
the the lab manual nobody writes — the institutional knowledge that lives in your head

Position vacant — engaged at B1

instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter
the careful senior who plans before touching data

Position vacant — engaged at B2

repo scaffold + pinned environments + read-only Plan mode reconnaissance

est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention
the the lab whose members don't overwrite each other

Position vacant — engaged at D2

git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
the the onboarding the lab never has to repeat

Position vacant — engaged at E3

lab-kit — the whole methodology packaged as a one-command install

est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
the the whole lab, orchestrated — the PI who designs the system instead of doing the work

Position vacant — engaged at F1

the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson	Role	Est. human-RA	Agent (yours when measured)
A1	the wall — the unstaffed midnight hours between a raw file and a first plot	an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work	~10 minutes for the quick win, plus the same task re-run in the other language for free
A2	you, working an order of magnitude faster — but only if you direct the work	the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong	~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1	the lab manual nobody writes — the institutional knowledge that lives in your head	~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down	written once in an hour; reloaded free at the start of every session thereafter
B2	careful senior who plans before touching data	~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots	an afternoon — most of it download wall-clock, not attention
B3	the data manager who guards the raw files — the person who says no near the master copies	permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads	two profiles configured once in minutes; the fence then holds every session, tired or not
C1	the methodologist — the one person who knows how the lab actually decides	the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do	an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2	data manager / QA who never sleeps	permanent vigilance — est. 2 weeks/year of load-checking and release-note reading	half a day to install and test the 9-line block; ~20 s per run thereafter
C3	the data engineer who wires the lab to its systems	days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes	register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1	the RA pool — and the adviser who critiques from outside	a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will	~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2	the lab whose members don't overwrite each other	the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time	two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3	overnight RA	one night shift per estimation batch — and the course runs several batches	~10 min to write the check or the objective; the night itself belongs to the machine
D4	an RA bench and the PI who keeps their results comparable	the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for	13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1	reproducibility checker	a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission	~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2	lab manager's standing chores	a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped	~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3	the onboarding the lab never has to repeat	six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over	~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1	the whole lab, orchestrated — the PI who designs the system instead of doing the work	each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits	the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed		0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.

The Pain

Why / When

Mechanics

Anatomy of a skill

Where skills live and how you invoke them

✳ Claude Code

⬡ Codex

Three skills, one per archetype

/clean-trips — a procedure with rationales

Python

R

/paper-summary — scholarship with a fixed shape

/demanding-adviser — judgment, written as questions

Writing skills that work

✳ Claude Code

⬡ Codex

Pitfalls & Gotchas

Parity note

Claude Code

Codex

`/clean-trips` — a procedure with rationales

`/paper-summary` — scholarship with a fixed shape

`/demanding-adviser` — judgment, written as questions

Claude Code

Codex