The Pain
The new RA is bright, fast, and completely uninformed. She does not know
that data/raw/ is never to be edited, that the borough of a pickup
comes from the lookup table and not the shapefile, that the zone ids run
two past the polygons, that everything in the warehouse is stored in
local time and the raw files are in UTC. She does not know that the
February file, the one everyone trusts, quietly carries voided meters
and refunds that have to be filtered before a single number is reported.
None of this is written down. It lives in your head and in the scar
tissue of three afternoons you would rather not relive.
So you explain it. You walk her through the directory, you flag the expensive query that once ran for forty minutes, you say never load a whole Parquet into memory twice because it matters. By Thursday she is useful. The following Monday a different RA arrives, equally bright, equally uninformed, and you explain it again. And then again. The explaining never compounds, because the person you are onboarding forgets everything the moment they leave — not from carelessness, but by construction. They are new every morning. You have hired, without quite noticing, a colleague with no memory of yesterday, and you are paying for that absence in the most expensive currency you own, which is your own repeated attention.
Why / When
An instruction file is the onboarding document you write once and the agent reads at the start of every session: where data lives, what columns mean, which operations are expensive, what must never be done. It is the lab manual nobody writes, because in a human lab the manual lives in people’s heads and walks out the door at five o’clock. With an agent, the manual is a file, and the file does not forget.
This is the cheapest, highest-leverage thing in the entire course, and it is also the least glamorous: prose in a text file. It sits at the very front of the project — before the first transform, before the first plot — and it pays out on every session that follows. The role it absorbs is the institutional knowledge that usually exists only as folklore; the agent stops being a permanently-new RA and starts every session already briefed.
It is not a place for aspirations. “We value clean, well-tested code”
is a sentence an agent cannot act on. “Run ruff check before every
commit; never load a full Parquet — query through DuckDB” is. An
instruction file that reads like a mission statement is worse than no
file at all, because it consumes context and teaches nothing.
Contrary winds
Not for: a throwaway script in a fresh directory you will delete by lunch — onboarding earns its keep across sessions, and a one-shot has none.
Mechanics
Both tools read a plain-text manual at session start. The concept is shared; the filenames, scoping rules, and budget differ, and that is the tab split below. After the tabs, one decision table both tools share, and then the demonstration that makes the whole argument.
What goes in the manual
Independent of tool, a good lab manual is a list of checkable facts and prohibitions, ordered by how much damage getting them wrong does:
- Data dictionary pointers — where the real schema lives
(
docs/data-dictionary.md), not the schema restated and left to rot. - Naming and layout conventions —
data/raw/is append-only;data/processed/is disposable; results land inresults/. - Expensive-operation warnings — never load a full Parquet into memory; query through DuckDB — the one rule whose violation you watched cost forty minutes.
- Domain landmines — borough comes from
taxi_zone_lookup.csvviaPULocationID; the shapefile is for maps; zone ids 264–265 exist in the lookup and not the polygons; timestamps areAmerica/New_York. - The cleaning contract — counts come from the cleaned panel, not raw files, because raw carries refunds and voided meters.
Every line is something a wrong answer would corrupt. None of it is encouragement.
Two dialects for the same manual
Claude Code
The manual is CLAUDE.md, read at the start of every session. It has two
scopes that compose:
- Project scope —
CLAUDE.mdat the repo root, committed, so the manual arrives with the clone. This is the lab’s shared knowledge. - User scope —
~/.claude/CLAUDE.md, your personal preferences across every project (your shell, your editor, how terse you like summaries). It is not committed and not the place for project facts.
For rules that should bite only inside a subtree, path-scoped rules live
in .claude/rules/: a stricter rule that applies only when the agent is
working under data/ — under this path, treat every file as
append-only; propose, never write — sits in a rule file scoped to that
directory, rather than as a sentence in the root manual that the agent
must remember to apply selectively. Scope the prohibition to where the
danger is.
Quick-adds keep the manual honest. When you catch yourself explaining the
same fact twice in a session, the # prefix appends it straight to
CLAUDE.md mid-conversation, so the manual grows from the moments that
actually required it rather than from a doomed up-front attempt to think
of everything.
# weather-mobility — lab manual
## Data- `data/raw/` is APPEND-ONLY. Never edit a raw file. (Enforced in B3.)- Counts come from `panel_zone_hour` (cleaned), NOT raw Parquet — raw carries refunds and voided meters.- Never load a full Parquet into memory. Query through DuckDB.- Schema of record: `docs/data-dictionary.md`. Trust it over your memory.
## Joins- Borough = `taxi_zone_lookup.csv` via `PULocationID`. The shapefile is for maps, not joins. Zone ids 264–265 exist in the lookup only.
## Time- Warehouse is `America/New_York`. Raw Parquet timestamps are UTC.Codex
The manual is AGENTS.md, read at session start, and it is hierarchical
by design. Files merge from general to specific, so a fact stated once at
the right level reaches everywhere below it:
- Global —
~/.codex/AGENTS.md, your cross-project preferences. - Repo root —
AGENTS.md, committed, the lab’s shared manual. - Nested — an
AGENTS.mdinsidedata/carries the stricter rules that apply only there (append-only; propose, never write), so the prohibition lives next to the danger instead of as a clause the agent must remember to apply selectively. - Override —
AGENTS.override.mdat a level wins over the inherited text at that level, for the rare case where a subtree must contradict its parent rather than extend it.
The forcing function is the 32 KiB budget: the merged manual is capped, and that cap is a feature. It makes verbosity expensive and pushes you toward terse, checkable rules. When the root file grows bloated, the fix is not a bigger budget — it is moving directory-specific detail down into nested files where it is only loaded when the agent is actually working there.
# weather-mobility — lab manual
## Data- `data/raw/` is APPEND-ONLY. Never edit a raw file. (Enforced in B3.)- Counts come from `panel_zone_hour` (cleaned), NOT raw Parquet — raw carries refunds and voided meters.- Never load a full Parquet into memory. Query through DuckDB.- Schema of record: `docs/data-dictionary.md`. Trust it over your memory.
## Joins- Borough = `taxi_zone_lookup.csv` via `PULocationID`. The shapefile is for maps, not joins. Zone ids 264–265 exist in the lookup only.
## Time- Warehouse is `America/New_York`. Raw Parquet timestamps are UTC.(Directory-specific rules for data/ move into data/AGENTS.md — that
is the budget working as intended.)
What belongs where
The manual is one of three places a fact can live, and putting a fact in the wrong one is its own failure mode. The full decision table is completed in C1, once skills exist; the part that matters now is the line between the manual and a README.
| If the fact is… | It belongs in… | Because |
|---|---|---|
| always-relevant context for the agent | the instruction file | read every session, costs context budget — keep it terse |
| documentation for a human reader | README.md / docs/ | the agent reads it only when pointed at it; length is cheap |
| a repeatable procedure with steps | a skill (C1) | loaded on demand when invoked, not always-on |
The instruction file is expensive real estate — it is read in full on every session and counts against context every time. A paragraph of prose explaining the project’s history belongs in the README; the manual points at the README. The test is brutal and simple: if the agent does not need this fact to avoid a mistake on a typical session, it does not go in the manual.
Auto-memory
A lab manual you write is one thing; a notebook the assistant keeps for itself is another. Here the tools genuinely diverge, so this is a spotlight, not a tab.
Claude Code Your tool
Auto-memory — the agent's own notebook
Claude Code maintains a memory that persists across sessions: facts it decides are worth keeping — a path it discovered, a convention it inferred, a correction you made — written to its own store and reloaded next time. It is the closest thing the agent has to learning from yesterday without you writing anything down, and on a long project it quietly accumulates the texture of how you actually work.
That same automaticity is the hazard. The memory grows without your review, and an agent’s inferred fact can be wrong: it decides, from one unlucky session, that “the green-taxi files are unreliable” or that “the 2024-02 zone join needs a manual fix,” and then it carries that conclusion — stated with the same confidence as a fact you taught it — into every session after. Unaudited auto-memory is an RA’s private notes that no one reads: useful right up until a wrong note becomes doctrine.
The discipline is to treat it like an RA’s notebook you do periodically
read. Audit the memory the way you would audit a contract suite: open it,
prune the conclusions that were true once and are not invariants, and
promote the ones that are invariants into CLAUDE.md, where they are
committed, visible, and reviewed in pull requests rather than living in a
store only the agent sees. Memory is for discovery; the manual is for
truth.
Nearest equivalent — Codex
Codex does not keep a private cross-session notebook; each session starts
from the committed AGENTS.md and nothing the agent inferred on its own.
That sounds like a loss and is partly a discipline: there is no unaudited
store to accumulate wrong conclusions, because there is no store.
The substitute is the deliberate end-of-session update. When a
session surfaces a fact worth keeping — a convention you settled, a
landmine you hit — you (or the agent, at your instruction) append it to
AGENTS.md before closing out, as an explicit, reviewable edit. What
Claude Code does automatically and you must audit after, Codex makes you
do by hand before it counts. The cost is a habit you must hold; the
benefit is that every remembered fact passed through a human and a diff
on its way into the manual. Write “update the manual” into your
session-close checklist and the discipline holds.
The A/B demonstration
Argument is cheap; demonstration is not. Below is one prompt — summarize February 2024 pickups by borough — run twice against the same repo and the same data. On the left, the agent opens with the lab manual loaded. On the right, the manual is absent: a permanently-new RA, started cold. Step through the divergence; the lesson teaches itself.
- claude ❯ Summarize February 2024 pickups by borough.
- Divergence 1: Loaded CLAUDE.md — weather-mobility lab manual: data dictionary in docs/ · never load a full Parquet, query through DuckDB · borough comes from taxi_zone_lookup.csv via PULocationID (the shapefile is for maps, not joins) · counts come from the cleaned panel — raw files carry refunds and voided meters · timezone America/New_York
- Divergence 2: The manual routes counting through panel_zone_hour — the cleaned panel already joins pickups to boroughs. One aggregate query.
- Divergence 2: Bash(duckdb warehouse.duckdb -c "SELECT borough, sum(pickups) AS pickups FROM panel_zone_hour WHERE strftime(ts_local, '%Y-%m') = '2024-02' GROUP BY 1 ORDER BY 2 DESC")
- Divergence 3: Manhattan 2,675,686 · Queens 243,807 · Brooklyn 27,555 · Unknown 9,456 · Bronx 6,884 · N/A 1,316 · EWR 270 · Staten Island 54 (8 rows, 0.4 s)
- Divergence 3: February 2024, cleaned panel: 2,965,028 pickups. Manhattan dominates with 2,675,686 (90.2%); Queens 243,807 — that's JFK and LaGuardia; Brooklyn 27,555; Bronx 6,884; EWR 270; Staten Island 54. 'Unknown' (9,456) and 'N/A' (1,316) are the lookup table's own honesty rows — zones 264 and 265, reported rather than dropped.
- Divergence 4: One query, six lines of SQL. The Parquet never left disk.
- claude ❯ Summarize February 2024 pickups by borough.
- Divergence 1: No CLAUDE.md found — session opens cold: no data dictionary, no conventions, no warnings.
- Divergence 2: I'll load the February file into pandas and aggregate by borough.
- Divergence 2: Bash(python -c "import pandas as pd; df = pd.read_parquet('data/raw/yellow_2024-02.parquet'); print(len(df), 'rows')")
- Divergence 2: 3007526 rows (full file deserialized into memory; ~28 s)
- Divergence 2: Loaded 3,007,526 rows — 19 columns, and none of them is a borough. I need a zone table; trying the obvious filename.
- Divergence 3: Bash(python -c "... df.merge(pd.read_csv('data/raw/zones.csv'), left_on='PULocationID', right_on='LocationID') ...")
- Divergence 3: FileNotFoundError: 'data/raw/zones.csv'
- Divergence 3: There's a data/raw/taxi_zones.zip — a shapefile. I'll install geopandas and read the attribute table.
- Divergence 3: Bash(pip install geopandas)
- Divergence 3: Successfully installed geopandas pyproj shapely fiona ... (a GIS stack, to do one CSV join)
- Divergence 3: Bash(python -c "... zones = gpd.read_file('taxi_zones.shp'); df.merge(zones, left_on='PULocationID', right_on='LocationID').groupby('Borough').size() ...")
- Divergence 3: KeyError: 'Borough' — column is spelled 'borough' in the shapefile
- Divergence 3: Fixed the casing and the join runs — but two flags before I report numbers. First, these are raw rows: no filter for refunds, voided meters, or out-of-month timestamps. Second, the shapefile has 263 polygons and the data references zone ids past that — every pickup in zones 264 and 265 just vanished from the inner join without an error.
- Divergence 4: Do you have a canonical zone lookup and a documented cleaning procedure I should be using?
- Divergence 4: Four tool calls, one GIS install, the whole Parquet resident in memory — and the run ends on the question an onboarding file answers.
The without-manual run is not incompetent — it is uninformed. It deserializes the entire 3,007,526-row Parquet into memory (the one rule the manual states first), guesses a filename for the zone table, installs a GIS stack to do a CSV join, trips on column casing, and silently drops zones 264 and 265 to an inner join — then, to its credit, ends by asking the exact question the manual exists to pre-answer. Every one of those detours is a fact that lived in your head and not in a file. The left-hand run is what your head looks like, written down.
(The curriculum’s draft phrased this as a January summary; the project’s pinned study window is February, March, and June 2024, and the answer key carries only those months — so the demonstration summarizes February 2024, and every number on the left is the cleaned panel’s 2024-02 column.)
Field Assignment
Artifact CLAUDE.md and AGENTS.md committed; the A/B diff logged
Write the lab manual once, in both dialects, from a single set of source notes — then run the demonstration and watch the difference you just wrote into existence.
- Start from the project’s source notes: the data dictionary, the directory contract from B2, and the three landmines (the zone join, the timezone split, raw-versus-cleaned counts).
- Write the manual in both dialects from those same notes — every line a checkable fact or prohibition, nothing aspirational. Keep it terse; if it would not prevent a mistake on a typical session, cut it.
Claude Code
Author CLAUDE.md at the repo root and commit it. Move the
data/-specific append-only rule into .claude/rules/ scoped to that
path, so the strict rule lives where the danger is. Confirm the manual
loads by opening a fresh session — the briefing line names the manual.
Codex
Author AGENTS.md at the repo root and commit it. Move the
data/-specific append-only rule into a nested data/AGENTS.md, keeping
the root file under the 32 KiB budget. Confirm the merged manual loads by
opening a fresh session in the repo.
- Run the A/B demo: the same borough-summary prompt with the manual
present, then with it renamed away. Diff the two transcripts and log
the divergence in
journal/b1-ab.md— the moment the manual paid for itself.
The committed manual is the substrate every later lesson assumes: B3 fences what it names, C2 enforces it, and every session from here opens already briefed.
make check-b1advances B1If a line would not prevent a mistake on a typical session, cut it. The manual is expensive real estate, read in full every session.
Same facts, two surfaces — the lab manual does not change because the tool did.
The cap is a forcing function. A manual that no longer fits stopped being terse.
The without-manual run loads the full Parquet and fumbles the zone join — the lesson teaches itself.
Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.
Pitfalls & Gotchas
- [both]
Aspirations instead of checkable rules. “We value clean code” is a sentence an agent cannot act on; it consumes context and changes no behavior. Every line of the manual should be a fact a wrong answer would corrupt or a prohibition with a clear test. If you cannot write the check, it is not a rule — it is a mood.
- [both]
〜〜
Stale data dictionaries are worse than none. The agent trusts the manual completely, so a dictionary that says
airport_feewhen the file now saysAirport_feedoes not merely fail to help — it actively misleads, and the agent will defend the wrong answer with your own document. Point at the schema of record and keep the pointer fresh; do not restate a schema you will forget to update. - [CX]
Blowing the 32 KiB budget with prose. The cap is a forcing function, not an obstacle: when the root manual bloats, the fix is to push directory-specific detail into nested files, not to mourn the limit. A manual that no longer fits is a manual that stopped being terse.
- [CC]
Unaudited auto-memory accumulates wrong conclusions. The agent’s private notebook records inferences with the same confidence as facts, and a conclusion that was true for one session becomes doctrine for fifty. Read it like an RA’s notes: prune what was situational, promote what is invariant into the committed manual.
Check Your Bearings
- Question 1Choose one
Two candidate lines for the project's instruction file. Which one earns its place, and why?
- Question 2Match the dialects
A fact can live in three places. Match each fact to where it belongs.
data/raw/is append-only — never edit a raw fileThe narrative of how the project's sampling frame was chosenThe full multi-step trip-cleaning procedure - Question 3What happens next?dialect check — Claude Code
Auditing the agent's auto-memory, you find an entry it wrote for itself three weeks ago. What should you do with it?
auto-memory (session 2024-06-03): "The 2024-02 zone join needs a manual casing fix every time." - Question 4Read the configdialect check — Codex
The repo-root instruction file has grown past the 32 KiB budget, bloated with rules that only matter inside
data/. What is the right fix?AGENTS.md (repo root) — 41 KiB, over budget ## Data ## data/raw/ append-only rules (long) ## data/processed/ rebuild rules (long) ## Joins ## Time
Field journal
Parity note
Instruction files are a genuine parity feature: both tools read a
plain-text manual at session start, differing in surface — CLAUDE.md
with user/project scopes and .claude/rules/ path-scoping on one side,
the hierarchical AGENTS.md with AGENTS.override.md and a 32 KiB
budget on the other. The real asymmetry is auto-memory: Claude Code keeps
a private cross-session notebook the agent writes for itself (powerful and
in need of auditing), while Codex has no equivalent and substitutes the
discipline of a deliberate end-of-session manual update. One tool makes
you audit after; the other makes you commit before.