B3 beginner ~30 min

Guardrails

Absorbs: the data manager who guards the raw files

Advances B3

The Pain

There is a person, in a well-run lab, whose entire job is to stand between everyone else and the raw data. They keep the master copies under lock, hand out working copies, and say no a great deal: no, you cannot edit the original; no, that script does not get to run against the production tables; no, you may not paste the database password into a shared notebook. They are not popular on the afternoons they say it. They are indispensable on the mornings they were right.

You do not have that person. You have a new collaborator who works at machine speed, takes instructions literally, and has — until you decide otherwise — exactly the same reach into your filesystem and your shell as you do. Most of the time that reach is a gift. The exception is the one that matters: the afternoon it reasons its way to the cleanest fix is to rebuild this table from scratch, and DROP TABLE trips_raw is a single confident command away from a month of re-downloads. Or the moment it decides a raw Parquet would be tidier with the duplicates removed, and overwrites the one file in the project that cannot be regenerated. Speed is the multiplier on both the help and the harm. The data manager’s no was never about distrust. It was about making the expensive mistakes impossible rather than merely unlikely — and that is a thing you can configure before the stakes arrive, instead of a person you must hire.

Why / When

A safety model is the set of rules that decide, before any action runs, whether the agent may do it outright, must ask you first, or is forbidden. It is the configured form of the data manager’s judgment: named risk levels you switch between deliberately, so the agent’s reach matches the task instead of always matching yours.

The discipline belongs at the front of the project, before Unit C hands the agent the power to modify data and run pipelines. The threat model is specific and real: destructive SQL against the warehouse, overwrites of irreplaceable raw files, an API loop that runs unbounded overnight, credentials that leak into a transcript and live there forever. None of these require malice — only literal-mindedness at speed. You configure the fence now, while the only thing it costs is five minutes, rather than later, while it costs a month.

The honest non-use case is the throwaway: a read-only look at public data on a clone you will delete. There is nothing there to protect, and the friction of a permission prompt buys you nothing. Everywhere the data is real, the fence earns its keep on the first afternoon you would otherwise have lost.

Contrary winds

Not for: reading-only exploration of public data on a throwaway clone, where there is nothing destructive to fence and the friction buys you nothing.

Mechanics

Both tools implement the same idea — name your risk levels, switch deliberately — through different surfaces. The shared principle first, the dialects in tabs, then the two profiles you will use all course, and the refusal that proves the fence is real.

The two safety models

Underneath both tools sit two layers that compose. The decision layer asks, for each proposed action, allowed, ask, or denied? — and the sandbox layer is the operating system enforcing a boundary the agent cannot reason its way around, so that even a misconfigured decision rule cannot reach outside the fence. Decision rules are policy; the sandbox is physics. Defense in depth means both.

Claude Code

Claude Code’s decision layer is permission rules, evaluated deny → ask → allow: a deny rule wins outright, an ask rule pauses for your approval, an allow rule proceeds silently. Rules match tools and their arguments, so you can allow Bash(duckdb:*) reads while denying Bash(rm:*) and asking on everything else.

Stacked on top are permission modes — session-wide postures like the default ask-on-write, a read-only plan posture, and an accept-edits posture for trusted bulk work — and beneath everything, the OS sandbox that confines file and network access at the operating-system level regardless of what the rules say.

The rules live in settings.json, and the layering is the point: user, project, and local files merge, so a team can commit a project-layer deny that an individual cannot silently loosen.

{
  "permissions": {
    "deny": [
      "Write(./data/raw/**)",
      "Bash(rm:*)",
      "Bash(duckdb:* DROP *)"
    ],
    "ask": [
      "Write(./data/processed/**)",
      "Write(./results/**)"
    ],
    "allow": [
      "Bash(duckdb:* SELECT *)",
      "Read(./**)"
    ]
  }
}

Codex

Codex’s decision layer is approval modes, three named postures you switch between: read-only (the agent may read and run read-only commands, nothing else), auto (it works in the workspace and asks before stepping outside it), and full-access (it acts without prompting — reserved for work you are watching). The mode is the coarse dial; you turn it down for exploration and up only deliberately.

Beneath that sits the OS sandbox — Seatbelt on macOS, Landlock on Linux — confining file and network reach at the kernel level, so a read-only session physically cannot write even if a rule were misconfigured.

Both compose into named profiles in config.toml: a profile bundles an approval mode and a sandbox policy under one name you select per session, and the layered .codex/ team config lets a project commit profiles every clone inherits.

[profiles.exploration]
approval_policy = "read-only"
sandbox_mode    = "read-only"      # Seatbelt / Landlock: no writes at all

[profiles.pipeline]
approval_policy = "auto"
sandbox_mode    = "workspace-write"
writable_roots  = ["data/processed", "results"]   # raw/ stays out

Both reduce to one practice: name your risk levels and switch between them deliberately. The names below are the ones the whole course uses.

Two named profiles

A research project needs exactly two postures, and naming them is what makes switching a deliberate act rather than a vague feeling of caution:

exploration — read-only data, no writes, no network egress. This is the default posture for surveying the warehouse, drafting plans, and any session where you are thinking rather than building. It is also what B2’s plan-first reconnaissance runs under, and what the refusal demo below exercises.
pipeline — writes scoped to data/processed/ and results/ only; data/raw/ stays read-only even here, because raw is the one thing that cannot be regenerated. This is the posture for running transforms and estimation — the agent can build, but it cannot touch the source of truth or escape the directories it is allowed to rebuild.

Least privilege means defaulting to exploration and rising to pipeline only for the span of work that genuinely writes, then falling back. The fence you most need is the one between every profile and data/raw/: in both profiles, raw is read-only, because the cheapest month you will ever spend is the one you never have to re-download.

One discipline both tools share and both make easy to forget: fencing the file tools but leaving the shell open. The shell is the universal escape hatch — rm, psql, a stray > redirect — so a profile that denies Write(./data/raw/**) but allows arbitrary shell has fenced the front door and left the window open. Scope the shell too.

Credentials hygiene

The warehouse password, the API token, the cloud key: none of them belong in a prompt, and none of them belong in config.toml or settings.json. A secret pasted into a prompt does not vanish when the answer comes back — it lives in the transcript, which is logged, synced, and quite possibly committed, forever. The rule is unconditional: secrets come from the environment, never from the conversation.

# Set in the shell that launches the agent — read from the environment,
# never typed into a prompt, never written to a committed config file.
export WAREHOUSE_DSN="$(security find-generic-password -s wm-warehouse -w)"
# Scripts read os.environ / Sys.getenv — the agent passes the name, not
# the value, and the value never enters a transcript.

The transcript is the leak surface you forget about. A key typed once, in a moment of haste, is a key you must now rotate — so the discipline is to make typing it impossible, not to remember not to.

Which verdict does an action earn?

The decision layer reduces to three verdicts — allow, ask, deny — and the skill is knowing which a given action deserves before it runs. Walk a handful of real situations from the project; the navigator returns the verdict the situation earns and the reason behind it, in the surface your tool uses:

Decision rubricAllow, ask, or deny?

An action is about to run under your safety model. Which verdict should it earn? Walk a few real situations from the weather-mobility project — the rule is least privilege, deliberately applied, not a vague feeling of caution.

What does the action touch?

The fence you most need is the one between every profile and the raw files — raw is the one thing that cannot be regenerated.

This navigator walks you to one ruling at a time; without JavaScript, here is every ruling the rubric can reach.

Recommended: Allow — Reads and read-only queries are the default-safe surface; under the exploration posture they proceed silently. Friction here buys you nothing — reserve the prompts for the actions that can lose a month.
Not this: Deny — Make the expensive mistake impossible, not merely unlikely. A deny rule wins outright and fires before anything runs — this is the demonstrated refusal: ask the exploration posture to drop trips_raw and watch the fence hold. Raw stays read-only in every profile.
Use with care: Ask — For the rare legitimate-but-dangerous action, pause for explicit approval rather than denying outright — but treat every prompt as a decision, not a reflex. The pitfall is living in the most permissive mode "to avoid friction": the one time it matters, you wave through the one that mattered, at 2 a.m., on muscle memory.
Use with care: Ask (under the pipeline posture) — Under pipeline, writes scoped to data/processed/ and results/ are permitted but worth a confirmation on first touch — the agent can build, but cannot reach data/raw/ or escape the directories it is allowed to rebuild. Rise to pipeline for the write, then fall back to exploration.
Not this: Deny — A write you did not ask for while in exploration is exactly what the posture exists to stop. Default to exploration (read-only, no writes); an unrequested write should never silently proceed. If you genuinely need to build, raise to pipeline deliberately — the deliberate switch is the safety, not the permissiveness.

The demonstrated refusal

Safety you have read about is a claim; safety you have seen fire is a fact. Below, the agent runs under the exploration profile and is asked to do exactly the thing exploration forbids: drop trips_raw. Before the refusal plays, you predict the verdict — allow, ask, or deny? Watch the fence hold.

Guided Run — The Demonstrated Refusal

Field Terminal — session: b3-refusal Claude Code

claude --permission-mode plan

The simulator needs JavaScript. The full transcript of this run is described in the lesson text above — nothing below is required reading.

Field Assignment

Artifact both profiles committed; the refusal demonstrated and logged

Configure the two profiles, prove the fence holds by tripping it on purpose, and write down what you saw.

Author both named profiles — exploration (read-only, no network writes) and pipeline (writes scoped to data/processed/ and results/, raw still read-only) — and commit them to the project layer so every clone inherits the fence.
Move your warehouse credential into the environment; confirm no secret appears in any committed config file or transcript.

Claude Code

Commit the permission rules to the project layer of settings.json: a deny on writes to data/raw/** and on rm/DROP, an ask on writes to data/processed/**, and allow on reads. Open a session under the read-only posture and attempt DROP TABLE trips_raw — confirm the deny rule fires before anything runs.

Codex

Commit both profiles to config.toml (and the layered .codex/ team config). Start a session under the exploration profile and attempt DROP TABLE trips_raw — confirm the read-only approval mode and the sandbox both refuse, then check that pipeline still cannot write to data/raw/.

Run the demonstrated refusal in your tool — predict the verdict, watch it fire — and log the incident in journal/b3-refusal.md: what you asked, what the fence did, and which profile you were in.

The two profiles are the safety substrate the whole midgame runs on: C2 doubles the raw-data fence at the hook layer, and every later lesson assumes exploration-by-default with deliberate elevation to pipeline.

Milestone gate · make check-b3advances B3

exploration profile committed — read-only data, no writes, no network egress
This is the default posture: surveying, planning, thinking. It is also what B2's reconnaissance runs under.
pipeline profile committed — writes scoped to data/processed/ and results/ only; data/raw/ stays read-only even here
Raw is the one thing that cannot be regenerated, so no profile may write it.
Shell scoped, not just file tools — the universal escape hatch (rm, redirects, psql) is fenced too
Warehouse credential moved into the environment — no secret in any committed config file or transcript
A key pasted into a prompt lives in the transcript forever. The agent passes the name, never the value.
Demonstrated refusal run and logged in journal/b3-refusal.md — DROP TABLE trips_raw refused under exploration, before execution
Predict the verdict (allow / ask / deny), then watch the fence hold. Safety you have seen fire is safety you trust.

Check each item only once it is true of YOUR repo — the gate is self-certified, like the rest of your methodology.

Pitfalls & Gotchas

[both] 〜〜

Living in the most permissive mode “to avoid friction.” Every prompt you wave through is vigilance you have rented back from the tool that was built to absorb it — and the one time it matters, you will wave through the one that mattered, at 2 a.m., on muscle memory. Default to exploration; rise to pipeline for the span of work that writes, then fall back. Friction at the dangerous moments is the product, not a bug.
[both]

Fencing the file tools but not the shell. The shell is the universal escape hatch: a profile that denies writes to data/raw/ but allows arbitrary Bash has locked the door and left the window open — rm, a redirect, a psql one-liner all walk straight through. Scope the shell with the same care as the file tools.
[both] 〜〜

A key pasted into a prompt lives in the transcript forever. The transcript is logged, synced, and often committed; a secret typed once in haste is a secret you must now rotate. The fix is not vigilance — it is making the paste impossible: secrets come from the environment, the agent passes the name and never the value.
[CX]

Subagents inherit the parent’s sandbox overrides. A child dispatched from a session you elevated to pipeline does not quietly fall back to exploration — it carries the parent’s writable roots with it. This returns in D1, where subagent isolation gets its own treatment; for now, assume reach flows downhill and check before you delegate.

Check Your Bearings

B3 · 4 questions · unlimited retries, no timer

This check opens when the guided simulation above is complete — the questions assume you have seen the run.

(noted in your field journal as an override)

The interactive check needs JavaScript — without it this section shows only the quiz cover. The lesson text above is complete without the quiz; answers and journal recording require JavaScript.

Field journal

Record the refusal you triggered on purpose: the command, the profile you were in, and the verdict the fence returned — the first time you watched a guardrail hold instead of trusting that it would.

as of June 2026

Safety models are a genuine parity feature: both tools compose a decision layer over an OS sandbox and both reduce to naming risk levels and switching deliberately — Claude Code through deny/ask/allow permission rules plus permission modes in layered settings.json, Codex through read-only/auto/full-access approval modes plus Seatbelt/Landlock, bundled into named profiles in config.toml. The surfaces differ in granularity (per-tool-and-argument rules on one side, coarser named modes on the other), but the practice — exploration by default, deliberate elevation, raw always read-only — is identical, and the OS sandbox under both is the layer no misconfiguration can talk its way past.

Feature-parity matrix

The Lab Roster

Engraved positions, not portraits. A seat fills itself when its lesson is complete.

Your position

Positions

the data manager

Position vacant — engaged at C2

write-time contract hooks (PreToolUse/PostToolUse + the validation suite)

est. human-RA: permanent vigilance — est. 2 weeks/year of load-checking and release-note reading agent: half a day to install and test the 9-line block; ~20 s per run thereafter
the methodologist

Position vacant — engaged at C1

the researcher skill library v1 (/clean-trips, /paper-summary, /demanding-adviser) — codified methodology, not macros

est. human-RA: the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do agent: an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
the data engineer

Position vacant — engaged at C3

MCP connections + the DuckDB warehouse, enrichment joins (weather/events/holidays), and the zone-hour analysis panel

est. human-RA: days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes agent: register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
the RA pool

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the overnight RA

Position vacant — engaged at D3

/loop supervision + Goal Mode runs over background estimation

est. human-RA: one night shift per estimation batch — and the course runs several batches agent: ~10 min to write the check or the objective; the night itself belongs to the machine
the adviser

Position vacant — engaged at D1

parallel subagents with report contracts (EDA + scholarship fleets) + the isolated adviser

est. human-RA: a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will agent: ~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
the referee

Position vacant — engaged at D4

contracted fleet fan-out (results contract + provenance) and an isolated adversarial referee

est. human-RA: the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for agent: 13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
the lab manager

Position vacant — engaged at E2

scheduled/cloud agents — the monthly-ingest routine, stopping at a human-approved PR

est. human-RA: a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped agent: ~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
the reproducibility checker

Position vacant — engaged at E1

headless invocation + the fresh-clone replication self-test + CI gates

est. human-RA: a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission agent: ~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
the the wall — the unstaffed midnight hours between a raw file and a first plot

Position vacant — engaged at A1

the bare agent loop (prompt → act → observe → fix), zero configuration

est. human-RA: an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work agent: ~10 minutes for the quick win, plus the same task re-run in the other language for free
the you, working an order of magnitude faster — but only if you direct the work

Position vacant — engaged at A2

the command surface + five prompting patterns + context hygiene

est. human-RA: the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong agent: ~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
the the lab manual nobody writes — the institutional knowledge that lives in your head

Position vacant — engaged at B1

instruction files (CLAUDE.md / AGENTS.md) + auto-memory + the A/B demonstration

est. human-RA: ~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down agent: written once in an hour; reloaded free at the start of every session thereafter
the careful senior who plans before touching data

Position vacant — engaged at B2

repo scaffold + pinned environments + read-only Plan mode reconnaissance

est. human-RA: ~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots agent: an afternoon — most of it download wall-clock, not attention
the the lab whose members don't overwrite each other

Position vacant — engaged at D2

git worktrees — one isolated checkout per agent/session/thread, combined through a deliberate merge

est. human-RA: the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time agent: two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
the the onboarding the lab never has to repeat

Position vacant — engaged at E3

lab-kit — the whole methodology packaged as a one-command install

est. human-RA: six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over agent: ~half a day to package and smoke-test the kit once; each new member is one install and one prompt
the the whole lab, orchestrated — the PI who designs the system instead of doing the work

Position vacant — engaged at F1

the research loop (/loop ↔ Goal Mode / @codex) orchestrating fleet → referee → headless re-run → regenerated report, under report-don't-act guardrails, a hard budget cap, and a human gate on substantive decisions only

est. human-RA: each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits agent: the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended

Running Totals

Lesson	Role	Est. human-RA	Agent (yours when measured)
A1	the wall — the unstaffed midnight hours between a raw file and a first plot	an evening or two per messy file — defensive parsing rewritten from scratch each project, rules forgotten by the time they work	~10 minutes for the quick win, plus the same task re-run in the other language for free
A2	you, working an order of magnitude faster — but only if you direct the work	the slow tax of an undriven session — drifted answers on long investigations, re-runs to find where it went wrong	~30 min to learn; thereafter a first-look on one month (3.5M rows) in minutes, with receipts
B1	the lab manual nobody writes — the institutional knowledge that lives in your head	~30 min re-onboarding every new RA, every time — plus the afternoons lost to landmines no one wrote down	written once in an hour; reloaded free at the start of every session thereafter
B2	careful senior who plans before touching data	~1 week at project start (setup, download babysitting, plan review) + the joins redone when structure rots	an afternoon — most of it download wall-clock, not attention
B3	the data manager who guards the raw files — the person who says no near the master copies	permanent vigilance you cannot staff — one lapse at machine speed costs a month of re-downloads	two profiles configured once in minutes; the fence then holds every session, tired or not
C1	the methodologist — the one person who knows how the lab actually decides	the judgment lives in one head; transferring it to a new RA costs weeks of shadowing, and leaves when they do	an afternoon to author three SKILL.md files in both dialects; zero cost per session until invoked
C2	data manager / QA who never sleeps	permanent vigilance — est. 2 weeks/year of load-checking and release-note reading	half a day to install and test the 9-line block; ~20 s per run thereafter
C3	the data engineer who wires the lab to its systems	days of bespoke glue per source — credentials, retries, schema spelunking, timezone forensics — re-debugged every time a source changes	register the server once; the agent explores INFORMATION_SCHEMA and builds the panel in a guided session, raw cached for replication
D1	the RA pool — and the adviser who critiques from outside	a week of breadth EDA across boroughs and slices, plus a literature pass — and no honest outside critic you can summon at will	~20 min to write the agent definition + report contract; the fleet runs in parallel; the isolated adviser critiques in minutes
D2	the lab whose members don't overwrite each other	the lost afternoon disentangling two agents' colliding edits — and the redo when you reconstruct it wrong the first time	two commands to create the worktrees; the parallelism runs free; one reviewed merge at the end
D3	overnight RA	one night shift per estimation batch — and the course runs several batches	~10 min to write the check or the objective; the night itself belongs to the machine
D4	an RA bench and the PI who keeps their results comparable	the curve is ~2 days of serialized edit-and-fit; the suspicious read of the robustness table is the rarer, senior hour nobody has time for	13 lanes fanned out under the cap finish in an afternoon; the referee files its evidenced finding in one isolated pass
E1	reproducibility checker	a clean-room rebuild every few weeks — dull, exacting, and the first thing dropped at submission	~20 min to wire scripts/replicate.sh and the gate workflow; the verdict returns in one headless run thereafter
E2	lab manager's standing chores	a recurring monthly chore nobody owns — check the CDN, pull, contract, append, re-estimate — reliably skipped	~30 min to define the routine + guardrails once; each month runs unattended and stops at the approval gate
E3	the onboarding the lab never has to repeat	six weeks of per-member onboarding, rediscovered from scratch every time the lab turns over	~half a day to package and smoke-test the kit once; each new member is one install and one prompt
F1	the whole lab, orchestrated — the PI who designs the system instead of doing the work	each revision is a serialized chain — re-spec, re-estimate, re-table, rewrite the paragraph, re-read the abstract — correct only as of the last manual pass, on a Sunday; a real reviewer round is days of hand-carried edits	the loop runs two iterations to convergence in one supervised sitting; the human stands at exactly one gate (approve dropping the post-treatment control) while the mechanical fixes proceed unattended
Positions absorbed		0 of 16

The honest column: every place a human had to step in lives in the Field Journal’s failure log. Your measured hours there override these estimates here.

The Pain

Why / When

Mechanics

The two safety models

✳ Claude Code

⬡ Codex

Two named profiles

Credentials hygiene

Which verdict does an action earn?

The demonstrated refusal

Guided Run — The Demonstrated Refusal

✳ Claude Code

⬡ Codex

Pitfalls & Gotchas

Parity note

Claude Code

Codex

Claude Code

Codex