There is a difference between knowing something and being forced to know it.

You can read every principle about AI agent behaviour. You can follow the best practices, add the right guidelines to your CLAUDE.md, nudge the model toward careful, deliberate work. That is knowing something.

Or you can build a production system, watch it break in ways that cost you, and let the failures dictate the rules. That is being forced to know it.

Axia taught me the rules. I didn’t bring them in from outside.

A sealed door with weld marks. The visual metaphor for an irreversible enforcement gate in a production AI agent architecture — a rule that came from failure and cannot be reasoned around.

How Rules Actually Get Made

A few weeks ago someone shared the forrestchang/andrej-karpathy-skills repo — a CLAUDE.md file encoding Andrej Karpathy’s observations about how LLM coding agents fail. It went viral. Over 100,000 GitHub stars. Developers recognised themselves in the failure descriptions.

I read it. Then I checked my own repo.

Every principle in that file was already in Axia’s CLAUDE.md. Not because I’d seen the repo — I hadn’t. Not because I’d read Karpathy’s thread — I hadn’t read that either. Because Axia had broken in exactly those ways, in production, with real consequences, and each break produced a rule that made the same break structurally impossible the next time.

The interesting part isn’t the convergence. Convergence is almost inevitable when independent builders hit the same failure modes under real conditions. The interesting part is the origin.

The Karpathy repo starts with observation — watching agents fail and naming the patterns. My rules didn’t start with observation. They started with wreckage.

What the Failures Produced

Axia runs a live B2B pipeline. It classifies email signals, writes to CRM, drafts outreach, and notifies humans when it needs a decision. Claude Code builds it. When it gets something wrong, the damage is real: corrupted pipeline data, silent wrong assumptions baked into deployed code, contacts misclassified and lost.

Each failure mode closed a door.

The silent assumption failure — Claude Code picked an interpretation, ran with it, built the wrong thing. The rule it produced:

Whenever you mark a queue item as PENDING_DECISION, immediately run:
python3 scripts/builder_notify.py --build "⏸️ Decision needed — [item name]
I cannot proceed without a human decision on: [specific question]"
Never silently mark PENDING_DECISION without posting to Discord.

The agent cannot continue past ambiguity. It halts. It posts. It waits. The silence is not optional behaviour the agent can reason its way out of — it is structurally blocked.

The side-effect modification failure — Claude Code modified a module it didn’t fully understand, broke a downstream consumer silently, and the failure surfaced in production rather than in testing. The rule it produced:

Before fixing any bug or building any feature:
1. Read docs/module_graph.md for the affected module
2. Identify all coupled modules listed
3. Read each coupled module's entry
4. Your fix must not break any coupled module's expected behaviour

Reading the dependency graph is a precondition. The build cannot start without it.

The write-before-read failure — Claude Code built against an incomplete understanding of the spec and produced code that passed narrow tests but failed real-world cases. The rule it produced:

Before building, check tests/chaos_latest.json
If it does not exist: post to #axia-builder "Waiting for chaos cases — [item]"
Do not build without chaos cases.

Context must exist before work begins. Not as a suggestion — as a gate.

The bug-without-test failure — fixes were written, deployed, and the same bug resurfaced because nothing had locked in the correct behaviour. The rule it produced:

Write a test case that would have caught this bug BEFORE writing any fix.
Never fix an observer bug without writing the test case first.

The sequence is enforced. No test, no fix. In that order. Always.

The Epistemological Difference

The Karpathy principles are top-down. Someone observed failures, named the patterns, encoded the solutions. The CLAUDE.md file is then applied to a system — principles imported from outside, layered on top.

What I built is bottom-up. The system failed. The failure named the constraint. The constraint became a gate. The gate made the failure impossible. Then the next failure named the next constraint.

The rules weren’t designed. They were extracted — by the system itself, through failure, over time.

This produces a different kind of rule. A designed rule is a best guess at what the system should do. An extracted rule is a scar — it marks exactly where the system broke and seals that break permanently. You can debate whether a designed rule is necessary. You cannot debate whether an extracted rule is — you already know what happens without it.

The Hardening Commitment

There is one more thing that separates these rules from guidelines.

They do not bend.

Every gate in Axia’s CLAUDE.md is intentionally irreversible. Not because flexibility is bad, but because an agent under pressure will find any flexibility and route through it. A rule that can be softened will be softened, at the worst possible moment, by the worst possible reasoning.

The PENDING_DECISION halt is not a default that can be overridden. The module graph read is not a step that can be skipped when the task seems simple. The chaos case requirement is not optional when the queue item looks straightforward.

Correctness under pressure is what production systems actually need. That only comes from rules the system cannot reason its way around.

The Karpathy CLAUDE.md is a nudge. It improves the agent’s default behaviour in the session that reads it. Useful — and for most use cases, probably enough.

What Axia runs is an enforcement architecture. The principles are the same. The agent’s ability to bypass them is not.

The Convergence Is the Point

When I saw the Karpathy repo, my first reaction wasn’t “I should add this.” It was recognition — the same reaction those developers had when they saw the failure descriptions.

But the recognition came from a different place. They recognised the failures Karpathy described. I recognised the rules I’d already built in response to the same failures, independently, without the reference.

That convergence matters — not because it validates my approach against an external benchmark, but because of what it implies about the constraints themselves. When independent builders under real production pressure arrive at the same rules without knowing about each other, those rules are not best practices. They are structural necessities. The failure modes are inherent to how LLMs behave under context pressure. The constraints that prevent them are not optional.

The Karpathy repo is a useful shortcut for developers who want to apply these principles without having to discover them through failure. That is genuinely valuable.

But there is something the shortcut cannot give you: the certainty that comes from having built the rule yourself, from a specific break, in a specific system, because nothing else was acceptable.

That certainty is what makes the gate stay closed.

Scaffold

Ready to take the next step?

Most AI is sold as a tool. Axia is built as the operating layer your business actually runs on.

See how Scaffold builds these systems

When the System Writes Its Own Rules