Stop Using CLAUDE.md — Put Your Standards Right in the Code

I Love Instructions Files, But They Get Skipped

Look, I've been tweaking my CLAUDE.md for months. Adding rules. Removing rules. Refining the wording. Bolding the important parts. Adding "IMPORTANT:" in front of the really important parts. Adding "CRITICAL:" in front of the really really important parts.

And the agent kept skipping half of it.

Not maliciously. It would just blow past a rule that obviously applied to whatever I'd asked it to do, ship the change, and I'd sit there thinking did you even open that file? Yes, I know it loaded the file into context. No, that doesn't mean it actually applied the part that mattered for this specific task.

Then it hit me: the agent always reads the code. The instructions file is optional. The code is mandatory. So why am I putting my standards in the optional place?

Turns Out the Research Backs This Up

I was about to chalk this up to vibes, but a paper landed in February from ETH Zurich and LogicStar.ai that basically nailed it: "Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?" by Gloaguen et al.

They built a benchmark called AGENTbench with 138 real-world Python tasks pulled from niche repos, then ran four different agents (Claude 3.5 Sonnet, Codex GPT-5.2, GPT-5.1 mini, and Qwen Code) under three scenarios: no context file, an LLM-generated AGENTS.md, and a human-written one.

The headline numbers are rough:

LLM-generated AGENTS.md files: -3% task success rate, +20% cost
Human-written AGENTS.md files: +4% success rate (the authors literally call this "barely worth it")
Every context file increased the number of steps the agent took to finish a task

The failure mode the paper describes is so obvious in hindsight. The agent dutifully follows the manifest's standing instructions on every task. So you tell it "always run the full test suite" once in AGENTS.md, and now it runs the full test suite when you ask it to fix a typo in a comment. You tell it "always check the data layer before touching components" and now it greps the data layer before fixing a CSS color. Every interaction pays the tax, whether the task needs it or not.

The authors' actual recommendation: skip the LLM-generated context file entirely. If you write one yourself, keep it tiny and limit it to stuff the agent literally cannot infer from the repo (custom build commands, weird tooling, that one environment variable).

To be fair, there's a contrary paper (Lulla et al., "On the Impact of AGENTS.md Files on the Efficiency of AI Coding Agents") that found the file did help OpenAI Codex on 124 PRs — about a 28% wall-clock reduction. So the jury is technically still out. But the failure mode the ETH paper found is exactly the one I keep hitting in real life, and the approach I'm about to describe sidesteps it completely.

Here's What I Actually Do

I stopped treating CLAUDE.md like a rulebook. I keep a tiny one for build commands and that's about it (basically what the ETH paper recommends).

All my real context — the warnings, the standards, the "don't touch this" notes, the architectural reasons, the things that bit me last week — lives as fat comment blocks right above the function, component, or config they apply to.

And here's the best part: I don't write those comments. The agent does.

When the agent learns something new about my codebase ("oh, this state has to be reset before the modal closes"), I ask it to leave a comment explaining what it learned. When it regresses something I have to fix, I ask it to write the warning in the same turn it ships the fix. The comments accumulate, and the next agent that touches that file has to read them.

Why In-Code Context Beats an Instructions File

The agent has to read the code. No skipping. No "I checked the manifest, looked good, moving on."
Context lives next to what it describes. It can't drift to the wrong file or apply to the wrong component.
The comment only fires when the agent is editing nearby code. No always-on tax like the ETH paper found. The warning about the modal only matters when you're touching the modal.
You see the context exactly when you need it. Reviewing the diff? The reasons are right there in the diff.
LLMs are unbelievably good at writing long, descriptive prose. Let them cook. A human would never write a six-paragraph comment explaining why a quadrant exists. An agent will, happily, in three seconds.
More docs make AI-generated code dramatically easier to review. Reviewing is the bottleneck now, not writing.

A Concrete Example: The "Don't Touch This Layout" Comment

Here's the kind of thing I'm talking about. Imagine a generic card component that the agent keeps accidentally regressing because it looks similar to another card elsewhere in the app.

Before — no context, easy to break:

function SummaryCard(props) {
  // ...renders header, key details, four bottom quadrants
}

After — the agent reads this before editing and stops to ask:

/**
 * NOTE: The visual layout of this card (header, key details, and the four
 * bottom quadrants: Status / Type / Category / Owner) is considered
 * finalized and is not expected to change much further.
 *
 * If you are about to change the card LAYOUT (which quadrant shows what,
 * removing/replacing a quadrant, etc.), STOP and double-check first — a
 * previous edit accidentally swapped the Owner quadrant for a different
 * field while trying to fix an unrelated rendering bug in the detail
 * panel. That field belongs in the detail panel component, NOT on this
 * base card.
 */
function SummaryCard(props) {
  // ...
}

That's it. No CLAUDE.md entry. No "rules for editing components" doc. The next time any agent opens this file, that block is right there at the top, exactly where the regression keeps happening. The first time I shipped a comment like that, the next session opened the file, paused, and asked me to confirm before changing the quadrant. Problem solved.

"But Won't My Files Get Bloated With Comments?"

Yes. That's fine.

Look, the big complaint about LLM-generated code is the volume — agents tend to produce more code than a human would. Fair. But more docs is the lowest-risk bloat there is.

Docs don't run.
Docs can't crash.
Docs can't leak secrets.
Docs can't introduce a SQL injection.

Worst case, you scroll past a few extra screens. Best case, the next person (or agent) doesn't repeat last week's bug. That trade is so lopsided it's almost embarrassing.

The Old Reason We Avoided Heavy Comments — And Why It No Longer Applies

For years, the conventional wisdom was: don't over-comment. Comments lie. Comments rot. You'd update the code, forget the comment, and three months later someone trusts a stale doc and ships a bug.

That was a real problem. It was also a people problem.

Agents don't have it. When you change the function, the agent updates the comment in the same edit. You can ask it to. You can tell it to do it without asking. It's actually good at it — it'll notice the comment, read it, decide whether the change still matches, and rewrite the comment to fit. The doc-rot tax that made us all minimalists about in-code documentation is gone.

This is the actual unlock. Doc-heavy code is finally safe.

How to Get Started (Takes Zero Setup)

This isn't a tooling change. There's nothing to install. Try this on your next session:

Next change you ship: ask the agent to leave a fat comment block explaining the constraint it just learned. One sentence is fine. Six paragraphs is also fine.
Next regression you fix: in the same turn, have the agent write the warning comment that would have prevented it. Past you didn't know. Future you (and future agents) will.
Your CLAUDE.md: open it. Look at it honestly. If most of it is rules the agent ignores, delete them. Keep the build commands and exotic tooling notes. That's the part the agent actually can't infer. (Yes, this very blog has a CLAUDE.md. Yes, it's mostly a voice/style guide. Yes, I'm slowly moving the parts that matter into the code.)

Why This Matters

Reviewing AI-generated code is the bottleneck now, not writing it. Anything that makes review faster pays for itself within a single session.

Co-located comments are the cheapest, lowest-risk way I've found to make review faster. The context is in the diff. The warnings are next to the danger. The reasons are next to the decisions. The agent reads them automatically because it has to read the code anyway.

Stop maintaining a rulebook your agent skims. Put the rules where the agent has to look.

Be effective. Not minimalist.

Photo by Ferenc Almasi on Unsplash

Content on this blog was created using human and AI-assisted workflows described here. Original ideas and editorial decisions by Justin Quaintance.