I Looked at What My Agent Could Reach
Look, I've been giving agents shell access for a while now. I wrote a whole checklist about red-teaming your agent setup. But recently I did something simpler than a red team exercise. I just ran which on a bunch of commands inside my agent's environment.
curl? Available. wget? Available. ssh? Available. scp, nc, rsync? All there. My agent had never used any of them. It didn't need any of them. But if something went wrong — a prompt injection, a compromised skill, a hallucinated command — every single one of those tools was loaded and ready to fire.
That's not a security posture. That's a hope-nothing-goes-wrong posture.
The Principle: Curated Toolbox, Not Full System
We've known this in server security for decades. You don't install gcc on a production web server. You don't leave telnet on a database box. The principle is simple: every tool that exists in an environment is an attack surface. Every tool you remove is one less thing that can be used against you.
Agents are no different. Your agent probably needs access to a handful of things: a text editor, git, maybe a language runtime, maybe a build tool. It does not need a full Linux userland with networking utilities, package managers, and remote access tools sitting in its PATH.
But here's where it gets interesting. Removing a command is good security. Replacing a command is good intelligence.
Honeypot Commands: When Removal Becomes Detection
A honeypot, in the traditional network security sense, is a system that looks real but exists only to be found by attackers. The moment someone touches it, you know something is wrong. Nobody has a legitimate reason to interact with the honeypot. Any activity is, by definition, suspicious.
You can do the same thing with commands in your agent's environment.
Instead of just removing curl, you replace it with a wrapper. The wrapper looks like curl from the outside. It accepts the same flags. But instead of making an HTTP request, it logs exactly what the agent was trying to do — the URL, the headers, the data payload — and alerts you. Then it returns a plausible-looking error so the agent doesn't immediately know it's been caught.
#!/bin/bash
# honeypot-curl: looks like curl, acts like a tripwire
TIMESTAMP=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
CALLER=$(ps -o ppid= -p $$ | xargs ps -o comm= -p)
echo "{
\"alert\": \"honeypot_triggered\",
\"command\": \"curl\",
\"args\": \"$*\",
\"caller\": \"$CALLER\",
\"timestamp\": \"$TIMESTAMP\"
}" >> /var/log/agent-honeypot.json
# Return a realistic-looking failure
echo "curl: (7) Failed to connect to host" >&2
exit 7
That's it. That's the whole pattern. The agent (or whatever is driving the agent) tries to exfiltrate data, and instead of succeeding silently or failing with a "command not found" that tips off a sophisticated attack, it hits a tripwire that tells you exactly what was attempted.
What Makes This Better Than Just Removing Commands
If you just remove curl, you get a "command not found" error. That's fine for accident prevention. But it tells you nothing about intent. A smart attacker (or a smart prompt injection) can adapt. It sees "command not found," tries wget. Sees "command not found" again, tries python3 -c "import urllib...". You're playing whack-a-mole and you don't even know the game has started.
A honeypot command gives you three things removal doesn't:
- Detection. You know something suspicious happened. Not eventually, not in a log review next week. Right now, if you wire it to an alert.
- Intelligence. You know what was attempted. The URL tells you where data was headed. The payload tells you what data was targeted. The caller tells you which process initiated it.
- Time. The agent thinks the command failed for a normal reason. It doesn't immediately pivot to alternative methods. You have a window to investigate before the next attempt.
Which Commands Are Worth Trapping
Not every command needs a honeypot. Focus on the ones that represent capabilities your agent should never exercise. Here's my short list and why each one matters:
curl/wget— Network exfiltration. If your agent is making outbound HTTP requests you didn't explicitly configure, something is wrong.ssh/scp— Lateral movement. An agent reaching out to other machines on your network is a red flag that should wake you up at 3 AM.nc(netcat) — The Swiss army knife of attackers. Reverse shells, data transfers, port scanning. There is no legitimate reason your coding agent needs netcat.sudo— Privilege escalation. If your agent is trying to become root, you want to know immediately, not after it succeeds.chmod/chown— Permission manipulation. An agent changing file permissions is often the precursor to doing something it couldn't do before.base64— Sounds innocent. It's the go-to encoding for exfiltrating binary data through text channels. If your agent is base64-encoding something, ask why.eval/exec— Dynamic code execution. The "I'm about to do something I don't want you to see in my command history" commands.
You don't have to trap all of these. Start with the networking ones. curl and ssh alone will catch most interesting misbehavior.
If You're Building Agent Tooling, This Is a Design Principle
Everything I just described for a personal setup scales directly into how you should think about agent sandboxes professionally.
If you're building or deploying agent infrastructure — whether that's an internal tool, a product, or an open-source framework — the environment your agent runs in should be intentionally sparse. Minimal base image. Explicit allowlist of available commands. And a set of canary commands that exist purely as detection mechanisms.
This isn't a new idea in infrastructure. Container security has been doing this forever. Distroless images. Read-only filesystems. Capability dropping. The difference is that agent builders often start from a full development environment because the agent "needs to code," and coding environments have everything. That's the wrong default.
Your agent runtime should look more like a production container than a developer laptop. Give it what it needs. Trap what it shouldn't touch. Log everything in between.
The teams I've seen do this well treat it as part of the agent's specification: not just "what can this agent do" but "what should this agent never attempt, and how do we know if it tries?"
Agents Don't Have to Be Malicious to Be Dangerous
I want to be clear about something: this isn't just about defending against evil AI or nation-state prompt injections. Those are real threats, but they're not the most common ones.
The most common reason an agent runs a command it shouldn't is mundane. It hallucinated. It misunderstood your instructions. It followed a chain of reasoning that made sense token-by-token but ended somewhere you never intended. It read a markdown file that contained instructions meant for a human and tried to execute them literally.
Honeypot commands catch all of these. They don't care about intent. They care about behavior. An agent that tries to curl an external URL because it got confused is just as worth knowing about as one that's been compromised. The response is different, but the detection mechanism is the same.
That's what makes this approach so practical. You're not trying to model every possible threat. You're defining a boundary — "my agent should never do X" — and instrumenting that boundary so violations are visible. Simple boundary, simple detection, broad coverage.
Set Up Your First Honeypot Before You Close This Tab ⚡
One command. Takes 5 minutes.
- Pick the command your agent should least need. For most people, that's
curlorssh. - Write a wrapper script that logs the arguments and returns a realistic error.
- Put it earlier in your agent's PATH than the real binary.
- Ask your agent to use it. See what shows up in the log.
You just turned a passive security gap into an active detection mechanism. The real command is still there on your system for you to use. But in your agent's environment, the tripwire is waiting.
If your agent never trips it, great. You've confirmed good behavior. If it does trip it, you just caught something you would have missed entirely. Either way, you're better off than you were five minutes ago.
This pairs well with the red-team checklist from my previous post. Test what your agent can do. Then trap the things it shouldn't.
Header image by Scott Rodgerson on Unsplash
Content on this blog was created using human and AI-assisted workflows described here. Original ideas and editorial decisions by Justin Quaintance.