Harnesses are everything. Here's how to optimize yours.

Friday, April 17, 2026 AI

Scraped Article

Engineers used to argue about IDEs, now we argue about harnesses. I've been using and contributing to open-source harnesses (Roo Code, DeepAgent CLI, HumanLayer), and here's what I wish I knew on day one: there are three things you can do right now to make your harness output orthogonal to slop. Yet all three still require human judgment. This guide covers these simple surfaces that separate harnesses that compound your output from ones that compound your mistakes: how to keep your config files lean enough to reason over, how to structure prompts using the R.P.I. framework so the model approaches problems the way a staff engineer would, and how to use subagents to keep your main context window clean. By the end, you'll have a concrete set of changes you can make to your setup today, and a clearer sense of why the harness, not just the model, is where your engineering judgment makes a difference. If the model is the source of intelligence, then the harness is what makes that intelligence useful. The harness’s primary job is to act as the scaffolding that: Manages the context in an inherently stateless LLM via sessions and compressions Makes functions like tool calls, I/O processing, and guardrails work around the model. Think of a harness as a `while (have next message) do {tool}` loop. One smooth harness amplifies your speed and quality of all code generated onwards. Keep your .md files lean and human-written The core shortcoming for agents today is the concept of “instruction budget”. To paraphrase Kyle from HumanLayer, frontier thinking LLMs can only follow a few hundred instructions before entering the “dumb zone”, where it starts to miss attending relevant instructions amongst the bloat. Giving too many instructions is functionally encouraging the model to hallucinate. For a global system prompt — ClAUDE.md or AGENTS.md — human-written outperforms LLM-generated. ETH research found LLM-generated system prompts degrade performance while costing ~20% more in inference. Describe the minimal requirements: what the project is, who the end users are. Every token should fight for its place, since it will be injected globally on every session. While instinct is front-load everything the model might need and prescribe if-else rules in as much detail as possible, parsing long context directly consumes valuable space in the context window, forcing the reasoning window to drop. Instead, apply Progressive Disclosure: only let the agent pull context when needed, and let it know what exists by giving individual .md files descriptive names. Here's how that plays out across the three common interfaces. CLIs Engineers already use progressive disclosure in CLIs without naming it. You run --help to see available subcommands, then drill into a specific subcommand's --help for its flags. The agent can do the same. This matters most for CLIs the model has never seen — a custom internal tool that wraps your API has zero training data. Without progressive disclosure, you'd need to paste the entire reference into context. With it, the agent runs mycli --help, finds the relevant subcommand, then runs mycli deploy --help to get specific flags. The model discovers commands for the tool as needed the same way you would, and context stays clean. Popular tools like kubectl or gh don't demonstrate this well because the model already knows their interfaces from training data. The real test is the CLI nobody outside your company has ever used. This also makes CLIs one of the cleanest uses of your CLAUDE.md or AGENTS.md. Rather than bloating those files with behavioral rules, use a few lines to document how to invoke a CLI the model isn't trained on. For example, uv is gaining adoption fast but models still fumble its flags and subcommands. A short line like "use uv for Python package management, run uv --help to discover subcommands before assuming syntax" gives the agent an entry point without wasting context on a full reference. Skills This is where the industry has converged. Claude Code, Codex, and OpenCode all implement progressive disclosure for skills the same way: at startup, only the name and description of each skill are loaded into context. The full SKILL.md instructions are read only when the agent decides a skill is relevant to the current task. Skills can point to reference files or scripts, which only load as needed. Write a clear, specific description and the agent can match on it without ever reading the body. Codex's own docs explicitly call this "progressive disclosure" and credit it as core to keeping context clean. As the engineer, this concretely means it is helpful to maintain specific instructions (skills) in separate files with clear naming conventions across your codebase that the agent can retrieve based on requests. MCP tools This is where harnesses diverge significantly. Claude Code ships with built-in MCP tool search: at session start, it loads a lightweight index of tool names, then searches and pulls fu