10x Your Coding Agent by Fixing Its Environment

Your coding agent is probably searching through node_modules, choking on JSON blobs, and butchering merge conflicts because it cannot see the original code. The fix is not a bigger model or a fancier prompt. It is better CLI tools, better representations, and better context flowing into the loop.

I used to think coding-agent quality was mostly a prompting problem.

Then I watched the same failure modes happen over and over:

the agent searched junk files and wasted half the context window
it read a giant JSON blob to extract three fields
it reviewed its own diff badly because the output was too noisy
it hit a merge conflict and confidently resolved the wrong thing
it produced code that looked plausible until the linter or compiler looked at it

After enough of that, the pattern became obvious.

Most coding-agent failures are not model failures first. They are context failures created by the environment the agent operates in.

That sounds abstract until you watch review in practice. A human reviewer is supposed to catch bugs before they ship, but they usually only see the lines that changed, not the twenty other places in the codebase that quietly depend on those lines. Diff review is narrow by default. Agents inherit the same blindness unless you fix the environment they operate in.

That is the framing I use now.

Coding-agent quality mostly comes from four layers:

perception
representation
verification
boundary

Perception is what the agent can search and notice. Representation is how legible the code, diffs, and outputs are. Verification is how fast the codebase pushes back. Boundary is what the agent can safely touch.

Once I started optimizing those four layers instead of endlessly rewriting prompts, the same models got meaningfully better.

Perception: search that does not choke on junk

Claude Code's built-in Grep tool is already a wrapper around ripgrep, so installing rg natively makes every search the agent runs faster. In a typical Node project the gap is massive, because ripgrep respects .gitignore by default. Your agent never wastes a search, or your context window, crawling through node_modules, dist, or .next.

fd does the same thing for file finding. It is gitignore-aware, has sane syntax, and makes it much easier to narrow the search space before the agent starts opening files. fzf is the fuzzy finder for when you need to interactively pick a file.

This is why I care about ripgrep less as a fast grep and more as a grep that behaves like it has judgment. Good retrieval compounds. Bad retrieval compounds too.

Install:

brew install ripgrep fd fzf
or apt install ripgrep fd-find fzf

Useful patterns:

rg -l 'pattern' returns only filenames, which is perfect for agent context
fd -e ts finds every TypeScript file in the repo in milliseconds while ignoring junk

If your agent keeps looking at junk, nothing downstream will save it.

Representation: output your agent can actually parse

A lot of agent failure is not weak reasoning. It is weak representation.

A noisy diff is a weak representation. A giant JSON blob is a weak representation. A full file dump when the agent only needs twenty lines is a weak representation.

bat is cat with syntax highlighting and line numbers. delta is the same idea for git diffs, with word-level highlighting and side-by-side view. Line numbers let the agent reference line 47 instead of vaguely describing surrounding code. Word-level diff highlighting shows exactly which token changed instead of marking a whole line as different.

When your agent reviews its own changes via git diff, delta makes the difference between “something changed here” and “this specific identifier was renamed.”

A lot of people say the agent should review its own changes. Fine. But what exactly is it reviewing? If the output is noisy and the diff is hard to parse, self-review becomes ritual rather than validation.

Install:

brew install bat git-delta
or apt install bat git-delta

Set delta as your git pager in ~/.gitconfig:

[core]
    pager = delta
[delta]
    navigate = true
    line-numbers = true
    side-by-side = true

And use bat -r 10:20 file.ts when the agent only needs one range. Instead of slurping a 2000-line file into context to inspect one function, it grabs exactly the slice it needs.

That sounds small, but it changes how verification feels. The loop goes from vague to inspectable.

The same is true for jq.

Every agentic workflow drowns in JSON: package.json, API responses, GitHub data, CI output, tool payloads, logs. Without jq, the agent loads the whole blob into context to grab three fields. With it, jq '[.[] | {number, title, state}]' cuts a 50-field-per-issue GitHub response to three fields.

The cleanest way to improve reasoning is often to hand the model a cleaner object.

Useful patterns:

jq empty file.json
jq '.dependencies | keys'
jq '.[] | select(.conclusion == "failure")'

Composto is a good example of what I mean

The most concrete recent example of this for me is a small review workflow I wrote up around composto.

composto digests JS/TS relationships like imports, callers, and signatures, then returns focused answers instead of raw file dumps. That matters because pure diff review is blind to exactly the things that break in practice:

callers outside the hunks
duplicated helpers the diff never shows you
file-level behavior the diff assumes but never states

The point is not “use more AI.” The point is “give the reviewer better context before critique starts.”

The workflow I wrote up does that in two phases.

Phase 1 runs three helpers in parallel before human review:

where does this change touch?
does a helper already exist for this pattern?
what does the most-changed file actually do?

Then Phase 2 critiques the change with that context already attached.

That is a much better workflow than asking a model to stare at a diff in isolation and improvise confidence. It also matches the real problem: code review is not just about the changed lines. It is about the relationships around those lines.

This is exactly what I mean by environment design. The model did not get better because I found a magic sentence. It got better because I changed what it could perceive before it made claims.

Boundary: stop letting your agent guess at merges

lazygit is a terminal UI for git, not for the agent, but for you watching what it does. Run Claude Code in one tmux pane and lazygit in another: stage individual hunks, undo with z via reflog, and jump between worktrees. Spin up two agents on two branches in two worktrees with no conflicts and no context-switching.

The bigger win is zdiff3 conflict style. Default merge markers show only “ours” vs “theirs,” so the agent has to guess what the original code looked like. zdiff3 adds the common ancestor. Now the agent can reason about intent instead of flipping a coin.

Install:

brew install lazygit

Add to ~/.gitconfig:

[merge]
    conflictStyle = zdiff3

Press w in lazygit's branch panel to manage worktrees.

Boundary design is part of quality.

Verification: the codebase has to disagree quickly

One of the worst things about AI-generated code is how often it is locally plausible and globally wrong.

That is why I increasingly think strong types are one of the best forms of prompt engineering available. A typed interface narrows the space of valid nonsense. The compiler becomes part of the interaction loop.

Fast linting matters for the same reason. Speed determines whether feedback stays in the loop or gets skipped. If validation is heavy, the agent accumulates mistakes before correction. If it is cheap enough, the loop stays tight.

That is what I like about tools like oxc. They make discipline cheap enough that it actually happens.

Install:

npm install -D oxlint

Add to your pre-commit hook or lint-staged:

oxlint --fix .

Pair it with TypeScript typechecking as a second layer. oxc catches style. tsc catches logic.

The easiest way to improve agent output is to make the environment disagree faster.

Close the loop without leaving the terminal

gh is what turns your agent from “writes code” into “ships code.”

The full loop is straightforward:

gh issue view 123
agent fixes the bug
gh pr create --fill
gh run watch --exit-status
if CI fails, gh run view <id> --log-failed

That last flag matters because it returns only the failing log lines, not 10,000 lines of passing tests.

Useful patterns:

gh api repos/{owner}/{repo}/whatever --jq '.field'
every gh command supports --json

If you only fix three things

If you want the highest-leverage changes first, do these:

make search gitignore-aware with ripgrep and fd
make review legible with bat, delta, and smaller structured inputs
keep fast machine feedback in the loop with linting and typechecking

Everything else compounds on top of that.

The real distinction

The mistake is thinking coding agents are mostly prompt systems.

In practice they are operating systems running inside an environment.

Their quality depends on:

what they can perceive
how state is represented
how quickly they are corrected
what boundaries they operate inside

That is why the same model can look brilliant in one repo and incompetent in another.

The difference is often not the prompt. It is the environment.

What made my coding agents better was not one better instruction block.

It was making the environment less noisy, the diffs more legible, the outputs more structured, the validation faster, and the operating boundary tighter.

The easiest way to make a model look smarter is often to stop making it work in the dark.