Working with LLM-assisted code generation has an odd side effect. As the tools improve, more effort shifts from producing code to judging whether it fits the system it enters. The review is about coherence with the existing design rather than about isolated correctness.
This tension appears quickly once generation becomes iterative. Early passes seem acceptable, but repetition introduces small deviations that gradually weaken conventions and shared assumptions. At some point, incremental fixes stop working, and starting over becomes the only reliable way to restore coherence.
I care about this because I like writing code. I like the act itself: structure, naming, the order of parameters, the slow convergence toward something that feels internally consistent. I don't particularly enjoy chatting so that someone else, or something else, writes it for me. Even when that "someone" is an agent, and even when the output is objectively good.
That preference doesn't disappear just because the context changes. I also work on projects with external commitments and shared responsibility. In those settings, opting out of agentic editing entirely would be artificial. I wouldn't impose that constraint on myself, and I wouldn't remove those tools from the people I work with. The reference point for what "normal" delivery looks like is already moving, whether I like it or not.
In personal work, where the goal is exploration and architectural clarity, I don't mind letting an agent write code either. What I don't believe is that, in 2026, this can be done without close inspection. Every generated file still needs to be checked against patterns, language, structure, and stylistic decisions I would have made myself if I had written the code by hand. That cost doesn't go away just because generation is automated.
Over time, the role shifts. Less time is spent writing code, more time is spent verifying that generated output still fits within an agreed shape. Doing that manually works for a while, but it degrades. Small deviations accumulate. Conventions soften. Eventually the gap between how a system is supposed to work and what the codebase actually looks like becomes wide enough that recovery turns blunt.
This failure mode isn't new. I've run into it before while working on HatMax, which started from a familiar place: repeatedly bootstrapping Go projects with the same structure, concerns, and practices. That effort drifted naturally toward generators, with the goal of producing idiomatic Go rather than introducing a DSL or a framework layered on top of the language. It worked initially. Over time, it became clear that generators tend to lock decisions in earlier than they should. As patterns and preferences evolve, the effort required to keep templates, guarantees, and generated code aligned starts to dominate.
That experience shapes how I think about agentic generation today. The risk I learned from deterministic generators is letting early outputs harden into decisions before the problem is fully understood.
Constraints expressed as documentation, conventions, or shared understanding tend to break down. LLMs sharpen this tension. They can already produce useful scaffolding from relatively small prompts, which reduces the appeal of maintaining large, mechanical generators. What they don't address is enforcement. Once generation becomes iterative, those constraints no longer live in code but in intent. With classic generators, this was mostly a template-maintenance problem. With LLMs, it becomes a behavioral problem.
This is the space Watchman lives in.
Watchman builds on a capability that Claude already exposes. Claude Code provides hooks between intent and execution. Watchman sits in that space. It evaluates tool requests as they happen and decides whether they should be allowed, denied, or flagged. It does not attempt to steer generation through prompts or heuristics. It applies constraints mechanically, at execution time. It can also issue lightweight, periodic reminders to reintroduce relevant context, without affecting how generation itself proceeds.
The shift is straightforward. Constraints are enforced where actions actually occur, rather than through prompts or ongoing human review. Rules apply to intent rather than to individual tools, so any action that violates a rule is blocked regardless of how it was attempted. The model remains free to generate within boundaries that are explicit and deliberate.
In practice, Watchman integrates with Claude Code as a pre-tool-use hook. It sits between the agent and the tools it uses. Every read, write, edit, or shell invocation passes through it first. The decision model is intentionally small: allow, deny, or warn.
The rules themselves are unremarkable, which is part of the point. Staying inside a workspace. Limiting which files can be edited. Keeping changes incremental. Checking invariants. Falling back to custom hooks when a project needs something more specific. These are the constraints that usually live in people's heads until they don't.
Configuration is equally direct. Rules live in YAML. Projects can override them explicitly. For cases that don't fit declarative rules, Watchman can invoke small pieces of code with full context about the attempted action. The constraints live outside the model, and they are inspectable.
This does not make agents autonomous, safe, or correct, and it does not remove the need for judgment. It reduces how often that judgment has to be exercised in real time. Undesired behavior no longer accumulates silently; it has to cross an explicit boundary to proceed.
From there, the rest is straightforward.
What I needed it to do
The rules are the ones I kept wishing for:
- Stay inside the workspace
- Limit which files can be edited
- Enforce versioning habits
- Keep changes incremental
- Check invariants with regex and glob rules
- Fall back to custom hooks for anything else
These rules apply to intent, not to a specific tool. If an action violates a rule, it is blocked regardless of how the tool tried to do it.
Wiring it into Claude Code
You add Watchman as a PreToolUse hook in ~/.claude/settings.json:
{
"hooks": {
"PreToolUse": [
{ "matcher": "Bash", "hooks": [{ "type": "command", "command": "/path/to/watchman" }] },
{ "matcher": "Read", "hooks": [{ "type": "command", "command": "/path/to/watchman" }] },
{ "matcher": "Write", "hooks": [{ "type": "command", "command": "/path/to/watchman" }] },
{ "matcher": "Edit", "hooks": [{ "type": "command", "command": "/path/to/watchman" }] },
{ "matcher": "Glob", "hooks": [{ "type": "command", "command": "/path/to/watchman" }] },
{ "matcher": "Grep", "hooks": [{ "type": "command", "command": "/path/to/watchman" }] }
]
}
}
Once this is in place, Watchman sees every tool request first.
Configuration in practice
Watchman reads a global config at ~/.config/watchman/config.yml and can be overridden per project via .watchman.yml (local wins, no merging).
Example configuration
version: 1
rules:
workspace: true
scope: true
incremental: true
invariants: true
workspace:
allow:
- /tmp
- ~/.cache
scope:
allow:
- "**/*.go"
- "**/*.md"
- "**/*.yaml"
- "**/*.yml"
- "**/*.json"
- "**/*.sql"
- Makefile
- Dockerfile
block:
- vendor/**
- node_modules/**
- dist/**
- "**/*_generated.go"
incremental:
max_files: 25
warn_ratio: 0.6
invariants:
content:
- name: "no-placeholders"
paths:
- "**/*.go"
- "!**/*_test.go"
forbid: "TODO|FIXME|XXX"
message: "Remove placeholder comments before committing"
- name: "no-debug-prints"
paths:
- "internal/**/*.go"
- "pkg/**/*.go"
forbid: "fmt\\.Print"
message: "Use a logger instead of fmt.Print in library code"
imports:
- name: "no-unsafe"
paths:
- "**/*.go"
forbid: '"unsafe"'
message: "unsafe package is not allowed"
commands:
block:
- sudo
- "rm -rf"
- chmod
- chown
- dd
The mental model
Claude makes a tool request. Watchman checks it. The request is either allowed, denied, or warned. That is the whole flow.
If you want to try it
go install github.com/adrianpk/watchman/cmd/watchman@latest
watchman setup
watchman init
Reference: github.com/adrianpk/watchman