the checklist pattern for agentic runs
a pattern for writing prompts that survive long agentic runs. numbered steps, explicit verification, a single 'done' criterion. nothing else works as well.
after three months of running long agentic tasks, i’ve converged on one prompt shape: the numbered checklist. it survives 4-hour runs better than any other format i’ve tried. here’s the shape and why each piece matters.
the shape
# goal: <one sentence, present tense>
## context (optional)
<3-5 lines of background only the agent couldn't infer>
## steps
1. <verb> <object> — <success criterion>
2. <verb> <object> — <success criterion>
3. ...
## done when
<a single, verifiable condition>
three rules:
- every step is a verb. “implement the rate limiter” not “the rate limiter.” passive headings get summarized away under load.
- every step has its own success criterion. “run pnpm test rate-limit, all green” beats “verify it works.”
- there’s one ‘done when’ at the bottom. without it, the agent ships the first thing that compiles. with it, the agent has a finish line and stops trying to add more.
why it works
agents under load — long runs, deep context — degrade at the same edges that humans do: they forget intermediate goals, they paraphrase instructions until the meaning drifts, they substitute proxies for the real metric. the checklist shape resists all three:
- numbered steps survive summarization better than prose
- success criteria per step give the agent a self-check between steps
- “done when” is a single anchor the agent can return to no matter how far it has drifted
the failure mode it doesn’t fix
the checklist won’t save you from a vague step. “improve performance” with success criterion “it’s faster” will still produce nonsense. you have to write a real step — “reduce p95 latency on /api/chat below 200ms, measured with pnpm bench” — for the pattern to do its job.