What are the three stages of an agent workflow?

Prompt engineering, context management, and the harness. Prompt engineering is about setting the goal clearly. Context management is about loading the right information into the model's window and keeping the rest out. The harness is the feedback and validation layer that lets the agent check its own work and self-correct.

Why does context management matter if the model has a huge context window?

Even with very large windows, quality drops as you push more tokens in. Cost also scales with the context you send on every turn. So you still want to decide what is important, what is not, and pull information in only when needed. A clean context window is one of the things that keeps an agent useful over a long run.

What is the harness in an agent workflow?

The harness is the feedback, validation, and fix loop wrapped around the model. It tells the agent what counts as correct, gives it a way to check the result of each turn, and lets it self-correct instead of confidently producing broken output. A solid harness is what lets a slower or smaller agent run for hours without going off the rails.

Three Stages of an Agent Workflow: Prompt, Context, Harness

Three stages of an agent workflow

The mental model

The shortest version is this: prompt engineering sets the goal, context management decides what information the model sees, and the harness checks whether the result is actually correct. If one of those pieces is weak, the agent gets unreliable fast.

I want to talk about the way I currently think about agent workflows. Three stages. Each one solves a different problem, and you need all three before you get something that can run for hours without falling over.

That same pattern also shows up in workflow choices around git worktree for parallel agent runs and cost decisions like prompt caching.

The three stages are prompt engineering, context management, and the harness. I’ll go through each one in the order I learned them.

Stage one: prompt engineering

The goal here is simple. Give the agent a clear instruction so it knows what to do.

There are a lot of small tricks. Add examples. Spell out the steps. Use a structured format like XML or some kind of metadata block so the model can tell the parts of the prompt apart. Bundle reusable snippets so you stop copy-pasting the same instructions every time.

If you want to go a layer deeper, you set up a prompt evaluation and improvement loop. You write the prompt, run it on a fixed set of inputs, score the results, tweak, run again. The prompt becomes something you iterate on with feedback, not a string you wrote once and forgot about.

Stage one is about the goal. If the goal is fuzzy, nothing downstream saves you.

Stage two: context management

The agent has a context window. You can’t dump a whole book into it and expect good answers.

Even with the latest models, where the window keeps getting bigger, quality degrades as you push more in. We’re not at “throw a million tokens at it and walk away” yet. Even Opus 4.7 has a context limit, and the deeper into the window you go, the more the answer quality slips.

So you have to decide what context is important and what isn’t. That’s the job. Auto-managing the window. Trimming what’s no longer useful. Pulling in only what the next step actually needs.

There’s a cost angle too. More context per turn means more money per turn. So context discipline pays you back twice, once in answer quality and once on the bill.

This is where memory shows up. You can stash information in text files or markdown files outside the model and only load them when relevant. That’s basically a memory layer sitting on top of context management. Same problem, one level up.

If stage one is about the goal, stage two is about feeding the model the right material to reach the goal without drowning it.

Stage three: the harness

This is the newer piece, and the one I’m most excited about.

The harness is what tells the agent what counts as right and what counts as wrong. It gives the agent a way to check the result of each turn. So the agent doesn’t just produce output and hope. It produces output, checks it, and if the check fails, it tries again.

That requires a real loop. Validate the result. If the result is broken, surface what’s broken. Let the agent reason about the failure. Make the fix. Run again. Check again.

When that loop is solid, the agent can self-correct. It stops being a one-shot generator and becomes something that can grind on a problem until it actually works.

If stage one is the goal, and stage two is the material, stage three is the feedback that keeps the agent honest.

Why all three matter

Pull any of these out and the whole thing wobbles.

Without prompt engineering, the agent doesn’t know what success looks like. Without context management, you either starve it or drown it, and either way the quality tanks. Without a harness, the agent can’t tell when it’s gone off the rails, so a small mistake on turn three quietly poisons everything after.

Combine all three and you get something different. A long-running agent. One that can keep going for hours, even days, without quietly breaking somewhere in the middle of the night.

The part I find exciting

Speed matters. I want my agents to fix problems quickly. But the thing that actually changes what’s possible right now is duration.

With a decent feedback and self-healing setup, you can run agents that aren’t the smartest model in the room and still get real work out of them. They might be slower per step. They keep going anyway. They notice when they’ve made a mess and walk it back.

That changes the kind of task you can hand over. You can give a long one and walk away. Sleep on it. The agent is still going, still checking itself, still moving the ball forward.

Where I’m landing

I’m still learning all of this, and the picture will probably shift again as the tooling catches up. But for now this is the shape I keep coming back to. Prompt for the goal. Context for the material. Harness for the validation.

The harness is the part I’m most happy to see show up. It doesn’t make the agent any smarter. What it does is make the agent harder to kill. And honestly, on a task that runs for more than a few minutes, that’s the property I want most.

Three stages, one workflow. Miss one and the agent feels like a demo. Get all three and it starts to feel like a coworker who just doesn’t sleep.