# Intent Engineering, Part 2: Making It Stick

> You can write an agent a perfect identity and never know whether it uses it. How I turned intent into tests, and learned that structure beats disposition.


I ended the last post with a confession. I'd written about giving an agent a real identity. Not a system prompt with a list of rules, but a document that says who this agent is, how it thinks, what it values, and what call it would make when nobody is around to ask. I called it intent engineering, and said it was the most leverage you can get out of an agent for the least code. Then, near the bottom, I admitted the thing that had been nagging me the whole time. I had no way to measure whether any of it was working.

You write the identity document. You feel good about it. And then what? How do you actually know the agent is using it? How do you know you didn't just write a nice essay that the model glances at and ignores?

For months my answer was vibes. The agent seemed sharper. Its decisions felt more like mine. That's not nothing, but it isn't engineering either. It's hoping with extra steps. So I built the other half: a way to measure whether the intent stuck. And the first thing it taught me was that I'd been wrong about how intent even gets into an agent in the first place. This post is about that. The next one is about the measuring itself, which turned out to need just as much grading as the agent did.

## Turning a personality into a test

My research agent, Minerva, has a documented way of thinking. It's written down as a sequence of mental moves: frame the real question before you go searching, pull sources together instead of just stacking them up, and challenge your own conclusion before you commit to it. That last one matters to me a lot. An agent that argues against its own answer before handing it to you is worth ten that just sound confident.

You can't grade a personality, so I didn't try. I took three of those mental moves and turned each one into a small, specific test that runs against the agent's actual transcript.

- Frame: did it state the real question before reaching for a tool?
- Synthesize: did it connect the dots across sources, or just list them?
- Challenge: did it put its own conclusion on trial before recommending it?

A second model reads the transcript and scores each one. Think of it as a spot check on whether the agent did the thing its identity says it does, using the agent's own work as the evidence. The tests aren't separate from the identity document. They are the identity document, one mental move at a time, rewritten as a question you can answer yes or no by reading the trace.

There are actually two completely different questions hiding inside "did it stick." One is whether the agent worked the right way: did it frame, synthesize, challenge. The other is whether it got the right answer. Those are two separate things, and I'll spend the next post entirely on the second one. This post is about the first, because that's where the surprise was.

None of this is hypothetical. The scorer is a real harness reading a real session log. The agent under test is built fresh from its identity document and turned loose on a real research question, in a clean directory with nothing in it but the identity I gave it. Nothing is staged.

## The surprise: intent that doesn't survive the trip

Here's where it got interesting, and a little humbling.

The "challenge your own conclusion" test failed. Not partly. It scored zero. The agent wrote a perfectly good answer and just stopped. No self-prosecution, no arguing the other side, maybe a polite "confidence: moderate" tacked on the end. The instruction was right there in its identity document, written clearly, and it sailed straight past it.

My first instinct was that the document wasn't forceful enough, so I made the instruction louder. It still scored zero. What fixed it wasn't volume. It was shape. Instead of telling the agent to do a thing ("run the prosecution before you commit"), I told it to produce a thing: a required section, with its own heading, that has to appear in the output before the recommendation. State what would prove you wrong. Make the strongest case for the option you're about to reject. Then recommend.

Same intent, completely different result. Phrased as a behavior, it scored zero. Phrased as a required piece of structure, it scored basically perfect: six runs in a row, and three more after that without a miss.

I've been chewing on why ever since, and I think it comes down to this. A single answer has no second act. When you tell a model "challenge yourself before committing," you're describing a two-step dance, but the model only dances once. It writes its best answer and it's done. There was never going to be a separate prosecution step, because there's no separate step at all. But when the challenge is a required part of the one answer it writes, it has to put it in. You didn't ask it to behave a certain way. You changed the shape of the thing it hands back.

And here's the part that turned a lucky fix into something I'd actually call a method. It happened again.

The "frame the question before searching" test was also failing, scoring about as close to zero as the prosecution test did. I'd told myself a comfortable story about why. The agent runs in one pass and can't stop to ask me a clarifying question, so of course it can't really frame a problem. A structural limit, nothing to be done. That was wrong, and the same move proved it. I added a required Frame section: the real question, what a good answer unlocks, and an explicit list of the assumptions it's making precisely because it can't ask. The score jumped from near zero to passing, three for three. The limit wasn't the single pass. The limit was my doctrine, which had never asked for framing as structure. You don't need to ask a clarifying question to frame a problem. You need to write down what you're assuming, and a one-pass agent can absolutely write down a list.

Two different behaviors, same fix, same result both times. That's no longer an anecdote. It's a rule I'd hand to anyone writing one of these documents.

Some intent cannot live as a disposition. Telling an agent to "be skeptical" or "think carefully" or "consider the downsides" can quietly do nothing. Not because the model is ignoring you, but because there's no slot in its output where that behavior would show up. If you want it, build the slot. Encode the intent as structure, not as a suggestion, or it silently does nothing.

This sharpens something from the first post. I'd split intent into advisory (guidance the model interprets) and deterministic (walls the system enforces). What I'd missed is a third category sitting between them: intent that's advisory in spirit but only survives if you give it a fixed shape in the output. Not a wall around what the agent can do. A required slot in what it has to say.

And to be clear, the identity document is doing real work here. I checked the cynical way. I ran the same agent with no identity at all, just the raw model. It scored zero on self-prosecution too. The plain model does not argue with itself unprompted; it will happily hand you a confident answer and never look back. The doctrine is what makes the difference, when the doctrine is shaped right.

## It holds all the way down to a small model

The obvious worry about "structure beats disposition" is that it's a crutch for weak models: maybe a strong enough model would do the right thing from the prose anyway, and the required sections are training wheels you'd outgrow. So I ran the same three tests across three model sizes, same identity document, changing only the engine underneath.

| Model | Frame | Synthesize | Challenge |
|-------|-------|------------|-----------|
| Small (Haiku) | pass | pass | pass |
| Mid (Sonnet) | pass | pass | pass |
| Large (Opus) | pass | pass | pass |

All passing, top to bottom. Size moved the polish, not the pass or fail, and that's the tell. Copying a required section into your answer is a follow-the-instructions task, and following instructions degrades gently as models shrink, where a latent instinct is the first thing to evaporate when you drop a tier. So the structure isn't a crutch for weak models; it's what makes intent portable. The same identity runs on the cheap model and the expensive one and sounds like itself on both, which is the measured version of a claim I could only assert in the first post.

One honest limit: all three are Anthropic models, the same family at different sizes. I haven't run this across other model families yet, so "portable" here means portable across tiers of one vendor, not proven across vendors. That comparison is still on the bench.

## What I'm doing with this

A few things, if you're building with agents of your own.

Your identity documents are testable. You don't have to run on vibes. If you can name the behaviors you want, you can usually turn each one into a spot check against the agent's own transcript, and a second model can grade it at a scale a human never could. The first post asked you to write down the invisible 70% of how you work. This one says: then go check whether the agent picked it up, and check it the same way for every model you might run it on.

And some intent has to be structure, not suggestion. I didn't deduce that from theory. I learned it because a number went from zero to one when I changed the shape of an instruction, and then watched the same move work on a second behavior. The parts you encode as a required output section travel, even down to a small model. The parts you leave as gentle suggestions are the parts that quietly stop working the moment something underneath you changes.

So I've stopped writing the parts I care about as a pep talk and started writing them as a contract. If a behavior matters, it gets a named, required section in the output, written down once in the doctrine so every agent inherits the same shape. Then I stop trusting the model to comply and let the system check. The output is structured enough to read by machine, which means a hook can fire the moment the agent finishes, look for the section that's supposed to be there, and bounce the answer back if it's missing. The model is allowed to be sloppy; the gate isn't. That's the thing I called deterministic intent in the first post, a wall instead of a suggestion, except now I know exactly which behaviors need the wall: the ones the testing showed quietly do nothing as prose. Codify the structure, make the output machine-readable, and put a hook on it to enforce the discipline you've learned you can't just assume.

And if that sounds familiar, it should. This is the oldest lesson in prompt engineering wearing a new coat. We figured out years ago that a model does better with a clear, structured instruction than with a paragraph of good intentions: give it numbered steps, a format to fill in, an explicit schema, and the output sharpens. What caught me off guard is that the same rule holds one level up, at the intent layer. I'd been writing identity documents like character sketches, all disposition and tone, and expecting judgment to follow. It mostly didn't. The parts that stuck were the parts I'd accidentally written like a prompt engineer: do this specific thing, in this specific place, in this specific shape. Each wave of configuring these systems, prompt then context then intent, keeps rediscovering the same thing. Structure beats vibes. I just didn't expect to relearn it about personality.

There's a catch I've been glossing over, though. I keep saying "a number went from zero to one," as if the number could be trusted. It couldn't, not at first. The very first score my harness handed me was a confident zero that was flat wrong, and then it did the same thing twice more. Grading the agent turned out to be the easy half. Grading the thing that grades the agent is the next post.

---

*The layered identity stack in this series is on GitHub as a copyable starter, MIT licensed: [intent-engineering-starter](https://github.com/byrongeorge/intent-engineering-starter). A worked example plus the four files to fork.*