Intent Engineering, Part 3: Grade the Grader

Tue, 09 Jun 2026 09:00:00 +0000

In the last post I showed that an agent’s identity actually sticks when you encode it as structure, a required section in the output, rather than a behavior you hope the model performs. I measured that by having a second model read the agent’s transcript and score whether it did the things its identity says it does.

You may have already spotted the problem. Can you trust a model to grade a model? I didn’t think so. Then I built it anyway, because it was the only way to get numbers at any useful scale. And it lied to me. Three times, with total confidence, in three different ways. This post is about why I kept it anyway, and what catching it taught me about measuring anything an agent does.

Intent Engineering, Part 2: Making It Stick

Sun, 07 Jun 2026 09:00:00 +0000

I ended the last post with a confession. I’d written about giving an agent a real identity. Not a system prompt with a list of rules, but a document that says who this agent is, how it thinks, what it values, and what call it would make when nobody is around to ask. I called it intent engineering, and said it was the most leverage you can get out of an agent for the least code. Then, near the bottom, I admitted the thing that had been nagging me the whole time. I had no way to measure whether any of it was working.

Llm on Byron DG — The Upstream

Intent Engineering, Part 3: Grade the Grader

Intent Engineering, Part 2: Making It Stick