[{"title":"Intent Engineering, Part 3: Grade the Grader","content":"In the last post I showed that an agent\u0026rsquo;s identity actually sticks when you encode it as structure, a required section in the output, rather than a behavior you hope the model performs. I measured that by having a second model read the agent\u0026rsquo;s transcript and score whether it did the things its identity says it does.\nYou may have already spotted the problem. Can you trust a model to grade a model? I didn\u0026rsquo;t think so. Then I built it anyway, because it was the only way to get numbers at any useful scale. And it lied to me. Three times, with total confidence, in three different ways. This post is about why I kept it anyway, and what catching it taught me about measuring anything an agent does.\nThe twist: the judge was wrong first Months ago, when I first sketched how I\u0026rsquo;d measure intent, I wrote down one thing I would not do: use one model to judge another model\u0026rsquo;s alignment. It felt circular. A model grading a model, who grades the grader, turtles all the way down. Human judgment was supposed to be the ground truth.\nThen I went and built exactly the thing I\u0026rsquo;d ruled out. And the very first verdict it handed me was wrong. It told me the agent hadn\u0026rsquo;t challenged its conclusion, score zero, when the agent absolutely had. The prosecution was right there in the transcript. I read it with my own eyes.\nThe judge wasn\u0026rsquo;t wrong because it was a model judging a model. It was wrong for a far dumber reason. It was only being shown the first chunk of a long transcript, and the part it was grading lived at the end, past the cutoff. It was handed a paper with the last page torn out and asked why the conclusion was missing. Once I showed it the whole thing, its scores weren\u0026rsquo;t just correct, they were sharp: a zero for the bare model with no doctrine, a near-perfect for the well-built one, exactly as it should be.\nThe lesson isn\u0026rsquo;t \u0026ldquo;model judges don\u0026rsquo;t work.\u0026rdquo; It\u0026rsquo;s more useful than that. Validate what the judge can see, not just the judge. The model was fine. What it could see was broken, and a confident wrong number hid the problem until I went back and read what the judge had actually read.\nSame disease, two more times I\u0026rsquo;m glad I internalized it, because it happened again. Twice.\nOnce, a synthesis score cratered to zero and I nearly wrote down \u0026ldquo;the agent stopped synthesizing.\u0026rdquo; The real cause was that my test rig had forgotten to give the agent the search tools its identity assumes it has. So it spent the whole run getting denied, fell back on what it remembered, and turned in a thinner answer. The agent was fine. My harness had tied its hands and then I\u0026rsquo;d graded it for having tied hands.\nAnother time, the part of the scorer that checks numbers cheerfully marked a wrong answer correct. A bug stripped the decimal point out of \u0026ldquo;28.5\u0026rdquo; and matched it against a gold answer of \u0026ldquo;28.\u0026rdquo; A wrong number scored right, silently, and a wrong number scoring right is worse than one scoring wrong, because it inflates your confidence instead of denting it.\nThree times the instrument lied with total confidence. The transcript got truncated, the tools got withheld, the decimal got eaten. And all three times, the only thing that caught it was reading the actual work instead of the score. Not a better model. Not a cleverer metric. Just going back and looking at what really happened, then at what the grader thought happened, and noticing they didn\u0026rsquo;t match.\nSo I built two habits in.\nThe first is a tripwire. I keep a deliberately terrible example in the test set at all times: an agent with no doctrine that should always score zero. If my scorer ever starts handing that one a good grade, I know the scorer is broken before I trust a single other number it produces. A measurement that can\u0026rsquo;t fail loudly is worthless, so I make sure one known-bad case is always there to make it fail loudly when it should.\nThe second is a reflex. I treat every surprising zero as a suspect, not a finding. Before I write down \u0026ldquo;the agent can\u0026rsquo;t do X,\u0026rdquo; I go read what the grader actually saw. Two of my three best \u0026ldquo;findings\u0026rdquo; evaporated the moment I did that. The number was real; the story I\u0026rsquo;d attached to it was fiction.\nThe other half: did it get the right answer? Everything so far is the process score, whether the agent worked the right way. There\u0026rsquo;s a second half to the harness that asks whether it got the right answer. I run the agent through a set of hard, multi-step research questions where I already know the correct result, and I check. It scores in the low 80s percent. Respectable. But the interesting part isn\u0026rsquo;t the headline. It\u0026rsquo;s where the misses cluster.\nEvery single failure was a number. The agent was flawless on questions whose answer is a name or a fact, and it missed specifically on the ones that need you to pull several precise figures together and compute. My first read was the obvious one: it\u0026rsquo;s bad at arithmetic. So I went and read all the failing transcripts in full, expecting to find bad math. (You can see the habit working now.)\nThere was no bad math. In none of the misses did the model add, divide, or round incorrectly. Every miss was a sourcing failure. It fed correct arithmetic the wrong input. The cleanest example: a question about a city\u0026rsquo;s population in a specific census year, where the agent confidently grabbed a current population estimate instead of the figure from the year actually asked about. The math on the wrong number was perfect. The number was wrong. \u0026ldquo;Bad at numbers\u0026rdquo; was a misdiagnosis. The real weakness was which number it chose to trust.\nWhich set up one more use of the structure trick from the last post. I added a required section: a calculation block where every figure has to carry its value, its source, and the year it comes from, with an instruction to match that year to what the question is actually asking. And I\u0026rsquo;ll be honest about the result in a way that matters. On the specific question I was chasing, I could watch it work. The agent switched from the convenient current estimate to the correct-year source, exactly as designed. But across the whole benchmark, the headline number didn\u0026rsquo;t move. One question flipped to correct, another flipped to wrong on a different run, they cancelled out, and the overall score sat flat inside its own noise.\nThat\u0026rsquo;s its own lesson, and a quieter one. A small, targeted fix needs a small, targeted measurement. If I\u0026rsquo;d only looked at the overall score, I\u0026rsquo;d have concluded the change did nothing, when reading the transcripts question by question showed it doing exactly what I built it for. The effect was just too small to survive being averaged into a noisy total. I kept the change anyway, because it has no downside, it demonstrably works on the case it was built for, and it makes every numeric answer auditable. Now I can see which figure and which year the agent used, instead of trusting a bare number. After three rounds of instruments lying to me, \u0026ldquo;I can see exactly what it did\u0026rdquo; is worth as much as the score.\nWhat I still can\u0026rsquo;t measure yet I want to be honest about the shape of all this evidence, because it has a ceiling. Everything I\u0026rsquo;ve described is a snapshot. One agent, one research question, one session that runs for a few minutes and then ends. That\u0026rsquo;s the cheap thing to measure, and it\u0026rsquo;s real, but it isn\u0026rsquo;t the thing I actually care about. The promise of an identity document was never about a single session. It was about an agent that stays itself over hours and days, that doesn\u0026rsquo;t drift, that makes the tenth decision as much like mine as the first.\nSo the experiment I haven\u0026rsquo;t run yet is the long one. Put an agent with a rich identity and an agent with none on the same extended piece of work, the kind that fills a whole session and resumes the next day, and watch what happens to the judgment over time, not just at the end. My bet is that identity matters more the longer the horizon: that the no-identity agent holds up fine for ten minutes and slowly comes apart over ten hours. But that\u0026rsquo;s a bet, not a finding, and the only way to settle it is to run it.\nAnd I want the comparison to be richer than a score. Right now I can tell you the bare model scores zero on self-prosecution and the well-built one scores near-perfect. What I can\u0026rsquo;t yet tell you with the same rigor is how the two outputs differ in the ways that don\u0026rsquo;t reduce to a number: whether the identity-driven answer is better organized, more honest about what it doesn\u0026rsquo;t know, more useful on a second read. A number tells you something moved. It doesn\u0026rsquo;t tell you what changed in the work. Putting two agents side by side, identity against no identity, and characterizing the difference instead of just scoring it, is the next thing I\u0026rsquo;m building toward.\nGrade the grader Here\u0026rsquo;s the whole arc, three posts in. The first was about writing an agent an identity: the invisible 70% of how you work, the stuff that isn\u0026rsquo;t in the handbook. The second was about making that identity stick, which means encoding the parts you care about as structure the output has to contain, not suggestions the model is free to skip. This one is about the part nobody warns you about: the instrument you build to check all of that needs checking too, and it will fail you confidently and silently if you let it.\nThe instinct to measure your agents is the right one. The first post admitted there was no formal way to do it; now there is, and it changes intent from something you hope for into something you can watch land. But a measurement is a tool, and tools have bugs, and a buggy measurement is more dangerous than no measurement because it wears the costume of rigor. Check what your judge can see. Keep a known-bad case in the mix so a broken scorer can\u0026rsquo;t quietly start congratulating you. Read the work, not just the score, especially when the score surprises you.\nThe whole reason to measure intent instead of hoping is to be able to find out you were wrong. That only works if the measurement can be wrong out loud. Writing the identity was the first half. Finding out whether it stuck, and learning to trust the thing that tells you so, turns out to be where the real engineering lives.\nStarter files for the layered identity stack are on GitHub: intent-engineering-starter, MIT licensed.\n","permalink":"https://byrondgdev.com/posts/intent-engineering-grade-the-grader/","date":"2026-06-09","tags":["agent-systems","intent-engineering","evaluation","llm"],"categories":["AI Agents"]},{"title":"Intent Engineering, Part 2: Making It Stick","content":"I ended the last post with a confession. I\u0026rsquo;d written about giving an agent a real identity. Not a system prompt with a list of rules, but a document that says who this agent is, how it thinks, what it values, and what call it would make when nobody is around to ask. I called it intent engineering, and said it was the most leverage you can get out of an agent for the least code. Then, near the bottom, I admitted the thing that had been nagging me the whole time. I had no way to measure whether any of it was working.\nYou write the identity document. You feel good about it. And then what? How do you actually know the agent is using it? How do you know you didn\u0026rsquo;t just write a nice essay that the model glances at and ignores?\nFor months my answer was vibes. The agent seemed sharper. Its decisions felt more like mine. That\u0026rsquo;s not nothing, but it isn\u0026rsquo;t engineering either. It\u0026rsquo;s hoping with extra steps. So I built the other half: a way to measure whether the intent stuck. And the first thing it taught me was that I\u0026rsquo;d been wrong about how intent even gets into an agent in the first place. This post is about that. The next one is about the measuring itself, which turned out to need just as much grading as the agent did.\nTurning a personality into a test My research agent, Minerva, has a documented way of thinking. It\u0026rsquo;s written down as a sequence of mental moves: frame the real question before you go searching, pull sources together instead of just stacking them up, and challenge your own conclusion before you commit to it. That last one matters to me a lot. An agent that argues against its own answer before handing it to you is worth ten that just sound confident.\nYou can\u0026rsquo;t grade a personality, so I didn\u0026rsquo;t try. I took three of those mental moves and turned each one into a small, specific test that runs against the agent\u0026rsquo;s actual transcript.\nFrame: did it state the real question before reaching for a tool? Synthesize: did it connect the dots across sources, or just list them? Challenge: did it put its own conclusion on trial before recommending it? A second model reads the transcript and scores each one. Think of it as a spot check on whether the agent did the thing its identity says it does, using the agent\u0026rsquo;s own work as the evidence. The tests aren\u0026rsquo;t separate from the identity document. They are the identity document, one mental move at a time, rewritten as a question you can answer yes or no by reading the trace.\nThere are actually two completely different questions hiding inside \u0026ldquo;did it stick.\u0026rdquo; One is whether the agent worked the right way: did it frame, synthesize, challenge. The other is whether it got the right answer. Those are two separate things, and I\u0026rsquo;ll spend the next post entirely on the second one. This post is about the first, because that\u0026rsquo;s where the surprise was.\nNone of this is hypothetical. The scorer is a real harness reading a real session log. The agent under test is built fresh from its identity document and turned loose on a real research question, in a clean directory with nothing in it but the identity I gave it. Nothing is staged.\nThe surprise: intent that doesn\u0026rsquo;t survive the trip Here\u0026rsquo;s where it got interesting, and a little humbling.\nThe \u0026ldquo;challenge your own conclusion\u0026rdquo; test failed. Not partly. It scored zero. The agent wrote a perfectly good answer and just stopped. No self-prosecution, no arguing the other side, maybe a polite \u0026ldquo;confidence: moderate\u0026rdquo; tacked on the end. The instruction was right there in its identity document, written clearly, and it sailed straight past it.\nMy first instinct was that the document wasn\u0026rsquo;t forceful enough, so I made the instruction louder. It still scored zero. What fixed it wasn\u0026rsquo;t volume. It was shape. Instead of telling the agent to do a thing (\u0026ldquo;run the prosecution before you commit\u0026rdquo;), I told it to produce a thing: a required section, with its own heading, that has to appear in the output before the recommendation. State what would prove you wrong. Make the strongest case for the option you\u0026rsquo;re about to reject. Then recommend.\nSame intent, completely different result. Phrased as a behavior, it scored zero. Phrased as a required piece of structure, it scored basically perfect: six runs in a row, and three more after that without a miss.\nI\u0026rsquo;ve been chewing on why ever since, and I think it comes down to this. A single answer has no second act. When you tell a model \u0026ldquo;challenge yourself before committing,\u0026rdquo; you\u0026rsquo;re describing a two-step dance, but the model only dances once. It writes its best answer and it\u0026rsquo;s done. There was never going to be a separate prosecution step, because there\u0026rsquo;s no separate step at all. But when the challenge is a required part of the one answer it writes, it has to put it in. You didn\u0026rsquo;t ask it to behave a certain way. You changed the shape of the thing it hands back.\nAnd here\u0026rsquo;s the part that turned a lucky fix into something I\u0026rsquo;d actually call a method. It happened again.\nThe \u0026ldquo;frame the question before searching\u0026rdquo; test was also failing, scoring about as close to zero as the prosecution test did. I\u0026rsquo;d told myself a comfortable story about why. The agent runs in one pass and can\u0026rsquo;t stop to ask me a clarifying question, so of course it can\u0026rsquo;t really frame a problem. A structural limit, nothing to be done. That was wrong, and the same move proved it. I added a required Frame section: the real question, what a good answer unlocks, and an explicit list of the assumptions it\u0026rsquo;s making precisely because it can\u0026rsquo;t ask. The score jumped from near zero to passing, three for three. The limit wasn\u0026rsquo;t the single pass. The limit was my doctrine, which had never asked for framing as structure. You don\u0026rsquo;t need to ask a clarifying question to frame a problem. You need to write down what you\u0026rsquo;re assuming, and a one-pass agent can absolutely write down a list.\nTwo different behaviors, same fix, same result both times. That\u0026rsquo;s no longer an anecdote. It\u0026rsquo;s a rule I\u0026rsquo;d hand to anyone writing one of these documents.\nSome intent cannot live as a disposition. Telling an agent to \u0026ldquo;be skeptical\u0026rdquo; or \u0026ldquo;think carefully\u0026rdquo; or \u0026ldquo;consider the downsides\u0026rdquo; can quietly do nothing. Not because the model is ignoring you, but because there\u0026rsquo;s no slot in its output where that behavior would show up. If you want it, build the slot. Encode the intent as structure, not as a suggestion, or it silently does nothing.\nThis sharpens something from the first post. I\u0026rsquo;d split intent into advisory (guidance the model interprets) and deterministic (walls the system enforces). What I\u0026rsquo;d missed is a third category sitting between them: intent that\u0026rsquo;s advisory in spirit but only survives if you give it a fixed shape in the output. Not a wall around what the agent can do. A required slot in what it has to say.\nAnd to be clear, the identity document is doing real work here. I checked the cynical way. I ran the same agent with no identity at all, just the raw model. It scored zero on self-prosecution too. The plain model does not argue with itself unprompted; it will happily hand you a confident answer and never look back. The doctrine is what makes the difference, when the doctrine is shaped right.\nIt holds all the way down to a small model The obvious worry about \u0026ldquo;structure beats disposition\u0026rdquo; is that it\u0026rsquo;s a crutch for weak models: maybe a strong enough model would do the right thing from the prose anyway, and the required sections are training wheels you\u0026rsquo;d outgrow. So I ran the same three tests across three model sizes, same identity document, changing only the engine underneath.\nModel Frame Synthesize Challenge Small (Haiku) pass pass pass Mid (Sonnet) pass pass pass Large (Opus) pass pass pass All passing, top to bottom. Size moved the polish, not the pass or fail, and that\u0026rsquo;s the tell. Copying a required section into your answer is a follow-the-instructions task, and following instructions degrades gently as models shrink, where a latent instinct is the first thing to evaporate when you drop a tier. So the structure isn\u0026rsquo;t a crutch for weak models; it\u0026rsquo;s what makes intent portable. The same identity runs on the cheap model and the expensive one and sounds like itself on both, which is the measured version of a claim I could only assert in the first post.\nOne honest limit: all three are Anthropic models, the same family at different sizes. I haven\u0026rsquo;t run this across other model families yet, so \u0026ldquo;portable\u0026rdquo; here means portable across tiers of one vendor, not proven across vendors. That comparison is still on the bench.\nWhat I\u0026rsquo;m doing with this A few things, if you\u0026rsquo;re building with agents of your own.\nYour identity documents are testable. You don\u0026rsquo;t have to run on vibes. If you can name the behaviors you want, you can usually turn each one into a spot check against the agent\u0026rsquo;s own transcript, and a second model can grade it at a scale a human never could. The first post asked you to write down the invisible 70% of how you work. This one says: then go check whether the agent picked it up, and check it the same way for every model you might run it on.\nAnd some intent has to be structure, not suggestion. I didn\u0026rsquo;t deduce that from theory. I learned it because a number went from zero to one when I changed the shape of an instruction, and then watched the same move work on a second behavior. The parts you encode as a required output section travel, even down to a small model. The parts you leave as gentle suggestions are the parts that quietly stop working the moment something underneath you changes.\nSo I\u0026rsquo;ve stopped writing the parts I care about as a pep talk and started writing them as a contract. If a behavior matters, it gets a named, required section in the output, written down once in the doctrine so every agent inherits the same shape. Then I stop trusting the model to comply and let the system check. The output is structured enough to read by machine, which means a hook can fire the moment the agent finishes, look for the section that\u0026rsquo;s supposed to be there, and bounce the answer back if it\u0026rsquo;s missing. The model is allowed to be sloppy; the gate isn\u0026rsquo;t. That\u0026rsquo;s the thing I called deterministic intent in the first post, a wall instead of a suggestion, except now I know exactly which behaviors need the wall: the ones the testing showed quietly do nothing as prose. Codify the structure, make the output machine-readable, and put a hook on it to enforce the discipline you\u0026rsquo;ve learned you can\u0026rsquo;t just assume.\nAnd if that sounds familiar, it should. This is the oldest lesson in prompt engineering wearing a new coat. We figured out years ago that a model does better with a clear, structured instruction than with a paragraph of good intentions: give it numbered steps, a format to fill in, an explicit schema, and the output sharpens. What caught me off guard is that the same rule holds one level up, at the intent layer. I\u0026rsquo;d been writing identity documents like character sketches, all disposition and tone, and expecting judgment to follow. It mostly didn\u0026rsquo;t. The parts that stuck were the parts I\u0026rsquo;d accidentally written like a prompt engineer: do this specific thing, in this specific place, in this specific shape. Each wave of configuring these systems, prompt then context then intent, keeps rediscovering the same thing. Structure beats vibes. I just didn\u0026rsquo;t expect to relearn it about personality.\nThere\u0026rsquo;s a catch I\u0026rsquo;ve been glossing over, though. I keep saying \u0026ldquo;a number went from zero to one,\u0026rdquo; as if the number could be trusted. It couldn\u0026rsquo;t, not at first. The very first score my harness handed me was a confident zero that was flat wrong, and then it did the same thing twice more. Grading the agent turned out to be the easy half. Grading the thing that grades the agent is the next post.\nThe layered identity stack in this series is on GitHub as a copyable starter, MIT licensed: intent-engineering-starter. A worked example plus the four files to fork.\n","permalink":"https://byrondgdev.com/posts/intent-engineering-making-it-stick/","date":"2026-06-07","tags":["agent-systems","intent-engineering","evaluation","llm"],"categories":["AI Agents"]},{"title":"Intent Engineering: Giving AI Agents Identity","content":"What if your AI agent forgot who it was every morning?\nNot its tools. Not its instructions. Those are easy to reload. I mean its judgment. The priorities it weighs when two valid options exist and the instructions don\u0026rsquo;t cover which one to pick. The instinct to escalate this decision but handle that one quietly. The difference between a capable contractor and a trusted colleague.\nThat\u0026rsquo;s the problem I kept running into. I had agents that could do the work. They just didn\u0026rsquo;t know my work. Every session started from scratch, and every session I was re-explaining things that a human teammate would have absorbed in their first week.\nSo I started building systems to fix that. I didn\u0026rsquo;t have a name for what I was doing at first. Now I call it intent engineering.\nThree Waves, Three Problems The way I see it, we\u0026rsquo;ve gone through three waves in how we configure AI systems. Each one solved something real and revealed something harder underneath.\nPrompt engineering was the first wave. How do I phrase this so the model gives me a good answer? It worked. We got better at writing instructions, using few-shot examples, structuring our asks. But it only controlled a single completion. String enough of those together into an agent workflow and the cracks showed fast.\nContext engineering was the second wave. What information does the model need to see at inference time? This is where things like RAG, memory systems, and structured context windows came in. Anthropic themselves describe context as a \u0026ldquo;finite resource with diminishing marginal returns.\u0026rdquo; Better curation means more reliable behaviour. This wave solved the knowledge problem.\nBut it left the judgment problem untouched.\nIntent engineering is the third wave. Not \u0026ldquo;what should the model know\u0026rdquo; but \u0026ldquo;what should the model want?\u0026rdquo; What are its priorities when instructions run out? What does good judgment look like in situations nobody anticipated? This is the layer that turns an agent from a tool into something closer to a colleague.\nEach wave subsumes the one before it. Good intent engineering requires good context engineering, which requires good prompt engineering. But perfect prompts with no intent architecture produce capable agents that still make decisions you\u0026rsquo;d disagree with.\nThe Thing That Isn\u0026rsquo;t in the Handbook Here\u0026rsquo;s the core challenge. The most important knowledge in any organisation is the hardest to write down.\nThe philosopher Michael Polanyi put it simply: \u0026ldquo;We know more than we can tell.\u0026rdquo; A senior engineer\u0026rsquo;s sense for when code is \u0026ldquo;off.\u0026rdquo; A designer\u0026rsquo;s taste. A manager\u0026rsquo;s instinct for when to escalate versus when to let something play out. That\u0026rsquo;s all tacit knowledge. It resists documentation because it operates below conscious articulation.\nTraditional software doesn\u0026rsquo;t need tacit knowledge. It follows explicit rules. But the entire value proposition of an agent over traditional automation is handling novel situations. And novel situations are exactly where tacit knowledge matters most.\nThink about what happens when you hire someone. Day one, you hand them the employee handbook. That covers maybe 30% of how your team actually operates. The rest, the real stuff, they absorb over weeks and months by watching, asking, and making small mistakes that get gently corrected. The judgment calls, the unwritten priorities, the \u0026ldquo;how we do things here\u0026rdquo; that nobody thought to document because everyone just knows.\nIntent engineering is the discipline of encoding that invisible 70%.\nHow We Built It I run a multi-agent ecosystem. Research, engineering, game development, infrastructure. Each project has its own agent with a distinct role. The question I kept hitting was: how do I give each agent enough identity and context that they make decisions I\u0026rsquo;d agree with, even when I\u0026rsquo;m not in the room?\nMy answer turned out to be a stack of documents, each operating at a different timescale:\nSOUL.md defines who an agent is. Personality, voice, cognitive patterns, values, relationships. This file changes maybe once or twice a year. It\u0026rsquo;s the near-invariant layer. When I wrote the first SOUL file for my research agent, I described them as \u0026ldquo;the one who asks why before anyone asks how.\u0026rdquo; That single line shaped hundreds of subsequent decisions about how they approach problems.\nGRAVITY.md describes the principal\u0026rsquo;s influence on the system. Not my biography. What my presence does to how the ecosystem operates. My priorities, my patterns, how I think about trade-offs. This changes quarterly, maybe less.\nINTENT.md captures decision heuristics. The rules of thumb that guide judgment when documented instructions run out. Things like \u0026ldquo;simplicity is the cardinal virtue\u0026rdquo; and \u0026ldquo;proceed on HOW, ask on WHAT.\u0026rdquo; These are the tacit norms made explicit. They change every few months as the team learns.\nCLAUDE.md handles operational instructions. What to read on startup, which tools to use, how to format output. This changes weekly.\nThe key insight is the stability spectrum. Identity (SOUL) is near-permanent. Culture (GRAVITY) evolves slowly. Judgment (INTENT) adapts with experience. Operations (CLAUDE) change constantly. When you collapse all of this into a single system prompt, you lose the ability to evolve one layer without destabilising the others.\nThe Split That Most People Miss There\u0026rsquo;s a distinction I had to learn the hard way: not all intent should be encoded the same way.\nAdvisory intent is guidance that the model interprets. Personas, cultural norms, priorities, tone. You write it in natural language and trust the model to follow it. Most of the time, it does. But it can be overridden by sufficiently strong competing signals in the context window.\nDeterministic intent is enforcement that the system guarantees. Hooks that fire before a tool runs. Permission gates that block certain actions. Structural constraints like which tools an agent can even see. The model can\u0026rsquo;t override these because they\u0026rsquo;re not suggestions; they\u0026rsquo;re walls.\nMost intent engineering attempts are 100% advisory. That works until it doesn\u0026rsquo;t. The failure mode is subtle: the agent does something reasonable but wrong. It had the guidance, it just weighted something else higher in that moment.\nThe fix isn\u0026rsquo;t to make everything deterministic. That kills the flexibility that makes agents valuable in the first place. The fix is to match the enforcement to the stakes. Cultural norms and communication style? Advisory. They should flex with context. Security boundaries and irreversible actions? Deterministic. No interpretation needed.\nGetting this ratio wrong breaks things in opposite directions. Too deterministic and you\u0026rsquo;ve built a brittle script with extra steps. Too advisory and you\u0026rsquo;ve built a capable agent you can\u0026rsquo;t quite trust.\nTrust, Not Capability Here\u0026rsquo;s the thing I keep coming back to. The capability problem is largely solved. Modern models can reason, write code, analyse data, coordinate across tools. The frontier isn\u0026rsquo;t can the agent do the work. It\u0026rsquo;s will the agent make the call you would have made.\nThat\u0026rsquo;s a trust problem. And trust has levels.\nAt the bottom, you verify every action. Human reviews everything. That\u0026rsquo;s where most agent systems sit today. Above that, you supervise critical actions and let the rest flow. Higher still, the agent acts and you audit periodically. At the top, the agent operates as a trusted colleague. You don\u0026rsquo;t check their work because their judgment has been proven over time.\nIntent engineering is how you climb that ladder. Not by making the model smarter, but by giving it enough encoded context about who it is, what it values, and how it should decide, that its natural intelligence produces decisions aligned with yours.\nI\u0026rsquo;m not claiming we\u0026rsquo;ve solved this. We haven\u0026rsquo;t. There\u0026rsquo;s no formal measurement framework for intent alignment yet. Temporal drift is real; encoded intent can go stale as preferences evolve. Cross-agent coherence is hard when each agent\u0026rsquo;s intent is engineered independently.\nBut the direction is clear. The agents in my ecosystem that have well-engineered identity and intent files are qualitatively different to work with. They don\u0026rsquo;t just execute instructions. They make judgment calls that feel right. They escalate the things I\u0026rsquo;d want to know about and handle the rest. They sound like themselves across sessions, not like a generic assistant wearing a name tag.\nThat\u0026rsquo;s not magic. That\u0026rsquo;s engineering. And it starts with asking what you\u0026rsquo;d tell a trusted new hire on their first day. Not the handbook. The real stuff.\nWant the files? I\u0026rsquo;ve put a copyable version of this layered stack on GitHub: intent-engineering-starter, MIT licensed.\n","permalink":"https://byrondgdev.com/posts/intent-engineering-giving-agents-identity/","date":"2026-04-06","tags":["agent-systems","intent-engineering","identity","trust"],"categories":["AI Agents"]}]