I Set Out to Fine-Tune a Model and Ended Up Building a Research Wing

I am a naturally curious person. I love learning how something works and then immediately looking for what I can point it at. For someone wired like me, AI is a bit of a dream come true. I can aim it at the internet, the largest pile of knowledge humans have ever assembled, and use it to learn and apply at a pace I could never manage on my own. With caution, of course. Scale cuts both ways.

This is the story of how that curiosity turned into a dedicated research practice, almost by accident.

Back in November 2025, I set myself a goal: learn how to fine-tune models using MLX, Apple’s machine-learning framework. I could have spent months reading papers and tutorials. Instead I thought, what better way to learn this than to do it with AI? So I picked up Claude Code, a coding agent that lives in your terminal and works directly on your files, and set out. I came to call that effort project Finetune.

One thing worked in my favor from the start: I already knew the technology reasonably well. That mattered more than I expected. When you understand the territory, you can delegate and steer an AI with real authority instead of quietly hoping it knows where it is going. Expertise did not become worthless the moment I had an agent. It became the thing that made the agent useful.

The next lesson surprised me: the harness matters as much as the model. Working with something file-system native, an agent that reads and writes real files in a real project rather than trading messages in a chat window, made a bigger difference than I expected. The work persisted and accumulated, instead of vanishing when a chat window closed.

Here is the rhythm that became the bread and butter of the project. Before we built anything, I would send my agent out on a research mission: go understand this problem, or figure out how to reach this goal, and come back with what you find. Then we would apply it. Research, then build. Little by little, mission by mission, we stacked up capabilities, standards, and the infrastructure to support them.

That is also where the memory system was born. I did not want to start from zero every single time, re-explaining everything the agent should already know. And I knew from experience that stuffing it all into one big instructions file, the CLAUDE.md that every Claude Code project carries, was not going to cut it. A growing practice needs real memory: something that holds onto what you learned and lets you build on it, not one file you keep editing by hand.

Then came the turn this whole series is named after. I started noticing that I was sending research missions out on questions that had nothing to do with fine-tuning. I was digging into agent design, tools, ideas from completely different domains. The research had outgrown the project it was born in.

So I did the obvious thing once I could see it. I split the research off into its own dedicated home. The research wing was born: a space whose only job is to produce useful, durable research knowledge, and to prototype ideas there before committing to them and wrapping real engineering standards around the ones that earn it. Build the knowledge first. Engineer the winners later.

If there is one belief sitting underneath all of this, it is old and a little unfashionable, and I am betting hard on it anyway. Knowledge is power. And if knowledge is power, then it is worth real time and real effort to build the infrastructure that lets your agents actually use that knowledge. Not just hoard it. Use it. That is the work that turns a pile of old answers into something you can reach into on demand, as a wiki you can read and a corpus you can query.

That is what the rest of this series is about: the infrastructure that makes knowledge usable. How the research learned to scale past a single agent. How a growing library starts paying you back instead of just getting heavier. How we keep it honest, and how one good investigation can become many things. None of it was planned up front. It got built one research mission at a time, because I kept being curious, and I kept refusing to start from zero.

The next post starts where the real leverage started: what happens when you stop sending one agent at a question and start sending a crowd.