The discourse around AI and software development right now is loud and mostly unhelpful. Gurus with courses. Headlines about job death. Benchmarks that somehow keep improving while the experience on the ground stays stubbornly inconsistent. I’ve been all-in on AI-first development since the Copilot beta, trying to make it work on greenfield projects and brownfield systems handling billions of dollars a year in transactions (don’t worry - that work is done very carefully). The tools are genuinely impressive. They’re also, in ways that matter, not yet reliable enough to trust.
That gap between impressive and trustworthy is what ATHEORY.AI is about.
The thing nobody is building
If you look closely at most of what’s shipping right now, a pattern emerges. There’s a layer of polish, some orchestration, vocabulary that makes it sound like something new is happening. Peel it back and it’s often prompts calling prompts, wrapped in just enough structure to hold together. Which works. People are building real things this way. But it also means we’ve invented a kind of soft scripting language where the primitives aren’t functions or types, but instructions and phrasing. Control flow exists, but it’s negotiated rather than enforced. You can route, chain, retry, reason your way through steps, but you’re still relying on the idea that the system will behave roughly the same way it did last time.
And most of the time, it does. Until it doesn’t.
That’s the part that’s easy to gloss over. Because from the outside, it looks structured. But once you’re a few layers in, the basic questions get hard to answer without hand-waving. Why did this work? Why did it fail? What changed? Not in theory, in a way you can actually point to. We’re mostly composing uncertainty into slightly more elaborate shapes and calling that a system.
What’s actually missing
This isn’t a prompting problem. It’s not a model problem either, not exactly. It’s a representation problem. Software development has always had a gap between intent and implementation. We’ve papered over it with documents nobody reads, diagrams that go stale the week they’re drawn, and the accumulated intuition of whoever happens to be in the room. That worked, after a fashion, because humans are good at filling gaps with context. AI is not. It doesn’t carry your intent forward. It doesn’t know what the system is supposed to be, only what it can see right now.
So we compensate. We retry, rephrase, add more context, hope it finds the right file, hope it doesn’t drift, hope the thing that worked yesterday still works today. The slot machine pays out just often enough to keep you pulling the lever.
The gap that actually needs closing isn’t between today’s models and tomorrow’s. It’s between how we currently represent software, loosely, informally, in artifacts that degrade, and what AI needs to actually reason about it: something explicit, structured, and durable.
A theory
My claim is that intent, structure, capability, and verification are separable concerns that deserve first-class representations. That’s not how the industry is building right now. But I think it’s true, and I think it points toward a fundamentally different development experience than what anyone is currently shipping.
Chip designers solved a version of this problem decades ago. VHDL and Verilog exist because the complexity of hardware design demanded a formal intermediate representation, something that captured design intent independently of the implementation, that could be simulated, verified, and synthesized. Software never built that layer. We went straight from intent to code, and we’ve been paying the cost ever since in bugs, drift, and systems that only the original author fully understands.
AI makes this both more urgent and more tractable. More urgent because you can’t negotiate intent with a model the way you can with a colleague. More tractable because we finally have tools capable of working with richer representations than code alone.
That’s the thread ATHEORY.AI is pulling on. Four projects, one underlying question: Skillex asks: what does a capability actually look like when it’s formally defined, versioned, and testable, rather than an instruction file scattered through a repo hoping to be discovered? The Context Engine asks: what does it mean to give a system genuine knowledge of a codebase, semantic, structural, queryable, rather than just a window full of text?
The prompt workspace asks: how do you know a prompt works, not just once but consistently, across model changes, over time?
Spec IDE asks: what if design intent were a first-class artifact that fed everything else, your skills, your context, your verification, instead of evaporating the moment someone opens their editor? Each one is a probe into the same claim. Together they’re sketching an answer.
What this is
ATHEORY.AI isn’t a startup. There’s no funding round, no enterprise tier, nothing to buy. It’s an independent research lab in the oldest sense: one person, a workbench, and a theory worth testing. Some friends are starting to contribute. Everything is being built in the open.
I don’t have this figured out. I’m not pretending to. But I’ve followed this thread long enough to believe that the problem is real, the current answers are incomplete, and that building toward something better is worth the effort.
That’s the theory. These are the first experiments.