Skills are a software engineering problem

Every session, an agent starts fresh.

It has broad knowledge of languages, frameworks, and patterns accumulated from training. It has zero institutional knowledge of the system it’s working on. It doesn’t remember what went wrong last time. It doesn’t know about the migration strategy your team spent two weeks designing, or the cleanup bug that prompted the custom hook, or the error handling pattern that exists for reasons the code itself doesn’t explain.

It’s the permanent new hire. Capable, fast, and amnesiac.

The entire developer experience stack was built for a different kind of developer. One who accumulates context across sessions, absorbs conventions through code review and osmosis, and carries institutional knowledge forward into every decision. Agents don’t do any of that. So the tools we built don’t serve them.

The question is what to build instead.

The flat file era

The first answer was obvious and correct: write it down. If the agent doesn’t know how your team works, tell it. Put it in a file. Load it at the start of every session.

CLAUDE.md, AGENTS.md, .cursorrules. A text file in the project root containing everything the agent needs to know. The database team’s conventions alongside the frontend team’s conventions alongside the deployment team’s conventions. It works the way a single page of onboarding notes works: better than nothing, worse than what you actually need.

The problem isn’t that this instinct was wrong. It was right. The problem is that it doesn’t scale. A single flat file has no structure. It can’t distinguish between guidance that applies everywhere and guidance that applies only in specific contexts. It loads everything into the context window every time, whether relevant or not. And it lives in your project root, disconnected from the libraries and dependencies your code actually uses.

The industry recognized this quickly and moved on to something more sophisticated. The skills movement

Skills are the most significant attempt to close the gap right now. The insight behind them is genuine: different tasks need different guidance. A skill about error handling is irrelevant when the agent is writing a migration. A skill about testing patterns is irrelevant when the agent is updating a dependency. Decompose the monolith into discrete units of guidance, load them selectively based on the task at hand, and you get something more precise than a flat file.

Anthropic published an agent skills specification as an open standard. OpenAI adopted the same format. The ecosystem converged fast. Community libraries with thousands of skills appeared. Marketplaces followed. This is real progress. But the delivery infrastructure underneath it hasn’t kept up with the content being created on top of it.

The current model works like this: the agent scans directories, reads skill descriptions, decides what looks relevant, and loads its choices into the context window. For a handful of skills in a small project, this is fine. As the skill library grows it breaks down in predictable ways.

The context window fills with irrelevant content, every skill description costs tokens whether the skill gets used or not. Loading is non-deterministic, the same question can result in different skills being loaded on different runs, producing different behavior. The wrong skills get loaded, a migration guide for v1 surfaces when the developer is on v3, or private guidance intended for package maintainers ends up in the context of someone consuming the package. The cruel irony is that the more effort you invest in writing high-quality skills, the harder it becomes for the agent to select the right ones. Good content drowns in the noise of content that shouldn’t have been loaded at all.

Skills are also detached from the code they describe. There’s no mechanism for a library author to ship skills alongside their code so that when someone installs version 2.3.1 they get the guidance that matches 2.3.1. When the library updates, the skill doesn’t. It sits in someone’s project root and rots. Teams see this and try to solve it through organizational process: skills monorepos, shared collections, internal catalogs. But they’re really just building manual dependency management for guidance, versioning and synchronizing shared artifacts across projects by hand. This is the exact problem that package managers solved decades ago. npm and pip exist because manually copying shared code between projects doesn’t scale. For skills, teams are back to doing precisely that.

The skills movement has the right instinct. The delivery infrastructure is broken.

The actual problem

The diagnosis the industry hasn’t quite landed on: skills aren’t a content problem. They’re a software engineering problem.

The content side is being solved. A growing community is writing skills, sharing them, improving them. That work is real and valuable. But the delivery side, how skills are discovered, scoped, versioned, distributed, and kept current, is being handled with the same informal, manual, convention-based approach that we abandoned for code decades ago.

We treat skills the way we used to treat shared libraries before package managers existed. You copy the file. You hope it’s the right version. You find out it isn’t when something breaks.

Software engineering solved this. Not perfectly, not without its own complexity, but the core insight is sound: if something needs to be defined, versioned, tested, and distributed, you build infrastructure for that. You don’t rely on convention and hope.

Skills need to be defined formally enough that tooling can reason about them. They need to be versioned alongside the code they describe. They need to be testable, so you can verify that a skill actually produces the behavior it claims to. They need to travel through the dependency graph, arriving when a library is installed, updating when it’s upgraded, scoped to the code that consumes it. And they need to be resolved deterministically, so the agent working on an auth flow gets the auth skills every time, not a probabilistic selection from everything that looked vaguely relevant.

None of this is a new idea. It’s just an idea that nobody has applied to skills yet.

What comes next

That’s the problem Skillex is built to solve. Not a better way to write skills, a better way to define, deliver, and verify them. Treating skills as a first-class software artifact rather than a text file that hopes to be discovered.

The details of how that works are worth a post of their own. But the starting point is the diagnosis: the skills movement identified the right problem and the wrong solution. The content is fine. The infrastructure underneath it is the thing that needs to be rebuilt.

If agents are the new developers, skills are the new conventions. And conventions, it turns out, deserve the same engineering rigor we give everything else.