ATHEORY.AI
← Back to Writing

Introducing Skillex: the sane way to manage agent skills

An open-source CLI and MCP server that treats agent skills as managed infrastructure.

The primary interface between developers and their AI agents is a markdown file dropped in a project root. CLAUDE.md, AGENTS.md, .cursorrules, SKILL.md — the file names change, the vendors jostle for position, but the fundamental mechanism is the same. You write instructions in a text file. The agent reads them. It becomes slightly less useless at your specific project.

This is genuinely powerful. These files transform a general-purpose language model into something that understands your conventions, your architecture, your deployment pipeline. They are the difference between an agent that writes plausible code and one that writes correct code in the style your team expects, using the frameworks you actually use, following the patterns you actually follow.

And yet. If someone had told you five years ago that the cutting edge of AI-assisted development would involve hand-maintaining plain text files and hoping your agent reads them correctly, you would have been disappointed. It feels provisional. It feels like the kind of thing that exists because the infrastructure hasn’t caught up yet.

It hasn’t. That’s what Skillex is for.

A brief history of teaching agents

The timeline is compressed but worth understanding because it explains why we are where we are.

It started with CLAUDE.md. Anthropic introduced it as a way for Claude Code to load project-specific context at the start of every session. One file, loaded into the context window before the first message. Cursor had its own variant with .cursorrules. Windsurf had another. Each vendor created a slightly different mechanism for the same basic idea: let the developer pre-load instructions that shape the agent’s behavior.

In mid-2025, AGENTS.md emerged from a collaboration between Sourcegraph, OpenAI, Google, Cursor, and others. The intent was to standardize the approach — one file format that any agent harness could read. It is now maintained by the Agentic AI Foundation under the Linux Foundation and is supported by Claude Code, Cursor, GitHub Copilot, Gemini CLI, Windsurf, and a growing list of others.

Then came skills. Anthropic published the Agent Skills specification as an open standard in late 2025, and OpenAI adopted the same SKILL.md format for Codex CLI and ChatGPT. Skills moved beyond a single flat file. Each skill is a directory containing a SKILL.md with instructions, optional scripts, reference files, and templates. The agent loads them on demand based on the task at hand.

The ecosystem has grown rapidly. Community libraries with over a thousand skills. Marketplaces appearing. VS Code extensions contributing skills through package.json. The SKILL.md format has become a genuine cross-platform standard.

This is real progress. The industry has converged on a common mechanism for giving agents project-specific behavioral guidance. The content side of the problem — what to tell your agent — is being solved by a growing community.

The delivery side remains broken.

The discovery problem

Here is how skills work in every major agent harness today: the agent scans a set of directories for SKILL.md files, reads their descriptions, decides which ones seem relevant to the current task, and loads them into the context window.

Discovery at runtime. The agent is making judgment calls about what to load every time it operates. For a handful of skills in a small project, this works fine. As the number of skills grows, three things go wrong.

First, the context window fills with irrelevant content. Every skill description that gets loaded costs tokens. Apideck documented a case where MCP tool definitions alone consumed 72% of a team’s context window. Skills have the same problem at a different layer — the agent reads through descriptions and content for skills that have nothing to do with the current task.

Second, the loading is non-deterministic. The agent decides what looks relevant based on its own judgment, which means the same question in the same project can result in different skills being loaded on different runs. Different context means different answers. You cannot reproduce the behavior. You cannot debug it.

Third, the wrong skills get loaded. Private development guidance intended for package maintainers ends up in the context of a developer consuming the package. A migration guide for version 1 surfaces when the developer is on version 3. Repo-wide conventions dilute package-specific guidance. The signal-to-noise ratio degrades with every skill you add.

The cruel irony is that the more effort you invest in writing high-quality skills, the harder it becomes for the agent to use them well. Good content drowns in the noise of content that should not have been loaded at all.

The maintenance problem

Separate from discovery, there is the question of how skills stay current.

A developer writes a skill explaining how to use a library. A colleague copies it into their project. The library releases a breaking change. The copied skill is now wrong, and nobody knows. This pattern is happening across the industry right now, at every scale.

Skills today are detached from the code they describe. They live in project roots, in shared directories, in community repositories. There is no mechanism for a library author to ship skills alongside their code so that when someone installs version 2.3.1, they get the guidance that matches 2.3.1. When the library updates, the skill doesn’t. It sits in someone’s project root and rots.

For package publishers this is particularly frustrating. You know the pitfalls of your API. You know the migration path from v2 to v3. You have written the guide. But you have no way to deliver it. You can put a markdown file in your repository and hope someone finds it. You cannot ship skills with your package.

The result is that every team writes the same guidance from scratch, maintains it independently, and watches it diverge from reality.

The MCP parallel

It is worth considering the Model Context Protocol here, because skills and MCP share a similar trajectory and similar growing pains.

MCP was introduced by Anthropic in late 2024 as a standard for connecting AI agents to external tools and data sources. Adoption was extraordinary — 97 million monthly SDK downloads, over 10,000 registered servers, adopted by OpenAI, Google, and others, donated to the Linux Foundation. It became the connective tissue of the agentic AI ecosystem.

And then the cracks showed. Security researchers documented serious vulnerabilities — prompt injection, tool poisoning, credential theft, supply chain attacks. Enterprise teams discovered that authentication was immature and there was no standard governance model. At ASK 2026, Perplexity’s CTO outlined the operational case against MCP in production, citing context window consumption and reliability concerns.

MCP did not fail. It is not going away. But the experience revealed a pattern that applies directly to skills: rapid ecosystem adoption outpaces the infrastructure needed to make it safe and manageable. The same categories of risk apply — untrusted content getting loaded into agent context, no audit trail for what influenced the agent’s behavior, no mechanism for controlled ingestion of external skills.

Skills are on the same trajectory. The community is growing fast. The infrastructure for managing them is not there yet.

What enterprises are trying

Large organizations see these problems clearly and reach for the same solution: the skills monorepo.

Create an official repository of approved skills, organize them into categories with index files and tables of contents, have teams pull from this central collection. In theory: version control, review processes, a single source of truth.

In practice: you have become a book publisher. Maintaining tables of contents, chapter indices, categorization schemes, cross-references. Every time someone adds a skill, the index files need updating. Every time you reorganize, the entire structure changes. And the agent still has to parse the hierarchy and make judgment calls about which branch to follow. You have made the discovery path more structured. It is still discovery at runtime.

Then the distribution problem surfaces. Teams pull the skills repository into their projects — as a Git submodule, a cloned directory, a periodic manual copy. Now you are coordinating versions across teams. Did someone update a skill last Tuesday and forget to notify downstream consumers? When the platform team changes their deployment skill, does the mobile team get the update?

This is manual dependency management. Versioning, distributing, and synchronizing shared artifacts across projects by hand. This is the exact problem that package managers solved decades ago. npm and pip exist because manually copying shared code between projects doesn’t scale. For skills, teams are back to doing precisely that.

What Skillex does

Skillex is an open-source CLI and MCP server that treats agent skills as managed infrastructure. Two core ideas.

The first: skills should be resolved at build time, not discovered at runtime. When you run skillex refresh, the tool scans your project, resolves your dependencies, discovers which packages export skills, applies scope rules, and builds everything into a SQLite registry. At runtime, the agent doesn’t browse. It queries. skillex query --path src/auth.ts --topic error-handling returns exactly the skills that apply to that file and that topic. One call. The right content. Nothing else in the context window.

The second: skills should travel with the code they describe. A package publisher adds "skillex": true to their package.json and places skill files in skillex/public/ and skillex/private/. When someone installs the package, the skills arrive with it. When they upgrade, the skills update. In a monorepo where different workspaces depend on different versions of the same package, each workspace gets skills matched to its resolved version. The skills are never copied. They are never stale. They are part of the dependency graph.

How it works

You initialize Skillex in your project and it detects your structure, creating a configuration file that defines scope rules — which skills apply to which paths, and where dependency boundaries exist.

Each skill is a Markdown file with YAML frontmatter for topics and tags. Skills are organized as public, for people consuming the package, or private, for people working on the package source. The distinction is enforced by directory convention: skillex/public/ and skillex/private/. The linker decides which to serve based on the relationship between the working path and the package.

When you run skillex refresh, the tool walks your dependency graph, finds every package that declares Skillex exports, reads their skill files, parses the frontmatter, and indexes everything into a SQLite database tagged with scope assignments, topics, visibility, source package, and version. It is deterministic — the same project state always produces the same index.

The agent interacts with Skillex through one of three interfaces. If the agent harness supports MCP, Skillex runs as an MCP server that exposes skills as resources and the query engine as a typed tool. If it doesn’t, the agent can call the CLI directly — skillex query outputs structured JSON to stdout. As a fallback, Skillex auto-generates a section in AGENTS.md that teaches the agent what is available and how to ask for it.

Every skill can have a co-located test file — structured Markdown with prompts and success criteria that the agent uses to self-evaluate whether the skill produces correct guidance. The CLI validates that test files exist and are well-formed. Both can run in CI.

External skills come through a controlled pipeline. skillex get fetches a skill from a URL, runs it through a safety review, converts it to the Skillex format, and vendors it into the project with provenance tracking. Nothing enters the system without inspection.

The result

The discovery problem is solved by moving resolution to build time. The agent never scans directories hoping to find what is relevant. It queries an index built from the dependency graph and the project’s scope rules. The answer is the same every time.

The maintenance problem is solved by making skills part of the package. When the code changes, the skills change with it. When a consumer upgrades, they get the updated guidance automatically. No copying. No drift. No stale files sitting in project roots.

The control problem is solved by making the entire pipeline visible and auditable. Every skill in the system has a known source, a known scope, and a known version. External skills go through a review pipeline before they enter. The registry is a deterministic build artifact you can diff, inspect, and verify.

The teams that can deliver the right context to their agents at the right time will build better software faster. That is the opportunity Skillex exists to unlock.

Skillex is open source. Find the repository, documentation, and quickstart guide at [link]. Install with npm install -g skillex and run skillex init in your project to get started.