Implementing a Skills Loading Tool for AI Agents

How to give your AI agent access to dozens of skills without blowing up your context window

If you're building an AI agent with specialized skills, you've probably hit this tradeoff: do you dump all instructions into the system prompt, or make the model ask for what it needs?

Here's a lazy-loading pattern that balances both.

The Problem

Say your assistant has 10+ skills detailed instruction sets for things like data analysis, report generation, and workflow automation. Each skill is a hefty markdown file. Together, they can easily eat 10,000+ tokens.

Stuffing all of them into the system prompt means the model pays that cost on every request, even when most skills are irrelevant. But if the model doesn't know what's available, it can't ask for what it needs.

The Solution: Index + Load

Expose a lightweight index in the system prompt, and give the model a tool to load the full content on demand. Think of it like a table of contents vs. reading the whole book.

1. Define skills as markdown with frontmatter

Each skill is a markdown file. The frontmatter holds metadata, the body holds the actual instructions:

---
name: Data Analysis
description: Analyze datasets with statistical methods and visualizations
triggers:
  - analyze data
  - statistics
---

# Data Analysis Instructions

When analyzing data, follow these steps...
(detailed instructions, templates, examples, etc.)

2. Build a lightweight index for the system prompt

Instead of injecting full skill content, build a summary table — just names and descriptions:

function buildSkillIndex(skills: ParsedSkill[]): string {
  const rows = skills.map(
    (s) => `| ${s.metadata.name} | ${s.metadata.description} |`
  );

  return [
    "## Available Skills",
    "Use the `loadSkill` tool to load a skill when needed.",
    "",
    "| Skill | Description |",
    "|-------|-------------|",
    ...rows,
  ].join("\n");
}

This adds ~200 tokens to your prompt regardless of how many skills you have, versus thousands for the full content.

3. Create the loadSkill tool

This is the key piece. The model calls this tool to fetch full instructions on demand. The .max(3) cap prevents it from loading too many at once:

function createLoadSkillTool(skills: ParsedSkill[]) {
  return {
    description: "Load specialized skills when you need detailed guidance.",
    parameters: z.object({
      names: z.array(z.string()).min(1).max(3),
    }),
    execute: async ({ names }) => {
      const loaded = [];

      for (const name of names) {
        const skill = skills.find(
          (s) => s.metadata.name.toLowerCase() === name.toLowerCase()
        );
        if (skill) loaded.push({ name: skill.metadata.name, content: skill.content });
      }

      return { success: loaded.length > 0, skills: loaded };
    },
  };
}

4. Wire it up

Register the tool conditionally, only if skills exist. The index goes in the system prompt, the content stays behind the tool:

const { skills, context } = categorizeFiles(files);

const systemPrompt = `You are a helpful assistant.\n${buildSkillIndex(skills)}`;

const tools = {
  ...baseTools,
  ...(skills.length > 0 ? { loadSkill: createLoadSkillTool(skills) } : {}),
};

The Tradeoff

Factor All in Context Index + Load Tool
Latency Immediate - one turn Extra round-trip per skill load
Token cost Pay for everything, every time Pay only for what's used
Reliability Model always has instructions Model might forget to load a skill
Scalability Degrades as skills grow Stays flat

Use direct context when you have a handful of short skills, or every request needs them.

Use the load tool when you have many large skills and most requests only touch one or two. In our case, the average request uses 1 skill out of 10+ — so the savings are significant.

One thing to watch out for

The model can sometimes skip loading a skill even when it's relevant. A nudge in the system prompt helps ("Load a skill when the user's request matches its description"), but it's not bulletproof. If a skill is critical to every interaction, just put it in the context directly.