TL;DR Skills are the cheap scalable primitive — 30–50 tokens until invoked versus 50k+ for a five-server MCP setup [4]. The
descriptionfield inSKILL.mdis the routing layer, not documentation [5]. For headless CI useclaude -p --bare; the Agent SDK brings the same loop into Python/TypeScript [8] [9]. Budget ~$150–250/dev/month at enterprise scale and set workspace spend limits before rollout [2].
Extension taxonomy
Claude Code ⭐ 130k [17] has four distinct extension primitives with different loading costs and scopes [1] [2]:
| Primitive | Context cost at startup | How it loads | Best for |
|---|---|---|---|
| Skills | ~100 tokens (name + description only) | Body loads on demand when relevant | Procedural knowledge, reusable workflows |
| MCP | Up to 18k tokens/server (deferred in Jan 2026) | Full schema unless Tool Search active | External system connectivity |
| Agents | Proportional to spawn prompt | Spawned on request; own context window | Isolated subtasks, parallel work |
| Hooks | Near zero (runs outside context window) | Shell commands at lifecycle events | Pre/post-tool validation, audit logging |
The operative framing: MCP is connectivity, Skills are procedural knowledge [4]. When you find yourself re-explaining the same workflow across sessions, that’s a Skill. When Claude needs to reach a system it can’t access independently, that’s MCP.
Context budget reality
MCP token overhead
MCP injects full tool schema — name, description, all parameter definitions — into every message [3]. Measured startup costs per server:
| MCP Server | Tools | Startup token cost |
|---|---|---|
| SQLite (minimal) | 6 | ~385 |
| Gmail | 7 | ~2,640 |
| Playwright | 22 | ~3,442 |
| GitHub MCP | — | 8k–12,000 |
| Full SQLite suite | — | ~13,400 |
| mcp-omnisearch | — | ~14,100 |
| Jira | — | ~17,000 |
The most expensive single tool in the Gmail server (gmail_create_draft) costs 820 tokens alone; the cheapest (browser_close) costs 59 [3]. A five-server developer setup reaches 50k–66k tokens before the first user prompt [15] [3].
Tool Search deferral (Jan 2026)
MCP Tool Search auto-enables when tool definitions exceed 10% of the context window. Only tool names enter context; full schemas load on demand via search [2]. Users report up to 95% reduction in startup token cost [15]. Set ENABLE_TOOL_SEARCH=auto:5 to lower the activation threshold to 5% for more aggressive deferral.
In one measured session: 5,900 tokens recovered from MCP tool deferral + 7,300 from system tool deferral = 13,200 tokens saved [3].
Skills stay cheap at scale
Skills load only ~100 tokens at startup (name + description). Body content enters context only when Claude deems the skill relevant. Teams routinely run 20–50 Skills simultaneously with negligible overhead [4]. Keep SKILL.md under 500 lines; overflow content lives in linked reference files [5].
Skill authoring craft
The description is the routing layer
Claude reads only name and description when deciding whether to load a skill. This makes the description the most operationally significant token in your skillset [5]:
# Good: specific triggers; what AND when
description: >
Extracts text and tables from PDF files, fills forms, merges documents.
Use when working with PDF files or when the user mentions PDFs, forms,
or document extraction.
# Bad: too vague to route correctly
description: Helps with documents
Rules enforced by the spec [5]:
- Always third person (“Processes Excel files…”, not “I can help…”)
- Include both what (capability) and when (concrete trigger phrases, file types)
- Max 64-char name (lowercase, hyphens only); max 1,024-char description; no XML tags
If a skill isn’t triggering, the root cause is almost always a vague description. Verify registration with /plugin [4].
Invocation mode
Add an invocation key to frontmatter to control who fires the skill [1]:
auto(default): Claude decides when the skill is relevantuser: skill only runs when explicitly invoked via/skill-name
Long reference skills or rarely-needed playbooks should use user to prevent spurious loads.
Progressive disclosure structure
my-skill/
├── SKILL.md ← loaded on trigger (≤500 lines)
├── reference.md ← read only when Claude follows the link
├── examples.md ← read only when needed
└── scripts/
└── validate.py ← executed, not loaded — only output consumes tokens
SKILL.md acts as a table of contents. Scripts execute without loading source; only their output costs tokens. Keep all reference links one level deep from SKILL.md — nested chains (A → B → C) cause partial reads [5].
Degrees of freedom
Match instruction specificity to task fragility [5]:
| Specificity | When to use | Example |
|---|---|---|
| High freedom | Multiple valid approaches; context drives choice | Code review, analysis |
| Medium freedom | Preferred pattern exists; some variation OK | Report generation |
| Low freedom | Fragile sequence; consistency critical | Database migrations, exact scripts |
Team deployment patterns
| Scenario | Skill encodes | MCP provides |
|---|---|---|
| Code review | House style, guardrails | GitHub (diffs, comments) |
| Database analysis | Safety constraints, schema context | Postgres |
| Incident response | Regression identification, PR draft flow | Sentry + GitHub |
| Support triage | Classification logic | Slack + Linear |
MCP beyond tools/call
Most builders only use the tool primitive. The spec defines six primitives across server- and client-side [6]:
Server-side (server exposes to client)
Tools — actions with explicit user approval per call. The one everyone knows.
Resources — read-only data identified by URIs and MIME types (travel://activities/{city}/{category}). Pulled by the client; never trigger operations. Treat them like structured files the server makes queryable.
Prompts — version-controlled instruction templates with typed variables. Servers define them; clients fill variables and build workflows. Enables centralized prompt governance: update once, propagate everywhere.
Client-side (client exposes to server)
These three unlock human-in-the-loop patterns that tool primitives alone can’t produce [6] [7]:
Sampling — server requests an LLM completion from the client’s model without needing its own API key. Users must explicitly approve the prompt before execution. Primary use: intent routing before tool dispatch — a 25-tool customer support server using Haiku to classify user intent adds ~200–800ms but removes 500 tokens of routing logic from the main agent prompt [7].
Server → sampling/create → Client (user approves) → LLM → Client → Server
Skip sampling for voice pipelines requiring <300ms response.
Roots — client declares filesystem URI boundaries the server may access during handshake. Prevents path guessing; enforces security perimeters when agents process customer-uploaded files or local knowledge bases [7]. Listen for RootsListChangedNotification when workspaces switch.
Elicitation — server pauses execution and requests schema-validated structured input from the user via the client’s UI (added in spec 2025-06-18) [6]. Replaces fragile multi-turn loops: define an enum schema, receive typed JSON back.
{ "action": { "enum": ["accept", "decline", "cancel"] } }
⚠ Check clientCapabilities?.elicitation before calling and fall back gracefully when the connected client doesn’t support it [7].
Human-in-the-loop pattern matrix
| Pattern | Use case | Latency cost |
|---|---|---|
| Elicitation | User confirmations, structured input | <100ms |
| Sampling | Intent routing, classification | 200–800ms |
| Roots | File scoping, security boundary | None |
| Hook system | Ops supervisor approval workflows | Async |
Headless operation & Agent SDK
claude -p --bare
--bare skips auto-discovery of hooks, skills, plugins, MCP servers, and CLAUDE.md [8]. This is the recommended CI mode: identical results on any machine regardless of local developer configuration. It will become the default -p behavior in a future release.
# Pipe data in, text out
cat build-error.txt | claude --bare -p "explain the root cause" > output.txt
# Schema-enforced JSON output
claude --bare -p "Extract function names from auth.ts" \
--output-format json \
--json-schema '{"type":"object","properties":{"functions":{"type":"array","items":{"type":"string"}}},"required":["functions"]}'
# Session continuity across invocations
session_id=$(claude -p "Start review" --output-format json | jq -r '.session_id')
claude -p "Continue with database layer" --resume "$session_id"
stdin is capped at 10MB; for larger inputs pass a file path in the prompt instead of piping [8].
Agent SDK
The Agent SDK brings the same loop — tools, context, hooks, subagents, skills, MCP — into Python and TypeScript [9]. Repos: claude-agent-sdk-python ⭐ 7.2k [10] · claude-agent-sdk-typescript ⭐ 1.5k [11]
from claude_agent_sdk import query, ClaudeAgentOptions
async for message in query(
prompt="Find and fix the bug in auth.py",
options=ClaudeAgentOptions(allowed_tools=["Read", "Edit", "Bash"]),
):
print(message)
import { query } from "@anthropic-ai/claude-agent-sdk";
for await (const message of query({
prompt: "Find and fix the bug in auth.ts",
options: { allowedTools: ["Read", "Edit", "Bash"] }
})) {
if ("result" in message) console.log(message.result);
}
SDK vs Managed Agents vs CLI [9]:
| Dimension | CLI (claude -p) |
Agent SDK | Managed Agents |
|---|---|---|---|
| Interface | Shell | Python / TypeScript library | REST API |
| Runs in | Your machine | Your process | Anthropic-managed sandbox |
| Session state | JSONL on filesystem | JSONL on filesystem | Anthropic-hosted event log |
| Custom tools | Via MCP config | In-process callbacks | You execute, return results |
| Best for | One-off scripts, local dev | Production automation, CI/CD | Long-running async sessions |
June 15 2026 billing change: Agent SDK and claude -p usage migrates from subscription quota to a separate monthly Agent SDK credit [18]. Plan for this now if you bill agent usage against team quotas.
Hooks in the SDK
SDK hooks use callback functions rather than shell commands. PostToolUse for audit logging, PreToolUse for validation or output filtering [9]:
const logFileChange: HookCallback = async (input) => {
const path = (input as any).tool_input?.file_path ?? "unknown";
await appendFile("audit.log", `${new Date().toISOString()}: ${path}\n`);
return {};
};
options: {
permissionMode: "acceptEdits",
hooks: { PostToolUse: [{ matcher: "Edit|Write", hooks: [logFileChange] }] }
}
A PreToolUse hook can filter a 10,000-line test log down to failure lines before it ever enters context — turning tens of thousands of tokens into hundreds [2].
Operating at scale
Cost baselines
Enterprise deployments average ~$13/dev/active day, ~$150–250/dev/month; 90% of users stay below $30/active day [2]. Model choice and context size drive most variance — Sonnet for teammates and subagents, Opus for architectural decisions.
Agent teams (parallel Claude instances) use ~7× more tokens than standard sessions. Keep spawn prompts focused; clean up teams when work completes [2].
Rate limit allocation
Concurrent users are a fraction of total, so per-user TPM decreases as org size grows [2]:
| Team size | TPM / user | RPM / user |
|---|---|---|
| 1–5 | 200k–300k | 5–7 |
| 5–20 | 100k–150k | 2.5–3.5 |
| 20–50 | 50k–75k | 1.25–1.75 |
| 50–100 | 25k–35k | 0.62–0.87 |
| 100–500 | 15k–20k | 0.37–0.47 |
| 500+ | 10k–15k | 0.25–0.35 |
⚠ Live training events (large groups online simultaneously) need temporary higher allocations — the table assumes normal daily concurrency patterns.
Spend controls
- API plans: workspace spend limits in the Claude Console; set a workspace rate limit to cap Claude Code’s share and protect other production workloads
- Pro/Max:
/usage-creditsmonthly cap via CLI - Bedrock/Vertex/Foundry: LiteLLM for per-key cost tracking (unaffiliated, not audited by Anthropic) [2]
/usagecommand shows 24h/7d token attribution per MCP server, skill, and subagent
Token reduction without removing capability
- Move CLAUDE.md procedures into Skills — CLAUDE.md loads unconditionally; Skills load on demand [2]. Keep CLAUDE.md under 200 lines.
- Prefer CLI tools (
gh,aws,gcloud) over MCP where available — no per-tool schema overhead [2]. - Filter verbose tool outputs in PreToolUse hooks before they reach context.
- Delegate log processing and test runs to subagents — verbose output stays isolated; only summary returns to main conversation.
- Always use
--barein CI to prevent local developer hooks and MCPs from bleeding into pipeline runs [8].
MCP production practices
The MCP servers registry ⭐ 87k [14] lists community implementations. For production deployments [13] [16]:
- Single domain per server — one clear purpose per MCP; no monolithic servers
- Strict schemas — typed inputs/outputs with documented side effects; schema drift is the primary cause of production tool failures
- Circuit breakers — graceful degradation when downstream APIs are unavailable
- Enterprise auth — OAuth 2.1 with re-authentication per tool call (not session-level token); route through an MCP Gateway for centralized policy enforcement and per-team audit trails [16]
- Targets [13]: >1,000 req/s per instance · P95 <100ms · error rate <0.1% · >99.9% uptime
The 2026 MCP roadmap priorities: stateless horizontal scaling via HTTP transport evolution and .well-known capability discovery; Tasks primitive for agent-to-agent communication; governance via Working Groups [12].
Debugging reference
| Symptom | First check | Fix |
|---|---|---|
| Skill never triggers automatically | description vague; missing “Use when…” phrase |
Rewrite with specific trigger phrases; verify /plugin |
| Skill not in slash-command list | Not in watched skill directories | Check ~/.claude/skills/, .claude/skills/ |
| MCP tool “not found” error | Short tool name instead of Server:tool_name |
Prefix with MCP server name |
| 50k+ startup token burn | Many MCPs without Tool Search deferral | Enable Tool Search; disable unused servers via /mcp |
| CI builds inconsistent vs local | Local hooks/MCPs bleeding into claude -p |
Add --bare flag |
| Skill content partially loaded | Deep reference nesting (A → B → C) |
Flatten all references to one level from SKILL.md |
| Agent SDK auth failure in CI | OAuth blocked in bare mode; API key missing | Set ANTHROPIC_API_KEY; pass via --settings |
| Elicitation request silently ignored | Client doesn’t support elicitation | Check clientCapabilities?.elicitation first |