Advanced Claude Code: Sub-Agents, MCP, and Cost Optimization
Author: CodeGateway team · Tested on May 2026
TL;DR: Getting Claude Code installed is easy. Getting Claude Code to operate at team-level productivity is the dividing line. This guide skips installation and configuration and goes straight to the advanced engineering practices: multi-agent orchestration, MCP server integration, advanced Hooks, prompt cache and tier markup strategy, and observability. Read it through and Claude Code stops being "an AI assistant" and starts being "an AI engineering team."
Table of Contents
- Multi-agent orchestration: from one agent to a team topology
- MCP server integration: extending tools and data sources
- Advanced Hooks: intercept, mutate, compose
- Workflow automation: scripts, CI, scheduled jobs
- Prompt cache and cost optimization
- Tier markup and model routing strategy
- Observability: usage, spend, errors
- Team practices: standardization and key governance
- Real scenarios: four high-leverage engineering patterns
- FAQ
- Related reading
Multi-agent orchestration: from one agent to a team topology
Why spawn sub-agents at all?
A single agent on a long task hits two intrinsic ceilings: context windows are finite (even at a million tokens, they aren't free), and one session struggles to wear multiple hats — architect, test author, doc writer — at once.
Sub-agents (/agents) bring three structural wins:
- Context isolation. Each sub-agent only sees what it needs.
- Parallelism. Independent work runs concurrently, cutting wall time by 1/N.
- Role specialization. Each sub-agent can have its own Skills, model, and Hooks.
Four topologies that actually show up
1) Single-layer fan-out
Main
├── sub-A: grep + enumerate change targets
├── sub-B: apply backend edits
├── sub-C: apply frontend edits
└── sub-D: run all tests, collect failures
2) Two-stage pipeline
Main → sub-Planner (design) → sub-Builder (apply) → sub-Verifier (verify)
3) GAN-style adversarial loop
Main
├── sub-Generator: produce implementation
└── sub-Critic: critique, suggest improvements
loop N rounds until Critic score crosses threshold
4) Fan-out with fan-in
Main
├── sub-FrontEnd
├── sub-Backend
├── sub-DataPipeline
└── Main writes the release notes once everyone reports backFour ways to invoke
# Inside an interactive session
/agents
# Tell the main agent in plain English
"Spawn one sub-agent to fix ESLint errors and another to compute test
coverage, run them in parallel"
# Inside a Skill template that bakes in scheduling
/skill multi-frontend
# Expose sub-agents as MCP tools so other systems can call them
(see next section)Choosing models for sub-agents
Sub-agents get spawned often. Default them to Haiku. Keep Sonnet on the main coordinator. Reserve Opus for one-shot deep reasoning (architectural reviews, tricky migration plans), then stop.
MCP server integration: extending tools and data sources
What MCP is
Model Context Protocol (MCP) is Anthropic's open extension protocol for tool integrations. An MCP server exposes tools, resources, and prompts; Claude Code calls them inline during a session. With MCP, Claude Code stops being limited to "read/write files plus run shell" and gains:
- Database connections (PostgreSQL, ClickHouse, internal BI)
- Third-party SaaS (GitHub, Linear, Jira, Slack)
- Internal knowledge bases (company wiki, product docs, customer lists)
- External LLMs (specialist models, domain models, context augmentation)
Installing a public MCP server
GitHub MCP example:
npm install -g @modelcontextprotocol/server-github
# .claude/mcp.json
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_xxx"
}
}
}
}After Claude Code restarts you can ask "read the review on PR #123, list high-priority issues by milestone."
Writing a custom MCP server
A minimum-viable MCP server in Node + TypeScript SDK is roughly 30 lines:
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";
const server = new Server(
{ name: "internal-billing", version: "0.1.0" },
{ capabilities: { tools: {} } }
);
server.setRequestHandler("tools/list", async () => ({
tools: [
{
name: "get_customer_usage",
description: "Query a customer's last-30-day token usage",
inputSchema: {
type: "object",
properties: { customerId: { type: "string" } },
required: ["customerId"],
},
},
],
}));
server.setRequestHandler("tools/call", async (req) => {
const { name, arguments: args } = req.params;
if (name === "get_customer_usage") {
const usage = await queryUsage(args.customerId);
return { content: [{ type: "text", text: JSON.stringify(usage) }] };
}
});
await server.connect(new StdioServerTransport());Register it in .claude/mcp.json and your customer-support session can read your internal BI directly.
Where MCP fits
Scenario | Pattern |
|---|---|
Team wiki search | Wrap as MCP, main agent retrieves on demand |
Customer attribution | MCP + DB connection, query as needed |
Cross-product release sync | MCP orchestrates GitHub / Slack / Linear |
Internal LLM routing | MCP wraps specialist models, route by task |
Advanced Hooks: intercept, mutate, compose
PreToolUse: mutate arguments
Hooks aren't only for blocking. Read the tool input from stdin, write back to stdout to mutate it:
# Reject Edit calls that try to write outside the project directory
node -e "
let d='';process.stdin.on('data',c=>d+=c);
process.stdin.on('end',()=>{
const i=JSON.parse(d);
if(i.tool_input?.file_path && !i.tool_input.file_path.startsWith(process.cwd())){
console.error('[Hook] cross-directory write rejected');
process.exit(2);
}
console.log(d);
});
"PostToolUse: chain quality gates
{
"hooks": {
"PostToolUse": [
{ "matcher": "Write|Edit", "command": "ruff check --fix \"$FILE_PATH\"" },
{ "matcher": "Write|Edit", "command": "ruff format \"$FILE_PATH\"" },
{ "matcher": "Write|Edit", "command": "mypy --quiet \"$FILE_PATH\" || true" }
]
}
}Order: format → lint → type-check. A failure feeds back to Claude Code which can attempt a fix.
Stop hook: end-of-session safety net
{
"hooks": {
"Stop": [
{ "command": "pnpm test --silent" },
{ "command": "pnpm build" },
{ "command": "git status --short" }
]
}
}A standing checkpoint before you walk away.
Skill + Hook composition
Skill defines the rule, Hook enforces. Example: "all Python files must have type hints":
- Skill says so in plain language.
- PostToolUse Hook runs
mypyand surfaces the failure to Claude Code, which then retries.
Two layers turn a suggestion into an invariant.
Workflow automation: scripts, CI, scheduled jobs
Non-interactive: --print
claude --print "Read docs/PRD.md, write task breakdown to docs/tasks.md"Drops you out of the interactive shell and exits with the result. Perfect for scripts, Makefile, CI.
CI integration (GitHub Actions example)
name: AI Code Review
on:
pull_request:
branches: [main]
jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: actions/setup-node@v4
with: { node-version: 20 }
- run: npm install -g @anthropic-ai/claude-code
- name: AI review
env:
ANTHROPIC_API_KEY: ${{ secrets.CODEGATEWAY_KEY }}
ANTHROPIC_BASE_URL: https://api.codegateway.dev
run: |
claude --print "Review the current git diff with CRITICAL/HIGH/MEDIUM/LOW labels. Output markdown to review.md"
- name: Comment on PR
run: gh pr comment ${{ github.event.pull_request.number }} -F review.mdFor CI specifically:
- Issue a dedicated key (
ci-<repo>). - Set RPM and monthly spend caps.
- Default to Haiku — most CI tasks don't need Sonnet, save it for local.
Scheduled jobs
cron, GitHub Actions schedule, Linear automations all work — just call claude --print like any CLI. Common jobs:
- Monday morning, attribute last week's failed tests, file tickets.
- Nightly, scan dependabot proposals, evaluate upgrade risk.
- Monthly, draft a changelog from merged PRs.
Prompt cache and cost optimization
How prompt cache works
Claude API supports explicit cache markers on cacheable prompt blocks (system prompts, long context documents). The first request writes to cache (slightly above normal input cost), and subsequent requests with the same prefix get input token cost dropped to ~10%. See the official Prompt caching docs for the full ruleset.
How Claude Code uses it
The Claude Code system prompt + long Skills + large file context are exactly the kinds of payloads worth caching. With prompt caching enabled at the gateway layer (CodeGateway or Anthropic direct), long sessions running with the same system + Skills get input cost down to 10–30% of full price. Real savings depend on session length.
Practical advice
- Long Skills documents live in
.claude/skills/— high cache hit rate. - Don't bounce between working directories during one session — it invalidates cache.
- Running
claude --printin batch jobs? Put shared prompt material first, variables last, to maximize prefix hits.
Model × workload cost cheatsheet
Workload Model Why
=========================================================
Architecture / DB migration Opus Single-shot deep reasoning
Daily refactor / generation Sonnet Balance + strong coding
Sub-agent / batch lint Haiku Cheap + sufficient
Tool-shaped tasks / OCR Haiku Cheap + fast
Long chain reasoning + cache Sonnet + cache Amortizes to near-zeroTier markup and model routing strategy
CodeGateway tier markup recap
90-day cumulative spend | Markup |
|---|---|
$0 – $10 | 1.5x |
$10 – $50 | 1.4x |
$50 – $200 | 1.3x |
$200 – $500 | 1.2x |
$500+ | 1.2x (floor, stable long-term) |
New users start at 1.5x with a $2 starter credit ≈ 440K Sonnet 4.6 input tokens. Full breakdown in the tier markup explainer.
Driving the markup down
Markup is per-account, 90-day rolling. Sharing one account across many people doesn't move the needle (and creates other problems — see Team Practices). Two real paths:
- Solo developer using consistently. $50–$100 monthly spend stabilizes at 1.2x within 3–6 months.
- Team account with multiple keys. All keys roll up to the same 90-day window — faster path to floor.
Model routing as a Skill
Bake the decision into a Skill so Claude Code follows the rule automatically:
# .claude/skills/cost-aware-llm/SKILL.md
---
name: cost-aware-llm-pipeline
description: Default main agent to Sonnet, sub-agents to Haiku. Switch to Opus only for multi-service architecture, DB migrations, or hard debugging.
---Once attached, Claude Code routes models per the rules instead of defaulting to "always Opus" and bleeding cost.
Observability: usage, spend, errors
Dashboard signals
CodeGateway → Overview / Logs. Worth watching:
- Total Tokens card: today / 7d / 30d totals.
- By model: is Opus over 10%? That deserves a closer look.
- By key: any single key spiking?
- Error rate: 4xx surges often correlate with config changes.
- First-byte latency: spikes hint at link issues.
Log filtering
Logs supports time range (Today / 7d / 30d / 90d / All) plus filters by key, model, status code. Tracking down "yesterday's CI run that failed" takes about 30 seconds.
Self-hosted metrics
Want CodeGateway numbers in your own Grafana / Datadog?
- Metrics API isn't public yet (on the roadmap).
- Workaround: at the end of
claude --printin CI, push the final usage to your metrics platform.
Team practices: standardization and key governance
Commit your config
.claude/settings.json, .claude/skills/, .claude/mcp.json, .claude/hooks/ all belong in the repo.
Never commit: API keys, env files with tokens.
.gitignore:
.claude/secrets.json
.claude/.env
.claude/cache/Key governance
Use case | Naming | Limits | Owner |
|---|---|---|---|
Personal dev machine |
| None | Owner |
Main repo CI |
| RPM 60 | Platform |
Docs / demos |
| RPM 30, $20/mo cap | Platform |
Customer demo (temporary) |
| Set expiry | Sales |
Annotate each key. Rotate in one second on leak or churn.
Pre-PR checklist as a Skill
- All tests pass
- Zero lint warnings
- Build succeeds
- Diff has no secrets / debug statements
- Reviewed for backwards-incompatible API changesBind it to a pre-pr-check Skill. Every team member's pre-PR pass runs the same checks.
Real scenarios: four high-leverage engineering patterns
Pattern 1: long-document summarization and extraction
A 30+ page legal contract, PRD, or research report. Main agent reads, then spawns sub-agents:
- sub-Extractor: pull schema-shaped fields (amounts, dates, breach clauses).
- sub-Risker: rate by risk dimension.
- sub-Drafter: produce internal review doc.
Main agent finishes with a unified summary. Prompt cache hits aggressively because the long document sits in a cache block — multi-turn questions all settle at ~10% cost.
Pattern 2: cross-service refactor
userId → accountId across 6 repos and 30+ files.
- sub-Scout: grep + enumerate.
- sub-Editor-A/B/C: apply edits per repo.
- sub-Tester: run tests, collect failures.
- Main: coordinate failures, write PR description.
Wall time: ~30 minutes parallel vs 2–3 hours serial.
Pattern 3: incident root-cause analysis
Production alert. Push logs, screenshots, and relevant code into the main agent on Opus.
- Main: form initial hypotheses.
- sub-Validator: grep, read code, write a repro script.
- sub-Fixer: produce a PR based on validated cause.
- Main: write the incident review document.
Pattern 4: cross-language port
Go → Rust rewrite.
- sub-Translator: file-by-file port.
- sub-Tester: run generated tests, verify behavior parity.
- sub-Reviewer: rewrite unidiomatic Rust into idiomatic Rust.
- Main: handle cadence and commit boundaries.
FAQ
Q: Can sub-agents talk to each other directly?
A: No. The architecture is tree-shaped — sub-agents only talk to their parent. To "share state," go through the filesystem or an MCP-exposed intermediate store.
Q: Do MCP servers slow the session down?
A: Depends on the implementation. Lightweight ones (GitHub API, local file search) are tens of milliseconds. Heavy ones (run an ML model, query a huge DB) extend tool-call wait time. Add timeouts to slow tools and surface that via a PreToolUse Hook.
Q: Is prompt caching enabled on CodeGateway?
A: Yes. Any client going through the gateway gets transparent cache header propagation and the matching billing discount. Discounts follow Anthropic's official rules.
Q: Is sharing a CI key safe across the team?
A: Per-repo with strict RPM and monthly spend caps, marginally. Across many teams and repos, it's an antipattern. One key per repo with failure / anomaly alerts on the key dimension.
Q: Is Opus really 5x more expensive than Sonnet?
A: Roughly, per Anthropic's pricing page. But for tasks where Sonnet needs three iterations + redoes (multi-service architecture, complex DB migrations), one Opus pass can be cheaper. Decision rule: is the task ≥ 70% reasoning? If yes, Opus.
Q: What happens when a Hook fails?
A: PreToolUse exit code 2 blocks the call; Claude Code sees the error and adjusts. PostToolUse failures don't block by default but the error feeds into context. Stop Hook failures don't block exit.
Q: Multiple MCP servers at once?
A: Yes. List them all in .claude/mcp.json. Claude Code spawns them in parallel and namespaces tools to avoid name collisions.
Q: What if a long task gets interrupted halfway?
A: Claude Code preserves session state by default. Re-launching claude resumes from the last checkpoint. Pair with a git commit per sub-task and resumption is essentially lossless. If you hit the 10-minute total-response ceiling at the gateway layer, see the connection timeout guide.
Related reading
- The complete Claude Code configuration guide — beginner to intermediate
- Claude Code connection timeout troubleshooting
- Claude Code 5-minute setup
- Top-up and billing guide
- Tier markup explainer
- Anthropic — Prompt caching
- Anthropic — Sub-agents
- Anthropic — Model Context Protocol
- Cloudflare — Workers runtime APIs
Going advanced isn't piling on features — it's matching workloads to patterns. Get the six clusters above (multi-agent / MCP / advanced Hooks / workflow automation / cache + tier / observability) into muscle memory and Claude Code crosses the line from "AI assistant" to "AI engineering team."
