Advanced Claude Code: Sub-Agents, MCP, Cost

Q: Multiple MCP servers at once?

Yes. List them all in .claude/mcp.json. Claude Code spawns them in parallel and namespaces tools to avoid name collisions.

Advanced Claude Code: Sub-Agents, MCP, and Cost Optimization

Author: CodeGateway team · Tested on May 2026

TL;DR: Getting Claude Code installed is easy. Getting Claude Code to operate at team-level productivity is the dividing line. This guide skips installation and configuration and goes straight to the advanced engineering practices: multi-agent orchestration, MCP server integration, advanced Hooks, prompt cache and tier markup strategy, and observability. Read it through and Claude Code stops being "an AI assistant" and starts being "an AI engineering team."

Multi-agent orchestration: from one agent to a team topology
MCP server integration: extending tools and data sources
Advanced Hooks: intercept, mutate, compose
Workflow automation: scripts, CI, scheduled jobs
Prompt cache and cost optimization
Tier markup and model routing strategy
Observability: usage, spend, errors
Team practices: standardization and key governance
Real scenarios: four high-leverage engineering patterns
FAQ
Related reading

Multi-agent orchestration: from one agent to a team topology

Why spawn sub-agents at all?

A single agent on a long task hits two intrinsic ceilings: context windows are finite (even at a million tokens, they aren't free), and one session struggles to wear multiple hats — architect, test author, doc writer — at once.

Sub-agents (/agents) bring three structural wins:

Context isolation. Each sub-agent only sees what it needs.
Parallelism. Independent work runs concurrently, cutting wall time by 1/N.
Role specialization. Each sub-agent can have its own Skills, model, and Hooks.

Four topologies that actually show up

1) Single-layer fan-out
   Main
   ├── sub-A: grep + enumerate change targets
   ├── sub-B: apply backend edits
   ├── sub-C: apply frontend edits
   └── sub-D: run all tests, collect failures

2) Two-stage pipeline
   Main → sub-Planner (design) → sub-Builder (apply) → sub-Verifier (verify)

3) GAN-style adversarial loop
   Main
   ├── sub-Generator: produce implementation
   └── sub-Critic: critique, suggest improvements
   loop N rounds until Critic score crosses threshold

4) Fan-out with fan-in
   Main
   ├── sub-FrontEnd
   ├── sub-Backend
   ├── sub-DataPipeline
   └── Main writes the release notes once everyone reports back

Four ways to invoke

text

# Inside an interactive session
/agents

# Tell the main agent in plain English
"Spawn one sub-agent to fix ESLint errors and another to compute test
coverage, run them in parallel"

# Inside a Skill template that bakes in scheduling
/skill multi-frontend

# Expose sub-agents as MCP tools so other systems can call them
(see next section)

Choosing models for sub-agents

Sub-agents get spawned often. Default them to Haiku. Keep Sonnet on the main coordinator. Reserve Opus for one-shot deep reasoning (architectural reviews, tricky migration plans), then stop.

MCP server integration: extending tools and data sources

What MCP is

Model Context Protocol (MCP) is Anthropic's open extension protocol for tool integrations. An MCP server exposes tools, resources, and prompts; Claude Code calls them inline during a session. With MCP, Claude Code stops being limited to "read/write files plus run shell" and gains:

Database connections (PostgreSQL, ClickHouse, internal BI)
Third-party SaaS (GitHub, Linear, Jira, Slack)
Internal knowledge bases (company wiki, product docs, customer lists)
External LLMs (specialist models, domain models, context augmentation)

Installing a public MCP server

GitHub MCP example:

bash

npm install -g @modelcontextprotocol/server-github

# .claude/mcp.json
{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_xxx"
      }
    }
  }
}

After Claude Code restarts you can ask "read the review on PR #123, list high-priority issues by milestone."

Writing a custom MCP server

A minimum-viable MCP server in Node + TypeScript SDK is roughly 30 lines:

import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const server = new Server(
  { name: "internal-billing", version: "0.1.0" },
  { capabilities: { tools: {} } }
);

server.setRequestHandler("tools/list", async () => ({
  tools: [
    {
      name: "get_customer_usage",
      description: "Query a customer's last-30-day token usage",
      inputSchema: {
        type: "object",
        properties: { customerId: { type: "string" } },
        required: ["customerId"],
      },
    },
  ],
}));

server.setRequestHandler("tools/call", async (req) => {
  const { name, arguments: args } = req.params;
  if (name === "get_customer_usage") {
    const usage = await queryUsage(args.customerId);
    return { content: [{ type: "text", text: JSON.stringify(usage) }] };
  }
});

await server.connect(new StdioServerTransport());

Where MCP fits

Scenario	Pattern
Team wiki search	Wrap as MCP, main agent retrieves on demand
Customer attribution	MCP + DB connection, query as needed
Cross-product release sync	MCP orchestrates GitHub / Slack / Linear
Internal LLM routing	MCP wraps specialist models, route by task

Advanced Hooks: intercept, mutate, compose

PreToolUse: mutate arguments

Hooks aren't only for blocking. Read the tool input from stdin, write back to stdout to mutate it:

bash

# Reject Edit calls that try to write outside the project directory
node -e "
let d='';process.stdin.on('data',c=>d+=c);
process.stdin.on('end',()=>{
  const i=JSON.parse(d);
  if(i.tool_input?.file_path && !i.tool_input.file_path.startsWith(process.cwd())){
    console.error('[Hook] cross-directory write rejected');
    process.exit(2);
  }
  console.log(d);
});
"

PostToolUse: chain quality gates

json

{
  "hooks": {
    "PostToolUse": [
      { "matcher": "Write|Edit", "command": "ruff check --fix \"$FILE_PATH\"" },
      { "matcher": "Write|Edit", "command": "ruff format \"$FILE_PATH\"" },
      { "matcher": "Write|Edit", "command": "mypy --quiet \"$FILE_PATH\" || true" }
    ]
  }
}

Order: format → lint → type-check. A failure feeds back to Claude Code which can attempt a fix.

Stop hook: end-of-session safety net

json

{
  "hooks": {
    "Stop": [
      { "command": "pnpm test --silent" },
      { "command": "pnpm build" },
      { "command": "git status --short" }
    ]
  }
}

A standing checkpoint before you walk away.

Skill + Hook composition

Skill defines the rule, Hook enforces. Example: "all Python files must have type hints":

Skill says so in plain language.
PostToolUse Hook runs mypy and surfaces the failure to Claude Code, which then retries.

Two layers turn a suggestion into an invariant.

Workflow automation: scripts, CI, scheduled jobs

Non-interactive: `--print`

bash

claude --print "Read docs/PRD.md, write task breakdown to docs/tasks.md"

Drops you out of the interactive shell and exits with the result. Perfect for scripts, Makefile, CI.

CI integration (GitHub Actions example)

yaml

name: AI Code Review
on:
  pull_request:
    branches: [main]

jobs:
  ai-review:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }
      - uses: actions/setup-node@v4
        with: { node-version: 20 }
      - run: npm install -g @anthropic-ai/claude-code
      - name: AI review
        env:
          ANTHROPIC_API_KEY: ${{ secrets.CODEGATEWAY_KEY }}
          ANTHROPIC_BASE_URL: https://api.codegateway.dev
        run: |
          claude --print "Review the current git diff with CRITICAL/HIGH/MEDIUM/LOW labels. Output markdown to review.md"
      - name: Comment on PR
        run: gh pr comment ${{ github.event.pull_request.number }} -F review.md

For CI specifically:

Issue a dedicated key (ci-<repo>).
Set RPM and monthly spend caps.
Default to Haiku — most CI tasks don't need Sonnet, save it for local.

Scheduled jobs

cron, GitHub Actions schedule, Linear automations all work — just call claude --print like any CLI. Common jobs:

Monday morning, attribute last week's failed tests, file tickets.
Nightly, scan dependabot proposals, evaluate upgrade risk.
Monthly, draft a changelog from merged PRs.

Prompt cache and cost optimization

How prompt cache works

Claude API supports explicit cache markers on cacheable prompt blocks (system prompts, long context documents). The first request writes to cache (slightly above normal input cost), and subsequent requests with the same prefix get input token cost dropped to ~10%. See the official Prompt caching docs for the full ruleset.

How Claude Code uses it

The Claude Code system prompt + long Skills + large file context are exactly the kinds of payloads worth caching. With prompt caching enabled at the gateway layer (CodeGateway or Anthropic direct), long sessions running with the same system + Skills get input cost down to 10–30% of full price. Real savings depend on session length.

Practical advice

Long Skills documents live in .claude/skills/ — high cache hit rate.
Don't bounce between working directories during one session — it invalidates cache.
Running claude --print in batch jobs? Put shared prompt material first, variables last, to maximize prefix hits.

Model × workload cost cheatsheet

Workload                          Model              Why
=========================================================
Architecture / DB migration       Opus              Single-shot deep reasoning
Daily refactor / generation       Sonnet            Balance + strong coding
Sub-agent / batch lint            Haiku             Cheap + sufficient
Tool-shaped tasks / OCR           Haiku             Cheap + fast
Long chain reasoning + cache      Sonnet + cache    Amortizes to near-zero

Tier markup and model routing strategy

CodeGateway tier markup recap

90-day cumulative spend	Markup
$0 – $10	1.5x
$10 – $50	1.4x
$50 – $200	1.3x
$200 – $500	1.2x
$500+	1.2x (floor, stable long-term)

New users start at 1.5x with a $2 starter credit ≈ 440K Sonnet 4.6 input tokens. Full breakdown in the tier markup explainer.

Driving the markup down

Markup is per-account, 90-day rolling. Sharing one account across many people doesn't move the needle (and creates other problems — see Team Practices). Two real paths:

Solo developer using consistently. $50–$100 monthly spend stabilizes at 1.2x within 3–6 months.
Team account with multiple keys. All keys roll up to the same 90-day window — faster path to floor.

Model routing as a Skill

Bake the decision into a Skill so Claude Code follows the rule automatically:

yaml

# .claude/skills/cost-aware-llm/SKILL.md
---
name: cost-aware-llm-pipeline
description: Default main agent to Sonnet, sub-agents to Haiku. Switch to Opus only for multi-service architecture, DB migrations, or hard debugging.
---

Once attached, Claude Code routes models per the rules instead of defaulting to "always Opus" and bleeding cost.

Observability: usage, spend, errors

Dashboard signals

CodeGateway → Overview / Logs. Worth watching:

Total Tokens card: today / 7d / 30d totals.
By model: is Opus over 10%? That deserves a closer look.
By key: any single key spiking?
Error rate: 4xx surges often correlate with config changes.
First-byte latency: spikes hint at link issues.

Log filtering

Logs supports time range (Today / 7d / 30d / 90d / All) plus filters by key, model, status code. Tracking down "yesterday's CI run that failed" takes about 30 seconds.

Self-hosted metrics

Want CodeGateway numbers in your own Grafana / Datadog?

Metrics API isn't public yet (on the roadmap).
Workaround: at the end of claude --print in CI, push the final usage to your metrics platform.

Team practices: standardization and key governance

Commit your config

.claude/settings.json, .claude/skills/, .claude/mcp.json, .claude/hooks/ all belong in the repo.

Never commit: API keys, env files with tokens.

.gitignore:

.claude/secrets.json
.claude/.env
.claude/cache/

Key governance

Use case	Naming	Limits	Owner
Personal dev machine	`dev-<name>-<host>`	None	Owner
Main repo CI	`ci-<repo>`	RPM 60	Platform
Docs / demos	`demo-<scenario>`	RPM 30, $20/mo cap	Platform
Customer demo (temporary)	`tmp-<customer>-<date>`	Set expiry	Sales

Annotate each key. Rotate in one second on leak or churn.

Pre-PR checklist as a Skill

- All tests pass
- Zero lint warnings
- Build succeeds
- Diff has no secrets / debug statements
- Reviewed for backwards-incompatible API changes

Bind it to a pre-pr-check Skill. Every team member's pre-PR pass runs the same checks.

Real scenarios: four high-leverage engineering patterns

Pattern 1: long-document summarization and extraction

A 30+ page legal contract, PRD, or research report. Main agent reads, then spawns sub-agents:

sub-Extractor: pull schema-shaped fields (amounts, dates, breach clauses).
sub-Risker: rate by risk dimension.
sub-Drafter: produce internal review doc.

Main agent finishes with a unified summary. Prompt cache hits aggressively because the long document sits in a cache block — multi-turn questions all settle at ~10% cost.

Pattern 2: cross-service refactor

userId → accountId across 6 repos and 30+ files.

sub-Scout: grep + enumerate.
sub-Editor-A/B/C: apply edits per repo.
sub-Tester: run tests, collect failures.
Main: coordinate failures, write PR description.

Wall time: ~30 minutes parallel vs 2–3 hours serial.

Pattern 3: incident root-cause analysis

Production alert. Push logs, screenshots, and relevant code into the main agent on Opus.

Main: form initial hypotheses.
sub-Validator: grep, read code, write a repro script.
sub-Fixer: produce a PR based on validated cause.
Main: write the incident review document.

Pattern 4: cross-language port

Go → Rust rewrite.

sub-Translator: file-by-file port.
sub-Tester: run generated tests, verify behavior parity.
sub-Reviewer: rewrite unidiomatic Rust into idiomatic Rust.
Main: handle cadence and commit boundaries.

FAQ

Q: Can sub-agents talk to each other directly?

A: No. The architecture is tree-shaped — sub-agents only talk to their parent. To "share state," go through the filesystem or an MCP-exposed intermediate store.

Q: Do MCP servers slow the session down?

A: Depends on the implementation. Lightweight ones (GitHub API, local file search) are tens of milliseconds. Heavy ones (run an ML model, query a huge DB) extend tool-call wait time. Add timeouts to slow tools and surface that via a PreToolUse Hook.

Q: Is prompt caching enabled on CodeGateway?

A: Yes. Any client going through the gateway gets transparent cache header propagation and the matching billing discount. Discounts follow Anthropic's official rules.

Q: Is sharing a CI key safe across the team?

A: Per-repo with strict RPM and monthly spend caps, marginally. Across many teams and repos, it's an antipattern. One key per repo with failure / anomaly alerts on the key dimension.

Q: Is Opus really 5x more expensive than Sonnet?

A: Roughly, per Anthropic's pricing page. But for tasks where Sonnet needs three iterations + redoes (multi-service architecture, complex DB migrations), one Opus pass can be cheaper. Decision rule: is the task ≥ 70% reasoning? If yes, Opus.

Q: What happens when a Hook fails?

A: PreToolUse exit code 2 blocks the call; Claude Code sees the error and adjusts. PostToolUse failures don't block by default but the error feeds into context. Stop Hook failures don't block exit.

Q: Multiple MCP servers at once?

A: Yes. List them all in .claude/mcp.json. Claude Code spawns them in parallel and namespaces tools to avoid name collisions.

Q: What if a long task gets interrupted halfway?

A: Claude Code preserves session state by default. Re-launching claude resumes from the last checkpoint. Pair with a git commit per sub-task and resumption is essentially lossless. If you hit the 10-minute total-response ceiling at the gateway layer, see the connection timeout guide.

The complete Claude Code configuration guide — beginner to intermediate
Claude Code connection timeout troubleshooting
Claude Code 5-minute setup
Top-up and billing guide
Tier markup explainer
Anthropic — Prompt caching
Anthropic — Sub-agents
Anthropic — Model Context Protocol
Cloudflare — Workers runtime APIs

Going advanced isn't piling on features — it's matching workloads to patterns. Get the six clusters above (multi-agent / MCP / advanced Hooks / workflow automation / cache + tier / observability) into muscle memory and Claude Code crosses the line from "AI assistant" to "AI engineering team."

Advanced Claude Code: Sub-Agents, MCP, and Cost Optimization

Advanced Claude Code: Sub-Agents, MCP, and Cost Optimization

Table of Contents

Multi-agent orchestration: from one agent to a team topology

Why spawn sub-agents at all?

Four topologies that actually show up

Four ways to invoke

Choosing models for sub-agents

MCP server integration: extending tools and data sources

What MCP is

Installing a public MCP server

Writing a custom MCP server

Where MCP fits

Advanced Hooks: intercept, mutate, compose

PreToolUse: mutate arguments

PostToolUse: chain quality gates

Stop hook: end-of-session safety net

Skill + Hook composition

Workflow automation: scripts, CI, scheduled jobs

Non-interactive: --print

CI integration (GitHub Actions example)

Scheduled jobs

Prompt cache and cost optimization

How prompt cache works

How Claude Code uses it

Practical advice

Model × workload cost cheatsheet

Tier markup and model routing strategy

CodeGateway tier markup recap

Driving the markup down

Model routing as a Skill

Observability: usage, spend, errors

Dashboard signals

Log filtering

Self-hosted metrics

Team practices: standardization and key governance

Commit your config

Key governance

Pre-PR checklist as a Skill

Real scenarios: four high-leverage engineering patterns

Pattern 1: long-document summarization and extraction

Pattern 2: cross-service refactor

Pattern 3: incident root-cause analysis

Pattern 4: cross-language port

FAQ

Related reading

Non-interactive: `--print`