← Back to Blog
Claude CodeAI 编程CodeGateway

Advanced Claude Code: Sub-Agents, MCP, and Cost Optimization

May 7, 2026
299-hero-169.jpg cover

Advanced Claude Code: Sub-Agents, MCP, and Cost Optimization

Author: CodeGateway team · Tested on May 2026

TL;DR: Getting Claude Code installed is easy. Getting Claude Code to operate at team-level productivity is the dividing line. This guide skips installation and configuration and goes straight to the advanced engineering practices: multi-agent orchestration, MCP server integration, advanced Hooks, prompt cache and tier markup strategy, and observability. Read it through and Claude Code stops being "an AI assistant" and starts being "an AI engineering team."

Table of Contents

  1. Multi-agent orchestration: from one agent to a team topology
  2. MCP server integration: extending tools and data sources
  3. Advanced Hooks: intercept, mutate, compose
  4. Workflow automation: scripts, CI, scheduled jobs
  5. Prompt cache and cost optimization
  6. Tier markup and model routing strategy
  7. Observability: usage, spend, errors
  8. Team practices: standardization and key governance
  9. Real scenarios: four high-leverage engineering patterns
  10. FAQ
  11. Related reading

Multi-agent orchestration: from one agent to a team topology

Why spawn sub-agents at all?

A single agent on a long task hits two intrinsic ceilings: context windows are finite (even at a million tokens, they aren't free), and one session struggles to wear multiple hats — architect, test author, doc writer — at once.

Sub-agents (/agents) bring three structural wins:

  • Context isolation. Each sub-agent only sees what it needs.
  • Parallelism. Independent work runs concurrently, cutting wall time by 1/N.
  • Role specialization. Each sub-agent can have its own Skills, model, and Hooks.

Four topologies that actually show up

1) Single-layer fan-out
Main
├── sub-A: grep + enumerate change targets
├── sub-B: apply backend edits
├── sub-C: apply frontend edits
└── sub-D: run all tests, collect failures

2) Two-stage pipeline
Main → sub-Planner (design) → sub-Builder (apply) → sub-Verifier (verify)

3) GAN-style adversarial loop
Main
├── sub-Generator: produce implementation
└── sub-Critic: critique, suggest improvements
loop N rounds until Critic score crosses threshold

4) Fan-out with fan-in
Main
├── sub-FrontEnd
├── sub-Backend
├── sub-DataPipeline
└── Main writes the release notes once everyone reports back

Four ways to invoke

text
# Inside an interactive session
/agents

# Tell the main agent in plain English
"Spawn one sub-agent to fix ESLint errors and another to compute test
coverage, run them in parallel"

# Inside a Skill template that bakes in scheduling
/skill multi-frontend

# Expose sub-agents as MCP tools so other systems can call them
(see next section)

Choosing models for sub-agents

Sub-agents get spawned often. Default them to Haiku. Keep Sonnet on the main coordinator. Reserve Opus for one-shot deep reasoning (architectural reviews, tricky migration plans), then stop.

MCP server integration: extending tools and data sources

What MCP is

Model Context Protocol (MCP) is Anthropic's open extension protocol for tool integrations. An MCP server exposes tools, resources, and prompts; Claude Code calls them inline during a session. With MCP, Claude Code stops being limited to "read/write files plus run shell" and gains:

  • Database connections (PostgreSQL, ClickHouse, internal BI)
  • Third-party SaaS (GitHub, Linear, Jira, Slack)
  • Internal knowledge bases (company wiki, product docs, customer lists)
  • External LLMs (specialist models, domain models, context augmentation)

Installing a public MCP server

GitHub MCP example:

bash
npm install -g @modelcontextprotocol/server-github

# .claude/mcp.json
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {
"GITHUB_PERSONAL_ACCESS_TOKEN": "ghp_xxx"
}
}
}
}

After Claude Code restarts you can ask "read the review on PR #123, list high-priority issues by milestone."

Writing a custom MCP server

A minimum-viable MCP server in Node + TypeScript SDK is roughly 30 lines:

ts
import { Server } from "@modelcontextprotocol/sdk/server/index.js";
import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js";

const server = new Server(
{ name: "internal-billing", version: "0.1.0" },
{ capabilities: { tools: {} } }
);

server.setRequestHandler("tools/list", async () => ({
tools: [
{
name: "get_customer_usage",
description: "Query a customer's last-30-day token usage",
inputSchema: {
type: "object",
properties: { customerId: { type: "string" } },
required: ["customerId"],
},
},
],
}));

server.setRequestHandler("tools/call", async (req) => {
const { name, arguments: args } = req.params;
if (name === "get_customer_usage") {
const usage = await queryUsage(args.customerId);
return { content: [{ type: "text", text: JSON.stringify(usage) }] };
}
});

await server.connect(new StdioServerTransport());

Register it in .claude/mcp.json and your customer-support session can read your internal BI directly.

Where MCP fits

Scenario

Pattern

Team wiki search

Wrap as MCP, main agent retrieves on demand

Customer attribution

MCP + DB connection, query as needed

Cross-product release sync

MCP orchestrates GitHub / Slack / Linear

Internal LLM routing

MCP wraps specialist models, route by task

Advanced Hooks: intercept, mutate, compose

PreToolUse: mutate arguments

Hooks aren't only for blocking. Read the tool input from stdin, write back to stdout to mutate it:

bash
# Reject Edit calls that try to write outside the project directory
node -e "
let d='';process.stdin.on('data',c=>d+=c);
process.stdin.on('end',()=>{
const i=JSON.parse(d);
if(i.tool_input?.file_path && !i.tool_input.file_path.startsWith(process.cwd())){
console.error('[Hook] cross-directory write rejected');
process.exit(2);
}
console.log(d);
});
"

PostToolUse: chain quality gates

json
{
"hooks": {
"PostToolUse": [
{ "matcher": "Write|Edit", "command": "ruff check --fix \"$FILE_PATH\"" },
{ "matcher": "Write|Edit", "command": "ruff format \"$FILE_PATH\"" },
{ "matcher": "Write|Edit", "command": "mypy --quiet \"$FILE_PATH\" || true" }
]
}
}

Order: format → lint → type-check. A failure feeds back to Claude Code which can attempt a fix.

Stop hook: end-of-session safety net

json
{
"hooks": {
"Stop": [
{ "command": "pnpm test --silent" },
{ "command": "pnpm build" },
{ "command": "git status --short" }
]
}
}

A standing checkpoint before you walk away.

Skill + Hook composition

Skill defines the rule, Hook enforces. Example: "all Python files must have type hints":

  • Skill says so in plain language.
  • PostToolUse Hook runs mypy and surfaces the failure to Claude Code, which then retries.

Two layers turn a suggestion into an invariant.

Workflow automation: scripts, CI, scheduled jobs

Non-interactive: --print

bash
claude --print "Read docs/PRD.md, write task breakdown to docs/tasks.md"

Drops you out of the interactive shell and exits with the result. Perfect for scripts, Makefile, CI.

CI integration (GitHub Actions example)

yaml
name: AI Code Review
on:
pull_request:
branches: [main]

jobs:
ai-review:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: actions/setup-node@v4
with: { node-version: 20 }
- run: npm install -g @anthropic-ai/claude-code
- name: AI review
env:
ANTHROPIC_API_KEY: ${{ secrets.CODEGATEWAY_KEY }}
ANTHROPIC_BASE_URL: https://api.codegateway.dev
run: |
claude --print "Review the current git diff with CRITICAL/HIGH/MEDIUM/LOW labels. Output markdown to review.md"
- name: Comment on PR
run: gh pr comment ${{ github.event.pull_request.number }} -F review.md

For CI specifically:

  • Issue a dedicated key (ci-<repo>).
  • Set RPM and monthly spend caps.
  • Default to Haiku — most CI tasks don't need Sonnet, save it for local.

Scheduled jobs

cron, GitHub Actions schedule, Linear automations all work — just call claude --print like any CLI. Common jobs:

  • Monday morning, attribute last week's failed tests, file tickets.
  • Nightly, scan dependabot proposals, evaluate upgrade risk.
  • Monthly, draft a changelog from merged PRs.

Prompt cache and cost optimization

How prompt cache works

Claude API supports explicit cache markers on cacheable prompt blocks (system prompts, long context documents). The first request writes to cache (slightly above normal input cost), and subsequent requests with the same prefix get input token cost dropped to ~10%. See the official Prompt caching docs for the full ruleset.

How Claude Code uses it

The Claude Code system prompt + long Skills + large file context are exactly the kinds of payloads worth caching. With prompt caching enabled at the gateway layer (CodeGateway or Anthropic direct), long sessions running with the same system + Skills get input cost down to 10–30% of full price. Real savings depend on session length.

Practical advice

  • Long Skills documents live in .claude/skills/ — high cache hit rate.
  • Don't bounce between working directories during one session — it invalidates cache.
  • Running claude --print in batch jobs? Put shared prompt material first, variables last, to maximize prefix hits.

Model × workload cost cheatsheet

Workload                          Model              Why
=========================================================
Architecture / DB migration Opus Single-shot deep reasoning
Daily refactor / generation Sonnet Balance + strong coding
Sub-agent / batch lint Haiku Cheap + sufficient
Tool-shaped tasks / OCR Haiku Cheap + fast
Long chain reasoning + cache Sonnet + cache Amortizes to near-zero

Tier markup and model routing strategy

CodeGateway tier markup recap

90-day cumulative spend

Markup

$0 – $10

1.5x

$10 – $50

1.4x

$50 – $200

1.3x

$200 – $500

1.2x

$500+

1.2x (floor, stable long-term)

New users start at 1.5x with a $2 starter credit ≈ 440K Sonnet 4.6 input tokens. Full breakdown in the tier markup explainer.

Driving the markup down

Markup is per-account, 90-day rolling. Sharing one account across many people doesn't move the needle (and creates other problems — see Team Practices). Two real paths:

  1. Solo developer using consistently. $50–$100 monthly spend stabilizes at 1.2x within 3–6 months.
  2. Team account with multiple keys. All keys roll up to the same 90-day window — faster path to floor.

Model routing as a Skill

Bake the decision into a Skill so Claude Code follows the rule automatically:

yaml
# .claude/skills/cost-aware-llm/SKILL.md
---
name: cost-aware-llm-pipeline
description: Default main agent to Sonnet, sub-agents to Haiku. Switch to Opus only for multi-service architecture, DB migrations, or hard debugging.
---

Once attached, Claude Code routes models per the rules instead of defaulting to "always Opus" and bleeding cost.

Observability: usage, spend, errors

Dashboard signals

CodeGateway → Overview / Logs. Worth watching:

  • Total Tokens card: today / 7d / 30d totals.
  • By model: is Opus over 10%? That deserves a closer look.
  • By key: any single key spiking?
  • Error rate: 4xx surges often correlate with config changes.
  • First-byte latency: spikes hint at link issues.

Log filtering

Logs supports time range (Today / 7d / 30d / 90d / All) plus filters by key, model, status code. Tracking down "yesterday's CI run that failed" takes about 30 seconds.

Self-hosted metrics

Want CodeGateway numbers in your own Grafana / Datadog?

  • Metrics API isn't public yet (on the roadmap).
  • Workaround: at the end of claude --print in CI, push the final usage to your metrics platform.

Team practices: standardization and key governance

Commit your config

.claude/settings.json, .claude/skills/, .claude/mcp.json, .claude/hooks/ all belong in the repo.

Never commit: API keys, env files with tokens.

.gitignore:

.claude/secrets.json
.claude/.env
.claude/cache/

Key governance

Use case

Naming

Limits

Owner

Personal dev machine

dev-<name>-<host>

None

Owner

Main repo CI

ci-<repo>

RPM 60

Platform

Docs / demos

demo-<scenario>

RPM 30, $20/mo cap

Platform

Customer demo (temporary)

tmp-<customer>-<date>

Set expiry

Sales

Annotate each key. Rotate in one second on leak or churn.

Pre-PR checklist as a Skill

- All tests pass
- Zero lint warnings
- Build succeeds
- Diff has no secrets / debug statements
- Reviewed for backwards-incompatible API changes

Bind it to a pre-pr-check Skill. Every team member's pre-PR pass runs the same checks.

Real scenarios: four high-leverage engineering patterns

Pattern 1: long-document summarization and extraction

A 30+ page legal contract, PRD, or research report. Main agent reads, then spawns sub-agents:

  • sub-Extractor: pull schema-shaped fields (amounts, dates, breach clauses).
  • sub-Risker: rate by risk dimension.
  • sub-Drafter: produce internal review doc.

Main agent finishes with a unified summary. Prompt cache hits aggressively because the long document sits in a cache block — multi-turn questions all settle at ~10% cost.

Pattern 2: cross-service refactor

userIdaccountId across 6 repos and 30+ files.

  • sub-Scout: grep + enumerate.
  • sub-Editor-A/B/C: apply edits per repo.
  • sub-Tester: run tests, collect failures.
  • Main: coordinate failures, write PR description.

Wall time: ~30 minutes parallel vs 2–3 hours serial.

Pattern 3: incident root-cause analysis

Production alert. Push logs, screenshots, and relevant code into the main agent on Opus.

  • Main: form initial hypotheses.
  • sub-Validator: grep, read code, write a repro script.
  • sub-Fixer: produce a PR based on validated cause.
  • Main: write the incident review document.

Pattern 4: cross-language port

Go → Rust rewrite.

  • sub-Translator: file-by-file port.
  • sub-Tester: run generated tests, verify behavior parity.
  • sub-Reviewer: rewrite unidiomatic Rust into idiomatic Rust.
  • Main: handle cadence and commit boundaries.

FAQ

Q: Can sub-agents talk to each other directly?

A: No. The architecture is tree-shaped — sub-agents only talk to their parent. To "share state," go through the filesystem or an MCP-exposed intermediate store.

Q: Do MCP servers slow the session down?

A: Depends on the implementation. Lightweight ones (GitHub API, local file search) are tens of milliseconds. Heavy ones (run an ML model, query a huge DB) extend tool-call wait time. Add timeouts to slow tools and surface that via a PreToolUse Hook.

Q: Is prompt caching enabled on CodeGateway?

A: Yes. Any client going through the gateway gets transparent cache header propagation and the matching billing discount. Discounts follow Anthropic's official rules.

Q: Is sharing a CI key safe across the team?

A: Per-repo with strict RPM and monthly spend caps, marginally. Across many teams and repos, it's an antipattern. One key per repo with failure / anomaly alerts on the key dimension.

Q: Is Opus really 5x more expensive than Sonnet?

A: Roughly, per Anthropic's pricing page. But for tasks where Sonnet needs three iterations + redoes (multi-service architecture, complex DB migrations), one Opus pass can be cheaper. Decision rule: is the task ≥ 70% reasoning? If yes, Opus.

Q: What happens when a Hook fails?

A: PreToolUse exit code 2 blocks the call; Claude Code sees the error and adjusts. PostToolUse failures don't block by default but the error feeds into context. Stop Hook failures don't block exit.

Q: Multiple MCP servers at once?

A: Yes. List them all in .claude/mcp.json. Claude Code spawns them in parallel and namespaces tools to avoid name collisions.

Q: What if a long task gets interrupted halfway?

A: Claude Code preserves session state by default. Re-launching claude resumes from the last checkpoint. Pair with a git commit per sub-task and resumption is essentially lossless. If you hit the 10-minute total-response ceiling at the gateway layer, see the connection timeout guide.

Going advanced isn't piling on features — it's matching workloads to patterns. Get the six clusters above (multi-agent / MCP / advanced Hooks / workflow automation / cache + tier / observability) into muscle memory and Claude Code crosses the line from "AI assistant" to "AI engineering team."