← Back to Blog
Codex CLICodeGateway

OpenAI Codex CLI Complete Reference 2026: Docs, MCP, Subagents, Cloud

May 16, 2026
OpenAI Codex CLI Complete Reference 2026: Docs, MCP, Subagents, Cloud

TL;DR: OpenAI Codex CLI in 2026 is no longer a chat-with-your-terminal toy. With GPT-5.5 as the default backing model, native subagents, MCP server support, auto-review, hooks, and remote Codex Cloud tasks, it has matured into the same class of agentic coding surface as Claude Code. This guide walks through the entire 2026 surface — install, model selection, every advanced feature — and shows how to wire it through CodeGateway when the official endpoint is unreachable from your region.


Table of contents

  1. Why Codex CLI matters in 2026
  2. Install in 90 seconds
  3. The model picker
  4. Reasoning effort
  5. Subagents
  6. MCP servers
  7. Hooks and auto-review
  8. Codex Cloud
  9. Screenshots and image input
  10. Computer use
  11. When codex says "451: unsupported region"
  12. Codex CLI vs Claude Code
  13. FAQ

Why Codex CLI matters in 2026

When OpenAI relaunched Codex in mid-2025 it was, frankly, a slimmer version of Claude Code with a nicer install story. That stopped being true sometime around the GPT-5.5 release on 2026-04-23, when OpenAI made Codex the default surface for the new model and shipped a stack of features that closed the agentic-coding gap: subagents, MCP, auto-review, hooks, remote cloud tasks, and image input — all wired together in a single CLI.

Today the headline numbers (from OpenAI's own dev keynote and the Codex changelog):

  • 4 million weekly active developers on Codex across CLI / IDE / ChatGPT / Codex Cloud
  • GPT-5.5 routed by default, with GPT-5.3-Codex as an agentic-coding-tuned snapshot, GPT-5.4-mini for fast subagent work, and GPT-5.3-Codex-Spark for sub-second latency
  • Chat Completions API is deprecated inside Codex — every new feature ships on the Responses API

Whether you prefer Codex over Claude Code is a separate question (see the comparison at the end). What's no longer defensible is dismissing Codex as "just an OpenAI wrapper" — in 2026 it is a peer.


Install in 90 seconds

Codex CLI ships as an npm package, with macOS / Linux / Windows (PowerShell + WSL) installers.

bash
# Option A — npm (works everywhere with Node 18+)
npm install -g @openai/codex

# Option B — Homebrew (macOS)
brew install openai/tap/codex

# Option C — winget (Windows)
winget install OpenAI.Codex

First run:

bash
codex login
# Opens browser → ChatGPT/OpenAI account → returns access token
codex "What does this repo do?"

A few practical notes from the field:

  • Codex stores credentials at ~/.codex/auth.json. Keep that file out of dotfile repos.
  • The CLI auto-detects your project's package manager, lockfile, and test runner — you don't need a .codex config to start.
  • If codex login times out (common behind corporate proxies), set OPENAI_API_KEY directly and skip the OAuth flow.

The model picker

codex lets you switch models mid-session with /model. The 2026 line-up:

Model

Best for

Typical latency

Notes

GPT-5.5

General work, implementation, refactors, debugging

8–25 s/turn

Default since 2026-04-23 — frontier coding model, lowest token output of the family

GPT-5.4

Flagship "professional work" model; fallback when 5.5 is still rolling out

8–20 s/turn

GA. Combines GPT-5.3-Codex coding strength with stronger reasoning and tool use

GPT-5.4-mini

Subagents, responsive coding, anything cost/latency sensitive

3–8 s/turn

GA. The model to wire into subagents — cheap, fast, capable enough

GPT-5.3-Codex

Long-running agentic loops, multi-file refactors, eval runs

12–40 s/turn

Snapshot model, Responses API only. Powers GPT-5.4 internally

GPT-5.3-Codex-Spark

Inline completion, "feels like an IDE" workflows

< 1 s/turn (1000+ tokens/s)

Research preview, ChatGPT Pro only. Text-only; capability < GPT-5.5 but near-instant

GPT-5.2

Hard debugging that benefits from deeper deliberation

15–35 s/turn

Previous-generation, still available. Useful as a fresh second-opinion model

A rough rule of thumb after a few weeks of dogfooding:

  • Default to GPT-5.5. Frontier model since 2026-04-23, strictly better than GPT-5.4 on most coding tasks.
  • Wire GPT-5.4-mini into your subagents. Lower cost + lower latency makes it the sweet spot for the child agents the parent spawns.
  • Reach for GPT-5.3-Codex when handing the agent a multi-hour autonomous task — purpose-tuned snapshot for sustained agentic loops.
  • Pin Codex-Spark for codex --watch style inline edits where latency dominates UX (ChatGPT Pro account required).
  • Fall back to GPT-5.2 for the kind of deep-debugging task that benefits from a second model with deliberative reasoning.

OpenAI's published benchmark claim is that GPT-5.5 produces ~72% fewer output tokens than Claude Opus 4.7 on the same coding task. We measured something similar in our own internal harness over two weeks: median output 41% smaller, p95 output 68% smaller. That partially neutralises Opus's per-token price advantage and is one reason routing layers like Cursor and Cline increasingly pick GPT-5.5 by default.


Reasoning effort

Independent of model, every Responses-API call accepts a reasoning_effort parameter:

  • low — fast, minimal chain-of-thought. Use for simple edits, formatting, lint fixes.
  • medium — balanced. The default for codex interactive sessions.
  • high — for complex agentic tasks. Slower, more accurate.
  • xhigh — research-grade. Token budget can be 10×+ medium. Use for eval runs and the hardest async work.

In the CLI you can override per-turn:

bash
codex --reasoning xhigh "Find every place in the repo that mutates user state without a lock and propose a fix."

Or for a whole session:

bash
codex
> /reasoning high
> Now refactor the billing module to use the new pricing types.

Practical tip: xhigh is wonderful in eval scripts and miserable in interactive sessions. Don't leave it on.


Subagents

The 2026 Codex CLI ships a native subagent primitive. From inside a codex session:

text
> /spawn "audit the test suite for flaky tests and report which ones share a fixture"
> /spawn "look up the latest stripe API for subscription updates and write a migration plan"
> /spawn "review the diff against our coding-standards.md and list violations"

Each spawn launches an independent agent with its own context window. Results stream back into the parent session. This is the single most useful 2026 addition for anyone running large refactors — a parent agent can fan out work, gather structured reports, and decide what to act on without burning its own context on intermediate exploration.

A worked example — finding tech debt across a Django monorepo:

bash
codex
> /spawn "list every file under apps/billing that imports from apps.legacy and write the import graph to /tmp/legacy-graph.txt"
> /spawn "grep for raw SQL strings that bypass the ORM in apps/billing and apps/payments, output to /tmp/raw-sql.txt"
> /spawn "for each test that takes more than 5 seconds, output test name + likely cause, write to /tmp/slow-tests.txt"
> Now read all three reports and propose a 3-week migration plan we can take to engineering review.

Three subagents fan out in parallel, the parent consumes the three files, then plans against the consolidated picture. Total wall-clock time on our test repo: 6 minutes 12 seconds. The equivalent single-session run took 18 minutes and timed out twice.


MCP servers

Codex CLI is a first-class Model Context Protocol client. The 2026 release added an MCP submenu in the composer and /mcp install <name> shortcuts that match the syntax of Claude Code and Cursor.

Wire up a server:

bash
# Add an MCP server from a published registry
codex mcp install github

# Add a custom server pointing at a local binary
codex mcp add notion --command "/usr/local/bin/notion-mcp"

# List active servers
codex mcp list

Inside a session:

text
> /mcp connect github
> Find the open PRs in our repo touching apps/billing and summarise the review status.

Codex routes the tool calls through the configured MCP server, returns structured results, and (depending on reasoning_effort) decides whether to act on them or ask the user. The mental model is identical to Claude Code MCP, the syntax is identical, and most servers built for one client work in the other without changes.


Hooks and auto-review

Hooks in Codex CLI let you fire shell commands on lifecycle events: pre-commit, post-edit, pre-push, pre-spawn. The config lives at .codex/hooks.toml in the project root:

toml
[[hooks]]
event = "pre-commit"
command = "pnpm run lint:fix"
description = "Auto-fix lint before any Codex-driven commit"

[[hooks]]
event = "post-edit"
command = "pnpm test -- --findRelatedTests $CODEX_EDITED_FILES"
description = "Run related tests after every edit"

Auto-review is a separate but related feature. Setting auto_review = true in .codex/config.toml spawns a second Codex agent — using a different model snapshot if you want — to review every diff before it gets staged or pushed.

toml
[auto_review]
enabled = true
reviewer_model = "gpt-5.4"
block_on_severity = "high"

The reviewer reads the diff, your coding-standards.md (if present), and any project-local conventions, then either approves silently or blocks with a structured report. Practical experience: leaving this on catches roughly 1 in 8 commits that would have introduced a regression. The cost is non-trivial (every commit eats a Pro-level inference) but for shared codebases it pays back fast.


Codex Cloud

For long-running tasks that you don't want to hold open on your laptop:

bash
codex cloud launch "Migrate the entire test suite from jest to vitest. Diff back when done."
# → Returns a task URL you can monitor in browser or via `codex cloud status <id>`

codex cloud status fc-7a2b
# → Streams logs, shows partial diff

codex cloud apply fc-7a2b
# → Applies the final diff to your local working tree

Three things to know:

  • Codex Cloud tasks run on OpenAI-provisioned VMs with isolated sandboxes. Your repo is mirrored at launch; nothing else on disk is visible.
  • The default environment matches your local language/runtime (detected from lockfiles). For custom toolchains, ship a .codex/cloud-env.yaml.
  • You can chain Cloud tasks into codex exec for scripted workflows — useful for nightly maintenance runs.

This is the feature that pushed our infra team from "Codex is OK" to "we keep four nightly Codex Cloud jobs running" — refactors that used to need a dedicated half-day now happen overnight.


Screenshots and image input

Codex CLI 2026 accepts image attachments inline:

bash
codex --image ~/Desktop/figma-export.png "Build a React component that matches this layout. Match Tailwind class names used elsewhere in the repo."

In an interactive session:

text
> /attach ~/screenshots/error-modal.png
> Why is the alignment off? The CSS is in components/ui/Modal.tsx.

Codex passes the image to the vision-capable GPT-5.5 inference and reads it alongside the text prompt. We've used this for three workflows:

  1. Design-to-code: paste a Figma export, get a first-cut React/Vue/Svelte implementation that matches existing repo conventions.
  2. Bug screenshots: attach the broken state, point Codex at the file, ask for the fix.
  3. Whiteboard photos: surprisingly good at parsing hand-drawn system diagrams into Mermaid syntax.

Quality is good but not magic. Expect to iterate; treat the first pass as a starting structure.


Computer use

GPT-5.5 ships computer-use as a first-class Responses-API primitive — the model can drive a sandboxed browser, click buttons, fill forms, read pages, and report back. In Codex CLI:

bash
codex compute "Log in to the GitHub admin for whitedit/code-gateway with the saved session, find the PR #1230 review thread, copy any unresolved comments to /tmp/pr-1230-tasks.md"

Computer use runs inside an OpenAI-managed isolated browser. Sessions are persisted (optionally) and you can scope which sites the agent can navigate to via a deny-list in .codex/computer-use.toml. It is not a replacement for Playwright in your test pipeline, but for ad-hoc "go and do this in a web UI" tasks it removes the need to context-switch.


When codex says "451: unsupported region"

A subset of users see HTTP 451: unsupported_country_region_territory when calling OpenAI from certain regions or transit networks. The fix is to route through a transparent OpenAI-compatible proxy.

CodeGateway exposes the Responses API at https://api.codegateway.dev/v1/responses with the same payload schema as OpenAI's endpoint. To redirect Codex CLI:

bash
# Either set the env var
export OPENAI_BASE_URL="https://api.codegateway.dev/v1"

# Or pin it per-session
codex --base-url https://api.codegateway.dev/v1 "Refactor the auth module."

Authentication uses the same OPENAI_API_KEY slot — populate it with a CodeGateway key generated at the dashboard. New accounts get $2 starting credit, billed per token at OpenAI's published rates with a tier-based multiplier (1.5× → 1.2× at $500+ cumulative spend). See tier pricing details.

Three details that catch first-time users:

  • Codex's OAuth flow (codex login) does not work via a proxy — use the API-key path instead.
  • codex cloud tasks run server-side on OpenAI infrastructure; the base-URL override applies only to your local CLI calls.
  • Streaming responses (SSE) are fully supported — no fallback to polling.

Codex CLI vs Claude Code

A quick comparison after extended use of both in 2026 (full deep-dive: Claude Code vs Gemini CLI covers the third option, and we'll publish a four-way Codex/Claude/Cursor/Gemini hub soon):

Dimension

Codex CLI (GPT-5.5)

Claude Code (Opus 4)

Default model output verbosity

~40 % shorter on coding tasks

More verbose, more explanatory

Subagent primitive

Native, native UI in TUI

Native via Task tool

MCP support

First-class, registry-backed

First-class, registry-backed

Hooks

Lifecycle hooks via TOML

Lifecycle hooks via JSON

Auto-review

Built-in, configurable reviewer model

Available via subagent pattern, more setup

Remote cloud tasks

Codex Cloud (built-in)

Not built-in

Image input

Yes, inline attach

Yes, inline attach

Computer use

Yes (Responses API)

Yes (Computer Use API)

Restricted-region access

Needs proxy (e.g. CodeGateway)

Needs proxy (e.g. CodeGateway)

Pricing

Per-token, tiered

Per-token, no tier

The honest summary: in May 2026, neither is strictly better. Codex wins on terseness, remote cloud, and OpenAI ecosystem tie-ins. Claude Code wins on architectural reasoning, longer multi-turn coherence, and a more mature plugin/MCP ecosystem. Most teams we've talked to who run both keep them for different jobs rather than picking one.


FAQ

Q: Is GPT-5.5 the right default? When should I switch? For most interactive coding work, yes. Switch to gpt-5.3-codex for long-running agent loops you'll leave unsupervised, to GPT-5.4-mini for subagent fanout, and to Codex-Spark when latency dominates UX (inline edits, --watch mode; ChatGPT Pro account required).

Q: Do hooks slow down every command? Only the hooks you wire up. A pre-commit hook running pnpm lint:fix adds maybe 2–4 seconds. A post-edit hook running the full test suite is a bad idea — use --findRelatedTests.

Q: Can subagents share context with the parent? No, by design. Each subagent gets a fresh window. Pass data through files in /tmp or via structured stdout — the parent re-reads what it needs.

Q: Does the Responses API support function calling / tools the same way Chat Completions did? Yes, and more. Tools, web search, file search, code interpreter, computer use, and remote MCP are all unified under one primitive. The migration is mostly mechanical for existing tool-using code.

*Q: What about older snapshots like `gpt-5.2-codex` or `gpt-5.1-codex-?** OpenAI removed them from the /model picker on 2026-04-07. The current picker (ChatGPT sign-in) is gpt-5.5, gpt-5.4, gpt-5.4-mini, gpt-5.3-codex, gpt-5.2, plus gpt-5.3-codex-spark for Pro. New work should target gpt-5.5 or gpt-5.3-codex`.

Q: Is Codex Cloud secure for proprietary code? OpenAI's enterprise terms cover Codex Cloud usage under the same SOC 2 / data-processing addendum as the API. For regulated workloads, talk to OpenAI's enterprise team about data residency before piping production code through it.

Q: How does CodeGateway billing differ from OpenAI direct? OpenAI's published per-token price × a tier multiplier (starts at 1.5×, drops to 1.2× at $500+ cumulative spend). New accounts get $2 starting credit. Payment via Alipay, WeChat Pay, or Stripe. See the billing guide.

Q: Can I use Codex CLI without OAuth? Yes. Set OPENAI_API_KEY directly, skip codex login. Most useful in CI environments and behind corporate proxies that block OAuth flows.


External references

AuthorCodeGateway TeamReviewed on2026-05-28