Imagen 4 vs Gemini 2.5 Flash Image vs GPT Image: Pick by Use Case
Author: CodeGateway team · Tested on May 2026
TL;DR: The biggest trap when choosing an image-gen API is that the official demos all look great. In real-world scenarios — photoreal, cartoon, infographics with text labels, UI mockups — the gap between APIs in their respective strength zones is wildly larger than the marketing pages suggest.
This is a real-run comparison. Same key calling 3 upstream image APIs (Google Imagen 4, Google Gemini 2.5 Flash Image, OpenAI GPT Image), same prompts and scenarios across 16 generated images, scored across 5 axes. The output material comes from a real Sprint 4b blog-image dogfooding run. The conclusion lands in a single recommendation table — no detours.
Table of Contents
- The 5 axes: things you actually care about
- Same prompt, three models side-by-side
- Per-axis scoring
- Recommendation cheat sheet
- One key, three providers: setup
- Cost retrospective: 16 images in production
- FAQ
- Further reading
The 5 axes: things you actually care about
This isn't a "general image quality" leaderboard — that's just numbers and vibes. Five axes that developers actually care about:
- Text rendering: when the image has text labels (infographics, step diagrams, comparison cards). Especially CJK / non-Latin scripts. Wrong characters, fuzzy edges, weird shapes are common landmines.
- Photoreal / concept illustration: blog heroes, product landing graphics, editorial illustrations. Want clean editorial feel, not cartoon.
- Cartoon / UI style: mockups, moodboards, demo screenshots. Need "production UI" quality, not hand-drawn cartoon.
- Speed: end-to-end API latency, request to b64 return. Bottleneck for batch jobs.
- Cost: flat-per-image vs per-token. At 10–100 image scale, which structure wins.
Same prompt, three models side-by-side
Five prompts, each fed to all three models for direct comparison.
Prompt 1: infographic with text labels
A clean three-layer architecture diagram, horizontally stacked panels:
top panel labeled "Network Layer" (purple #8B5CF6 stripe),
middle panel labeled "TLS Layer" (lighter violet stripe),
bottom panel labeled "Inference Layer" (deep violet stripe).
Each panel has a small icon. Modern minimal infographic.


Model | Text rendering | Notes |
|---|---|---|
Imagen 4 (std) | ⚠️ Letters often warped or missing strokes | Photoreal-strong, text-weak |
Gemini 2.5 Flash Image | ✅ Clean and readable | Usable for text scenarios |
GPT Image 2 | ✅ Sharpest text fidelity | The clear winner here |
Verdict: any image with text labels → GPT Image 2 first, Gemini as backup, Imagen 4 not a fit.
Prompt 2: blog hero photoreal concept
A minimalist flat illustration showing a frustrated developer at a laptop,
the laptop screen displaying a terminal window with red error text,
soft purple gradient background, clean modern tech aesthetic, no text,
professional editorial composition.


Model | Visual quality | Notes |
|---|---|---|
Imagen 4 std | ✅ Editorial feel at the top | Concept-illustration ceiling |
Gemini 2.5 Flash Image | ⚠️ Tends toward icon-y, lacks editorial polish | Off strength zone |
GPT Image 2 medium | ✅ Clean style + native 16:9 support | Friendly to hero containers |
Verdict: photoreal concept blog heroes → Imagen 4 std by default; if you need 16:9 horizontal → GPT Image 2 medium. Gemini is weak here.
Prompt 3: UI card mockup
A clean mockup of a developer dashboard card showing API usage stats:
"Total Tokens" header, a number "1,234,567", a small bar chart trend line,
rounded corners, soft shadow, dark mode with purple accent.


Model | UI feel | Notes |
|---|---|---|
Imagen 4 | ⚠️ Tends illustrative | Off strength zone |
Gemini 2.5 Flash Image | ✅ Number rendering accurate + clean | Strong on data cards |
GPT Image 2 medium | ✅ Most "real product UI" | The clear winner for UI mockups |
Verdict: product UI mockups / cards / fake screenshots → GPT Image 2 medium first; if numbers in cards → Gemini works too.
Prompt 4: abstract / texture / decorative
A minimal abstract illustration with soft purple gradient,
overlapping geometric shapes, no text, subtle grain texture,
modern editorial style.


Model | Aesthetics | Notes |
|---|---|---|
Imagen 4 fast | ✅ Best price/perf | $0.02/img, decoration first pick |
Gemini 2.5 Flash Image | ⚠️ Functional, lacks artistic feel | Off zone |
GPT Image 2 | ✅ Aesthetics OK | Slow and expensive |
Verdict: pure decoration / abstract / background → Imagen 4 fast. $0.02/img, clean output, batch-friendly.
Prompt 5: stepwise flowchart (numbered + short text)
A 3-step horizontal flowchart on white background,
three circles connected by arrows in purple color scheme,
each circle labeled "1 Sign Up", "2 Configure", "3 Ship",
modern minimal flat design.


Model | Numbers | Short text | Notes |
|---|---|---|---|
Imagen 4 | ⚠️ Numbers OK / text scrambled | Off | — |
Gemini 2.5 Flash Image | ✅ Numbers + text both accurate | ✅ | Backup |
GPT Image 2 | ✅ Numbers + sharpest text | ✅ | Recommended for step diagrams |
Verdict: step diagrams / numbered infographics with text → GPT Image 2 first, Gemini as backup.
Per-axis scoring
Rolling the 5 prompts above into the 5 evaluation axes (1–5 scale):
Axis | Imagen 4 fast | Imagen 4 std | Gemini 2.5 Flash Image | GPT Image 2 medium |
|---|---|---|---|---|
Text rendering | 1 | 2 | 4 | 5 |
Photoreal / concept | 4 | 5 | 2 | 4 |
Cartoon / UI | 2 | 3 | 3 | 5 |
Speed (E2E) | 5 (~7-9s) | 4 (~10-12s) | 3 (~8-17s) | 1 (~56-71s) |
Cost (per image) | 5 ($0.02) | 4 ($0.04) | 3 (~$0.06) | 4 ($0.041) |
Use-case fit total | 17 | 18 | 16 | 17 |
Totals are close — but per-axis gaps are huge. That's exactly why "pick by scenario" beats "pick by overall score."
Pricing notes
CodeGateway transparently passes through 4 model billings:
- Imagen 4 fast: $0.02 / image (flat per image, not affected by prompt / resolution)
- Imagen 4 std: $0.04 / image
- Imagen 4 ultra: $0.06 / image (premium one-offs)
- Gemini 2.5 Flash Image: per-token (input $0.30/MTok + text output $2.50/MTok + image output $30/MTok); typical single image lands ~$0.04–0.08
- GPT Image 2: quality × aspect matrix (low $0.005–0.006, medium $0.041–0.053, high $0.165–0.211)
Plus CodeGateway's 1.2x–1.5x tier markup — mixing across models hits lower tier brackets faster than single-vendor spend (see Tier markup explainer).
Recommendation cheat sheet
Copy this into your spec decision comments:
Scenario | First pick | Backup | Per-image cost |
|---|---|---|---|
Blog hero (1:1) | Imagen 4 std | Imagen 4 fast | $0.04 / $0.02 |
Blog hero (16:9 horizontal) | GPT Image 2 medium | — | $0.041 |
Body illustration (photoreal) | Imagen 4 fast | Imagen 4 std | $0.02 / $0.04 |
Body infographic (with labels) | GPT Image 2 medium | Gemini 2.5 Flash Image | $0.041 |
Step / flow diagram | GPT Image 2 medium | Gemini 2.5 Flash Image | $0.041 |
Product UI mockup / fake card | GPT Image 2 medium | Gemini | $0.041 |
Pure decoration / abstract / bg | Imagen 4 fast | — | $0.02 |
OG / social card (1.91:1, near 16:9) | GPT Image 2 medium | Imagen 4 std + crop | $0.041 |
Logo / brand mark (precise reproduction) | Don't use AI gen | — | — |
The last row matters: rarely AI-generate logos / brand marks / trademarks. Copyright, risk, fidelity all fail. Use real design files.
One key, three providers: setup
CodeGateway's sk-cg- key calls all three upstreams — no separate Google / OpenAI accounts, no international credit cards, no service-account configuration.
Shared endpoint
POST https://api.codegateway.dev/v1/images/generationsDifferent models route via the model field in the request body:
# Imagen 4 fast
curl -X POST https://api.codegateway.dev/v1/images/generations \
-H "Authorization: Bearer $CODEGATEWAY_PROD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"imagen-4.0-fast-generate-001","prompt":"...","n":1,"response_format":"b64_json"}'
# Gemini 2.5 Flash Image
curl -X POST https://api.codegateway.dev/v1/images/generations \
-H "Authorization: Bearer $CODEGATEWAY_PROD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gemini-2.5-flash-image","prompt":"...","aspect_ratio":"1:1","response_format":"b64_json"}'
# GPT Image 2 medium 1536x1024
curl -X POST https://api.codegateway.dev/v1/images/generations \
-H "Authorization: Bearer $CODEGATEWAY_PROD_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"gpt-image-2","prompt":"...","size":"1536x1024","quality":"medium","response_format":"b64_json"}'Mixing all three in one spec
In production specs, just pick the model per scenario; the runner handles the rest:
- name: blog-hero
model: gpt-image-2
quality: medium
size: "1536x1024"
prompt: A wide cinematic editorial illustration...
out: /tmp/blog-hero.png
- name: architecture-diagram
model: gemini-2.5-flash-image
aspect: "1:1"
prompt: |
A clean three-layer architecture diagram with labels "Network Layer" / "TLS Layer" / "Inference Layer"...
out: /tmp/architecture.png
- name: hero-decoration
model: imagen-4.0-fast-generate-001
aspect: "1:1"
prompt: A minimal abstract purple gradient...
out: /tmp/decoration.pngThe full spec runner is open source at Whitedit/code-gateway-cookbook · image-gen/ — single generate.py auto-routes by model field to the right body shape (Imagen uses aspect_ratio; GPT Image uses size + quality).
Cost retrospective: 16 images in production
Sprint 4b blog-image dogfooding ran 4 blog posts / 16 images / 4 mixed models:
Model | Count | Use | Cost |
|---|---|---|---|
Imagen 4 std | 4 | Heroes (1024×1024) | $0.16 |
Imagen 4 fast | 3 | Body photoreal | $0.06 |
Gemini 2.5 Flash Image | 9 | Infographics / step diagrams (with text) | $0.54 |
GPT Image 2 medium | 4 | Hero 16:9 regen | $0.164 |
Total: $0.92 / 16 images / 4 models / one key.
If we'd run all 16 through any single model:
- All Imagen 4 fast: $0.32 (cheapest, but text labels would die)
- All Gemini 2.5 Flash Image: ~$0.96 (great text, but heroes weak)
- All GPT Image 2 medium: ~$0.66 (slow + UI-strong, but heroes overpriced)
Mixing was cheaper than any single-model run, and quality landed at each scenario's optimum. That's why "pick by scenario" pays off.
FAQ
Q: Imagen 4 really hardcoded to 1024×1024 in the backend? No horizontal?
A: Yes. backend/src/proxy-vertex-image.ts comments say aspect_ratio is "accepted but ignored." For 16:9 / 9:16 you must switch to GPT Image 2 (OpenAI route natively supports size). That's why Sprint 4b regenerated 4 heroes.
Q: Gemini 2.5 Flash Image is per-token while Imagen is flat — which is more economical?
A: Gemini per image: ~$0.04–0.08. Imagen std flat: $0.04. Real-world: Gemini's Chinese-text rendering is stable, but GPT Image 2 edges it in our tests. If your prompt is short and has no text → Imagen is more reliable. Long prompt + text labels → GPT Image 2 first, Gemini as backup.
Q: GPT Image 2 is so slow (~60s/image) — worth it?
A: Depends. Definitely not for batch decoration — slow with no advantage. Worth it in two scenarios: (1) you need native 16:9 / 9:16 (other models don't); (2) product UI mockups (GPT Image 2 is clearly stronger here).
Q: Can I dispatch the same prompt to all three and pick a strong choice?
A: Yes, but cost spikes. The 16-image dogfood at top-of-3 would be ~$2.76 instead of $0.92. Worth it depends on stakes — blog cover heroes are worth top-of-3, body illustrations aren't.
Q: Imagen 4 ultra is 50% pricier than std ($0.06 vs $0.04). Worth it?
Mostly no. Unless it's a top-of-page / marketing primary image that gets seen 10K+ times. Daily blog hero with std is fine; spend the difference on top-of-3 instead — better ROI.
Q: Can CodeGateway's key be used in Cursor / Figma / etc.?
A: The image API endpoint speaks OpenAI Images API protocol (/v1/images/generations) plus Vertex passthrough — so any tool compatible with the OpenAI Images API can plug in directly. In Cursor, Aider, etc.: point OPENAI_BASE_URL at https://api.codegateway.dev/v1 and OPENAI_API_KEY at your sk-cg-xxx.
Q: Will models suddenly disappear / repriced?
A: Upstream Google / OpenAI handle their own announcement cadence. The CodeGateway gateway tracks upstream changes — when upstream reprices, our CMS price table updates and new prices show on /pricing immediately. In-flight requests settle at the price at submission time.
Q: Who owns the image copyright?
A: Depends on the upstream model's ToS:
- Imagen / Gemini: Google's Generative AI Terms; commercial use mostly allowed, some content (real people, etc.) restricted.
- GPT Image: OpenAI's Usage Policies; user owns the generated content.
CodeGateway as a gateway makes no copyright claim on generated images — what you generate is yours. But copyright ≠ compliance: don't use AI-gen for public figures / trademark infringement / platform-ToS-violating content.
Further reading
- An honest receipt: 16 blog hero images for $0.92 in an hour — source material for this post; full dogfood retrospective
- Codex CLI vs Claude Code: pick by task — same "pick by scenario" framework applied to coding tools
- Top-up and billing guide
- Tier markup explainer
- Google Cloud — Imagen 4 model card
- Google Cloud — Gemini 2.5 Flash Image
- OpenAI — Image Generation guide
- Production scripts and spec templates: Whitedit/code-gateway-cookbook · image-gen/
Picking an image API is the same playbook as picking a coding tool — don't compare overall, compare per axis. Text → Gemini; photoreal → Imagen; UI mockup → GPT Image; pure decoration → Imagen 4 fast. Paste this table in your spec decision comments. Save yourself a few rework rounds the next time you write a prompt.
