Imagen 4 vs Gemini Flash Image vs GPT Image

Q: Imagen 4 really hardcoded to 1024×1024 in the backend? No horizontal?

Yes. backend/src/proxy-vertex-image.ts comments say aspect_ratio is "accepted but ignored." For 16:9 / 9:16 you must switch to GPT Image 2 (OpenAI route natively supports size). That's why Sprint 4b regenerated 4 heroes.

Q: Imagen 4 ultra is 50% pricier than std ($0.06 vs $0.04). Worth it?Mostly no. Unless it's a top-of-page / marketing primary image that gets seen 10K+ times. Daily blog hero with std is fine; spend the difference on top-of-3 instead — better ROI. Q: Can CodeGateway's key be used in Cursor / Figma / etc.?

The image API endpoint speaks OpenAI Images API protocol (/v1/images/generations) plus Vertex passthrough — so any tool compatible with the OpenAI Images API can plug in directly. In Cursor, Aider, etc.: point OPENAI_BASE_URL at https://api.codegateway.dev/v1 and OPENAI_API_KEY at your sk-cg-xxx.

Q: Who owns the image copyright?

Depends on the upstream model's ToS: CodeGateway as a gateway makes no copyright claim on generated images — what you generate is yours. But copyright ≠ compliance: don't use AI-gen for public figures / trademark infringement / platform-ToS-violating content. Further reading

Imagen 4 vs Gemini 2.5 Flash Image vs GPT Image: Pick by Use Case

Author: CodeGateway team · Tested on May 2026

TL;DR: The biggest trap when choosing an image-gen API is that the official demos all look great. In real-world scenarios — photoreal, cartoon, infographics with text labels, UI mockups — the gap between APIs in their respective strength zones is wildly larger than the marketing pages suggest.

This is a real-run comparison. Same key calling 3 upstream image APIs (Google Imagen 4, Google Gemini 2.5 Flash Image, OpenAI GPT Image), same prompts and scenarios across 16 generated images, scored across 5 axes. The output material comes from a real Sprint 4b blog-image dogfooding run. The conclusion lands in a single recommendation table — no detours.

The 5 axes: things you actually care about
Same prompt, three models side-by-side
Per-axis scoring
Recommendation cheat sheet
One key, three providers: setup
Cost retrospective: 16 images in production
FAQ
Further reading

The 5 axes: things you actually care about

This isn't a "general image quality" leaderboard — that's just numbers and vibes. Five axes that developers actually care about:

Text rendering: when the image has text labels (infographics, step diagrams, comparison cards). Especially CJK / non-Latin scripts. Wrong characters, fuzzy edges, weird shapes are common landmines.
Photoreal / concept illustration: blog heroes, product landing graphics, editorial illustrations. Want clean editorial feel, not cartoon.
Cartoon / UI style: mockups, moodboards, demo screenshots. Need "production UI" quality, not hand-drawn cartoon.
Speed: end-to-end API latency, request to b64 return. Bottleneck for batch jobs.
Cost: flat-per-image vs per-token. At 10–100 image scale, which structure wins.

Same prompt, three models side-by-side

Five prompts, each fed to all three models for direct comparison.

Prompt 1: infographic with text labels

plaintext

A clean three-layer architecture diagram, horizontally stacked panels:
top panel labeled "Network Layer" (purple #8B5CF6 stripe),
middle panel labeled "TLS Layer" (lighter violet stripe),
bottom panel labeled "Inference Layer" (deep violet stripe).
Each panel has a small icon. Modern minimal infographic.

Prompt 1 含中文文字信息图 - Imagen 4 实测 — Imagen 4

Prompt 1 含中文文字信息图 - Gemini 2.5 Flash Image 实测 — Gemini 2.5 Flash Image

Prompt 1 含中文文字信息图 - GPT Image 2 实测 — GPT Image 2

Model	Text rendering	Notes
Imagen 4 (std)	⚠️ Letters often warped or missing strokes	Photoreal-strong, text-weak
Gemini 2.5 Flash Image	✅ Clean and readable	Usable for text scenarios
GPT Image 2	✅ Sharpest text fidelity	The clear winner here

Verdict: any image with text labels → GPT Image 2 first, Gemini as backup, Imagen 4 not a fit.

Prompt 2: blog hero photoreal concept

plaintext

A minimalist flat illustration showing a frustrated developer at a laptop,
the laptop screen displaying a terminal window with red error text,
soft purple gradient background, clean modern tech aesthetic, no text,
professional editorial composition.

Prompt 2 写实概念插画 - Imagen 4 实测 — Imagen 4

Prompt 2 写实概念插画 - Gemini 2.5 Flash Image 实测 — Gemini 2.5 Flash Image

Prompt 2 写实概念插画 - GPT Image 2 实测 — GPT Image 2

Model	Visual quality	Notes
Imagen 4 std	✅ Editorial feel at the top	Concept-illustration ceiling
Gemini 2.5 Flash Image	⚠️ Tends toward icon-y, lacks editorial polish	Off strength zone
GPT Image 2 medium	✅ Clean style + native 16:9 support	Friendly to hero containers

Verdict: photoreal concept blog heroes → Imagen 4 std by default; if you need 16:9 horizontal → GPT Image 2 medium. Gemini is weak here.

Prompt 3: UI card mockup

plaintext

A clean mockup of a developer dashboard card showing API usage stats:
"Total Tokens" header, a number "1,234,567", a small bar chart trend line,
rounded corners, soft shadow, dark mode with purple accent.

Prompt 3 UI 卡片 mockup - Imagen 4 实测 — Imagen 4

Prompt 3 UI 卡片 mockup - Gemini 2.5 Flash Image 实测 — Gemini 2.5 Flash Image

Prompt 3 UI 卡片 mockup - GPT Image 2 实测 — GPT Image 2

Model	UI feel	Notes
Imagen 4	⚠️ Tends illustrative	Off strength zone
Gemini 2.5 Flash Image	✅ Number rendering accurate + clean	Strong on data cards
GPT Image 2 medium	✅ Most "real product UI"	The clear winner for UI mockups

Verdict: product UI mockups / cards / fake screenshots → GPT Image 2 medium first; if numbers in cards → Gemini works too.

Prompt 4: abstract / texture / decorative

plaintext

A minimal abstract illustration with soft purple gradient,
overlapping geometric shapes, no text, subtle grain texture,
modern editorial style.

Prompt 4 抽象装饰 - Imagen 4 fast 实测 — Imagen 4 fast

Prompt 4 抽象装饰 - Gemini 2.5 Flash Image 实测 — Gemini 2.5 Flash Image

Prompt 4 抽象装饰 - GPT Image 2 实测 — GPT Image 2

Model	Aesthetics	Notes
Imagen 4 fast	✅ Best price/perf	$0.02/img, decoration first pick
Gemini 2.5 Flash Image	⚠️ Functional, lacks artistic feel	Off zone
GPT Image 2	✅ Aesthetics OK	Slow and expensive

Verdict: pure decoration / abstract / background → Imagen 4 fast. $0.02/img, clean output, batch-friendly.

Prompt 5: stepwise flowchart (numbered + short text)

plaintext

A 3-step horizontal flowchart on white background,
three circles connected by arrows in purple color scheme,
each circle labeled "1 Sign Up", "2 Configure", "3 Ship",
modern minimal flat design.

Prompt 5 步骤流程图(含中文)- Imagen 4 实测 — Imagen 4

Prompt 5 步骤流程图(含中文)- Gemini 2.5 Flash Image 实测 — Gemini 2.5 Flash Image

Prompt 5 步骤流程图(含中文)- GPT Image 2 实测 — GPT Image 2

Model	Numbers	Short text	Notes
Imagen 4	⚠️ Numbers OK / text scrambled	Off	—
Gemini 2.5 Flash Image	✅ Numbers + text both accurate	✅	Backup
GPT Image 2	✅ Numbers + sharpest text	✅	Recommended for step diagrams

Verdict: step diagrams / numbered infographics with text → GPT Image 2 first, Gemini as backup.

Per-axis scoring

Rolling the 5 prompts above into the 5 evaluation axes (1–5 scale):

Axis	Imagen 4 fast	Imagen 4 std	Gemini 2.5 Flash Image	GPT Image 2 medium
Text rendering	1	2	4	5
Photoreal / concept	4	5	2	4
Cartoon / UI	2	3	3	5
Speed (E2E)	5 (~7-9s)	4 (~10-12s)	3 (~8-17s)	1 (~56-71s)
Cost (per image)	5 ($0.02)	4 ($0.04)	3 (~$0.06)	4 ($0.041)
Use-case fit total	17	18	16	17

Totals are close — but per-axis gaps are huge. That's exactly why "pick by scenario" beats "pick by overall score."

Pricing notes

CodeGateway transparently passes through 4 model billings:

Imagen 4 fast: $0.02 / image (flat per image, not affected by prompt / resolution)
Imagen 4 std: $0.04 / image
Imagen 4 ultra: $0.06 / image (premium one-offs)
Gemini 2.5 Flash Image: per-token (input $0.30/MTok + text output $2.50/MTok + image output $30/MTok); typical single image lands ~$0.04–0.08
GPT Image 2: quality × aspect matrix (low $0.005–0.006, medium $0.041–0.053, high $0.165–0.211)

Plus CodeGateway's 1.2x–1.5x tier markup — mixing across models hits lower tier brackets faster than single-vendor spend (see Tier markup explainer).

Recommendation cheat sheet

Copy this into your spec decision comments:

Scenario	First pick	Backup	Per-image cost
Blog hero (1:1)	Imagen 4 std	Imagen 4 fast	$0.04 / $0.02
Blog hero (16:9 horizontal)	GPT Image 2 medium	—	$0.041
Body illustration (photoreal)	Imagen 4 fast	Imagen 4 std	$0.02 / $0.04
Body infographic (with labels)	GPT Image 2 medium	Gemini 2.5 Flash Image	$0.041
Step / flow diagram	GPT Image 2 medium	Gemini 2.5 Flash Image	$0.041
Product UI mockup / fake card	GPT Image 2 medium	Gemini	$0.041
Pure decoration / abstract / bg	Imagen 4 fast	—	$0.02
OG / social card (1.91:1, near 16:9)	GPT Image 2 medium	Imagen 4 std + crop	$0.041
Logo / brand mark (precise reproduction)	Don't use AI gen	—	—

The last row matters: rarely AI-generate logos / brand marks / trademarks. Copyright, risk, fidelity all fail. Use real design files.

One key, three providers: setup

CodeGateway's sk-cg- key calls all three upstreams — no separate Google / OpenAI accounts, no international credit cards, no service-account configuration.

Shared endpoint

bash

POST https://api.codegateway.dev/v1/images/generations

Different models route via the model field in the request body:

bash

# Imagen 4 fast
curl -X POST https://api.codegateway.dev/v1/images/generations \
  -H "Authorization: Bearer $CODEGATEWAY_PROD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"imagen-4.0-fast-generate-001","prompt":"...","n":1,"response_format":"b64_json"}'

# Gemini 2.5 Flash Image
curl -X POST https://api.codegateway.dev/v1/images/generations \
  -H "Authorization: Bearer $CODEGATEWAY_PROD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gemini-2.5-flash-image","prompt":"...","aspect_ratio":"1:1","response_format":"b64_json"}'

# GPT Image 2 medium 1536x1024
curl -X POST https://api.codegateway.dev/v1/images/generations \
  -H "Authorization: Bearer $CODEGATEWAY_PROD_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-image-2","prompt":"...","size":"1536x1024","quality":"medium","response_format":"b64_json"}'

Mixing all three in one spec

In production specs, just pick the model per scenario; the runner handles the rest:

yaml

- name: blog-hero
  model: gpt-image-2
  quality: medium
  size: "1536x1024"
  prompt: A wide cinematic editorial illustration...
  out: /tmp/blog-hero.png

- name: architecture-diagram
  model: gemini-2.5-flash-image
  aspect: "1:1"
  prompt: |
    A clean three-layer architecture diagram with labels "Network Layer" / "TLS Layer" / "Inference Layer"...
  out: /tmp/architecture.png

- name: hero-decoration
  model: imagen-4.0-fast-generate-001
  aspect: "1:1"
  prompt: A minimal abstract purple gradient...
  out: /tmp/decoration.png

The full spec runner is open source at Whitedit/code-gateway-cookbook · image-gen/ — single generate.py auto-routes by model field to the right body shape (Imagen uses aspect_ratio; GPT Image uses size + quality).

Cost retrospective: 16 images in production

Sprint 4b blog-image dogfooding ran 4 blog posts / 16 images / 4 mixed models:

Model	Count	Use	Cost
Imagen 4 std	4	Heroes (1024×1024)	$0.16
Imagen 4 fast	3	Body photoreal	$0.06
Gemini 2.5 Flash Image	9	Infographics / step diagrams (with text)	$0.54
GPT Image 2 medium	4	Hero 16:9 regen	$0.164

Total: $0.92 / 16 images / 4 models / one key.

If we'd run all 16 through any single model:

All Imagen 4 fast: $0.32 (cheapest, but text labels would die)
All Gemini 2.5 Flash Image: ~$0.96 (great text, but heroes weak)
All GPT Image 2 medium: ~$0.66 (slow + UI-strong, but heroes overpriced)

Mixing was cheaper than any single-model run, and quality landed at each scenario's optimum. That's why "pick by scenario" pays off.

FAQ

Q: Imagen 4 really hardcoded to 1024×1024 in the backend? No horizontal?

A: Yes. backend/src/proxy-vertex-image.ts comments say aspect_ratio is "accepted but ignored." For 16:9 / 9:16 you must switch to GPT Image 2 (OpenAI route natively supports size). That's why Sprint 4b regenerated 4 heroes.

Q: Gemini 2.5 Flash Image is per-token while Imagen is flat — which is more economical?

A: Gemini per image: ~$0.04–0.08. Imagen std flat: $0.04. Real-world: Gemini's Chinese-text rendering is stable, but GPT Image 2 edges it in our tests. If your prompt is short and has no text → Imagen is more reliable. Long prompt + text labels → GPT Image 2 first, Gemini as backup.

Q: GPT Image 2 is so slow (~60s/image) — worth it?

A: Depends. Definitely not for batch decoration — slow with no advantage. Worth it in two scenarios: (1) you need native 16:9 / 9:16 (other models don't); (2) product UI mockups (GPT Image 2 is clearly stronger here).

Q: Can I dispatch the same prompt to all three and pick a strong choice?

A: Yes, but cost spikes. The 16-image dogfood at top-of-3 would be ~$2.76 instead of $0.92. Worth it depends on stakes — blog cover heroes are worth top-of-3, body illustrations aren't.

Q: Imagen 4 ultra is 50% pricier than std ($0.06 vs $0.04). Worth it?
Mostly no. Unless it's a top-of-page / marketing primary image that gets seen 10K+ times. Daily blog hero with std is fine; spend the difference on top-of-3 instead — better ROI.

Q: Can CodeGateway's key be used in Cursor / Figma / etc.?

A: The image API endpoint speaks OpenAI Images API protocol (/v1/images/generations) plus Vertex passthrough — so any tool compatible with the OpenAI Images API can plug in directly. In Cursor, Aider, etc.: point OPENAI_BASE_URL at https://api.codegateway.dev/v1 and OPENAI_API_KEY at your sk-cg-xxx.

Q: Will models suddenly disappear / repriced?

A: Upstream Google / OpenAI handle their own announcement cadence. The CodeGateway gateway tracks upstream changes — when upstream reprices, our CMS price table updates and new prices show on /pricing immediately. In-flight requests settle at the price at submission time.

Q: Who owns the image copyright?

A: Depends on the upstream model's ToS:

Imagen / Gemini: Google's Generative AI Terms; commercial use mostly allowed, some content (real people, etc.) restricted.
GPT Image: OpenAI's Usage Policies; user owns the generated content.

CodeGateway as a gateway makes no copyright claim on generated images — what you generate is yours. But copyright ≠ compliance: don't use AI-gen for public figures / trademark infringement / platform-ToS-violating content.

Imagen 4 vs Gemini 2.5 Flash Image vs GPT Image: Pick by Use Case

Imagen 4 vs Gemini 2.5 Flash Image vs GPT Image: Pick by Use Case

Table of Contents

The 5 axes: things you actually care about

Same prompt, three models side-by-side

Prompt 1: infographic with text labels

Prompt 2: blog hero photoreal concept

Prompt 3: UI card mockup

Prompt 4: abstract / texture / decorative

Prompt 5: stepwise flowchart (numbered + short text)

Per-axis scoring

Pricing notes

Recommendation cheat sheet

One key, three providers: setup

Shared endpoint

Mixing all three in one spec

Cost retrospective: 16 images in production

FAQ

Further reading