← Back to Blog
编程工具AI 编程CodeGateway

An Honest Receipt: 16 Blog Hero Images for $0.92 in an Hour

May 8, 2026
N5-cover.jpg cover

An Honest Receipt: 16 Blog Hero Images for $0.92 in an Hour

Author: CodeGateway team · Tested on May 2026

TL;DR: The next thing you do after finishing a batch of blog drafts is normally images. Hand-curating from stock libraries is slow, licensing is uncertain, and styles rarely match across posts. AI image APIs feel like the answer — but the question that stops most teams cold is simply: what does this actually cost? Is the workflow painful? This post is the receipt. One real dogfooding run: 4 long-form posts, 16 images, 5 different upstream models, $0.92 total, under an hour end-to-end. All numbers came from real production API calls with no cherry-picking. The spec.yaml and generation script live at the end of the post — copy and adapt.

Table of Contents

  1. The setup: 4 posts, 16 missing images
  2. Model selection: which model for which image
  3. Three non-obvious prompt rules
  4. Spec design: one YAML for 16 images
  5. The receipt: model mix, per-image cost, time
  6. The mistake: the cost of regenerating heroes from 1:1 to 16:9
  7. Reproducible spec and script
  8. FAQ
  9. Further reading

The setup: 4 posts, 16 missing images

The story starts with a concrete situation: four long-form blog drafts (8,000–12,000 words each) ready to publish, with the pre-publish checklist stuck on "images." Each post needed at minimum:

  • 1 hero image (top-of-page banner, also used for OG / social card)
  • 3 in-body illustrations (architecture, infographics, step diagrams)

Sixteen images total. The hand-curated paths:

  • Free stock libraries (Unsplash, Pexels): hard to search by concept, inconsistent styles, popular terms are saturated.
  • Paid stock libraries (Shutterstock, iStock): money solves it but at $10–30 per image.
  • Design contractor: matches the brief but adds 2–3 days of lead time and hundreds of dollars.

Compounding the problem: these are technical posts, and stock libraries don't have "frustrated developer staring at a connection-timeout terminal" as a tag. So the decision: AI image APIs. The question becomes: which models, what mix, and how low can the total go?

Model selection: which model for which image

CodeGateway exposes 6 image generation models across two upstream routes:

Model

Route

Pricing

Strengths

imagen-4.0-fast-generate-001

Vertex (Imagen)

$0.02/image

Speed, cost, photorealistic & concept illustration

imagen-4.0-generate-001

Vertex (Imagen)

$0.04/image

Standard quality

imagen-4.0-ultra-generate-001

Vertex (Imagen)

$0.06/image

Top quality

gemini-2.5-flash-image

Vertex (Gemini)

per-token (~$0.04–0.08/image)

Strong text rendering, infographic labels

gpt-image-2

OpenAI

$0.005–$0.211/image (by quality × aspect)

UI / cartoon, native 16:9 support

gpt-image-1.5

OpenAI

$0.009–$0.200/image

Same idea, slightly weaker

The decision matrix that emerged from this dogfooding:

Image contains rendered text labels?    → gemini-2.5-flash-image
Photorealistic concept illustration? → imagen-4.0-fast-generate-001 (default)
Hero needs native 16:9 horizontal? → gpt-image-2 medium 1536×1024
Premium one-off? → imagen-4.0-generate-001
None of the above? → ultra or rethink the prompt

The 16-image allocation:

  • 4 heroes (initial pass with imagen-4.0-generate-001 standard, $0.04 × 4 = $0.16)
  • 3 photoreal illustrations (imagen-4.0-fast-generate-001, $0.02 × 3 = $0.06)
  • 9 labeled infographics (gemini-2.5-flash-image, ~$0.06 × 9 = $0.54)

Estimated total: $0.76. The initial pass came in at $0.76 exactly, plus $0.16 to regenerate the heroes (we'll get to why), arriving at $0.92.

Three non-obvious prompt rules

After 16 images, the three loudest signals about how to write prompts:

1. Use color names, not hex codes

Writing #8B5CF6 in a prompt has a non-trivial chance of getting the literal characters rendered into the image. You'll end up with a piece of art that has #8B5CF6 floating across it.

What works: deep violet purple, lavender gradient, emerald green accent. Color words map to the model's training distribution; hex codes get treated as text strings.

2. Skip emoji decorations

with rocket emoji 🚀 or festive ✨ vibe in a prompt makes the model attempt to render emoji glyphs — usually as muddy or garbled text in the image.

What works: describe the visual element. a small upward arrow, a subtle sparkle effect, warm celebratory tone. Hand the model the semantics; let it render.

3. Repeat the aspect intent in the prompt

Imagen 4's backend hardcodes 1024×1024 — passing aspect_ratio: "16:9" is "accepted but ignored" (their internal docs say so explicitly). Even when going through gpt-image-2 with size: 1536x1024, it still helps to write wide cinematic horizontal composition in the prompt itself. Without that, the model still composes the subject as if it were square, and you get awkward negative space when the renderer pads.

Spec design: one YAML for 16 images

The minimum useful tool is a YAML spec → batched API calls → saved PNGs + cost report. The spec entries look like:

yaml
- name: 297-hero
model: imagen-4.0-generate-001
prompt: |
A minimalist flat illustration showing a frustrated developer at a laptop,
the laptop screen displaying a terminal window with red Connection Timeout
error text, soft purple gradient background...
aspect: "16:9"
size: "1792x1024"
out: /tmp/sprint4b-images/297-hero.png

- name: 297-arch
model: gemini-2.5-flash-image
prompt: |
A clean three-layer architecture diagram, horizontally stacked panels:
top panel labeled "Network Layer" (purple stripe),
middle panel "TLS Layer" (lighter violet),
bottom panel "Inference Layer" (deep violet)...
aspect: "1:1"
out: /tmp/sprint4b-images/297-arch.png

# ... 16 entries total

Why YAML over hardcoded Python:

  • Readable. Reviewing prompts and tweaking copy is a text edit, not a code edit.
  • Version-controllable. The spec lives in git alongside the post. Diffs and rollbacks are real.
  • Re-runnable per item. Edit one prompt, rerun that entry, leave the rest.
  • Cost-previewable. A --dry-run mode totals the cost from the matrix without firing API calls.

Running the whole batch is one command:

bash
python3 generate.py --spec image-spec.yaml --api-key "$CODEGATEWAY_PROD_API_KEY"

Generation phase: under 5 minutes serial (per-image latency 7s fastest, 18s slowest).

The receipt: model mix, per-image cost, time

Pass 1: 16 images

Model

Count

Unit

Subtotal

Avg latency

imagen-4.0-generate-001

4

$0.040

$0.160

10–12 s

imagen-4.0-fast-generate-001

3

$0.020

$0.060

7–9 s

gemini-2.5-flash-image

9

~$0.060

$0.540

8–17 s

Pass 1 subtotal

16

$0.760

avg ~10 s

API calls totaled about 2 minutes 30 seconds serial. Going concurrent could compress that to 30 seconds, but for a one-off blog batch, serial is fine.

Pass 2: 4 hero regenerations at 16:9

Pass 1 heroes used imagen-4.0-generate-001, fixed at 1024×1024. Our blog template renders heroes at 16:9 — the 1:1 source got cropped top and bottom, losing the subject. So we regenerated 4 heroes at 16:9, switching to gpt-image-2 medium:

Model

Count

Unit

Subtotal

Avg latency

gpt-image-2 medium 1536×1024

4

$0.041

$0.164

56–71 s

Pass 2 subtotal

4

$0.164

avg ~62 s

gpt-image-2 is roughly 6× slower than Imagen. But OpenAI's route is the only one with native 16:9 support, so the trade-off is forced.

Total

Pass 1 (16 images):       $0.760
Pass 2 (4 hero regen): $0.164
─────────────────────────────────
Total: $0.924 (call it $0.92)

Per-image average: ~$0.046

Plus roughly 30–45 minutes of human time for spec authoring, prompt tuning, JPEG conversion, CMS upload, and cover assignment. Round to one hour end-to-end.

The mistake: the cost of regenerating heroes from 1:1 to 16:9

The biggest lesson from this dogfooding was that hero aspect ratio wasn't planned in pass 1. The full timeline:

  1. The spec specified aspect: "16:9", but Imagen 4's backend marks that field as "accepted but ignored" — both the docs and the implementation make this clear (see backend/src/proxy-vertex-image.ts comments). Result: 4 heroes at 1024×1024.
  2. Uploaded to CMS. The blog template renders heroes inside a 16:9 container; the 1:1 source got top-and-bottom cropped, with subject loss.
  3. Regenerated, this time on gpt-image-2 with size: "1536x1024". But gpt-image-2 takes ~60 seconds per image. Four images, four minutes — significantly slower than Imagen.
  4. $0.16 + 4 minutes paid as a tax for not getting it right an early time.

Takeaways:

  • Before an early batch, confirm the rendering container's aspect ratio. Hero containers tend to be 16:9; in-body images are usually 1:1; OG/social cards are 1.91:1 (close to 16:9).
  • If a model doesn't support a target aspect ratio, switch models in pass 1 rather than discovering it during render review.
  • Different models speak different parameter languages: Imagen ignores aspect_ratio; gpt-image-2 uses size. The dispatch logic in your spec runner should handle this — five extra minutes writing it saves hours of rework later.

Reproducible spec and script

Minimal generation runner (single file, no external deps)

python
#!/usr/bin/env python3
"""Minimal image gen runner. Reads YAML spec, calls /v1/images/generations,
saves PNG to disk, prints cost totals. Public domain."""

import argparse, base64, json, os, sys, time
import urllib.request
from pathlib import Path

def post(url, body, headers):
data = json.dumps(body, ensure_ascii=False).encode("utf-8")
req = urllib.request.Request(url, data=data, headers=headers, method="POST")
with urllib.request.urlopen(req, timeout=180) as resp:
return json.loads(resp.read().decode("utf-8"))

def build_body(entry):
body = {
"model": entry["model"],
"prompt": entry["prompt"],
"n": int(entry.get("n", 1)),
"response_format": "b64_json",
}
# OpenAI route: size; Vertex route: aspect_ratio
if entry["model"].startswith("gpt-image"):
if "size" in entry: body["size"] = entry["size"]
if "quality" in entry: body["quality"] = entry["quality"]
else:
if "aspect" in entry: body["aspect_ratio"] = entry["aspect"]
return body

def main():
ap = argparse.ArgumentParser()
ap.add_argument("--spec", required=True)
ap.add_argument("--api-key", default=os.environ.get("CODEGATEWAY_PROD_API_KEY"))
args = ap.parse_args()
# ... full source ~280 lines, see workspace

The complete version (cost estimation, dry-run, error handling, model dispatch, ~280 lines) is open-sourced at Whitedit/code-gateway-cookbook — one generate.py plus a spec-example.yaml, MIT licensed, copy and adapt.

Spec template (edit the prompts, run)

yaml
# Hero, native 16:9 horizontal (OpenAI route only)
- name: my-hero
model: gpt-image-2
quality: medium
size: "1536x1024"
prompt: |
A wide cinematic flat editorial illustration of <YOUR SCENE>,
soft purple gradient background, modern minimal tech aesthetic,
no text, no logos, professional editorial composition.
out: /tmp/images/my-hero.png

# Photorealistic / concept illustration (cheapest)
- name: my-concept
model: imagen-4.0-fast-generate-001
prompt: |
A minimal abstract <SUBJECT>, soft purple gradient,
clean editorial style, no text.
aspect: "1:1"
out: /tmp/images/my-concept.png

# Infographic with text labels (Gemini's strength)
- name: my-infographic
model: gemini-2.5-flash-image
prompt: |
A clean infographic on white background:
<Step 1 title>, <Step 2 title>, <Step 3 title>,
purple connecting lines, modern minimal flat design.
aspect: "1:1"
out: /tmp/images/my-infographic.png

Wiring up the API

bash
export CODEGATEWAY_PROD_API_KEY="sk-cg-xxx"   # from https://www.codegateway.dev signup
python3 generate.py --spec image-spec.yaml --api-key "$CODEGATEWAY_PROD_API_KEY"

New accounts get a $2 starter credit — at this post's per-image average of ~$0.046, that funds two complete 16-image dogfooding rounds. Enough to wring out your prompt style and decide whether to fund production usage.

FAQ

Q: Can I just use one model and skip the matrix?

A: You can, but you'll trade off either style or coverage. Imagen 4 fast is great for photorealistic concepts but weak at rendering text. Gemini 2.5 Flash Image labels infographics well but doesn't match Imagen on aesthetic photoreal. gpt-image-2 handles UI mockups but is slower and more expensive. Mixing models gives a strong choice mix for blog imagery.

Q: Can I push the cost lower?

A: Yes — drop all heroes to imagen-4.0-fast ($0.02 vs std $0.04). Sixteen images all on fast: $0.32. But heroes are an early thing visitors see, and the standard tier's quality bump usually pays off in social CTR. The $0.16 premium on heroes is a reasonable expense.

Q: Are these models token-billed?

A: Two regimes. Imagen models are per-image flat ($0.02/$0.04/$0.06) regardless of prompt length or rendered resolution. Gemini 2.5 Flash Image is per-token (input + text output + image output as separate modalities), typically landing $0.04–$0.08 per image. GPT Image is per-image flat, indexed by quality × aspect.

Q: What about 100+ image batches?

A: Same spec format, more time. To compress wall time, switch generate.py from serial to async (a few asyncio lines turn 100 images into 1–2 minutes). Watch your RPM budget — the limit varies by your CodeGateway tier.

Q: How do I keep style consistent?

A: Two paths:

  • Pin one model + one prompt template per post (color, composition, and visual style descriptors stay constant; only the subject changes).
  • Use a single reference image as a style anchor (gemini-2.5-flash-image accepts up to 5 reference images for style-locked editing).

Q: What about copyright?

A: Read each upstream's commercial terms. Anthropic, Google, and OpenAI's image APIs broadly allow commercial use (read the latest terms each time). CodeGateway as a gateway makes no claim on the images you generate — what you generate is yours.

Q: Will failed generations bill me twice?

A: No. Failed requests (4xx / 5xx) are free; only successful responses with b64_json or url deduct from balance. Our 16 images all passed on first attempt.

Q: Can I dry-run the budget before generating?

A: Yes. generate.py --dry-run reads the spec, totals expected cost from the pricing matrix, and prints a budget — no API calls fired. Adjust the spec until the number looks right.

Further reading

Anyone who's written a technical blog knows the breakdown: 80% content, 10% images, 10% links and SEO metadata. The image 10% used to mean either pestering a designer or hand-painting illustrations for hours. One hour, $1, and a single spec file later — it doesn't anymore. Get the pipeline running once, and the time you save goes straight back into the content. That's what good tools are supposed to do.
16 Blog Images for $0.92 in an Hour: Receipt | CodeGateway