An Honest Receipt: 16 Blog Hero Images for $0.92 in an Hour
Author: CodeGateway team · Tested on May 2026
TL;DR: The next thing you do after finishing a batch of blog drafts is normally images. Hand-curating from stock libraries is slow, licensing is uncertain, and styles rarely match across posts. AI image APIs feel like the answer — but the question that stops most teams cold is simply: what does this actually cost? Is the workflow painful? This post is the receipt. One real dogfooding run: 4 long-form posts, 16 images, 5 different upstream models, $0.92 total, under an hour end-to-end. All numbers came from real production API calls with no cherry-picking. The spec.yaml and generation script live at the end of the post — copy and adapt.
Table of Contents
- The setup: 4 posts, 16 missing images
- Model selection: which model for which image
- Three non-obvious prompt rules
- Spec design: one YAML for 16 images
- The receipt: model mix, per-image cost, time
- The mistake: the cost of regenerating heroes from 1:1 to 16:9
- Reproducible spec and script
- FAQ
- Further reading
The setup: 4 posts, 16 missing images
The story starts with a concrete situation: four long-form blog drafts (8,000–12,000 words each) ready to publish, with the pre-publish checklist stuck on "images." Each post needed at minimum:
- 1 hero image (top-of-page banner, also used for OG / social card)
- 3 in-body illustrations (architecture, infographics, step diagrams)
Sixteen images total. The hand-curated paths:
- Free stock libraries (Unsplash, Pexels): hard to search by concept, inconsistent styles, popular terms are saturated.
- Paid stock libraries (Shutterstock, iStock): money solves it but at $10–30 per image.
- Design contractor: matches the brief but adds 2–3 days of lead time and hundreds of dollars.
Compounding the problem: these are technical posts, and stock libraries don't have "frustrated developer staring at a connection-timeout terminal" as a tag. So the decision: AI image APIs. The question becomes: which models, what mix, and how low can the total go?
Model selection: which model for which image
CodeGateway exposes 6 image generation models across two upstream routes:
Model | Route | Pricing | Strengths |
|---|---|---|---|
| Vertex (Imagen) | $0.02/image | Speed, cost, photorealistic & concept illustration |
| Vertex (Imagen) | $0.04/image | Standard quality |
| Vertex (Imagen) | $0.06/image | Top quality |
| Vertex (Gemini) | per-token (~$0.04–0.08/image) | Strong text rendering, infographic labels |
| OpenAI | $0.005–$0.211/image (by quality × aspect) | UI / cartoon, native 16:9 support |
| OpenAI | $0.009–$0.200/image | Same idea, slightly weaker |
The decision matrix that emerged from this dogfooding:
Image contains rendered text labels? → gemini-2.5-flash-image
Photorealistic concept illustration? → imagen-4.0-fast-generate-001 (default)
Hero needs native 16:9 horizontal? → gpt-image-2 medium 1536×1024
Premium one-off? → imagen-4.0-generate-001
None of the above? → ultra or rethink the promptThe 16-image allocation:
- 4 heroes (initial pass with imagen-4.0-generate-001 standard, $0.04 × 4 = $0.16)
- 3 photoreal illustrations (imagen-4.0-fast-generate-001, $0.02 × 3 = $0.06)
- 9 labeled infographics (gemini-2.5-flash-image, ~$0.06 × 9 = $0.54)
Estimated total: $0.76. The initial pass came in at $0.76 exactly, plus $0.16 to regenerate the heroes (we'll get to why), arriving at $0.92.
Three non-obvious prompt rules
After 16 images, the three loudest signals about how to write prompts:
1. Use color names, not hex codes
Writing #8B5CF6 in a prompt has a non-trivial chance of getting the literal characters rendered into the image. You'll end up with a piece of art that has #8B5CF6 floating across it.
What works: deep violet purple, lavender gradient, emerald green accent. Color words map to the model's training distribution; hex codes get treated as text strings.
2. Skip emoji decorations
with rocket emoji 🚀 or festive ✨ vibe in a prompt makes the model attempt to render emoji glyphs — usually as muddy or garbled text in the image.
What works: describe the visual element. a small upward arrow, a subtle sparkle effect, warm celebratory tone. Hand the model the semantics; let it render.
3. Repeat the aspect intent in the prompt
Imagen 4's backend hardcodes 1024×1024 — passing aspect_ratio: "16:9" is "accepted but ignored" (their internal docs say so explicitly). Even when going through gpt-image-2 with size: 1536x1024, it still helps to write wide cinematic horizontal composition in the prompt itself. Without that, the model still composes the subject as if it were square, and you get awkward negative space when the renderer pads.
Spec design: one YAML for 16 images
The minimum useful tool is a YAML spec → batched API calls → saved PNGs + cost report. The spec entries look like:
- name: 297-hero
model: imagen-4.0-generate-001
prompt: |
A minimalist flat illustration showing a frustrated developer at a laptop,
the laptop screen displaying a terminal window with red Connection Timeout
error text, soft purple gradient background...
aspect: "16:9"
size: "1792x1024"
out: /tmp/sprint4b-images/297-hero.png
- name: 297-arch
model: gemini-2.5-flash-image
prompt: |
A clean three-layer architecture diagram, horizontally stacked panels:
top panel labeled "Network Layer" (purple stripe),
middle panel "TLS Layer" (lighter violet),
bottom panel "Inference Layer" (deep violet)...
aspect: "1:1"
out: /tmp/sprint4b-images/297-arch.png
# ... 16 entries totalWhy YAML over hardcoded Python:
- Readable. Reviewing prompts and tweaking copy is a text edit, not a code edit.
- Version-controllable. The spec lives in git alongside the post. Diffs and rollbacks are real.
- Re-runnable per item. Edit one prompt, rerun that entry, leave the rest.
- Cost-previewable. A
--dry-runmode totals the cost from the matrix without firing API calls.
Running the whole batch is one command:
python3 generate.py --spec image-spec.yaml --api-key "$CODEGATEWAY_PROD_API_KEY"Generation phase: under 5 minutes serial (per-image latency 7s fastest, 18s slowest).
The receipt: model mix, per-image cost, time
Pass 1: 16 images
Model | Count | Unit | Subtotal | Avg latency |
|---|---|---|---|---|
imagen-4.0-generate-001 | 4 | $0.040 | $0.160 | 10–12 s |
imagen-4.0-fast-generate-001 | 3 | $0.020 | $0.060 | 7–9 s |
gemini-2.5-flash-image | 9 | ~$0.060 | $0.540 | 8–17 s |
Pass 1 subtotal | 16 | — | $0.760 | avg ~10 s |
API calls totaled about 2 minutes 30 seconds serial. Going concurrent could compress that to 30 seconds, but for a one-off blog batch, serial is fine.
Pass 2: 4 hero regenerations at 16:9
Pass 1 heroes used imagen-4.0-generate-001, fixed at 1024×1024. Our blog template renders heroes at 16:9 — the 1:1 source got cropped top and bottom, losing the subject. So we regenerated 4 heroes at 16:9, switching to gpt-image-2 medium:
Model | Count | Unit | Subtotal | Avg latency |
|---|---|---|---|---|
gpt-image-2 medium 1536×1024 | 4 | $0.041 | $0.164 | 56–71 s |
Pass 2 subtotal | 4 | — | $0.164 | avg ~62 s |
gpt-image-2 is roughly 6× slower than Imagen. But OpenAI's route is the only one with native 16:9 support, so the trade-off is forced.
Total
Pass 1 (16 images): $0.760
Pass 2 (4 hero regen): $0.164
─────────────────────────────────
Total: $0.924 (call it $0.92)
Per-image average: ~$0.046Plus roughly 30–45 minutes of human time for spec authoring, prompt tuning, JPEG conversion, CMS upload, and cover assignment. Round to one hour end-to-end.
The mistake: the cost of regenerating heroes from 1:1 to 16:9
The biggest lesson from this dogfooding was that hero aspect ratio wasn't planned in pass 1. The full timeline:
- The spec specified
aspect: "16:9", but Imagen 4's backend marks that field as "accepted but ignored" — both the docs and the implementation make this clear (seebackend/src/proxy-vertex-image.tscomments). Result: 4 heroes at 1024×1024. - Uploaded to CMS. The blog template renders heroes inside a 16:9 container; the 1:1 source got top-and-bottom cropped, with subject loss.
- Regenerated, this time on gpt-image-2 with
size: "1536x1024". But gpt-image-2 takes ~60 seconds per image. Four images, four minutes — significantly slower than Imagen. - $0.16 + 4 minutes paid as a tax for not getting it right an early time.
Takeaways:
- Before an early batch, confirm the rendering container's aspect ratio. Hero containers tend to be 16:9; in-body images are usually 1:1; OG/social cards are 1.91:1 (close to 16:9).
- If a model doesn't support a target aspect ratio, switch models in pass 1 rather than discovering it during render review.
- Different models speak different parameter languages: Imagen ignores
aspect_ratio; gpt-image-2 usessize. The dispatch logic in your spec runner should handle this — five extra minutes writing it saves hours of rework later.
Reproducible spec and script
Minimal generation runner (single file, no external deps)
#!/usr/bin/env python3
"""Minimal image gen runner. Reads YAML spec, calls /v1/images/generations,
saves PNG to disk, prints cost totals. Public domain."""
import argparse, base64, json, os, sys, time
import urllib.request
from pathlib import Path
def post(url, body, headers):
data = json.dumps(body, ensure_ascii=False).encode("utf-8")
req = urllib.request.Request(url, data=data, headers=headers, method="POST")
with urllib.request.urlopen(req, timeout=180) as resp:
return json.loads(resp.read().decode("utf-8"))
def build_body(entry):
body = {
"model": entry["model"],
"prompt": entry["prompt"],
"n": int(entry.get("n", 1)),
"response_format": "b64_json",
}
# OpenAI route: size; Vertex route: aspect_ratio
if entry["model"].startswith("gpt-image"):
if "size" in entry: body["size"] = entry["size"]
if "quality" in entry: body["quality"] = entry["quality"]
else:
if "aspect" in entry: body["aspect_ratio"] = entry["aspect"]
return body
def main():
ap = argparse.ArgumentParser()
ap.add_argument("--spec", required=True)
ap.add_argument("--api-key", default=os.environ.get("CODEGATEWAY_PROD_API_KEY"))
args = ap.parse_args()
# ... full source ~280 lines, see workspaceThe complete version (cost estimation, dry-run, error handling, model dispatch, ~280 lines) is open-sourced at Whitedit/code-gateway-cookbook — one generate.py plus a spec-example.yaml, MIT licensed, copy and adapt.
Spec template (edit the prompts, run)
# Hero, native 16:9 horizontal (OpenAI route only)
- name: my-hero
model: gpt-image-2
quality: medium
size: "1536x1024"
prompt: |
A wide cinematic flat editorial illustration of <YOUR SCENE>,
soft purple gradient background, modern minimal tech aesthetic,
no text, no logos, professional editorial composition.
out: /tmp/images/my-hero.png
# Photorealistic / concept illustration (cheapest)
- name: my-concept
model: imagen-4.0-fast-generate-001
prompt: |
A minimal abstract <SUBJECT>, soft purple gradient,
clean editorial style, no text.
aspect: "1:1"
out: /tmp/images/my-concept.png
# Infographic with text labels (Gemini's strength)
- name: my-infographic
model: gemini-2.5-flash-image
prompt: |
A clean infographic on white background:
<Step 1 title>, <Step 2 title>, <Step 3 title>,
purple connecting lines, modern minimal flat design.
aspect: "1:1"
out: /tmp/images/my-infographic.pngWiring up the API
export CODEGATEWAY_PROD_API_KEY="sk-cg-xxx" # from https://www.codegateway.dev signup
python3 generate.py --spec image-spec.yaml --api-key "$CODEGATEWAY_PROD_API_KEY"New accounts get a $2 starter credit — at this post's per-image average of ~$0.046, that funds two complete 16-image dogfooding rounds. Enough to wring out your prompt style and decide whether to fund production usage.
FAQ
Q: Can I just use one model and skip the matrix?
A: You can, but you'll trade off either style or coverage. Imagen 4 fast is great for photorealistic concepts but weak at rendering text. Gemini 2.5 Flash Image labels infographics well but doesn't match Imagen on aesthetic photoreal. gpt-image-2 handles UI mockups but is slower and more expensive. Mixing models gives a strong choice mix for blog imagery.
Q: Can I push the cost lower?
A: Yes — drop all heroes to imagen-4.0-fast ($0.02 vs std $0.04). Sixteen images all on fast: $0.32. But heroes are an early thing visitors see, and the standard tier's quality bump usually pays off in social CTR. The $0.16 premium on heroes is a reasonable expense.
Q: Are these models token-billed?
A: Two regimes. Imagen models are per-image flat ($0.02/$0.04/$0.06) regardless of prompt length or rendered resolution. Gemini 2.5 Flash Image is per-token (input + text output + image output as separate modalities), typically landing $0.04–$0.08 per image. GPT Image is per-image flat, indexed by quality × aspect.
Q: What about 100+ image batches?
A: Same spec format, more time. To compress wall time, switch generate.py from serial to async (a few asyncio lines turn 100 images into 1–2 minutes). Watch your RPM budget — the limit varies by your CodeGateway tier.
Q: How do I keep style consistent?
A: Two paths:
- Pin one model + one prompt template per post (color, composition, and visual style descriptors stay constant; only the subject changes).
- Use a single reference image as a style anchor (gemini-2.5-flash-image accepts up to 5 reference images for style-locked editing).
Q: What about copyright?
A: Read each upstream's commercial terms. Anthropic, Google, and OpenAI's image APIs broadly allow commercial use (read the latest terms each time). CodeGateway as a gateway makes no claim on the images you generate — what you generate is yours.
Q: Will failed generations bill me twice?
A: No. Failed requests (4xx / 5xx) are free; only successful responses with b64_json or url deduct from balance. Our 16 images all passed on first attempt.
Q: Can I dry-run the budget before generating?
A: Yes. generate.py --dry-run reads the spec, totals expected cost from the pricing matrix, and prints a budget — no API calls fired. Adjust the spec until the number looks right.
Further reading
- Top-up and billing guide — tier markup, $2 starter credit, Stripe top-up
- Tier markup explainer — 90-day rolling window, floor 1.2x
- Image Generation API guide (coming soon) — full reference
- Anthropic — Claude API getting started
- Google — Imagen 4 model card
- OpenAI — Image Generation guide
- Full
generate.pysource: Whitedit/code-gateway-cookbook · image-gen/
Anyone who's written a technical blog knows the breakdown: 80% content, 10% images, 10% links and SEO metadata. The image 10% used to mean either pestering a designer or hand-painting illustrations for hours. One hour, $1, and a single spec file later — it doesn't anymore. Get the pipeline running once, and the time you save goes straight back into the content. That's what good tools are supposed to do.
