Prompt runs, scored.
Renderbench turns image-generation prompts into a real version history. Iterate toward the prompt that wins — with an AI judge grading every version on a rubric.


how it works
One thread, many versions, one winner.


01
Create an experiment
Pick a model, write the baseline prompt. v1 runs immediately.


02
Iterate
Tweak the prompt and run v2, v3, v4. Every version keeps its own image + score.


03
Let the judge score
A vision model grades each version on fidelity, composition, color, and artifacts.


04
Copy the winner
Paste the best prompt into your own platform. No vendor lock-in, no wrapper SDK.
what it’s not
Not a playground. Not a wrapper.
A prompt — iterated.
Every prompt-as-you-go playground forgets the tweak you made two runs ago. Renderbench treats each run as v1, v2, v3 in the same thread — the diff is right there.
A judge — structured.
Images look fine until you put them side-by-side. The AI judge scores each version on four axes so you see why one is better, not just which.
start today
