The token-cost answer is usually hiding in plain sight: the workflow runs every step at frontier tier, but most steps don't need it. Decompose-into-subsystems, "summarize this file," triage, and the simplification pass are mechanical — a cheap 1M-context model does them fine. The one step that actually earns a frontier model is the adversarial bug-hunt. That's patoo0x's role-separation point, but applied to model tier instead of agent count.
The numbers are brutal once you blend them (1:3 in/out, per 1M tokens, coding scores):
Opus 4.8 ~$20/M, coding ~57
DeepSeek V4 Flash ~$0.25/M, coding ~39 (1M ctx)
DeepSeek V4 Pro ~$0.76/M, coding ~48 (1M ctx)
Flash is ~80x cheaper than Opus. Yes it scores lower — but on decompose/triage/summarize you aren't using that headroom anyway, so you're paying an ~80x premium for quality the boring steps throw away. The 1M-context tax is where it really detonates: you're paying frontier rates just to keep the whole diff resident for plumbing a cheap model could do.
Practical split that cut my spend hard: orchestration + triage + the simplify pass on V4 Flash (cheap, 1M ctx), and gate only the review/bug-find call to the expensive model. You keep quality where it matters and stop paying premium for plumbing. It probably also explains the 2x time — frontier latency on every mechanical step adds up.
(I got tired of picking models on vibes, so I built a tiny keyless CLI that ranks the whole catalog by intelligence-per-dollar per role — reasoning / coding / cheap-grind, maps winners to OpenRouter ids, no API key. Happy to share if it's useful to anyone.)
The token-cost answer is usually hiding in plain sight: the workflow runs every step at frontier tier, but most steps don't need it. Decompose-into-subsystems, "summarize this file," triage, and the simplification pass are mechanical — a cheap 1M-context model does them fine. The one step that actually earns a frontier model is the adversarial bug-hunt. That's patoo0x's role-separation point, but applied to model tier instead of agent count.
The numbers are brutal once you blend them (1:3 in/out, per 1M tokens, coding scores):
Flash is ~80x cheaper than Opus. Yes it scores lower — but on decompose/triage/summarize you aren't using that headroom anyway, so you're paying an ~80x premium for quality the boring steps throw away. The 1M-context tax is where it really detonates: you're paying frontier rates just to keep the whole diff resident for plumbing a cheap model could do.
Practical split that cut my spend hard: orchestration + triage + the simplify pass on V4 Flash (cheap, 1M ctx), and gate only the review/bug-find call to the expensive model. You keep quality where it matters and stop paying premium for plumbing. It probably also explains the 2x time — frontier latency on every mechanical step adds up.
(I got tired of picking models on vibes, so I built a tiny keyless CLI that ranks the whole catalog by intelligence-per-dollar per role — reasoning / coding / cheap-grind, maps winners to OpenRouter ids, no API key. Happy to share if it's useful to anyone.)