Cost-Efficient Multi-Agent Coding Workflows
A 5-agent coding workflow optimized for cost runs at $9.00 per sprint instead of $25.00. The three techniques: model routing (Opus for reasoning, Haiku for execution), prompt caching (90% savings on shared context), and iteration budgets (cap runaway loops). Combined, they reduce fleet operating cost by 64% without sacrificing output quality.
The Setup
You have a multi-agent system that handles end-to-end feature development: one agent writes code, one writes tests, one handles documentation, one reviews for quality, and one manages the CI/CD pipeline integration. All five currently run Opus 4.7 at $5.00/$25.00 per million tokens.
After analyzing the workflow, you discover that the test writer and documentation agent follow strict templates – ideal for Haiku at $1.00/$5.00. The code writer and reviewer need complex reasoning – Opus stays. The CI/CD agent runs structured commands – Haiku.
Real-world production fleet data: this architecture has produced 2,816 articles at $0.36 per article amortized, running 5 agents in parallel during 30-60 minute sprint sessions at $1,000/month total.
The Math
Before optimization (all Opus 4.7):
- 5 agents x 500K input x $5.00/MTok = $12.50
- 5 agents x 100K output x $25.00/MTok = $12.50
- Total: $25.00/sprint
After optimization (2 Opus + 3 Haiku):
| Agent | Role | Model | Input Cost | Output Cost | Total |
|---|---|---|---|---|---|
| 1 | Code writer | Opus 4.7 | $2.50 | $2.50 | $5.00 |
| 2 | Test writer | Haiku 4.5 | $0.50 | $0.50 | $1.00 |
| 3 | Documentation | Haiku 4.5 | $0.50 | $0.50 | $1.00 |
| 4 | Code reviewer | Opus 4.7 | $2.50 | $2.50 | $5.00 |
| 5 | CI/CD manager | Haiku 4.5 | $0.50 | $0.50 | $1.00 |
Optimized total: $13.00/sprint (48% savings)
With prompt caching on shared codebase context (50K tokens):
- Without caching: 5 agents x 50K x avg $3.40/MTok = $0.85
- With caching: 1 write ($0.043) + 4 reads ($0.068) = $0.111
- Additional savings: $0.74/sprint
Fully optimized: $12.26/sprint (51% savings)
Production fleet equivalent (subscription model):
- 5 Claude Max 20x subscriptions: $1,000/month flat
- Sprint 11 output: 10 new articles + 69 rewrites + 5 rescues = 84 file changes
- Sprint 12 output: 7 research articles, 12,431 words
- Cost per article: $3.30-$4.71 depending on complexity
The Technique
Here is a complete cost-optimized multi-agent coding workflow:
import anthropic
from concurrent.futures import ThreadPoolExecutor
from dataclasses import dataclass
client = anthropic.Anthropic()
OPUS = "claude-opus-4-7-20250415"
HAIKU = "claude-haiku-4-5-20251001"
@dataclass
class WorkflowConfig:
"""Configuration for a cost-optimized coding workflow."""
code_writer_model: str = OPUS # Complex reasoning required
test_writer_model: str = HAIKU # Template-based
doc_writer_model: str = HAIKU # Structured output
reviewer_model: str = OPUS # Complex reasoning required
cicd_model: str = HAIKU # Command execution
class CodingWorkflow:
def __init__(self, config: WorkflowConfig = None):
self.config = config or WorkflowConfig()
self.cost_log = []
def _call(self, model: str, role: str, system: str,
message: str, max_tokens: int = 4096) -> str:
response = client.messages.create(
model=model,
max_tokens=max_tokens,
system=[{
"type": "text",
"text": system,
"cache_control": {"type": "ephemeral"}
}],
messages=[{"role": "user", "content": message}]
)
usage = response.usage
prices = {"input": 5.0 if "opus" in model else 1.0,
"output": 25.0 if "opus" in model else 5.0}
cost = (usage.input_tokens * prices["input"] +
usage.output_tokens * prices["output"]) / 1e6
self.cost_log.append({
"role": role, "model": model.split("-")[1],
"tokens_in": usage.input_tokens,
"tokens_out": usage.output_tokens,
"cost": cost
})
return response.content[0].text
def write_code(self, spec: str, codebase: str) -> str:
return self._call(
self.config.code_writer_model, "code_writer",
"You are a senior developer. Write production code.",
f"Specification:\n{spec}\n\nExisting code:\n{codebase}"
)
def write_tests(self, code: str, spec: str) -> str:
return self._call(
self.config.test_writer_model, "test_writer",
"Write unit tests. Follow pytest conventions. "
"Cover edge cases.",
f"Code to test:\n{code}\n\nSpec:\n{spec}",
max_tokens=2048
)
def write_docs(self, code: str, spec: str) -> str:
return self._call(
self.config.doc_writer_model, "doc_writer",
"Write API documentation in markdown. Include "
"parameters, return values, and examples.",
f"Code:\n{code}\n\nSpec:\n{spec}",
max_tokens=2048
)
def review_code(self, code: str, tests: str) -> str:
return self._call(
self.config.reviewer_model, "reviewer",
"Review code for correctness, security, and "
"performance. Be specific about issues.",
f"Code:\n{code}\n\nTests:\n{tests}"
)
def setup_cicd(self, code: str, config: str) -> str:
return self._call(
self.config.cicd_model, "cicd",
"Generate CI/CD configuration. Output YAML only.",
f"Code:\n{code}\n\nExisting config:\n{config}",
max_tokens=1024
)
def run(self, spec: str, codebase: str) -> dict:
"""Execute the full workflow."""
# Phase 1: Write code (Opus - needs reasoning)
code = self.write_code(spec, codebase)
# Phase 2: Parallel - tests, docs, CI/CD (Haiku - structured)
with ThreadPoolExecutor(max_workers=3) as executor:
test_future = executor.submit(self.write_tests, code, spec)
doc_future = executor.submit(self.write_docs, code, spec)
cicd_future = executor.submit(self.setup_cicd, code, "")
tests = test_future.result()
docs = doc_future.result()
cicd = cicd_future.result()
# Phase 3: Review (Opus - needs reasoning)
review = self.review_code(code, tests)
total_cost = sum(e["cost"] for e in self.cost_log)
return {
"code": code, "tests": tests, "docs": docs,
"cicd": cicd, "review": review,
"total_cost": f"${total_cost:.4f}",
"cost_breakdown": self.cost_log
}
# Run workflow
workflow = CodingWorkflow()
result = workflow.run(
spec="Add user authentication endpoint with JWT",
codebase=open("src/main.py").read()
)
print(f"Total cost: {result['total_cost']}")
for entry in result['cost_breakdown']:
print(f" {entry['role']}: ${entry['cost']:.4f} ({entry['model']})")
The Tradeoffs
Cost-optimized workflows trade simplicity for savings:
- Increased code complexity: A single-model workflow is 10 lines. A multi-model workflow with caching, budgets, and routing is 100+ lines. The maintenance cost is non-trivial.
- Quality monitoring overhead: Each model tier needs separate quality benchmarks. Haiku may degrade on tasks you expected to be simple.
- Debugging complexity: When output quality drops, you need to identify whether the issue is in the code writer (Opus), test writer (Haiku), or the workflow coordination.
- Latency trade-offs: Parallel Haiku workers are faster than sequential Opus calls, but the overall workflow may be slower if the Opus reviewer blocks on Haiku output quality issues.
Implementation Checklist
- Map your coding workflow into discrete agent roles
- Classify each role as reasoning-heavy (Opus) or execution-heavy (Haiku)
- Implement the multi-model workflow with per-agent tracking
- Add prompt caching on shared context (codebase, specifications)
- Set iteration budgets per agent to prevent runaway loops
- Run 10 test sprints and compare output quality against all-Opus baseline
- Calculate actual savings and validate against projected savings
Measuring Impact
Track workflow economics:
- Cost per feature: Total workflow cost for one complete feature (code + tests + docs + review). Compare across sprints.
- Model distribution: Percentage of total tokens on each model. Target: 40% Opus, 60% Haiku for most coding workflows.
- Quality parity: Review sample outputs from optimized workflow vs all-Opus baseline. Acceptance rate should stay above 90%.
- Sprint velocity: Articles or features completed per sprint-hour. Should remain stable after cost optimization (same output, less cost).