Claude Orchestrator-Worker Cost Optimization
An all-Opus agent fleet costs $25.00 per sprint in API tokens. Replace 4 of 5 agents with Haiku workers and the cost drops to $9.00 – a 64% reduction. The orchestrator (Opus 4.7 at $5.00/$25.00 per MTok) handles complex reasoning and task decomposition. The workers (Haiku 4.5 at $1.00/$5.00 per MTok) execute straightforward tasks at 5x lower cost per token.
The Setup
You are building an API-driven multi-agent system for automated code review. The system needs one agent to analyze PR complexity, decompose the review into subtasks, and synthesize findings. The remaining agents execute individual review subtasks: check style compliance, verify test coverage, scan for security issues, and validate documentation.
Only the orchestration agent needs Opus-level reasoning. The worker agents follow explicit instructions and produce structured output – tasks where Haiku 4.5 performs well at one-fifth the cost.
Current all-Opus fleet: $25.00 per sprint. After optimization: $9.00 per sprint. Monthly savings at 30 sprints: $480.
The Math
All-Opus fleet (5 agents), per sprint:
- 5 agents x 500K input tokens x $5.00/MTok = $12.50
- 5 agents x 100K output tokens x $25.00/MTok = $12.50
- Total: $25.00/sprint
Opus orchestrator + 4 Haiku workers, per sprint:
- Orchestrator: 500K x $5.00/MTok + 100K x $25.00/MTok = $5.00
- 4 workers: 4 x 500K x $1.00/MTok + 4 x 100K x $5.00/MTok = $2.00 + $2.00 = $4.00
- Total: $9.00/sprint
Savings: $16.00/sprint (64%) Monthly (30 sprints): $480 saved
At Sonnet 4.6 workers instead of Haiku:
- 4 workers: 4 x 500K x $3.00/MTok + 4 x 100K x $15.00/MTok = $6.00 + $6.00 = $12.00
- Total: $17.00/sprint (32% savings vs all-Opus)
The Haiku worker configuration saves 2x more than Sonnet workers.
The Technique
The orchestrator-worker pattern requires clear separation between reasoning tasks and execution tasks:
import anthropic
from dataclasses import dataclass
client = anthropic.Anthropic()
ORCHESTRATOR_MODEL = "claude-opus-4-7-20250415" # $5/$25 per MTok
WORKER_MODEL = "claude-haiku-4-5-20251001" # $1/$5 per MTok
@dataclass
class ReviewTask:
task_type: str
instructions: str
context: str
def orchestrate_review(pr_diff: str) -> list[ReviewTask]:
"""Opus decomposes the PR into subtasks for workers."""
response = client.messages.create(
model=ORCHESTRATOR_MODEL,
max_tokens=4096,
system=(
"You are a code review orchestrator. Analyze the PR diff and "
"create specific review subtasks. Output valid JSON array."
),
messages=[
{"role": "user", "content": f"PR Diff:\n{pr_diff}"}
]
)
import json
tasks_raw = json.loads(response.content[0].text)
return [
ReviewTask(
task_type=t["type"],
instructions=t["instructions"],
context=t.get("context", "")
)
for t in tasks_raw
]
def execute_task(task: ReviewTask, pr_diff: str) -> str:
"""Haiku worker executes a single review subtask."""
response = client.messages.create(
model=WORKER_MODEL,
max_tokens=2048,
system=(
f"You are a {task.task_type} reviewer. "
f"Follow these instructions exactly:\n{task.instructions}"
),
messages=[
{"role": "user", "content": f"Code:\n{pr_diff}\n\n{task.context}"}
]
)
return response.content[0].text
def synthesize_review(
pr_diff: str,
task_results: list[dict]
) -> str:
"""Opus synthesizes worker results into final review."""
results_text = "\n\n".join(
f"## {r['type']}\n{r['result']}" for r in task_results
)
response = client.messages.create(
model=ORCHESTRATOR_MODEL,
max_tokens=4096,
system="Synthesize these review results into a cohesive PR review.",
messages=[
{"role": "user", "content": f"PR:\n{pr_diff}\n\nResults:\n{results_text}"}
]
)
return response.content[0].text
def review_pr(pr_diff: str) -> str:
"""Full orchestrator-worker review pipeline."""
# Step 1: Opus decomposes (high-cost reasoning)
tasks = orchestrate_review(pr_diff)
print(f"Orchestrator created {len(tasks)} subtasks")
# Step 2: Haiku workers execute (low-cost execution)
results = []
for task in tasks:
result = execute_task(task, pr_diff)
results.append({"type": task.task_type, "result": result})
print(f" Worker completed: {task.task_type}")
# Step 3: Opus synthesizes (high-cost reasoning)
final = synthesize_review(pr_diff, results)
print("Orchestrator synthesized final review")
return final
# Run
diff = open("pr_diff.txt").read()
review = review_pr(diff)
print(review)
Cost tracking per pipeline run:
def track_costs(pipeline_name: str, calls: list[dict]) -> dict:
"""Track actual costs for an orchestrator-worker pipeline."""
prices = {
ORCHESTRATOR_MODEL: {"in": 5.00, "out": 25.00},
WORKER_MODEL: {"in": 1.00, "out": 5.00}
}
total_cost = 0
breakdown = {"orchestrator": 0, "workers": 0}
for call in calls:
model = call["model"]
p = prices[model]
cost = (call["input_tokens"] * p["in"] +
call["output_tokens"] * p["out"]) / 1e6
total_cost += cost
if model == ORCHESTRATOR_MODEL:
breakdown["orchestrator"] += cost
else:
breakdown["workers"] += cost
return {
"pipeline": pipeline_name,
"total_cost": f"${total_cost:.4f}",
"orchestrator_cost": f"${breakdown['orchestrator']:.4f}",
"worker_cost": f"${breakdown['workers']:.4f}",
"worker_pct": f"{breakdown['workers']/total_cost*100:.0f}%"
}
The Tradeoffs
The orchestrator-worker pattern trades cost for complexity:
- Quality variance: Haiku workers may produce lower quality output on nuanced tasks. Test each subtask type on Haiku before deploying. Security-sensitive reviews may require Sonnet or Opus workers.
- Orchestration overhead: The orchestrator makes two calls (decompose + synthesize) that would not exist in a single-agent architecture. These Opus calls add ~$0.004 per pipeline run.
- Latency increase: Sequential orchestrate-execute-synthesize adds latency compared to a single Opus call. Parallelizing worker execution partially mitigates this.
- Error propagation: A worker failure cascades to the synthesis step. Implement retry logic per worker and fallback to Sonnet for failed Haiku tasks.
Implementation Checklist
- Audit your agent tasks: which require complex reasoning vs structured execution?
- Assign Opus ($5/$25) to reasoning-heavy tasks (planning, analysis, synthesis)
- Assign Haiku ($1/$5) to execution tasks (classification, formatting, extraction)
- Implement the three-phase pipeline: decompose, execute, synthesize
- Add per-model cost tracking to every API call
- Test worker quality on 100 representative tasks before production deployment
- Compare total pipeline cost against the single-model baseline
Measuring Impact
Track cost distribution across the fleet:
- Orchestrator-to-worker cost ratio: Orchestrator should account for 40-60% of total cost despite being only 1 of 5 agents. This confirms workers are running on cheaper models.
- Cost per pipeline run: Track actual vs estimated. Target: under $0.01 for simple pipelines, under $0.05 for complex ones.
- Quality parity check: Sample 50 final outputs from the orchestrator-worker pipeline and compare against 50 from the all-Opus baseline. Acceptance rate should be within 5%.
- Monthly total should be 60%+ lower than the equivalent all-Opus fleet.