Claude Agent Loop Cost: Tokens Per Iteration

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

Every agent loop iteration adds 245 to 735 tokens of tool overhead before a single token of useful work is done. Over 100 iterations on a 5-agent fleet running Opus 4.7, that overhead reaches 409,250 tokens – costing $2.05 just for the structural tokens that describe available tools. Understanding and minimizing this overhead is essential for cost-efficient agent systems.

The Setup

You have built an autonomous coding agent that loops through read-edit-test cycles. Each iteration sends the full system prompt (346 tokens of tool definitions), invokes a tool (245-735 tokens), receives the result, and decides the next action.

The useful content – the actual code being reviewed, the files being read, the edits being made – sits on top of this fixed overhead. At 100 iterations, the agent has consumed 34,600 tokens just for system prompt repetition and 47,250 tokens for tool schema definitions, before counting any actual context.

On Opus 4.7, that is $0.41 in overhead per agent, per sprint. Across 5 agents, $2.05 in pure overhead.

The Math

Tool overhead per interaction (verified from Anthropic docs):

Component Tokens
System prompt (auto/none mode) 346
System prompt (any/tool mode) 313
Bash tool schema 245
Text editor tool schema 700
Computer use tool schema 735

Per-iteration overhead scenarios:

Agent using bash only: 346 + 245 = 591 tokens/iteration Agent using text editor: 346 + 700 = 1,046 tokens/iteration Agent using bash + editor: 346 + 245 + 700 = 1,291 tokens/iteration Agent using all tools: 346 + 245 + 700 + 735 = 2,026 tokens/iteration

100 iterations, 5 agents (bash + editor):

Cost breakdown per sprint:

The Technique

Reduce tool overhead by declaring only the tools each agent needs and minimizing iteration count:

import anthropic

client = anthropic.Anthropic()


def create_minimal_agent(
    task_type: str,
    model: str = "claude-opus-4-7-20250415"
) -> dict:
    """Configure agent with minimal tool set for its task type."""

    # Only declare tools the agent actually needs
    tool_configs = {
        "code_reader": {
            "tools": ["bash"],  # 245 tokens overhead
            "overhead_per_iter": 591,
        },
        "code_editor": {
            "tools": ["bash", "text_editor"],  # 945 tokens
            "overhead_per_iter": 1291,
        },
        "full_agent": {
            "tools": ["bash", "text_editor", "computer_use"],  # 1680 tokens
            "overhead_per_iter": 2026,
        },
        "chat_only": {
            "tools": [],  # 0 tool tokens
            "overhead_per_iter": 346,
        },
    }

    config = tool_configs.get(task_type, tool_configs["chat_only"])

    return {
        "model": model,
        "tools": config["tools"],
        "overhead_per_iter": config["overhead_per_iter"],
        "estimated_100_iter_overhead": config["overhead_per_iter"] * 100,
        "estimated_100_iter_cost": config["overhead_per_iter"] * 100 * 5.0 / 1e6
    }


# Compare overhead by configuration
for task_type in ["chat_only", "code_reader", "code_editor", "full_agent"]:
    config = create_minimal_agent(task_type)
    print(f"{task_type:>15}: {config['overhead_per_iter']:>5} tokens/iter, "
          f"${config['estimated_100_iter_cost']:.3f}/100 iters")

Strategies to reduce iteration count (and therefore overhead):

def optimize_agent_loop(
    max_iterations: int = 100,
    early_stop_threshold: int = 3
) -> dict:
    """Agent loop with early stopping and batch operations."""

    iterations = 0
    no_change_count = 0
    total_overhead_tokens = 0
    overhead_per_iter = 1291  # bash + editor

    for i in range(max_iterations):
        iterations += 1
        total_overhead_tokens += overhead_per_iter

        # Simulate agent work
        # In production: call Claude, execute tools, check results
        made_changes = simulate_iteration(i)

        if not made_changes:
            no_change_count += 1
        else:
            no_change_count = 0

        # Early stop: if no changes for N iterations, task is done
        if no_change_count >= early_stop_threshold:
            break

    overhead_cost = total_overhead_tokens * 5.0 / 1e6  # Opus pricing

    return {
        "iterations": iterations,
        "overhead_tokens": total_overhead_tokens,
        "overhead_cost": f"${overhead_cost:.4f}",
        "saved_iterations": max_iterations - iterations,
        "saved_tokens": (max_iterations - iterations) * overhead_per_iter
    }


def simulate_iteration(i: int) -> bool:
    """Simulate whether an iteration makes meaningful changes."""
    # Typically, agents make changes in early iterations
    # and converge (no changes) in later iterations
    return i < 15  # Simulates 15 productive iterations

Batch tool calls to reduce iterations:

# Instead of: read file, analyze, read another file, analyze
# Do: read ALL files in one iteration, analyze all at once
python3 -c "
# Single-iteration overhead: 1,291 tokens (bash + editor)
# 5 separate reads = 5 iterations = 6,455 overhead tokens
# 1 batch read = 1 iteration = 1,291 overhead tokens

separate = 5 * 1291
batched = 1 * 1291
saved = separate - batched
print(f'Separate reads: {separate:,} overhead tokens')
print(f'Batched read: {batched:,} overhead tokens')
print(f'Saved: {saved:,} tokens (\${saved * 5.0 / 1e6:.4f} on Opus)')
"

The Tradeoffs

Minimizing tool overhead has limits:

Implementation Checklist

  1. Audit each agent’s actual tool usage over 10 sprints
  2. Remove tools that an agent never invokes (saves 245-735 tokens/iteration)
  3. Implement early stopping based on consecutive no-change iterations
  4. Batch related tool calls into single iterations where possible
  5. Track total overhead tokens vs useful tokens per agent per sprint
  6. Set iteration limits per agent based on observed convergence patterns
  7. Monitor overhead-to-useful ratio (target: below 15% of total input tokens)

Measuring Impact

Track overhead efficiency: