Web Search Costs $10 per 1,000 Searches

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

Claude’s web search tool costs $10.00 per 1,000 searches – that’s $0.01 per individual search. But the search fee is only half the story. Each search result gets injected into the conversation as input tokens, and those tokens get billed at the model’s standard input rate. A typical web page contributes about 2,500 tokens. At Sonnet 4.6 rates ($3.00/MTok), each search actually costs $0.01 + $0.0075 = $0.0175 when you factor in the token ingestion. Scale that to 1,000 searches per day and you’re looking at $525/month in combined fees.

The Setup

Claude’s server-side tools include web_search and web_fetch. Web search has an explicit per-search fee. Web fetch has no additional surcharge – you only pay for the tokens the fetched content consumes as input. The distinction matters for cost optimization: if you already know the URL you need, web_fetch is cheaper than web_search because you skip the $0.01 search fee entirely. The API response includes a server_tool_use object that reports exactly how many search and fetch requests occurred, so you can track these costs precisely.

The Math

An RAG application that searches the web before answering every question:

Before (web search on every request):

After optimization (cache search results, skip known-URL searches):

Monthly savings: $270 (51% reduction)

The Technique

Build a search result cache and URL-first routing layer to minimize paid searches.

import anthropic
import hashlib
import json
import time

client = anthropic.Anthropic()

class SearchCostOptimizer:
    def __init__(self, cache_ttl_seconds: int = 3600):
        self.cache = {}  # query_hash -> (result, timestamp)
        self.ttl = cache_ttl_seconds
        self.known_urls = {}  # topic -> url mapping

    def _hash_query(self, query: str) -> str:
        return hashlib.sha256(query.lower().strip().encode()).hexdigest()[:16]

    def _is_fresh(self, timestamp: float) -> bool:
        return (time.time() - timestamp) < self.ttl

    def get_cached(self, query: str) -> dict | None:
        """Return cached search result if fresh."""
        key = self._hash_query(query)
        if key in self.cache:
            result, ts = self.cache[key]
            if self._is_fresh(ts):
                return result
            del self.cache[key]
        return None

    def cache_result(self, query: str, result: dict) -> None:
        key = self._hash_query(query)
        self.cache[key] = (result, time.time())

    def search_with_cache(self, query: str) -> dict:
        """Search with caching to avoid duplicate search fees."""
        cached = self.get_cached(query)
        if cached:
            return cached  # saved $0.01 + token costs

        # Check if we have a known URL for this topic
        for topic, url in self.known_urls.items():
            if topic in query.lower():
                # Use web_fetch instead of web_search: no $0.01 fee
                return self._fetch_url(url)

        # Fall through to paid search
        result = self._do_search(query)
        self.cache_result(query, result)
        return result

    def _do_search(self, query: str) -> dict:
        """Perform actual web search ($0.01 per search)."""
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=[{"type": "web_search_20250305"}],
            messages=[{"role": "user", "content": f"Search for: {query}"}]
        )
        return {
            "content": response.content,
            "usage": {
                "input_tokens": response.usage.input_tokens,
                "output_tokens": response.usage.output_tokens,
                "searches": response.usage.server_tool_use.get(
                    "web_search_requests", 0) if hasattr(
                    response.usage, "server_tool_use") else 0
            }
        }

    def _fetch_url(self, url: str) -> dict:
        """Fetch a known URL directly (no search fee)."""
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=[{"type": "web_search_20250305"}],
            messages=[{
                "role": "user",
                "content": f"Fetch and summarize: {url}"
            }]
        )
        return {"content": response.content}

    def get_stats(self) -> dict:
        return {
            "cached_entries": len(self.cache),
            "known_urls": len(self.known_urls),
            "estimated_savings_per_cache_hit": "$0.0175"
        }

# Usage
optimizer = SearchCostOptimizer(cache_ttl_seconds=1800)
optimizer.known_urls = {
    "python docs": "https://docs.python.org/3/",
    "anthropic api": "https://docs.anthropic.com/",
}

result = optimizer.search_with_cache("python list comprehension docs")

The Tradeoffs

Caching search results means you might serve stale information. For time-sensitive queries (stock prices, news, weather), cache TTLs need to be short (minutes, not hours), which reduces cache hit rates and savings. URL-based fetching skips the search engine’s ranking algorithm, so you might miss more relevant pages that a search would surface. Monitor answer quality metrics alongside cost savings to find the right cache TTL for your use case.

Implementation Checklist

Measuring Impact

Track three metrics daily: total paid searches (from server_tool_use), cache hit rate (searches avoided), and web_fetch substitutions (searches replaced). Your target is reducing paid searches by 40-60%. At $0.01 per search saved, plus $0.0075 in avoided token ingestion, each avoided search saves $0.0175. Report weekly: (baseline_searches - current_searches) x $0.0175 x 7 = weekly dollar savings.

Be aware that search token ingestion costs scale with the model you use. A search result adding 2,500 input tokens costs $0.0125 on Opus 4.7 ($5.00/MTok) but only $0.0025 on Haiku 4.5 ($1.00/MTok). If your search-augmented pipeline runs on Opus, the token ingestion cost often exceeds the $0.01 search fee itself, making token-level optimization more impactful than reducing search count. Consider routing search-heavy workloads to Sonnet 4.6 ($3.00/MTok) where the combined per-search cost drops to $0.01 + $0.0075 = $0.0175, compared to $0.01 + $0.0125 = $0.0225 on Opus.