An RAG application that searches the web before answering every question: Before (web search on every request): 1,000 requests/day, each triggers 1 web search Search fees: 1,000 x $0.01 = $10.00/day Token ingestion: 1,000 x 2,500 tokens avg x $3.00/MTok = $7.50/day Total: $17.50/day = $...

Web Search Costs $10 per 1,000 Searches (2026)

Last updated: April 19, 2026

Claude’s web search tool costs $10.00 per 1,000 searches – that’s $0.01 per individual search. But the search fee is only half the story. Each search result gets injected into the conversation as input tokens, and those tokens get billed at the model’s standard input rate. A typical web page contributes about 2,500 tokens. At Sonnet 4.6 rates ($3.00/MTok), each search actually costs $0.01 + $0.0075 = $0.0175 when you factor in the token ingestion. Scale that to 1,000 searches per day and you’re looking at $525/month in combined fees.

The Setup

Claude’s server-side tools include web_search and web_fetch. Web search has an explicit per-search fee. Web fetch has no additional surcharge – you only pay for the tokens the fetched content consumes as input. The distinction matters for cost optimization: if you already know the URL you need, web_fetch is cheaper than web_search because you skip the $0.01 search fee entirely. The API response includes a server_tool_use object that reports exactly how many search and fetch requests occurred, so you can track these costs precisely.

The Math

An RAG application that searches the web before answering every question:

Before (web search on every request):

1,000 requests/day, each triggers 1 web search
Search fees: 1,000 x $0.01 = $10.00/day
Token ingestion: 1,000 x 2,500 tokens avg x $3.00/MTok = $7.50/day
Total: $17.50/day = $525/month

After optimization (cache search results, skip known-URL searches):

400 unique searches (60% are repeat queries, served from cache)
200 direct web_fetch calls (URL already known, no search fee)
400 cached results (no API call at all)
Search fees: 400 x $0.01 = $4.00/day
Fetch token ingestion: 200 x 2,500 x $3.00/MTok = $1.50/day
Search token ingestion: 400 x 2,500 x $3.00/MTok = $3.00/day
Total: $8.50/day = $255/month

Monthly savings: $270 (51% reduction)

The Technique

Build a search result cache and URL-first routing layer to minimize paid searches.

import anthropic
import hashlib
import json
import time
client = anthropic.Anthropic()
class SearchCostOptimizer:
    def __init__(self, cache_ttl_seconds: int = 3600):
        self.cache = {}  # query_hash -> (result, timestamp)
        self.ttl = cache_ttl_seconds
        self.known_urls = {}  # topic -> url mapping

    def _hash_query(self, query: str) -> str:
        return hashlib.sha256(query.lower().strip().encode()).hexdigest()[:16]
    def _is_fresh(self, timestamp: float) -> bool:
        return (time.time() - timestamp) < self.ttl
    def get_cached(self, query: str) -> dict | None:
        """Return cached search result if fresh."""
        key = self._hash_query(query)
        if key in self.cache:
            result, ts = self.cache[key]
            if self._is_fresh(ts):
                return result
            del self.cache[key]
        return None
    def cache_result(self, query: str, result: dict) -> None:
        key = self._hash_query(query)
        self.cache[key] = (result, time.time())
    def search_with_cache(self, query: str) -> dict:
        """Search with caching to avoid duplicate search fees."""
        cached = self.get_cached(query)
        if cached:
            return cached  # saved $0.01 + token costs

        # Check if we have a known URL for this topic
        for topic, url in self.known_urls.items():
            if topic in query.lower():
                # Use web_fetch instead of web_search: no $0.01 fee
                return self._fetch_url(url)
        # Fall through to paid search
        result = self._do_search(query)
        self.cache_result(query, result)
        return result
    def _do_search(self, query: str) -> dict:
        """Perform actual web search ($0.01 per search)."""
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=[{"type": "web_search_20250305"}],
            messages=[{"role": "user", "content": f"Search for: {query}"}]
        )
        return {
            "content": response.content,
            "usage": {
                "input_tokens": response.usage.input_tokens,
                "output_tokens": response.usage.output_tokens,
                "searches": response.usage.server_tool_use.get(
                    "web_search_requests", 0) if hasattr(
                    response.usage, "server_tool_use") else 0
            }
        }
    def _fetch_url(self, url: str) -> dict:
        """Fetch a known URL directly (no search fee)."""
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=1024,
            tools=[{"type": "web_search_20250305"}],
            messages=[{
                "role": "user",
                "content": f"Fetch and summarize: {url}"
            }]
        )
        return {"content": response.content}
    def get_stats(self) -> dict:
        return {
            "cached_entries": len(self.cache),
            "known_urls": len(self.known_urls),
            "estimated_savings_per_cache_hit": "$0.0175"
        }
# Usage
optimizer = SearchCostOptimizer(cache_ttl_seconds=1800)
optimizer.known_urls = {
    "python docs": "https://docs.python.org/3/",
    "anthropic api": "https://docs.anthropic.com/",
}
result = optimizer.search_with_cache("python list comprehension docs")

The Tradeoffs

Caching search results means you might serve stale information. For time-sensitive queries (stock prices, news, weather), cache TTLs need to be short (minutes, not hours), which reduces cache hit rates and savings. URL-based fetching skips the search engine’s ranking algorithm, so you might miss more relevant pages that a search would surface. Monitor answer quality metrics alongside cost savings to find the right cache TTL for your use case.

Implementation Checklist

Log server_tool_use.web_search_requests from every API response
Calculate your current daily/monthly search spend ($0.01 x total searches)
Identify repeat queries in your logs (candidates for caching)
Build a URL knowledge base for your most common search topics
Implement a search cache with appropriate TTL for your freshness requirements
Route known-URL queries to web_fetch instead of web_search
Monitor cache hit rate and answer quality weekly

Measuring Impact

Track three metrics daily: total paid searches (from server_tool_use), cache hit rate (searches avoided), and web_fetch substitutions (searches replaced). Your target is reducing paid searches by 40-60%. At $0.01 per search saved, plus $0.0075 in avoided token ingestion, each avoided search saves $0.0175. Report weekly: (baseline_searches - current_searches) x $0.0175 x 7 = weekly dollar savings.

Be aware that search token ingestion costs scale with the model you use. A search result adding 2,500 input tokens costs $0.0125 on Opus 4.7 ($5.00/MTok) but only $0.0025 on Haiku 4.5 ($1.00/MTok). If your search-augmented pipeline runs on Opus, the token ingestion cost often exceeds the $0.01 search fee itself, making token-level optimization more impactful than reducing search count. Consider routing search-heavy workloads to Sonnet 4.6 ($3.00/MTok) where the combined per-search cost drops to $0.01 + $0.0075 = $0.0175, compared to $0.01 + $0.0125 = $0.0225 on Opus.

Which model? → Take the 5-question quiz in our Model Selector.

Try it: Estimate your monthly spend with our Cost Calculator.

Claude API Tool Use Function Calling Deep Dive
Claude Skills Token Optimization Reduce API Costs
Advanced Claude Skills with Tool Use
Claude Code cost guide — Complete guide to Claude Code costs, pricing, and optimization