Web Search Costs $10 per 1,000 Searches
Claude’s web search tool costs $10.00 per 1,000 searches – that’s $0.01 per individual search. But the search fee is only half the story. Each search result gets injected into the conversation as input tokens, and those tokens get billed at the model’s standard input rate. A typical web page contributes about 2,500 tokens. At Sonnet 4.6 rates ($3.00/MTok), each search actually costs $0.01 + $0.0075 = $0.0175 when you factor in the token ingestion. Scale that to 1,000 searches per day and you’re looking at $525/month in combined fees.
The Setup
Claude’s server-side tools include web_search and web_fetch. Web search has an explicit per-search fee. Web fetch has no additional surcharge – you only pay for the tokens the fetched content consumes as input. The distinction matters for cost optimization: if you already know the URL you need, web_fetch is cheaper than web_search because you skip the $0.01 search fee entirely. The API response includes a server_tool_use object that reports exactly how many search and fetch requests occurred, so you can track these costs precisely.
The Math
An RAG application that searches the web before answering every question:
Before (web search on every request):
- 1,000 requests/day, each triggers 1 web search
- Search fees: 1,000 x $0.01 = $10.00/day
- Token ingestion: 1,000 x 2,500 tokens avg x $3.00/MTok = $7.50/day
- Total: $17.50/day = $525/month
After optimization (cache search results, skip known-URL searches):
- 400 unique searches (60% are repeat queries, served from cache)
- 200 direct web_fetch calls (URL already known, no search fee)
- 400 cached results (no API call at all)
- Search fees: 400 x $0.01 = $4.00/day
- Fetch token ingestion: 200 x 2,500 x $3.00/MTok = $1.50/day
- Search token ingestion: 400 x 2,500 x $3.00/MTok = $3.00/day
- Total: $8.50/day = $255/month
Monthly savings: $270 (51% reduction)
The Technique
Build a search result cache and URL-first routing layer to minimize paid searches.
import anthropic
import hashlib
import json
import time
client = anthropic.Anthropic()
class SearchCostOptimizer:
def __init__(self, cache_ttl_seconds: int = 3600):
self.cache = {} # query_hash -> (result, timestamp)
self.ttl = cache_ttl_seconds
self.known_urls = {} # topic -> url mapping
def _hash_query(self, query: str) -> str:
return hashlib.sha256(query.lower().strip().encode()).hexdigest()[:16]
def _is_fresh(self, timestamp: float) -> bool:
return (time.time() - timestamp) < self.ttl
def get_cached(self, query: str) -> dict | None:
"""Return cached search result if fresh."""
key = self._hash_query(query)
if key in self.cache:
result, ts = self.cache[key]
if self._is_fresh(ts):
return result
del self.cache[key]
return None
def cache_result(self, query: str, result: dict) -> None:
key = self._hash_query(query)
self.cache[key] = (result, time.time())
def search_with_cache(self, query: str) -> dict:
"""Search with caching to avoid duplicate search fees."""
cached = self.get_cached(query)
if cached:
return cached # saved $0.01 + token costs
# Check if we have a known URL for this topic
for topic, url in self.known_urls.items():
if topic in query.lower():
# Use web_fetch instead of web_search: no $0.01 fee
return self._fetch_url(url)
# Fall through to paid search
result = self._do_search(query)
self.cache_result(query, result)
return result
def _do_search(self, query: str) -> dict:
"""Perform actual web search ($0.01 per search)."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=[{"type": "web_search_20250305"}],
messages=[{"role": "user", "content": f"Search for: {query}"}]
)
return {
"content": response.content,
"usage": {
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
"searches": response.usage.server_tool_use.get(
"web_search_requests", 0) if hasattr(
response.usage, "server_tool_use") else 0
}
}
def _fetch_url(self, url: str) -> dict:
"""Fetch a known URL directly (no search fee)."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
tools=[{"type": "web_search_20250305"}],
messages=[{
"role": "user",
"content": f"Fetch and summarize: {url}"
}]
)
return {"content": response.content}
def get_stats(self) -> dict:
return {
"cached_entries": len(self.cache),
"known_urls": len(self.known_urls),
"estimated_savings_per_cache_hit": "$0.0175"
}
# Usage
optimizer = SearchCostOptimizer(cache_ttl_seconds=1800)
optimizer.known_urls = {
"python docs": "https://docs.python.org/3/",
"anthropic api": "https://docs.anthropic.com/",
}
result = optimizer.search_with_cache("python list comprehension docs")
The Tradeoffs
Caching search results means you might serve stale information. For time-sensitive queries (stock prices, news, weather), cache TTLs need to be short (minutes, not hours), which reduces cache hit rates and savings. URL-based fetching skips the search engine’s ranking algorithm, so you might miss more relevant pages that a search would surface. Monitor answer quality metrics alongside cost savings to find the right cache TTL for your use case.
Implementation Checklist
- Log
server_tool_use.web_search_requestsfrom every API response - Calculate your current daily/monthly search spend ($0.01 x total searches)
- Identify repeat queries in your logs (candidates for caching)
- Build a URL knowledge base for your most common search topics
- Implement a search cache with appropriate TTL for your freshness requirements
- Route known-URL queries to web_fetch instead of web_search
- Monitor cache hit rate and answer quality weekly
Measuring Impact
Track three metrics daily: total paid searches (from server_tool_use), cache hit rate (searches avoided), and web_fetch substitutions (searches replaced). Your target is reducing paid searches by 40-60%. At $0.01 per search saved, plus $0.0075 in avoided token ingestion, each avoided search saves $0.0175. Report weekly: (baseline_searches - current_searches) x $0.0175 x 7 = weekly dollar savings.
Be aware that search token ingestion costs scale with the model you use. A search result adding 2,500 input tokens costs $0.0125 on Opus 4.7 ($5.00/MTok) but only $0.0025 on Haiku 4.5 ($1.00/MTok). If your search-augmented pipeline runs on Opus, the token ingestion cost often exceeds the $0.01 search fee itself, making token-level optimization more impactful than reducing search count. Consider routing search-heavy workloads to Sonnet 4.6 ($3.00/MTok) where the combined per-search cost drops to $0.01 + $0.0075 = $0.0175, compared to $0.01 + $0.0125 = $0.0225 on Opus.