Scenario: 1,000 conversations/day, 50K input tokens average, Opus 4.7 Before compression (50K tokens/request): Input cost: 50K * 1,000 * $5.00/MTok = $250/day -> $7,500/month After 30% compression (35K tokens/request): Input cost: 35K * 1,000 * $5.00/MTok = $175/day -> $5,250/month...

Prompt Compression Techniques (2026)

Last updated: April 19, 2026

Switching from verbose JSON examples to concise XML tags in your Claude prompts reduces token count by 30% — saving $75 per 10,000 requests on Opus 4.7. At scale, prompt compression is the highest-ROI cost optimization because it applies to every single API call.

The Setup

Every token in your prompt costs money. On Claude Opus 4.7, each million input tokens costs $5.00. A 50,000-token conversation context that could be expressed in 35,000 tokens wastes $0.075 per request — $2,250/month at 1,000 daily requests.

Prompt compression reduces token count without losing information. The techniques range from simple (removing filler words) to structural (switching data formats). This guide covers six compression methods with before/after token counts and quality validation results.

The Math

Scenario: 1,000 conversations/day, 50K input tokens average, Opus 4.7

Before compression (50K tokens/request):

Input cost: 50K * 1,000 * $5.00/MTok = $250/day -> $7,500/month

After 30% compression (35K tokens/request):

Input cost: 35K * 1,000 * $5.00/MTok = $175/day -> $5,250/month

Savings: $2,250/month (30%)

After 50% compression (25K tokens/request):

Input cost: 25K * 1,000 * $5.00/MTok = $125/day -> $3,750/month

Savings: $3,750/month (50%)

The same compression applied to Sonnet 4.6 at $3.00/MTok saves $1,350-$2,250/month. On Haiku 4.5 at $1.00/MTok, savings are $450-$750/month.

The Technique

1. XML Tags vs JSON for Context (30% Token Reduction)

Claude is optimized for XML tag parsing. XML uses fewer tokens than JSON for the same structured data.

# JSON context: ~180 tokens
json_context = """
{
  "customer": {
    "name": "Jane Smith",
    "account_id": "ACC-12345",
    "plan": "enterprise",
    "issues": [
      {"date": "2026-04-01", "type": "billing", "status": "resolved"},
      {"date": "2026-04-15", "type": "technical", "status": "open"}
    ]
  }
}
"""
# XML context: ~120 tokens (33% fewer)
xml_context = """
<customer>
<name>Jane Smith</name>
<id>ACC-12345</id>
<plan>enterprise</plan>
<issues>
<issue date="2026-04-01" type="billing" status="resolved"/>
<issue date="2026-04-15" type="technical" status="open"/>
</issues>
</customer>
"""

2. Instruction Deduplication

# BEFORE: Repeated instructions (~300 tokens)
verbose_system = """You are a code reviewer. Review the code for bugs.
Look carefully for any bugs in the code. Pay special attention to
potential bugs that could cause issues. When you find bugs, explain
what the bug is and how to fix the bug. Make sure to identify all bugs."""
# AFTER: Deduplicated (~80 tokens)
lean_system = """Code reviewer. For each bug found, state: location, issue, fix.
Be thorough. Output as numbered list."""

3. Reference Compression for Few-Shot Examples

import anthropic
client = anthropic.Anthropic()
# BEFORE: Full verbose examples (~2,000 tokens for 3 examples)
verbose_examples = """
Example 1:
Input: "The product arrived broken and nobody helped me fix it"
Analysis: This customer is expressing frustration about a damaged product
and poor customer service experience. The sentiment is clearly negative.
Output: NEGATIVE
Example 2:
Input: "Decent quality for the price point"
Analysis: The customer is giving a moderate assessment, neither strongly
positive nor negative. They find value adequate.
Output: NEUTRAL
Example 3:
Input: "Best purchase I've made all year, highly recommend"
Analysis: Strong positive sentiment with enthusiastic recommendation.
Output: POSITIVE
"""
# AFTER: Compact examples (~400 tokens for 3 examples)
compact_examples = """Examples:
"Product arrived broken, nobody helped" -> NEGATIVE
"Decent quality for the price" -> NEUTRAL
"Best purchase all year, highly recommend" -> POSITIVE"""
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=50,
    system=f"Classify sentiment as POSITIVE, NEGATIVE, or NEUTRAL.\n{compact_examples}",
    messages=[{"role": "user", "content": "The delivery was on time but the packaging was damaged."}],
)
print(response.content[0].text)

4. Abbreviation Tables for Domain-Specific Prompts

# Define abbreviations once, use everywhere
abbreviation_header = """Abbreviations: req=request, resp=response, auth=authentication,
db=database, cfg=configuration, err=error, msg=message, usr=user"""
# Use abbreviated instructions
lean_instructions = """Review API endpoint:
- Check auth handling for missing/expired tokens
- Verify db queries use parameterized inputs
- Confirm err msgs don't leak internal details
- Validate req/resp schemas match docs"""

5. Structured Extraction Prompts

# BEFORE: Natural language (~200 tokens)
verbose_extract = """Please read the following document and extract the
following pieces of information: the person's full name, their email
address, their phone number, and their mailing address. Format the
extracted information as a JSON object with the keys 'name', 'email',
'phone', and 'address'."""
# AFTER: Tabular format (~60 tokens)
lean_extract = """Extract from document -> JSON:
{name, email, phone, address}
Null if missing. No extra keys."""

6. Token Counting Before and After

import anthropic
client = anthropic.Anthropic()
def count_tokens(text: str, model: str = "claude-sonnet-4-6") -> int:
    """Count tokens for a given text using the API."""
    response = client.messages.count_tokens(
        model=model,
        messages=[{"role": "user", "content": text}],
    )
    return response.input_tokens
# Compare token counts
verbose = "Please analyze the following code and identify any potential security vulnerabilities, performance issues, or bugs that could cause problems in production."
compressed = "Analyze code for: security vulns, perf issues, bugs. List each with severity."
print(f"Verbose: {count_tokens(verbose)} tokens")
print(f"Compressed: {count_tokens(compressed)} tokens")

The Tradeoffs

Over-compression creates ambiguity. If Claude misinterprets a compressed instruction, the retry costs more than the tokens you saved. Keep instructions unambiguous even when brief — clarity beats brevity.

Abbreviations can confuse Claude on uncommon terms. Stick to widely understood abbreviations (API, DB, URL) and define domain-specific ones explicitly in the prompt header.

XML formatting saves tokens compared to JSON, but if your downstream systems expect JSON output, you still need to specify JSON output format in your prompt.

Implementation Checklist

Count tokens in your top 5 system prompts using the token counting API
Apply deduplication — remove repeated instructions
Convert JSON context to XML where possible
Compress few-shot examples to input/output pairs only
Measure token reduction percentage
A/B test compressed prompts on 100 requests for quality parity
Deploy and monitor monthly token consumption

Measuring Impact

Measure compression ratio: (original tokens - compressed tokens) / original tokens. Target 30-50% compression on system prompts and 20-40% on user messages. Track quality scores alongside compression to catch any degradation. The ideal outcome is a flat quality curve with a declining token curve.

Which model? → Take the 5-question quiz in our Model Selector.

Estimate tokens → Calculate your usage with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Claude Code Token Usage Optimization — broader token optimization strategies
Reduce Claude Code Hallucinations Save Tokens — prompt writing techniques that reduce waste
Claude Skill Token Usage Profiling — identify which prompts consume the most tokens

Prompt Compression Techniques (2026)

The Setup

The Math

The Technique

1. XML Tags vs JSON for Context (30% Token Reduction)

2. Instruction Deduplication

3. Reference Compression for Few-Shot Examples

4. Abbreviation Tables for Domain-Specific Prompts

5. Structured Extraction Prompts

6. Token Counting Before and After

The Tradeoffs

Implementation Checklist

Measuring Impact

See Also

About the Author

The Setup

The Math

The Technique

1. XML Tags vs JSON for Context (30% Token Reduction)

2. Instruction Deduplication

3. Reference Compression for Few-Shot Examples

4. Abbreviation Tables for Domain-Specific Prompts

5. Structured Extraction Prompts

6. Token Counting Before and After

The Tradeoffs

Implementation Checklist

Measuring Impact

Related Guides

See Also

About the Author

Related Guides