Standard verbose few-shot (5 examples): Average 400 tokens per example (input + analysis + output) Total: 2,000 tokens Cost per 10K requests on Opus 4.7: 2,000 * 10,000 * $5.00/MTok = $100.00 Compressed few-shot (5 examples): Average 80 tokens per example (input -> output only) T...

Token-Efficient Few-Shot Examples (2026)

Last updated: April 19, 2026

Few-shot examples in Claude prompts typically consume 1,500-3,000 tokens. Compressing them to input-output pairs cuts that to 300-500 tokens — an 80% reduction that saves $80 per 10,000 Opus 4.7 requests. At 30,000 requests/month, that is $240/month in savings from reformatting your examples alone.

The Setup

Few-shot learning works by showing Claude examples of desired input-output behavior. The standard approach includes full input text, detailed analysis, and formatted output for each example. This is effective for accuracy but wasteful for tokens.

Claude does not need verbose explanations in examples. It extracts the pattern from minimal input-output pairs just as effectively. This guide shows five techniques to compress few-shot examples without losing the accuracy gains they provide.

The Math

Standard verbose few-shot (5 examples):

Average 400 tokens per example (input + analysis + output)
Total: 2,000 tokens
Cost per 10K requests on Opus 4.7: 2,000 * 10,000 * $5.00/MTok = $100.00

Compressed few-shot (5 examples):

Average 80 tokens per example (input -> output only)
Total: 400 tokens
Cost per 10K requests: 400 * 10,000 * $5.00/MTok = $20.00

Savings: $80 per 10K requests (80%)

At 300,000 requests/month on Opus: $2,400/month saved. At 300,000 requests/month on Sonnet 4.6: $1,440/month saved.

The Technique

Pattern 1: Strip Analysis, Keep Input-Output Pairs

import anthropic
client = anthropic.Anthropic()
# VERBOSE few-shot (~2,000 tokens for 5 examples)
verbose_system = """Classify customer messages into categories.
Example 1:
Input: "I was charged twice on my credit card for the same order placed last Tuesday"
Analysis: The customer is reporting a duplicate charge on their credit card. This is clearly a billing-related issue as it involves incorrect charges. The customer may need a refund.
Category: BILLING
Confidence: 0.95
Example 2:
Input: "The app crashes every time I try to upload a photo larger than 5MB"
Analysis: The customer is experiencing a technical problem with the application. The crash occurs during a specific action (photo upload) with a specific condition (file size > 5MB). This is a reproducible bug.
Category: TECHNICAL
Confidence: 0.92
Example 3:
Input: "Can you add the ability to export reports as PDF files?"
Analysis: The customer is requesting a new feature. They want PDF export functionality for reports. This is not a bug or complaint but a product enhancement suggestion.
Category: FEATURE_REQUEST
Confidence: 0.98
Example 4:
Input: "What are your business hours on weekends?"
Analysis: This is a general inquiry about the company's operating schedule. Not a complaint, bug, or feature request.
Category: GENERAL
Confidence: 0.99
Example 5:
Input: "I can't access my account after you changed the login page yesterday"
Analysis: The customer lost access to their account following a recent update. This is a technical issue related to the login system change.
Category: TECHNICAL
Confidence: 0.90"""
# COMPRESSED few-shot (~400 tokens for 5 examples)
compact_system = """Classify messages: BILLING, TECHNICAL, FEATURE_REQUEST, GENERAL.
Output: {"category": "...", "confidence": 0.0-1.0}
"Charged twice on credit card" -> {"category": "BILLING", "confidence": 0.95}
"App crashes on photo upload >5MB" -> {"category": "TECHNICAL", "confidence": 0.92}
"Add PDF export for reports" -> {"category": "FEATURE_REQUEST", "confidence": 0.98}
"Business hours on weekends?" -> {"category": "GENERAL", "confidence": 0.99}
"Can't access account after login page change" -> {"category": "TECHNICAL", "confidence": 0.90}"""
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=100,
    system=compact_system,
    messages=[{"role": "user", "content": "My invoice shows the wrong tax amount"}],
)
print(response.content[0].text)

Pattern 2: Use Table Format for Multi-Feature Examples

# Table format is more token-efficient than repeated key-value pairs
table_examples = """Extract entities from text.
| Input | Person | Org | Location | Date |
|-------|--------|-----|----------|------|
| "Tim Cook announced Apple's Q4 results in Cupertino on Oct 30" | Tim Cook | Apple | Cupertino | Oct 30 |
| "Satya Nadella spoke at Microsoft Build in Seattle" | Satya Nadella | Microsoft | Seattle | null |
| "The mayor of Austin signed the agreement last Friday" | null | null | Austin | last Friday |"""

Pattern 3: Reduce Example Count with Strategic Selection

def select_minimal_examples(all_examples: list, target_count: int = 3) -> list:
    """Select the most diverse subset of examples to cover all categories."""
    categories_covered = set()
    selected = []
    # Prioritize examples that cover new categories
    for example in all_examples:
        if example["category"] not in categories_covered:
            selected.append(example)
            categories_covered.add(example["category"])
            if len(selected) >= target_count:
                break
    # If we still need more, add edge cases
    while len(selected) < target_count and len(selected) < len(all_examples):
        for example in all_examples:
            if example not in selected:
                selected.append(example)
                break
    return selected
# Often 3 well-chosen examples work as well as 5-7 redundant ones
# Savings: 40-60% fewer example tokens

Pattern 4: One-Line Examples with Separator

# Ultra-compact format for simple classification
compact = """Sentiment: POS/NEG/NEU
Happy with purchase -> POS | Broken on arrival -> NEG | Package arrived -> NEU | Love this product -> POS | Worst experience ever -> NEG"""

Pattern 5: Validate Compression Did Not Reduce Accuracy

def validate_few_shot_accuracy(
    verbose_system: str,
    compact_system: str,
    test_cases: list,
    model: str = "claude-sonnet-4-6",
) -> dict:
    """Compare accuracy of verbose vs compact few-shot prompts."""
    verbose_correct = 0
    compact_correct = 0
    for case in test_cases:
        v_resp = client.messages.create(
            model=model, max_tokens=100, system=verbose_system,
            messages=[{"role": "user", "content": case["input"]}],
        )
        c_resp = client.messages.create(
            model=model, max_tokens=100, system=compact_system,
            messages=[{"role": "user", "content": case["input"]}],
        )
        if case["expected"] in v_resp.content[0].text:
            verbose_correct += 1
        if case["expected"] in c_resp.content[0].text:
            compact_correct += 1
    return {
        "verbose_accuracy": f"{verbose_correct/len(test_cases)*100:.1f}%",
        "compact_accuracy": f"{compact_correct/len(test_cases)*100:.1f}%",
        "test_cases": len(test_cases),
    }
test_data = [
    {"input": "Refund my last payment", "expected": "BILLING"},
    {"input": "Page loads very slowly", "expected": "TECHNICAL"},
    {"input": "Add two-factor auth option", "expected": "FEATURE_REQUEST"},
]
results = validate_few_shot_accuracy(verbose_system, compact_system, test_data)
print(f"Verbose: {results['verbose_accuracy']} | Compact: {results['compact_accuracy']}")

The Tradeoffs

Compact examples work best for classification and extraction tasks with clear categories. For generation tasks (writing, coding), more detailed examples showing style and reasoning often produce better output. Test before committing.

Reducing from 5 examples to 3 saves 40% of example tokens but may reduce accuracy on edge cases. If your task has more than 5 categories, keep at least one example per category.

Table format breaks down for examples with long text fields. Use it for short, structured data only.

Implementation Checklist

Count tokens in your current few-shot examples
Rewrite examples as input -> output pairs (strip analysis)
Reduce example count to the minimum that covers all categories
Run accuracy comparison on 100 test cases
Deploy compressed examples if accuracy holds within 2%
Monitor classification accuracy weekly

Measuring Impact

Compare total example tokens before and after compression. Multiply savings by request volume and model rate. Track classification accuracy on a weekly test set of 50 labeled examples. If accuracy drops more than 3%, add back one carefully selected example at a time until accuracy recovers. The sweet spot is typically 3-5 compressed examples.

Which model? → Take the 5-question quiz in our Model Selector.

Estimate tokens → Calculate your usage with our Token Estimator.

Try it: Estimate your monthly spend with our Cost Calculator.

Claude Code Token Usage Optimization — optimize tokens across your entire prompt
Reduce Claude Code Hallucinations Save Tokens — better examples reduce hallucination-driven retries
Claude Skill Token Usage Profiling — measure per-skill token consumption