Lean Prompting: Fewer Tokens, Same Quality
A lean prompt uses 50% fewer input tokens than a verbose one while producing equivalent output. On Claude Opus 4.7 at $5.00 per million input tokens, that cuts $3,750/month off a pipeline processing 1,000 requests per day with 50K-token prompts.
The Setup
Lean prompting is a discipline: say what you need in the fewest tokens possible. It is not about dumbing down your prompts — it is about removing words that do not change Claude’s behavior.
Most prompts contain three categories of waste: polite filler (“Please kindly”), redundant emphasis (“very important, critical, essential”), and implicit instructions (things Claude already does by default). Removing all three typically cuts token count by 40-60% with zero quality degradation.
This guide provides a lean prompting framework with five rules, before/after examples, and a quality validation approach.
The Math
Pipeline: 1,000 requests/day, Claude Opus 4.7
Verbose prompts (50K input tokens average):
- Daily input cost: 50M * $5.00/MTok = $250
- Monthly: $7,500
Lean prompts (25K input tokens average):
- Daily input cost: 25M * $5.00/MTok = $125
- Monthly: $3,750
Savings: $3,750/month (50%)
On Sonnet 4.6 at $3.00/MTok: $2,250/month saved. On Haiku 4.5 at $1.00/MTok: $750/month saved.
Over a year on Opus: $45,000 saved by writing leaner prompts.
The Technique
Rule 1: Delete Filler Words
# Verbose: 45 tokens
verbose = """
Could you please take a look at the following piece of code and
let me know if there are any potential issues or problems that
you can identify? I would really appreciate your help with this.
"""
# Lean: 12 tokens
lean = "Review this code. List all issues."
# Same output quality. 73% fewer tokens.
Rule 2: One Instruction Per Concept
# Verbose: 60 tokens (says "be concise" four different ways)
verbose = """Keep your response brief. Don't include unnecessary
details. Be concise in your answer. Avoid long explanations
that go beyond what's needed."""
# Lean: 8 tokens
lean = "Max 3 sentences."
Rule 3: Skip Default Behaviors
Claude already writes grammatically, stays on topic, and follows instructions. You do not need to tell it to.
# Verbose: 40 tokens of things Claude does by default
verbose = """Make sure your response is grammatically correct
and well-structured. Stay focused on the topic. Follow the
instructions I'm about to give you carefully."""
# Lean: 0 tokens (delete entirely)
lean = "" # Claude does all of this by default
Rule 4: Use Structured Output Specs
import anthropic
client = anthropic.Anthropic()
# Verbose: 120 tokens
verbose_system = """When you analyze a piece of text, I want you to
provide the following information in your response: first, identify
the main topic of the text. Second, determine the sentiment (whether
it's positive, negative, or neutral). Third, extract any key entities
mentioned in the text such as people, places, or organizations. Finally,
provide a one-sentence summary. Please format your response as a JSON
object with the keys 'topic', 'sentiment', 'entities', and 'summary'."""
# Lean: 35 tokens
lean_system = """Analyze text. Output JSON:
{"topic": "str", "sentiment": "positive|negative|neutral",
"entities": ["str"], "summary": "one sentence"}"""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
system=lean_system,
messages=[{"role": "user", "content": "Apple announced record iPhone sales in Q4, beating Wall Street expectations by 12%. CEO Tim Cook credited strong demand in China and India."}],
)
print(response.content[0].text)
Rule 5: Compress Examples
# Verbose few-shot: ~500 tokens for 3 examples
verbose_examples = """
Here is an example of how to classify support tickets:
Example 1:
Customer message: "I can't log into my account. I've tried resetting my password but the reset email never arrives."
Classification: TECHNICAL
Reason: The customer is experiencing a technical issue with the login and password reset functionality.
Example 2:
Customer message: "I was charged twice for my subscription this month."
Classification: BILLING
Reason: The customer has a billing-related complaint about being charged more than once.
Example 3:
Customer message: "Can you add dark mode to the mobile app?"
Classification: FEATURE_REQUEST
Reason: The customer is requesting a new feature be added to the product.
"""
# Lean few-shot: ~80 tokens for 3 examples
lean_examples = """Classify tickets: TECHNICAL, BILLING, FEATURE_REQUEST, GENERAL.
"Can't log in, reset email never arrives" -> TECHNICAL
"Charged twice this month" -> BILLING
"Add dark mode to mobile app" -> FEATURE_REQUEST"""
Complete Lean Prompt Template
def lean_request(task: str, data: str, output_format: str,
constraints: str = "", model: str = "claude-sonnet-4-6") -> str:
"""Make a token-efficient API request using the lean prompt pattern."""
system_parts = [task]
if output_format:
system_parts.append(f"Output: {output_format}")
if constraints:
system_parts.append(constraints)
response = client.messages.create(
model=model,
max_tokens=1024,
system="\n".join(system_parts),
messages=[{"role": "user", "content": data}],
)
return response.content[0].text
# Usage
result = lean_request(
task="Extract contact info.",
data="Email [email protected] or call 555-0123. Office at 100 Main St.",
output_format='{"email": "str|null", "phone": "str|null", "address": "str|null"}',
)
print(result)
The Tradeoffs
Lean prompts work best for well-defined tasks. For exploratory, open-ended requests where you want Claude to brainstorm or provide comprehensive analysis, some verbosity in the prompt helps guide the output quality.
Over-compressed prompts can cause Claude to misinterpret your intent. If you see quality drop on ambiguous tasks, add back the specific instruction that disambiguates — but only that one instruction.
Teams with multiple prompt authors need a shared lean prompting style guide. Without one, different developers will write at different verbosity levels, making optimization inconsistent.
Implementation Checklist
- Audit your top 5 prompts against the 5 lean rules
- Rewrite each prompt applying all applicable rules
- Count tokens before and after using the counting API
- Run quality comparison on 50 test cases per prompt
- Deploy lean versions where quality holds
- Train your team on lean prompting principles
- Review prompts quarterly for accumulated bloat
Measuring Impact
Track average prompt token count per request type over time. Plot a weekly trend line — it should decrease as you apply lean principles. Pair this with a quality score (1-5 rating on a random sample of 20 outputs per week). The goal is a decreasing token trend with a flat quality trend. If quality dips on any request type, inspect those specific prompts for over-compression.
Related Guides
- Claude Code Token Usage Optimization — token reduction across your entire workflow
- Reduce Claude Code Hallucinations Save Tokens — precise prompts that avoid wasted retries
- Claude Skill Token Usage Profiling — find which skills consume the most tokens