Claude Extended Thinking Not Working Fix
Extended thinking gives Claude deeper reasoning capabilities, but misconfigured parameters produce 400 errors or empty thinking blocks. This guide covers every failure mode and the exact fix.
The Error
{
"type": "error",
"error": {
"type": "invalid_request_error",
"message": "thinking.budget_tokens: must be >= 1024 and < max_tokens"
}
}
Quick Fix
- Set
budget_tokensto at least 1024 and strictly less thanmax_tokens. - When using tools with thinking, set
tool_choicetoautoornoneonly. - Pass thinking blocks back unmodified in multi-turn conversations.
What Causes This
Extended thinking fails when:
budget_tokensis less than 1024 or greater than or equal tomax_tokens.tool_choiceis set toanyor a specific tool name (onlyautoandnonework with thinking).- Thinking is toggled on or off mid-assistant-turn.
- Thinking blocks are modified or stripped when passing them back in multi-turn conversations.
- Thinking parameters change between turns, invalidating cached messages.
Full Solution
Basic Extended Thinking Setup
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "Solve this step by step: What is 127 * 389?"}]
)
for block in response.content:
if block.type == "thinking":
print(f"Thinking: {block.thinking[:200]}...")
elif block.type == "text":
print(f"Answer: {block.text}")
Fix budget_tokens Validation
The budget_tokens value must satisfy: 1024 <= budget_tokens < max_tokens:
# WRONG: budget_tokens >= max_tokens
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=8000,
thinking={"type": "enabled", "budget_tokens": 8000}, # Error: not < max_tokens
messages=[...]
)
# WRONG: budget_tokens < 1024
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=8000,
thinking={"type": "enabled", "budget_tokens": 500}, # Error: < 1024
messages=[...]
)
# CORRECT
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000}, # 1024 <= 10000 < 16000
messages=[...]
)
Fix Tool Choice Conflicts
Extended thinking only supports tool_choice: auto or tool_choice: none:
# WRONG: tool_choice "any" with thinking
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
tool_choice={"type": "any"}, # Error!
tools=[{"name": "calc", "description": "Calculate", "input_schema": {"type": "object", "properties": {}}}],
messages=[...]
)
# CORRECT: tool_choice "auto" with thinking
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
tool_choice={"type": "auto"}, # OK
tools=[{"name": "calc", "description": "Calculate", "input_schema": {"type": "object", "properties": {}}}],
messages=[...]
)
Control Thinking Display
By default, Claude 4 models return summarized thinking. You can control this with the display parameter:
# Summarized thinking (default) -- charged for full tokens, returns summary
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000, "display": "summarized"},
messages=[...]
)
# Omit thinking content -- returns empty thinking blocks with encrypted signature
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000, "display": "omitted"},
messages=[...]
)
Multi-Turn Thinking Continuity
Pass thinking blocks back unmodified to maintain reasoning continuity:
# First turn
response1 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=[{"role": "user", "content": "What is 127 * 389?"}]
)
# Second turn -- pass ALL content blocks back unmodified
messages = [
{"role": "user", "content": "What is 127 * 389?"},
{"role": "assistant", "content": response1.content}, # Includes thinking blocks
{"role": "user", "content": "Now multiply that result by 2"}
]
response2 = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=16000,
thinking={"type": "enabled", "budget_tokens": 10000},
messages=messages
)
TypeScript Example
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-sonnet-4-6",
max_tokens: 16000,
thinking: { type: "enabled", budget_tokens: 10000 },
messages: [{ role: "user", content: "Solve step by step: What is 127 * 389?" }]
});
for (const block of response.content) {
if (block.type === "thinking") {
console.log("Thinking:", block.thinking.slice(0, 200));
} else if (block.type === "text") {
console.log("Answer:", block.text);
}
}
Prevention
- Always set max_tokens > budget_tokens + expected output: A good rule is
max_tokens = budget_tokens + 4096. - Default to tool_choice auto: When combining tools with thinking, always use
auto. - Never modify thinking blocks: In multi-turn conversations, return them exactly as received.
- Keep thinking params stable: Changing thinking parameters between turns invalidates cached messages but not cached system prompts or tools.
Related Guides
- Claude Extended Thinking API Guide – full tutorial on using extended thinking effectively.
- Claude Tool Use Not Working – debug tool_choice and tool definition issues.
- Claude API Error 400 invalid_request_error Fix – the error type returned for thinking parameter violations.
- Claude Prompt Caching Not Working – understand how thinking changes affect cache invalidation.
- Claude Streaming API Guide – streaming works with extended thinking for real-time output.