Claude Extended Thinking Not Working Fix

Written by Michael Lip · Solo founder of Zovo · $400K+ on Upwork · 100% JSS Join 50+ builders · More at zovo.one

Extended thinking gives Claude deeper reasoning capabilities, but misconfigured parameters produce 400 errors or empty thinking blocks. This guide covers every failure mode and the exact fix.

The Error

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "thinking.budget_tokens: must be >= 1024 and < max_tokens"
  }
}

Quick Fix

  1. Set budget_tokens to at least 1024 and strictly less than max_tokens.
  2. When using tools with thinking, set tool_choice to auto or none only.
  3. Pass thinking blocks back unmodified in multi-turn conversations.

What Causes This

Extended thinking fails when:

Full Solution

Basic Extended Thinking Setup

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "Solve this step by step: What is 127 * 389?"}]
)

for block in response.content:
    if block.type == "thinking":
        print(f"Thinking: {block.thinking[:200]}...")
    elif block.type == "text":
        print(f"Answer: {block.text}")

Fix budget_tokens Validation

The budget_tokens value must satisfy: 1024 <= budget_tokens < max_tokens:

# WRONG: budget_tokens >= max_tokens
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=8000,
    thinking={"type": "enabled", "budget_tokens": 8000},  # Error: not < max_tokens
    messages=[...]
)

# WRONG: budget_tokens < 1024
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=8000,
    thinking={"type": "enabled", "budget_tokens": 500},  # Error: < 1024
    messages=[...]
)

# CORRECT
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},  # 1024 <= 10000 < 16000
    messages=[...]
)

Fix Tool Choice Conflicts

Extended thinking only supports tool_choice: auto or tool_choice: none:

# WRONG: tool_choice "any" with thinking
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    tool_choice={"type": "any"},  # Error!
    tools=[{"name": "calc", "description": "Calculate", "input_schema": {"type": "object", "properties": {}}}],
    messages=[...]
)

# CORRECT: tool_choice "auto" with thinking
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    tool_choice={"type": "auto"},  # OK
    tools=[{"name": "calc", "description": "Calculate", "input_schema": {"type": "object", "properties": {}}}],
    messages=[...]
)

Control Thinking Display

By default, Claude 4 models return summarized thinking. You can control this with the display parameter:

# Summarized thinking (default) -- charged for full tokens, returns summary
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000, "display": "summarized"},
    messages=[...]
)

# Omit thinking content -- returns empty thinking blocks with encrypted signature
response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000, "display": "omitted"},
    messages=[...]
)

Multi-Turn Thinking Continuity

Pass thinking blocks back unmodified to maintain reasoning continuity:

# First turn
response1 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=[{"role": "user", "content": "What is 127 * 389?"}]
)

# Second turn -- pass ALL content blocks back unmodified
messages = [
    {"role": "user", "content": "What is 127 * 389?"},
    {"role": "assistant", "content": response1.content},  # Includes thinking blocks
    {"role": "user", "content": "Now multiply that result by 2"}
]

response2 = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=16000,
    thinking={"type": "enabled", "budget_tokens": 10000},
    messages=messages
)

TypeScript Example

import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

const response = await client.messages.create({
  model: "claude-sonnet-4-6",
  max_tokens: 16000,
  thinking: { type: "enabled", budget_tokens: 10000 },
  messages: [{ role: "user", content: "Solve step by step: What is 127 * 389?" }]
});

for (const block of response.content) {
  if (block.type === "thinking") {
    console.log("Thinking:", block.thinking.slice(0, 200));
  } else if (block.type === "text") {
    console.log("Answer:", block.text);
  }
}

Prevention

  1. Always set max_tokens > budget_tokens + expected output: A good rule is max_tokens = budget_tokens + 4096.
  2. Default to tool_choice auto: When combining tools with thinking, always use auto.
  3. Never modify thinking blocks: In multi-turn conversations, return them exactly as received.
  4. Keep thinking params stable: Changing thinking parameters between turns invalidates cached messages but not cached system prompts or tools.