Claude Code Context Window Management Guide
The Problem
Your Claude Code sessions become slow, expensive, or produce lower-quality responses as the conversation grows. You see compaction happening frequently, or /cost shows unexpectedly high token usage for simple tasks.
Quick Fix
Use /clear between unrelated tasks and delegate exploratory work to subagents:
/clear
Check your current context usage at any time:
/context
What’s Happening
Claude Code’s context window holds everything Claude knows about your session: your CLAUDE.md instructions, auto memory, MCP tool names, skill descriptions, every file read, every response, and metadata that never appears in your terminal. When this window fills up, Claude Code automatically runs compaction, which summarizes the conversation history to make room.
The problem is that context size directly drives token costs. Every message you send includes the full conversation context, so a bloated context window means you pay for irrelevant information on every single API call. Compaction helps, but it is lossy and may discard details you need later.
Step-by-Step Fix
Step 1: Understand what loads at startup
Before you type anything, Claude Code loads several items into context:
- Project-root CLAUDE.md and any user-level CLAUDE.md
- Auto memory (first 200 lines or 25KB)
- MCP tool names (deferred by default, so only names load until a tool is used)
- Skill descriptions from configured skills
Run /memory to see which CLAUDE.md and auto memory files loaded. Run /context for a live breakdown by category.
Step 2: Keep CLAUDE.md files concise
Target under 200 lines per CLAUDE.md file. Every line consumes tokens on every message. Move detailed procedures into skills instead of cramming everything into CLAUDE.md:
# CLAUDE.md - keep it lean
- Use 2-space indentation for TypeScript
- Run `pnpm test` before committing
- API handlers live in src/api/handlers/
Detailed workflows belong in .claude/skills/ where Claude loads them on demand rather than on every session.
Step 3: Clear between tasks
When you finish a task and start something unrelated, clear the context:
/rename feature-auth-implementation
/clear
Renaming first lets you resume the session later with /resume. Stale context from a previous task wastes tokens on every subsequent message.
Step 4: Use custom compaction instructions
When compaction runs, tell Claude what to preserve:
/compact Focus on code samples, test results, and API changes
You can also configure this in your CLAUDE.md:
# Compact instructions
When you are using compact, please focus on test output and code changes
Step 5: Delegate exploration to subagents
When Claude needs to search through many files, the search results fill your main context. Use subagents to keep exploration out of your main window:
Claude has a built-in Explore subagent that runs on Haiku (fast, cheap) with read-only access. When you ask Claude to understand a codebase, it delegates to Explore, which works in its own context window and returns only a summary.
For custom delegation, create a subagent:
---
description: Research agent for codebase exploration
tools:
- Read
- Grep
- Glob
model: haiku
---
Search the codebase and return a concise summary of findings.
Step 6: Understand what survives compaction
After compaction, some instructions reload automatically and some are lost:
| Content | After compaction |
|---|---|
| System prompt, output style | Unchanged |
| Project-root CLAUDE.md | Re-injected from disk |
| Auto memory | Re-injected from disk |
Rules with paths: frontmatter |
Lost until a matching file is read again |
| Nested subdirectory CLAUDE.md | Lost until a file in that directory is read again |
| Invoked skill bodies | Re-injected, capped at 5,000 tokens per skill |
If a rule must persist across compaction, remove the paths: frontmatter or move it to the project-root CLAUDE.md.
Step 7: Reduce MCP overhead
MCP tool definitions are deferred by default, so only tool names enter context. But if you have many servers configured, even the names add up. Disable unused servers:
/mcp
Review active servers and disable any you are not using. Prefer CLI tools like gh, aws, and gcloud over MCP servers when both options exist, as CLI tools add zero per-tool context overhead.
Step 8: Choose the right model
Sonnet handles most coding tasks well and costs less than Opus. Reserve Opus for complex architectural decisions. Switch mid-session:
/model sonnet
For subagent tasks, configure them to use Haiku for maximum cost efficiency.
Prevention
Build context hygiene into your workflow:
- Start each distinct task with
/clear - Keep CLAUDE.md under 200 lines; use skills for detailed procedures
- Run
/contextperiodically to spot bloat - Use subagents for any task involving many file reads
- Configure your status line to show context usage continuously with
/statusline
Level Up Your Claude Code Workflow
The developers who get the most out of Claude Code aren’t just fixing errors — they’re running multi-agent pipelines, using battle-tested CLAUDE.md templates, and shipping with production-grade operating principles.
Get Claude Code Mastery — included in Zovo Lifetime →
16 CLAUDE.md templates · 80+ prompts · orchestration configs · workflow playbooks. $99 once, free forever.