Fix: Claude MD Error Handling Patterns (2026)
Error handling in Claude skills requires deliberate design. When you build automated workflows with Claude Code, unexpected failures can derail entire pipelines. This guide shows developers and power users how to embed solid error handling directly into Claude MD files, ensuring your skills recover gracefully from failures.
Why Error Handling Matters in Claude MD Files
Claude MD files serve as persistent instructions that shape how Claude Code behaves across sessions. Without proper error handling patterns, a single failure can cause your skill to produce incomplete output, hang indefinitely, or generate misleading results. The cost of poor error handling compounds when skills run in automated pipelines or CI/CD environments.
Consider a skill that processes user uploads using the pdf skill. If the file is corrupted, the skill should detect this and provide a clear error rather than crashing mid-operation. Similarly, a skill using tdd to generate tests should handle cases where test frameworks are missing or permission errors occur.
What makes error handling in Claude MD files different from error handling in traditional code is that you cannot write try/catch blocks or if err != nil guards in the usual sense. Instead, you write declarative instructions that tell Claude Code what to look for, what constitutes a failure state, and what actions to take when failure occurs. The model interprets these instructions and applies them contextually during execution. Done well, this produces remarkably resilient behavior. Done poorly, it produces skills that silently swallow errors or fail in confusing ways.
Core Error Handling Patterns
Try-Catch Equivalent Using Conditional Branches
Claude Code does not have native try-catch syntax, but you can simulate error handling through conditional instructions. Structure your Claude MD to check preconditions before executing risky operations.
Before attempting file operations
{% if file_exists == false %}
Stop and report: Required configuration file not found at {{ config_path }}
{% endif %}
Before API calls
{% if api_key_missing == true %}
Abort with: API key not configured. Set ANTHROPIC_API_KEY environment variable.
{% endif %}
This pattern works by embedding conditional logic that Claude evaluates before proceeding. You trigger these checks by including state variables that your tools set during execution.
The equivalent in a traditional language would look like:
Python equivalent of what the Claude MD pattern achieves
if not os.path.exists(config_path):
raise FileNotFoundError(f"Required config not found at {config_path}")
if not os.environ.get("ANTHROPIC_API_KEY"):
raise EnvironmentError("API key not configured")
The Claude MD version is declarative rather than imperative, but it achieves the same protective effect. it stops execution with a useful message rather than proceeding into an operation that will definitely fail.
Validation Gates
Place validation gates at critical decision points in your skill workflow. The frontend-design skill demonstrates this by validating design tokens exist before applying them to components.
Validation Gate: Project Structure
Before modifying any files, verify:
1. Package.json exists in project root
2. Required directories are present (src/, tests/)
3. Git repository is initialized
If any validation fails, report the specific missing item and stop execution.
The key to making validation gates effective is specificity. A gate that says “check that the environment is set up correctly” gives the model too much interpretive freedom. A gate that names exactly what to check, exactly how to check it, and exactly what to report on failure leaves no ambiguity.
Compare these two formulations:
Weak gate. too vague
Verify the project is ready before running.
Strong gate. specific and actionable
Before running, verify these four things in order:
1. Run `ls package.json` and confirm exit code is 0
2. Run `ls -d src/ tests/` and confirm both directories exist
3. Run `git status` and confirm output does not contain "not a git repository"
4. Run `node --version` and confirm it returns v18 or higher
If any check fails, stop and report exactly which check failed and what the actual output was.
The second version gives Claude Code specific commands to run and specific success criteria to evaluate. This eliminates guesswork and makes failures easy to diagnose.
Graceful Degradation Strategies
When ideal resources are unavailable, your skill should fall back to sensible defaults. The supermemory skill handles this by loading minimal context when memory services are unavailable.
Fallback Behavior
If external service fails:
1. Log the error with timestamp
2. Use cached data if available (check .cache/ directory)
3. If no cache, provide limited functionality with warning
4. Never expose raw error messages to end users
Graceful degradation is particularly valuable in skills that integrate with external APIs or network services. Those services will be unavailable sometimes, and your skill’s behavior in those moments defines its reliability from the user’s perspective.
A more complete degradation hierarchy for a skill that depends on external data might look like:
Data Source Fallback Chain
Attempt to load data in this order, stopping at the first success:
1. Primary API endpoint (https://api.example.com/data)
2. Backup API endpoint (https://backup.example.com/data)
3. Local cache at .cache/data.json (use if file is less than 24 hours old)
4. Embedded defaults in this skill file
If using fallback levels 3 or 4, include a warning in your response explaining
that data is outdated and when it was last refreshed.
Never silently use stale data without notifying the user.
This kind of explicit hierarchy ensures the skill stays functional under a range of failure conditions while being transparent about the quality of its results.
Implementing Retry Logic
Retry patterns require careful design in Claude MD since you cannot write loops directly. Instead, structure your skill to recognize failure states and re-execute with modified parameters.
Explicit Retry Instructions
Retry Strategy for Network Requests
If API call fails with 5xx error:
1. Wait 2 seconds
2. Retry with exponential backoff (max 3 attempts)
3. On final failure, cache the request for later manual review
4. Report failure count to user
State Tracking for Retries
Include retry counters in your skill instructions to prevent infinite loops:
Retry Counter Management
Maintain a retry_count variable:
- Initialize at 0 before any network operation
- Increment after each failed attempt
- If retry_count exceeds 3, stop and report all errors collected
- Reset retry_count to 0 after successful operation
The reason to be explicit about the maximum retry count is that without a hard limit, Claude Code can get into a loop where each retry fails in the same way and the skill appears to hang. Three attempts is a sensible default for most operations; some skills dealing with flaky network conditions may benefit from up to five, but beyond that, retrying usually does not help and wastes time.
For operations where the failure is transient versus permanent, you can add error classification to your retry logic:
Error Classification for Retry Decisions
Before retrying, classify the error:
Transient errors (worth retrying):
- HTTP 429 (rate limit). wait longer before retry
- HTTP 502, 503, 504 (server/gateway error). retry with backoff
- Connection timeout. retry immediately once, then with backoff
Permanent errors (do not retry):
- HTTP 400, 401, 403 (client/auth error). stop and report
- HTTP 404 (not found). stop and report
- File not found. stop and report
- Invalid input format. stop and report
Only apply retry logic to transient errors. For permanent errors, fail immediately
with a clear explanation of what the user needs to fix.
This classification prevents the skill from wasting retries on errors that will never resolve on their own.
Error Context and Debugging
Rich error context dramatically improves troubleshooting. When failures occur, your skill should capture enough information for effective debugging.
Structured Error Reporting
Error Reporting Format
When any operation fails, report:
- Operation name and timestamp
- Input parameters (excluding secrets)
- Actual error message received
- Suggested remediation steps
- Link to relevant documentation
Example output format:
[ERROR] Operation: parse_pdf Time: 2026-03-14T10:30:00Z Input: document.pdf (2.3MB) Error: Unable to read PDF structure Suggestion: Verify file is valid PDF using pdfinfo tool
The structure matters here. Error reports that dump raw output without context force the user to interpret a wall of text. Structured reports with labeled fields allow someone to immediately identify the relevant parts. This is especially important when a skill is running in an automated pipeline where errors is aggregated into a log file that someone reviews hours later.
Note the explicit instruction to exclude secrets from error reports. Skills that log API keys, tokens, or passwords in error output create security vulnerabilities. Making this exclusion explicit in the skill definition prevents it from happening accidentally.
Debug Mode Toggle
Include a debug flag in your skill that enables verbose logging:
Debug Mode
When DEBUG=true is set in environment:
1. Log all tool invocations with full parameters
2. Show token usage after each major operation
3. Include stack traces for errors
4. Output intermediate values in transformations
Default behavior (DEBUG unset):
- Only report failures and final results
- Sanitize sensitive values in logs
This two-tier logging approach is a standard pattern in well-engineered software and works well in Claude MD skills too. Normal users see clean output focused on results. Developers investigating problems can turn on verbose logging to see the full picture.
You can extend this pattern to support graduated debug levels:
Log Levels
LOG_LEVEL environment variable controls output verbosity:
- LOG_LEVEL=error (default): Only report failures
- LOG_LEVEL=warn: Report failures and unexpected conditions
- LOG_LEVEL=info: Report major milestones and all failures
- LOG_LEVEL=debug: Report all tool calls, intermediate values, and timing
Handling Specific Failure Categories
Different types of operations fail in different ways, and your error handling should be tailored to each category.
File System Errors
File Operation Error Handling
For read operations:
- File not found: Report exact path and suggest checking the path with `ls`
- Permission denied: Report file path and suggest `ls -la` to check permissions
- File too large: Report file size and suggest chunked processing
For write operations:
- Disk full: Report available space and suggest cleanup
- Permission denied: Report target path and suggest writing to /tmp as fallback
- Path not found: Report missing parent directory and offer to create it
External Service Errors
API Error Handling
For authentication errors (401, 403):
- Never retry
- Report which credential is invalid
- Provide instructions for refreshing credentials
For rate limit errors (429):
- Extract retry-after header if present
- Wait the specified duration before retrying
- If no header, wait 60 seconds before first retry
For server errors (5xx):
- Retry up to 3 times with exponential backoff (2s, 4s, 8s)
- Log each attempt with timestamp and error code
- On final failure, report all collected errors and suggest checking service status
Input Validation Errors
Input Validation
Validate inputs at the start of skill execution before any file operations or API calls.
Required fields: Report immediately if missing, do not proceed.
Optional fields: Use documented defaults, note which defaults were applied.
Format errors: Report the field name, expected format, and actual value received.
Example validation report:
"Input validation failed: 'output_format' must be one of [json, csv, xml].
Received: 'yaml'. No files were modified."
Integration with Claude Skills Ecosystem
Several existing skills demonstrate error handling patterns worth studying:
The pdf skill validates file signatures before attempting parsing, preventing crashes on invalid files. The tdd skill wraps test execution in validation gates, ensuring test frameworks are available before generating tests. The supermemory skill implements graceful degradation when external storage services are unreachable.
Study these patterns and adapt them to your own skills. Error handling becomes especially critical when skills chain together. each skill’s error output becomes input to the next stage.
When chaining skills, design each one to produce a structured result object that includes both the output data and an error status field:
Chained Skill Output Format
Always produce a result in this format so downstream skills can check it:
Success case:
- status: "success"
- data: [the actual output]
- warnings: [list any non-fatal issues encountered]
Failure case:
- status: "error"
- error_code: [machine-readable code like FILE_NOT_FOUND]
- error_message: [human-readable explanation]
- partial_data: [any data produced before the failure, if useful]
Downstream skills should check status before processing data.
If status is "error", propagate the error rather than attempting to process partial_data.
This contract between chained skills prevents silent partial failures where one skill errors but the next stage processes garbage data and produces a plausible-looking but wrong result.
Best Practices Summary
Keep error handling declarative and specific. Instead of generic “handle errors” instructions, specify exactly what constitutes an error, how to detect it, and what action to take. Test your error paths by intentionally triggering failures during development. Document expected error states in your skill’s usage section so users understand failure modes. Finally, ensure error messages remain helpful rather than technical. users should understand what went wrong and how to fix it.
A quick reference for the patterns covered in this guide:
| Pattern | Use When | Key Rule |
|---|---|---|
| Validation gate | Before any irreversible operation | List specific checks with exact commands |
| Graceful degradation | External dependencies is unavailable | Define an explicit fallback chain |
| Retry with backoff | Transient network/API failures | Set a hard maximum retry count |
| Error classification | Mixed transient/permanent failure types | Never retry permanent errors |
| Structured error reporting | All failure cases | Always include remediation steps |
| Debug mode toggle | Skill development and production | Default to minimal output; enable verbosity explicitly |
The investment in thoughtful error handling pays dividends every time a skill runs in a real environment with real, messy data. Skills that fail clearly and recover gracefully earn trust. Skills that silently produce wrong output or crash without explanation erode it.
Related Reading
- Error Handling Reference. Complete error diagnosis and resolution guide
- Claude Code Skips Error Handling in Generated Code
- Claude API Timeout Errors: Handling and Retry Guide
- Claude Code Docker Permission Denied Bind Mount Error
Built by theluckystrike. More at zovo.one
Find the right skill → Browse 155+ skills in our Skill Finder.
See Also
Configure permissions → Build your settings with our Permission Configurator.
Try it: Paste your error into our Error Diagnostic for an instant fix.