Building Production AI Agents (2026)

Claude Code skills are .md files that extend Claude’s behavior for specific tasks When building production AI agents that run autonomously, these skills provide specialized context. for document processing, TDD workflows, memory management, and more. This guide covers practical patterns for composing them into agents that handle real workloads.

Core Skills for Agent Development

TDD Skill for Quality Assurance

The tdd skill guides Claude through red-green-refactor cycles. In an agent pipeline, invoke it on your agent’s own test suite before deployment:

Run TDD skill analysis on agent modules
claude -p "/tdd analyze the agent modules in ./src and generate missing test cases"

This generates test cases based on your agent’s actual implementation. state management, tool invocation sequences, and response validation.

PDF Skill for Document Processing

Production agents frequently process PDF documents. Invoke the pdf skill when the agent receives a document:

claude -p "/pdf Extract all tables and section headings from /tmp/incoming-contract.pdf"

The skill handles complex layouts, form fields, and multi-column documents.

Supermemory Skill for Context Management

Long-running agents need persistent context between sessions. The supermemory skill provides retrieval of previously stored knowledge:

/supermemory store: Order #12345 processed for customer acme-corp, result: approved
/supermemory What are the previous orders for acme-corp?

This lets agents maintain decision history and reference previous outputs without re-loading full context on every run.

Building a Production Agent Workflow

A practical order-processing agent sequences skills via claude -p calls from a shell script:

#!/bin/bash
agent-pipeline.sh
PDF_FILE="$1"
CUSTOMER_ID="$2"
Step 1: Extract order details from PDF
EXTRACTED=$(claude -p "/pdf Extract order details (items, quantities, totals) from $PDF_FILE")
Step 2: Retrieve customer history
HISTORY=$(claude -p "/supermemory What are the previous orders for $CUSTOMER_ID?")
Step 3: Process with full context
RESULT=$(claude -p "
Order details: $EXTRACTED
Customer history: $HISTORY
Determine whether to approve this order based on past account standing.
Reply with APPROVED or HOLD and a one-sentence reason.
")
echo "$RESULT"
Step 4: Store the result for future reference
claude -p "/supermemory store: Order from $PDF_FILE for $CUSTOMER_ID. $RESULT"

Each step is discrete and testable. The tdd skill can generate tests for each step’s expected input/output pairs before you deploy.

Skill Composition Strategies

Sequential processing: Chain claude -p calls where each output feeds the next. Use when transformations build on each other.

Parallel execution: Invoke independent skills simultaneously using background shell processes:

Run pdf extraction and memory retrieval in parallel
claude -p "/pdf Extract order details from $PDF_FILE" > /tmp/pdf-result.txt &
claude -p "/supermemory What is the order history for $CUSTOMER_ID?" > /tmp/mem-result.txt &
wait
Then process both results together
RESULT=$(claude -p "PDF result: $(cat /tmp/pdf-result.txt)
Memory: $(cat /tmp/mem-result.txt)
Approve or hold this order?")

Conditional routing: Select skills based on input analysis using shell conditionals:

if [[ "$REQUEST_TYPE" == "document" ]]; then
 claude -p "/pdf Process $FILE"
elif [[ "$REQUEST_TYPE" == "query" ]]; then
 claude -p "/supermemory $QUERY"
else
 claude -p "Handle this general request: $REQUEST"
fi

Error Handling and Resilience

Implement retry logic for skill invocations that may fail transiently:

run_skill_with_retry() {
 local PROMPT="$1"
 local OUTPUT="$2"
 local MAX_RETRIES=3
 local ATTEMPT=0
 while [ $ATTEMPT -lt $MAX_RETRIES ]; do
 RESULT=$(claude -p "$PROMPT" 2>/dev/null)
 if [[ $? -eq 0 && -n "$RESULT" ]]; then
 echo "$RESULT" > "$OUTPUT"
 return 0
 fi
 ATTEMPT=$((ATTEMPT + 1))
 echo "Attempt $ATTEMPT failed, retrying..." >&2
 sleep $((ATTEMPT * 2))
 done
 echo "All retries failed" >&2
 return 1
}

When a skill fails repeatedly, fail fast rather than consuming resources on retries.

Monitoring Skill Execution

Track which skills your agent invokes and their outcomes using structured logging:

invoke_skill() {
 local SKILL="$1"
 local PROMPT="$2"
 local START=$(date +%s%N)
 RESULT=$(claude -p "/$SKILL $PROMPT" 2>/dev/null)
 local EXIT_CODE=$?
 local END=$(date +%s%N)
 local DURATION_MS=$(( (END - START) / 1000000 ))
 if [[ $EXIT_CODE -eq 0 ]]; then
 echo "{\"skill\": \"$SKILL\", \"status\": \"success\", \"duration_ms\": $DURATION_MS}" >> /var/log/agent-metrics.jsonl
 else
 echo "{\"skill\": \"$SKILL\", \"status\": \"failure\", \"duration_ms\": $DURATION_MS}" >> /var/log/agent-metrics.jsonl
 fi
 echo "$RESULT"
}

These logs help you identify which skills cause latency and which fail under load.

Deployment Considerations

Version management: Pin the version of Claude Code you’re using in production. Skills are read from your local ~/.claude/skills/ directory. store them in version control and deploy them alongside your agent.

npm install -g @anthropic-ai/[email protected]

Resource allocation: Profile which skills are invoked most frequently. If your agent processes thousands of PDFs daily, ensure you have sufficient API quota for those calls.

Testing before deployment: Use the tdd skill to validate that your skill composition produces expected outputs for representative inputs before shipping to production:

claude -p "/tdd Write integration tests for this agent pipeline script: $(cat agent-pipeline.sh)"

Environment Isolation for Agent Skills

When running Claude Code agents in production, skill files should never live only on a developer’s laptop. Store your ~/.claude/skills/ directory contents in a dedicated repository and deploy them as part of your CI/CD pipeline:

deploy-skills.sh
SKILLS_REPO="[email protected]:your-org/claude-skills.git"
SKILLS_DIR="$HOME/.claude/skills"
mkdir -p "$SKILLS_DIR"
git clone "$SKILLS_REPO" /tmp/claude-skills
cp /tmp/claude-skills/*.md "$SKILLS_DIR/"
rm -rf /tmp/claude-skills
echo "Skills deployed: $(ls $SKILLS_DIR | wc -l) files"

Run this script at container startup. Any change to a skill. updated instructions, new examples, corrected behavior. ships through a pull request rather than a manual file copy, giving you an audit trail of every skill modification.

For containerized agents running in Docker, add the skills copy step to your Dockerfile so the image is self-contained:

FROM node:20-slim
RUN npm install -g @anthropic-ai/[email protected]
COPY skills/ /root/.claude/skills/
COPY agent-pipeline.sh /app/agent-pipeline.sh
RUN chmod +x /app/agent-pipeline.sh
CMD ["/app/agent-pipeline.sh"]

Validating Skill Output Quality in Production

Retry logic handles transient failures, but it does not catch outputs that succeed technically yet return wrong answers. Add an output validation step between each skill call and the next stage of your pipeline:

validate_output() {
 local OUTPUT="$1"
 local MIN_LENGTH="${2:-20}"
 if [[ -z "$OUTPUT" ]]; then
 echo "INVALID: empty output" >&2
 return 1
 fi
 if [[ ${#OUTPUT} -lt $MIN_LENGTH ]]; then
 echo "INVALID: output too short (${#OUTPUT} chars)" >&2
 return 1
 fi
 if echo "$OUTPUT" | grep -qi "i cannot\|unable to process\|error:"; then
 echo "INVALID: refusal or error phrase detected" >&2
 return 1
 fi
 return 0
}
EXTRACTED=$(claude -p "/pdf Extract order details from $PDF_FILE")
if ! validate_output "$EXTRACTED" 50; then
 echo "PDF extraction failed quality check, aborting pipeline" >&2
 exit 1
fi

This pattern catches the cases where the pdf skill runs without error but returns a refusal message because the document was unreadable or password-protected. Catching these early prevents garbage data from propagating to downstream steps and corrupting your supermemory store.

Scaling Agent Pipelines Horizontally

Single-threaded shell pipelines hit throughput limits when order volume grows. Use a simple queue-based architecture to run multiple agent instances in parallel without duplicating processing:

worker.sh. run N copies of this on separate machines or containers
QUEUE_DIR="/shared/queue/pending"
DONE_DIR="/shared/queue/done"
while true; do
 JOB=$(ls "$QUEUE_DIR" | head -1)
 if [[ -z "$JOB" ]]; then
 sleep 2
 continue
 fi
 # Atomic claim: move file before processing
 mv "$QUEUE_DIR/$JOB" "/shared/queue/processing/$JOB" 2>/dev/null || continue
 PDF_FILE=$(jq -r '.pdf' "/shared/queue/processing/$JOB")
 CUSTOMER_ID=$(jq -r '.customer_id' "/shared/queue/processing/$JOB")
 bash /app/agent-pipeline.sh "$PDF_FILE" "$CUSTOMER_ID" \
 > "/shared/queue/results/$JOB.result" 2>&1
 mv "/shared/queue/processing/$JOB" "$DONE_DIR/$JOB"
done

Each worker claims jobs atomically by moving the file before processing. Multiple workers can run on separate containers pointing at the same shared volume without stepping on each other. The supermemory skill remains a single source of truth for customer history because all workers invoke it through the same API endpoint rather than maintaining local state.

Find the right skill → Browse 155+ skills in our Skill Finder.

Try it: Paste your error into our Error Diagnostic for an instant fix.


This site was built by 5 autonomous agents running in tmux while I was in Bali. 2,500 articles. Zero manual work. 100% quality gate pass rate. The orchestration configs, sprint templates, and quality gates that made that possible are in the Zovo Lifetime bundle. Along with 16 CLAUDE.md templates and 80 tested prompts. **[See how the pipeline works →](https://zovo.one/lifetime?utm_source=ccg&utm_medium=cta-skills&utm_campaign=building-production-ai-agents-with-claude-skills-2026)** $99 once. I'm a solo dev in Da Nang. This is how I scale.

Related Reading


*Built by theluckystrike. More at zovo.one *