Claude Code API Pagination (2026)
When building applications that interact with the Claude Code API, handling large datasets efficiently becomes crucial. Pagination isn’t just about splitting data into chunks, it’s about creating a smooth, performant experience for your users while respecting API rate limits and response times.
This guide covers practical pagination strategies you can implement today, with code examples that work with real-world scenarios. Whether you’re pulling conversation history, traversing a library of documents for the pdf skill, or building a dashboard that aggregates data from multiple threads, solid pagination fundamentals will save you hours of debugging and unexpected failures in production.
Understanding Cursor-Based Pagination
The Claude Code API uses cursor-based pagination rather than offset-based approaches. This means each response includes a cursor token that points to the next set of results. Unlike traditional offset pagination (skip 10, take 10), cursor-based pagination is more stable when data changes between requests.
The core problem with offset pagination is straightforward: if a record is inserted or deleted between your first and second page fetch, your offsets shift and you either miss records or see duplicates. Cursors solve this by anchoring to a specific position in the ordered dataset rather than a numeric offset.
Here’s how to implement basic cursor pagination:
import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
def fetch_all_messages(thread_id, max_results=100):
"""Fetch all messages from a thread with pagination."""
messages = []
cursor = None
while len(messages) < max_results:
response = client.messages.list(
thread_id=thread_id,
cursor=cursor,
limit=50
)
messages.extend(response.data)
if not response.has_more:
break
cursor = response.cursor
return messages[:max_results]
The key insight is that you always check has_more before attempting to fetch the next page. This prevents unnecessary API calls and helps you handle edge cases where the dataset is smaller than expected.
A subtle trap here: never assume the last page is full. If you expect 50 results per page and receive 23, that does not mean there are more. Always rely on has_more as the authoritative signal to continue or stop.
Offset vs. Cursor Pagination: A Direct Comparison
Understanding why cursors are preferred over offsets helps you make better design decisions when building on top of the API.
| Aspect | Offset Pagination | Cursor Pagination |
|---|---|---|
| Consistency | Data may shift between pages | Stable regardless of inserts/deletes |
| Performance | Slower as offset grows (DB must count rows) | Consistent speed regardless of position |
| Random access | Can jump to page 5 directly | Must traverse forward sequentially |
| Resumability | Fragile. offset becomes stale | Solid. cursor remains valid |
| Implementation | Simple to reason about | Slightly more complex initially |
For the Claude Code API, cursors are the right tool because conversation history and document libraries change frequently. Offset pagination would give you unreliable results in any live system.
Setting Appropriate Page Sizes
The limit parameter controls how many items return per request. The Claude Code API typically allows limits between 1 and 100, but choosing the right value depends on your use case.
For interactive applications where users scroll through results, a limit of 20-30 provides a good balance:
async function fetchConversations(limit = 25) {
const response = await fetch('/api/conversations', {
method: 'POST',
body: JSON.stringify({ limit })
});
const data = await response.json();
return {
conversations: data.conversations,
nextCursor: data.cursor
};
}
For background jobs or data exports where throughput matters, you might push toward the maximum limit. However, larger page sizes increase memory usage and response latency, so profile your application to find the sweet spot.
A practical rule: match your page size to your rendering unit. If you display 20 items on screen, fetch 20. If you’re doing batch processing with no UI, fetch the maximum allowed. Here is a more complete example for a data export scenario:
def export_all_threads(output_file, page_size=100):
"""Export all threads to a file using maximum page size."""
cursor = None
total_written = 0
with open(output_file, "w") as f:
while True:
page = client.threads.list(cursor=cursor, limit=page_size)
for thread in page.data:
f.write(json.dumps(thread) + "\n")
total_written += 1
if not page.has_more:
break
cursor = page.cursor
print(f"Exported {total_written} threads to {output_file}")
return total_written
In this export scenario, 100 per page reduces total round trips while keeping each individual response fast enough to avoid timeouts.
Handling Rate Limits Gracefully
When paginating through large datasets, you’ll inevitably encounter rate limits. The Claude Code API returns a 429 status code when you’ve exceeded your quota. Implement exponential backoff to handle this gracefully:
import time
import requests
def fetch_with_retry(url, max_retries=3):
"""Fetch with exponential backoff on rate limits."""
for attempt in range(max_retries):
response = requests.get(url)
if response.status_code == 200:
return response.json()
if response.status_code == 429:
wait_time = 2 attempt
print(f"Rate limited. Waiting {wait_time}s...")
time.sleep(wait_time)
response.raise_for_status()
raise Exception("Max retries exceeded")
This pattern works especially well when combining pagination with other API operations. If you’re building a tool that uses the pdf skill to process documents while also fetching conversation history, rate limit handling ensures your entire workflow doesn’t fail on a temporary throttling event.
For production systems, check the response headers for rate limit metadata. Many APIs include X-RateLimit-Remaining and X-RateLimit-Reset headers that let you be smarter about backoff timing rather than guessing:
def fetch_page_with_header_backoff(url, headers={}):
"""Respect rate limit headers when backing off."""
response = requests.get(url, headers=headers)
if response.status_code == 429:
reset_at = int(response.headers.get("X-RateLimit-Reset", 0))
wait_seconds = max(reset_at - time.time(), 1)
print(f"Rate limited. Sleeping {wait_seconds:.1f}s until reset.")
time.sleep(wait_seconds)
return fetch_page_with_header_backoff(url, headers)
response.raise_for_status()
return response.json()
Using the reset timestamp from the header avoids both over-sleeping (wasting time) and under-sleeping (retrying before you’re allowed).
Parallel Page Fetching for Independent Data
Sometimes you need to fetch multiple paginated resources simultaneously. Rather than sequentially waiting for each page, you can use concurrent requests:
async function fetchMultipleThreads(threadIds: string[]) {
const fetchThread = async (id: string) => {
const response = await fetch(`/api/threads/${id}/messages`);
return response.json();
};
// Fetch all threads in parallel
const results = await Promise.all(
threadIds.map(fetchThread)
);
return results;
}
This approach works well when you know the thread IDs upfront. However, be mindful of total concurrent connections, too many simultaneous requests can trigger rate limits regardless of individual request patterns.
A safer pattern uses a concurrency limiter rather than firing all requests at once. Here is a Python implementation using asyncio with a semaphore to cap parallel requests:
import asyncio
import aiohttp
async def fetch_page(session, url, semaphore):
async with semaphore:
async with session.get(url) as response:
return await response.json()
async def fetch_all_parallel(urls, max_concurrent=5):
"""Fetch multiple paginated endpoints with bounded concurrency."""
semaphore = asyncio.Semaphore(max_concurrent)
async with aiohttp.ClientSession() as session:
tasks = [fetch_page(session, url, semaphore) for url in urls]
results = await asyncio.gather(*tasks, return_exceptions=True)
# Filter out exceptions and log them
successful = [r for r in results if not isinstance(r, Exception)]
failed = [r for r in results if isinstance(r, Exception)]
if failed:
print(f"Warning: {len(failed)} requests failed")
return successful
This limits to 5 concurrent requests at any time. Increase or decrease based on your rate limit tier and observed error rates.
Combining Claude Skills with Pagination
The real power of pagination emerges when you combine it with Claude’s specialized skills. For instance, when using the frontend-design skill to generate UI components, you might paginate through a library of design tokens:
def process_design_tokens(token_library_id, token_handler):
"""Process design tokens across multiple pages."""
cursor = None
while True:
page = client.design_tokens.list(
library_id=token_library_id,
cursor=cursor,
limit=50
)
for token in page.data:
token_handler(token)
if not page.has_more:
break
cursor = page.cursor
Similarly, when using the tdd skill to generate tests across multiple files, pagination helps you manage large codebases without overwhelming memory:
async function generateTestsForFiles(fileIds, testGenerator) {
let cursor = null;
do {
const page = await fetchFilePage(fileIds, cursor);
for (const file of page.files) {
await testGenerator(file.path, file.content);
}
cursor = page.has_more ? page.cursor : null;
} while (cursor);
}
The supermemory skill can also benefit from pagination when retrieving historical context, fetching memories in chunks prevents single-request timeouts while still building a complete context window.
A common pattern when combining skills with pagination is the “accumulate then process” model versus the “process as you go” model. Choose based on your memory constraints:
Accumulate then process. works for smaller datasets
def batch_process_with_pdf_skill(document_ids):
all_docs = list(paginate_all(document_ids))
return process_with_pdf_skill(all_docs)
Process as you go. required for large datasets
def stream_process_with_pdf_skill(document_ids):
cursor = None
results = []
while True:
page = fetch_document_page(document_ids, cursor)
for doc in page.data:
result = process_single_doc_with_pdf(doc)
results.append(result)
if not page.has_more:
break
cursor = page.cursor
return results
The streaming model is almost always safer for production workflows involving skills that process large files. You avoid loading thousands of documents into memory before any processing begins.
Tracking Pagination State
For long-running operations or user-resumable flows, persist pagination state:
import json
def save_progress(cursor, page_number, filename="pagination_state.json"):
"""Save pagination progress for resumability."""
state = {
"cursor": cursor,
"page": page_number,
"timestamp": time.time()
}
with open(filename, "w") as f:
json.dump(state, f)
def load_progress(filename="pagination_state.json"):
"""Load saved pagination state."""
try:
with open(filename, "r") as f:
return json.load(f)
except FileNotFoundError:
return None
This becomes valuable when building tools that run as background jobs or need to survive application restarts. Your users will appreciate not losing progress when processing thousands of items.
Extend this pattern to store additional context that helps you validate state is still valid when you resume:
def save_full_progress(cursor, processed_count, job_id, filename):
state = {
"cursor": cursor,
"processed_count": processed_count,
"job_id": job_id,
"saved_at": time.time(),
"api_version": "v1" # Track API version in case schema changes
}
with open(filename, "w") as f:
json.dump(state, f, indent=2)
print(f"Progress saved: {processed_count} records, cursor={cursor[:20]}...")
def resume_job(filename):
state = load_progress(filename)
if not state:
return None, 0
age_minutes = (time.time() - state["saved_at"]) / 60
if age_minutes > 60:
print(f"Warning: state is {age_minutes:.0f} minutes old. cursor is stale")
return state["cursor"], state["processed_count"]
Storing the timestamp allows you to warn when a saved cursor might have expired. Some APIs invalidate cursors after a certain period of inactivity.
Testing Your Pagination Logic
Pagination code often breaks at boundaries: exactly at the page size limit, on the last partial page, or when a dataset has exactly zero items. Test these edge cases explicitly:
def test_pagination_edge_cases():
# Empty dataset
result = fetch_all_messages("empty_thread")
assert result == [], "Should handle empty dataset"
# Exactly one page worth of results
# (dataset size equals page limit)
single_page = fetch_all_messages("thread_with_50_messages")
assert len(single_page) == 50, "Should return all items on single page"
# Dataset size is 1 less than limit
partial = fetch_all_messages("thread_with_49_messages")
assert len(partial) == 49, "Should handle partial last page"
# Dataset exceeds max_results cap
capped = fetch_all_messages("large_thread", max_results=10)
assert len(capped) == 10, "Should respect max_results cap"
These tests are easy to write and catch the most common pagination bugs before they reach production. Mock the API client responses to cover cases your live test data might not include.
Key Takeaways
Cursor-based pagination with the Claude Code API requires a different mindset than traditional offset pagination, but it offers significant advantages for data consistency and performance. Set appropriate page sizes based on your use case, implement proper rate limit handling, and consider parallel fetching when you need to gather data from multiple independent sources.
Always test boundary conditions: empty datasets, exactly full pages, and partial last pages. Persist pagination state for any job that takes longer than a few seconds to complete. Use concurrency limits when parallelizing to avoid triggering rate limits through volume alone.
The skills like pdf, tdd, frontend-design, and supermemory all work better when you build pagination into your workflows from the start rather than treating it as an afterthought.
Try it: Paste your error into our Error Diagnostic for an instant fix.
Related Reading
- Claude Code REST API Design Best Practices
- Claude Code GraphQL Schema Design Guide
- Claude Code OpenAPI Spec Generation Guide
- Claude Skills Integrations Hub
- Claude Code Next.js API Routes Best — Honest Review 2026
Built by theluckystrike. More at zovo.one
Find the right skill → Browse 155+ skills in our Skill Finder.