Claude Code API Pagination (2026)

Last updated: April 17, 2026

When building applications that interact with the Claude Code API, handling large datasets efficiently becomes crucial. Pagination isn’t just about splitting data into chunks, it’s about creating a smooth, performant experience for your users while respecting API rate limits and response times.

This guide covers practical pagination strategies you can implement today, with code examples that work with real-world scenarios. Whether you’re pulling conversation history, traversing a library of documents for the pdf skill, or building a dashboard that aggregates data from multiple threads, solid pagination fundamentals will save you hours of debugging and unexpected failures in production.

Understanding Cursor-Based Pagination

The Claude Code API uses cursor-based pagination rather than offset-based approaches. This means each response includes a cursor token that points to the next set of results. Unlike traditional offset pagination (skip 10, take 10), cursor-based pagination is more stable when data changes between requests.

The core problem with offset pagination is straightforward: if a record is inserted or deleted between your first and second page fetch, your offsets shift and you either miss records or see duplicates. Cursors solve this by anchoring to a specific position in the ordered dataset rather than a numeric offset.

Here’s how to implement basic cursor pagination:

import anthropic
client = anthropic.Anthropic(api_key="your-api-key")
def fetch_all_messages(thread_id, max_results=100):
 """Fetch all messages from a thread with pagination."""
 messages = []
 cursor = None
 while len(messages) < max_results:
 response = client.messages.list(
 thread_id=thread_id,
 cursor=cursor,
 limit=50
 )
 messages.extend(response.data)
 if not response.has_more:
 break
 cursor = response.cursor
 return messages[:max_results]

The key insight is that you always check has_more before attempting to fetch the next page. This prevents unnecessary API calls and helps you handle edge cases where the dataset is smaller than expected.

A subtle trap here: never assume the last page is full. If you expect 50 results per page and receive 23, that does not mean there are more. Always rely on has_more as the authoritative signal to continue or stop.

Offset vs. Cursor Pagination: A Direct Comparison

Understanding why cursors are preferred over offsets helps you make better design decisions when building on top of the API.

Aspect	Offset Pagination	Cursor Pagination
Consistency	Data may shift between pages	Stable regardless of inserts/deletes
Performance	Slower as offset grows (DB must count rows)	Consistent speed regardless of position
Random access	Can jump to page 5 directly	Must traverse forward sequentially
Resumability	Fragile. offset becomes stale	Solid. cursor remains valid
Implementation	Simple to reason about	Slightly more complex initially

For the Claude Code API, cursors are the right tool because conversation history and document libraries change frequently. Offset pagination would give you unreliable results in any live system.

Setting Appropriate Page Sizes

The limit parameter controls how many items return per request. The Claude Code API typically allows limits between 1 and 100, but choosing the right value depends on your use case.

For interactive applications where users scroll through results, a limit of 20-30 provides a good balance:

async function fetchConversations(limit = 25) {
 const response = await fetch('/api/conversations', {
 method: 'POST',
 body: JSON.stringify({ limit })
 });
 const data = await response.json();
 return {
 conversations: data.conversations,
 nextCursor: data.cursor
 };
}

For background jobs or data exports where throughput matters, you might push toward the maximum limit. However, larger page sizes increase memory usage and response latency, so profile your application to find the sweet spot.

A practical rule: match your page size to your rendering unit. If you display 20 items on screen, fetch 20. If you’re doing batch processing with no UI, fetch the maximum allowed. Here is a more complete example for a data export scenario:

def export_all_threads(output_file, page_size=100):
 """Export all threads to a file using maximum page size."""
 cursor = None
 total_written = 0
 with open(output_file, "w") as f:
 while True:
 page = client.threads.list(cursor=cursor, limit=page_size)
 for thread in page.data:
 f.write(json.dumps(thread) + "\n")
 total_written += 1
 if not page.has_more:
 break
 cursor = page.cursor
 print(f"Exported {total_written} threads to {output_file}")
 return total_written

In this export scenario, 100 per page reduces total round trips while keeping each individual response fast enough to avoid timeouts.

Handling Rate Limits Gracefully

When paginating through large datasets, you’ll inevitably encounter rate limits. The Claude Code API returns a 429 status code when you’ve exceeded your quota. Implement exponential backoff to handle this gracefully:

import time
import requests
def fetch_with_retry(url, max_retries=3):
 """Fetch with exponential backoff on rate limits."""
 for attempt in range(max_retries):
 response = requests.get(url)
 if response.status_code == 200:
 return response.json()
 if response.status_code == 429:
 wait_time = 2 attempt
 print(f"Rate limited. Waiting {wait_time}s...")
 time.sleep(wait_time)
 response.raise_for_status()
 raise Exception("Max retries exceeded")

This pattern works especially well when combining pagination with other API operations. If you’re building a tool that uses the pdf skill to process documents while also fetching conversation history, rate limit handling ensures your entire workflow doesn’t fail on a temporary throttling event.

For production systems, check the response headers for rate limit metadata. Many APIs include X-RateLimit-Remaining and X-RateLimit-Reset headers that let you be smarter about backoff timing rather than guessing:

def fetch_page_with_header_backoff(url, headers={}):
 """Respect rate limit headers when backing off."""
 response = requests.get(url, headers=headers)
 if response.status_code == 429:
 reset_at = int(response.headers.get("X-RateLimit-Reset", 0))
 wait_seconds = max(reset_at - time.time(), 1)
 print(f"Rate limited. Sleeping {wait_seconds:.1f}s until reset.")
 time.sleep(wait_seconds)
 return fetch_page_with_header_backoff(url, headers)
 response.raise_for_status()
 return response.json()

Using the reset timestamp from the header avoids both over-sleeping (wasting time) and under-sleeping (retrying before you’re allowed).

Parallel Page Fetching for Independent Data

Sometimes you need to fetch multiple paginated resources simultaneously. Rather than sequentially waiting for each page, you can use concurrent requests:

async function fetchMultipleThreads(threadIds: string[]) {
 const fetchThread = async (id: string) => {
 const response = await fetch(`/api/threads/${id}/messages`);
 return response.json();
 };
 // Fetch all threads in parallel
 const results = await Promise.all(
 threadIds.map(fetchThread)
 );
 return results;
}

This approach works well when you know the thread IDs upfront. However, be mindful of total concurrent connections, too many simultaneous requests can trigger rate limits regardless of individual request patterns.

A safer pattern uses a concurrency limiter rather than firing all requests at once. Here is a Python implementation using asyncio with a semaphore to cap parallel requests:

import asyncio
import aiohttp
async def fetch_page(session, url, semaphore):
 async with semaphore:
 async with session.get(url) as response:
 return await response.json()
async def fetch_all_parallel(urls, max_concurrent=5):
 """Fetch multiple paginated endpoints with bounded concurrency."""
 semaphore = asyncio.Semaphore(max_concurrent)
 async with aiohttp.ClientSession() as session:
 tasks = [fetch_page(session, url, semaphore) for url in urls]
 results = await asyncio.gather(*tasks, return_exceptions=True)
 # Filter out exceptions and log them
 successful = [r for r in results if not isinstance(r, Exception)]
 failed = [r for r in results if isinstance(r, Exception)]
 if failed:
 print(f"Warning: {len(failed)} requests failed")
 return successful

This limits to 5 concurrent requests at any time. Increase or decrease based on your rate limit tier and observed error rates.

Combining Claude Skills with Pagination

The real power of pagination emerges when you combine it with Claude’s specialized skills. For instance, when using the frontend-design skill to generate UI components, you might paginate through a library of design tokens:

def process_design_tokens(token_library_id, token_handler):
 """Process design tokens across multiple pages."""
 cursor = None
 while True:
 page = client.design_tokens.list(
 library_id=token_library_id,
 cursor=cursor,
 limit=50
 )
 for token in page.data:
 token_handler(token)
 if not page.has_more:
 break
 cursor = page.cursor

Similarly, when using the tdd skill to generate tests across multiple files, pagination helps you manage large codebases without overwhelming memory:

async function generateTestsForFiles(fileIds, testGenerator) {
 let cursor = null;
 do {
 const page = await fetchFilePage(fileIds, cursor);
 for (const file of page.files) {
 await testGenerator(file.path, file.content);
 }
 cursor = page.has_more ? page.cursor : null;
 } while (cursor);
}

The supermemory skill can also benefit from pagination when retrieving historical context, fetching memories in chunks prevents single-request timeouts while still building a complete context window.

A common pattern when combining skills with pagination is the “accumulate then process” model versus the “process as you go” model. Choose based on your memory constraints:

Accumulate then process. works for smaller datasets
def batch_process_with_pdf_skill(document_ids):
 all_docs = list(paginate_all(document_ids))
 return process_with_pdf_skill(all_docs)
Process as you go. required for large datasets
def stream_process_with_pdf_skill(document_ids):
 cursor = None
 results = []
 while True:
 page = fetch_document_page(document_ids, cursor)
 for doc in page.data:
 result = process_single_doc_with_pdf(doc)
 results.append(result)
 if not page.has_more:
 break
 cursor = page.cursor
 return results

The streaming model is almost always safer for production workflows involving skills that process large files. You avoid loading thousands of documents into memory before any processing begins.

Tracking Pagination State

For long-running operations or user-resumable flows, persist pagination state:

import json
def save_progress(cursor, page_number, filename="pagination_state.json"):
 """Save pagination progress for resumability."""
 state = {
 "cursor": cursor,
 "page": page_number,
 "timestamp": time.time()
 }
 with open(filename, "w") as f:
 json.dump(state, f)
def load_progress(filename="pagination_state.json"):
 """Load saved pagination state."""
 try:
 with open(filename, "r") as f:
 return json.load(f)
 except FileNotFoundError:
 return None

This becomes valuable when building tools that run as background jobs or need to survive application restarts. Your users will appreciate not losing progress when processing thousands of items.

Extend this pattern to store additional context that helps you validate state is still valid when you resume:

def save_full_progress(cursor, processed_count, job_id, filename):
 state = {
 "cursor": cursor,
 "processed_count": processed_count,
 "job_id": job_id,
 "saved_at": time.time(),
 "api_version": "v1" # Track API version in case schema changes
 }
 with open(filename, "w") as f:
 json.dump(state, f, indent=2)
 print(f"Progress saved: {processed_count} records, cursor={cursor[:20]}...")
def resume_job(filename):
 state = load_progress(filename)
 if not state:
 return None, 0
 age_minutes = (time.time() - state["saved_at"]) / 60
 if age_minutes > 60:
 print(f"Warning: state is {age_minutes:.0f} minutes old. cursor is stale")
 return state["cursor"], state["processed_count"]

Storing the timestamp allows you to warn when a saved cursor might have expired. Some APIs invalidate cursors after a certain period of inactivity.

Testing Your Pagination Logic

Pagination code often breaks at boundaries: exactly at the page size limit, on the last partial page, or when a dataset has exactly zero items. Test these edge cases explicitly:

def test_pagination_edge_cases():
 # Empty dataset
 result = fetch_all_messages("empty_thread")
 assert result == [], "Should handle empty dataset"
 # Exactly one page worth of results
 # (dataset size equals page limit)
 single_page = fetch_all_messages("thread_with_50_messages")
 assert len(single_page) == 50, "Should return all items on single page"
 # Dataset size is 1 less than limit
 partial = fetch_all_messages("thread_with_49_messages")
 assert len(partial) == 49, "Should handle partial last page"
 # Dataset exceeds max_results cap
 capped = fetch_all_messages("large_thread", max_results=10)
 assert len(capped) == 10, "Should respect max_results cap"

These tests are easy to write and catch the most common pagination bugs before they reach production. Mock the API client responses to cover cases your live test data might not include.

Key Takeaways

Cursor-based pagination with the Claude Code API requires a different mindset than traditional offset pagination, but it offers significant advantages for data consistency and performance. Set appropriate page sizes based on your use case, implement proper rate limit handling, and consider parallel fetching when you need to gather data from multiple independent sources.

Always test boundary conditions: empty datasets, exactly full pages, and partial last pages. Persist pagination state for any job that takes longer than a few seconds to complete. Use concurrency limits when parallelizing to avoid triggering rate limits through volume alone.

The skills like pdf, tdd, frontend-design, and supermemory all work better when you build pagination into your workflows from the start rather than treating it as an afterthought.

Try it: Paste your error into our Error Diagnostic for an instant fix.

This site was built by 5 autonomous agents running in tmux while I was in Bali. 2,500 articles. Zero manual work. 100% quality gate pass rate. The orchestration configs, sprint templates, and quality gates that made that possible are in the Zovo Lifetime bundle. Along with 16 CLAUDE.md templates and 80 tested prompts. **[See how the pipeline works →](https://zovo.one/lifetime?utm_source=ccg&utm_medium=cta-skills&utm_campaign=claude-code-api-pagination-best-practices)** $99 once. I'm a solo dev in Da Nang. This is how I scale.