Claude Code Bullmq Delayed Retry (2026)
Claude Code BullMQ Delayed Retry Job Workflow Guide
Building reliable asynchronous job processing systems requires careful consideration of failure handling, delayed execution, and retry strategies. BullMQ, a Node.js message queue library built on Redis, provides powerful primitives for implementing these patterns. This guide explores how to use Claude Code’s capabilities to design, implement, and maintain BullMQ delayed retry job workflows effectively.
Understanding BullMQ Delayed Jobs and Retry Mechanisms
BullMQ offers two primary mechanisms for handling delayed execution and retries: delayed jobs and retry strategies. Understanding when to use each approach is fundamental to building solid systems.
Delayed jobs allow you to schedule a job to be processed after a specified delay. This is useful for scenarios like sending reminder emails, processing time-sensitive data, or implementing rate limiting. When you add a job with a delay option, BullMQ stores it in a Redis sorted set keyed by its execution timestamp. A background scheduler polls this set and moves jobs into the active queue when their time arrives.
Retry strategies automatically reattempt failed jobs with configurable backoff patterns. This ensures transient failures don’t permanently block processing while preventing thundering herd problems through exponential backoff. BullMQ tracks the attemptsMade counter on each job, incrementing it after every failure and applying your configured delay before re-queuing.
Understanding the difference between these two mechanisms matters for architecture decisions:
| Mechanism | When to Use | Redis Storage | Failure Handling |
|---|---|---|---|
| Delayed job | Schedule future work | Sorted set by timestamp | Manual re-add |
| Retry with backoff | Handle transient failures | Active queue with delay | Automatic |
| Dead letter queue | Exhausted retries | Separate queue | Manual review |
| Repeatable jobs | Periodic scheduled tasks | Sorted set by CRON | Automatic re-schedule |
Claude Code can assist you in designing these patterns by analyzing your requirements and generating appropriate configurations. Its understanding of BullMQ internals allows it to suggest optimal settings based on your use case.
Setting Up a Basic BullMQ Worker with Claude Code
Let’s create a solid BullMQ worker that demonstrates delayed jobs and retry handling. Claude Code can help you scaffold this structure efficiently:
import { Worker, Queue, QueueEvents } from 'bullmq';
import Redis from 'ioredis';
const connection = new Redis(process.env.REDIS_URL || 'redis://localhost:6379', {
maxRetriesPerRequest: null, // Required for BullMQ
enableReadyCheck: false,
});
// Define a queue for payment processing
const paymentQueue = new Queue('payment-processing', { connection });
// Create the worker with retry configuration
const paymentWorker = new Worker(
'payment-processing',
async job => {
// Process payment logic here
const result = await processPayment(job.data);
return result;
},
{
connection,
concurrency: 10,
limiter: {
max: 100,
duration: 1000
},
// Retry configuration
defaultJobOptions: {
attempts: 3,
backoff: {
type: 'exponential',
delay: 1000 // Exponential backoff: 1s, 2s, 4s
},
removeOnComplete: { count: 1000 },
removeOnFail: { count: 5000 }
}
}
);
paymentWorker.on('completed', job => {
console.log(`Job ${job.id} completed successfully`);
});
paymentWorker.on('failed', (job, err) => {
console.error(`Job ${job?.id} failed:`, err.message);
});
paymentWorker.on('stalled', jobId => {
console.warn(`Job ${jobId} stalled. worker may have crashed`);
});
Notice the maxRetriesPerRequest: null setting on the Redis connection. this is required by BullMQ and a common source of confusing startup errors. Claude Code will flag this omission if you paste an incomplete configuration.
Implementing Delayed Jobs with Custom Backoff
For more complex scenarios, you might need custom delayed retry logic that adapts based on job attributes or failure types. Here’s how to implement a sophisticated delayed retry workflow:
import { JobsOptions, UnrecoverableError } from 'bullmq';
// Custom delayed retry function
async function addPaymentWithRetry(
paymentData: PaymentData,
attemptNumber: number = 0
): Promise<void> {
const queue = new Queue('payment-processing', { connection });
const delay = calculateDelay(attemptNumber);
const jobOptions: JobsOptions = {
delay, // Delayed execution in milliseconds
attempts: 5,
backoff: {
type: 'custom'
},
// Store attempt info for debugging
removeOnComplete: {
count: 100,
age: 3600
},
removeOnFail: {
count: 500
}
};
await queue.add('process-payment', {
...paymentData,
attemptNumber
}, jobOptions);
}
// Custom backoff calculation
function calculateDelay(attempt: number): number {
// Exponential backoff with jitter
const baseDelay = Math.pow(2, attempt) * 1000;
const jitter = Math.random() * 1000;
return Math.min(baseDelay + jitter, 30000); // Cap at 30 seconds
}
The UnrecoverableError export is worth highlighting. When you throw an UnrecoverableError inside a worker processor, BullMQ immediately marks the job as failed without retrying. even if attempts is greater than 1. This is perfect for cases like invalid input data where retrying would never help:
import { UnrecoverableError } from 'bullmq';
const worker = new Worker('payment-processing', async job => {
const { cardNumber, amount } = job.data;
// Validation failures should not be retried
if (!cardNumber || cardNumber.length !== 16) {
throw new UnrecoverableError('Invalid card number. skipping retries');
}
// Network errors should be retried
const response = await chargeCard(cardNumber, amount);
return response;
}, { connection });
Claude Code can help you extend this pattern to handle specific error types differently, implement circuit breaker patterns, or add alerting for jobs that exceed retry limits.
Differentiating Transient vs. Permanent Failures
One of the most valuable things Claude Code helps you think through is the classification of error types. Not all failures should be retried the same way.
class PaymentGatewayError extends Error {
constructor(
message: string,
public readonly statusCode: number,
public readonly retryable: boolean
) {
super(message);
this.name = 'PaymentGatewayError';
}
}
const worker = new Worker('payment-processing', async job => {
try {
return await processPayment(job.data);
} catch (err) {
if (err instanceof PaymentGatewayError) {
if (!err.retryable || err.statusCode === 400) {
// Bad request. no point retrying
throw new UnrecoverableError(err.message);
}
if (err.statusCode === 429) {
// Rate limited. tell BullMQ to wait longer before next attempt
const retryAfter = parseInt(err.headers?.['retry-after'] ?? '60', 10);
throw Object.assign(new Error(err.message), {
retryDelay: retryAfter * 1000
});
}
}
// Any other error: use default exponential backoff
throw err;
}
}, {
connection,
defaultJobOptions: {
attempts: 5,
backoff: { type: 'exponential', delay: 2000 }
}
});
This pattern lets you embed retry intelligence directly in the error boundary rather than spreading conditional logic across multiple places. Claude Code can audit your existing error handling and suggest where to add similar classification logic.
Using Claude Code to Analyze and Optimize Your Workflow
One of Claude Code’s strengths is its ability to analyze your existing BullMQ setup and suggest improvements. When working with delayed retry workflows, consider asking Claude Code to:
- Review your retry configuration. Analyze whether your maxRetries and backoff settings align with your processing requirements
- Identify potential issues. Detect configurations that might cause job abandonment or excessive resource usage
- Suggest monitoring improvements. Help you set up appropriate logging and alerting
- Generate migration scripts. Assist in updating legacy queue configurations
- Audit for stall detection. BullMQ marks jobs as stalled if a worker locks a job but doesn’t heartbeat within
stalledInterval. Claude Code can identify workers missing this configuration.
Here’s an example prompt you can use with Claude Code:
Review my BullMQ worker configuration for a high-volume notification system.
Currently I'm processing about 10,000 jobs per hour with a simple retry strategy.
What improvements would you suggest for handling temporary API failures while
ensuring no jobs are lost?
Claude Code can then analyze your code and provide specific recommendations tailored to your use case. It will typically surface things like missing maxStalledCount settings, overly aggressive retry counts that can saturate Redis, and missing job progress reporting that makes dashboards useless.
Advanced Pattern: Circuit Breaker with BullMQ
For production systems calling external APIs, a circuit breaker prevents cascading failures when a downstream service is down. Here’s how to implement one alongside BullMQ:
import { Worker, Queue } from 'bullmq';
type CircuitState = 'closed' | 'open' | 'half-open';
class CircuitBreaker {
private state: CircuitState = 'closed';
private failureCount = 0;
private lastFailureTime = 0;
constructor(
private readonly threshold: number = 5,
private readonly resetTimeout: number = 30_000
) {}
async call<T>(fn: () => Promise<T>): Promise<T> {
if (this.state === 'open') {
if (Date.now() - this.lastFailureTime > this.resetTimeout) {
this.state = 'half-open';
} else {
throw new Error('Circuit open. request blocked');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (err) {
this.onFailure();
throw err;
}
}
private onSuccess() {
this.failureCount = 0;
this.state = 'closed';
}
private onFailure() {
this.failureCount++;
this.lastFailureTime = Date.now();
if (this.failureCount >= this.threshold) {
this.state = 'open';
}
}
}
const breaker = new CircuitBreaker(5, 30_000);
const worker = new Worker('notification-queue', async job => {
return breaker.call(() => sendPushNotification(job.data));
}, {
connection,
defaultJobOptions: {
attempts: 3,
backoff: { type: 'exponential', delay: 5000 }
}
});
When the circuit is open, jobs fail immediately and BullMQ applies its backoff before retrying. By the time BullMQ retries, the circuit may have moved to half-open, allowing a probe request through.
Best Practices for Production Environments
When deploying BullMQ delayed retry workflows in production, keep these best practices in mind:
Always use named jobs. Instead of anonymous job functions, use named jobs to make debugging easier:
await queue.add('send-notification', data, {
jobId: `notification-${data.userId}-${Date.now()}`
});
Using a deterministic jobId also gives you idempotency for free. adding the same jobId twice will not create a duplicate job.
Implement dead letter queues. After max retries are exhausted, jobs should move to a dead letter queue for manual investigation:
const worker = new Worker('my-queue', processJob, {
connection,
defaultJobOptions: {
attempts: 3,
backoff: {
type: 'exponential',
delay: 2000
}
}
});
worker.on('failed', async (job, error) => {
if (job && job.attemptsMade >= (job.opts.attempts ?? 3)) {
// Move to dead letter queue
const dlq = new Queue('my-queue-dlq', { connection });
await dlq.add('failed-job', {
originalQueue: 'my-queue',
jobData: job.data,
jobId: job.id,
error: error.message,
stack: error.stack,
failedAt: new Date().toISOString()
});
}
});
Set sensible removeOnComplete and removeOnFail limits. By default BullMQ retains all completed and failed jobs in Redis. On high-throughput queues this can exhaust memory. Use count- and age-based limits:
const defaultJobOptions = {
removeOnComplete: { count: 500, age: 86_400 }, // Keep last 500 or 24h
removeOnFail: { count: 2000 } // Keep last 2000 failures
};
Monitor queue health. Set up dashboards to track:
- Jobs pending vs completed vs failed
- Average processing time per job type
- Retry frequency and patterns
- Queue depth over time
- Stalled job counts (a rising stall count indicates worker crashes)
Use BullMQ’s built-in flow producer for dependent jobs. If job B depends on job A completing, use FlowProducer instead of manually chaining events:
import { FlowProducer } from 'bullmq';
const flow = new FlowProducer({ connection });
await flow.add({
name: 'charge-card',
queueName: 'payment-processing',
data: { amount: 99.99, cardToken: 'tok_xxx' },
children: [
{
name: 'send-receipt',
queueName: 'email-queue',
data: { template: 'receipt' }
},
{
name: 'update-ledger',
queueName: 'accounting-queue',
data: { entry: 'debit' }
}
]
});
The parent job (charge-card) only becomes active after all children complete successfully. If a child fails and exhausts retries, the parent job is also failed. Claude Code can help you map your existing business logic onto the FlowProducer API.
Comparing Backoff Strategies
Choosing the wrong backoff strategy is a common source of reliability problems. Here’s a practical comparison:
| Strategy | Formula | Best For | Drawback |
|---|---|---|---|
| Fixed | delay = N ms | Predictable SLAs | Thundering herd on restart |
| Linear | delay = attempt * N | Gradual pressure reduction | Still causes spikes |
| Exponential | delay = 2^attempt * N | General transient failures | Delay grows very fast |
| Exponential + jitter | delay = (2^attempt * N) + random(N) | High-concurrency systems | Harder to predict max delay |
| Custom | any function | Rate-limit-aware retries | Requires maintenance |
For most production systems, exponential backoff with jitter is the right default. The jitter desynchronizes retries from multiple workers that all failed at the same moment, spreading load on the recovering downstream service.
Claude Code can assist you in designing appropriate monitoring solutions, reviewing your chosen strategy against your throughput numbers, and setting up alerts for anomalous patterns like sudden spikes in failed job counts.
Conclusion
Building reliable BullMQ delayed retry job workflows requires thoughtful configuration and ongoing maintenance. By using BullMQ’s built-in retry mechanisms, implementing custom backoff strategies, using UnrecoverableError to skip pointless retries, and applying circuit breaker patterns for downstream dependencies, you can create solid systems that handle failures gracefully while maintaining processing reliability.
Remember to always implement dead letter queues for jobs that exceed retry limits, set memory-safe removeOnComplete and removeOnFail limits, monitor your queue health proactively, and design your retry strategy based on your specific failure modes and business requirements. Claude Code is particularly effective for auditing existing configurations, surfacing subtle misconfigurations, and generating the boilerplate for patterns like flow producers and circuit breakers.
Try it: Paste your error into our Error Diagnostic for an instant fix.
Related Reading
- Claude Code Skills Redis Caching Layer Implementation
- Claude Code Upstash Redis Rate Limiting Workflow
- Claude Code Trigger.dev Background Job Workflow Guide
Built by theluckystrike. More at zovo.one
Find the right skill → Browse 155+ skills in our Skill Finder.