Claude Code Bullmq Delayed Retry (2026)

Last updated: April 17, 2026

Claude Code BullMQ Delayed Retry Job Workflow Guide

Building reliable asynchronous job processing systems requires careful consideration of failure handling, delayed execution, and retry strategies. BullMQ, a Node.js message queue library built on Redis, provides powerful primitives for implementing these patterns. This guide explores how to use Claude Code’s capabilities to design, implement, and maintain BullMQ delayed retry job workflows effectively.

Understanding BullMQ Delayed Jobs and Retry Mechanisms

BullMQ offers two primary mechanisms for handling delayed execution and retries: delayed jobs and retry strategies. Understanding when to use each approach is fundamental to building solid systems.

Delayed jobs allow you to schedule a job to be processed after a specified delay. This is useful for scenarios like sending reminder emails, processing time-sensitive data, or implementing rate limiting. When you add a job with a delay option, BullMQ stores it in a Redis sorted set keyed by its execution timestamp. A background scheduler polls this set and moves jobs into the active queue when their time arrives.

Retry strategies automatically reattempt failed jobs with configurable backoff patterns. This ensures transient failures don’t permanently block processing while preventing thundering herd problems through exponential backoff. BullMQ tracks the attemptsMade counter on each job, incrementing it after every failure and applying your configured delay before re-queuing.

Understanding the difference between these two mechanisms matters for architecture decisions:

Mechanism	When to Use	Redis Storage	Failure Handling
Delayed job	Schedule future work	Sorted set by timestamp	Manual re-add
Retry with backoff	Handle transient failures	Active queue with delay	Automatic
Dead letter queue	Exhausted retries	Separate queue	Manual review
Repeatable jobs	Periodic scheduled tasks	Sorted set by CRON	Automatic re-schedule

Claude Code can assist you in designing these patterns by analyzing your requirements and generating appropriate configurations. Its understanding of BullMQ internals allows it to suggest optimal settings based on your use case.

Setting Up a Basic BullMQ Worker with Claude Code

Let’s create a solid BullMQ worker that demonstrates delayed jobs and retry handling. Claude Code can help you scaffold this structure efficiently:

import { Worker, Queue, QueueEvents } from 'bullmq';
import Redis from 'ioredis';
const connection = new Redis(process.env.REDIS_URL || 'redis://localhost:6379', {
 maxRetriesPerRequest: null, // Required for BullMQ
 enableReadyCheck: false,
});
// Define a queue for payment processing
const paymentQueue = new Queue('payment-processing', { connection });
// Create the worker with retry configuration
const paymentWorker = new Worker(
 'payment-processing',
 async job => {
 // Process payment logic here
 const result = await processPayment(job.data);
 return result;
 },
 {
 connection,
 concurrency: 10,
 limiter: {
 max: 100,
 duration: 1000
 },
 // Retry configuration
 defaultJobOptions: {
 attempts: 3,
 backoff: {
 type: 'exponential',
 delay: 1000 // Exponential backoff: 1s, 2s, 4s
 },
 removeOnComplete: { count: 1000 },
 removeOnFail: { count: 5000 }
 }
 }
);
paymentWorker.on('completed', job => {
 console.log(`Job ${job.id} completed successfully`);
});
paymentWorker.on('failed', (job, err) => {
 console.error(`Job ${job?.id} failed:`, err.message);
});
paymentWorker.on('stalled', jobId => {
 console.warn(`Job ${jobId} stalled. worker may have crashed`);
});

Notice the maxRetriesPerRequest: null setting on the Redis connection. this is required by BullMQ and a common source of confusing startup errors. Claude Code will flag this omission if you paste an incomplete configuration.

Implementing Delayed Jobs with Custom Backoff

For more complex scenarios, you might need custom delayed retry logic that adapts based on job attributes or failure types. Here’s how to implement a sophisticated delayed retry workflow:

import { JobsOptions, UnrecoverableError } from 'bullmq';
// Custom delayed retry function
async function addPaymentWithRetry(
 paymentData: PaymentData,
 attemptNumber: number = 0
): Promise<void> {
 const queue = new Queue('payment-processing', { connection });
 const delay = calculateDelay(attemptNumber);
 const jobOptions: JobsOptions = {
 delay, // Delayed execution in milliseconds
 attempts: 5,
 backoff: {
 type: 'custom'
 },
 // Store attempt info for debugging
 removeOnComplete: {
 count: 100,
 age: 3600
 },
 removeOnFail: {
 count: 500
 }
 };
 await queue.add('process-payment', {
 ...paymentData,
 attemptNumber
 }, jobOptions);
}
// Custom backoff calculation
function calculateDelay(attempt: number): number {
 // Exponential backoff with jitter
 const baseDelay = Math.pow(2, attempt) * 1000;
 const jitter = Math.random() * 1000;
 return Math.min(baseDelay + jitter, 30000); // Cap at 30 seconds
}

The UnrecoverableError export is worth highlighting. When you throw an UnrecoverableError inside a worker processor, BullMQ immediately marks the job as failed without retrying. even if attempts is greater than 1. This is perfect for cases like invalid input data where retrying would never help:

import { UnrecoverableError } from 'bullmq';
const worker = new Worker('payment-processing', async job => {
 const { cardNumber, amount } = job.data;
 // Validation failures should not be retried
 if (!cardNumber || cardNumber.length !== 16) {
 throw new UnrecoverableError('Invalid card number. skipping retries');
 }
 // Network errors should be retried
 const response = await chargeCard(cardNumber, amount);
 return response;
}, { connection });

Claude Code can help you extend this pattern to handle specific error types differently, implement circuit breaker patterns, or add alerting for jobs that exceed retry limits.

Differentiating Transient vs. Permanent Failures

One of the most valuable things Claude Code helps you think through is the classification of error types. Not all failures should be retried the same way.

class PaymentGatewayError extends Error {
 constructor(
 message: string,
 public readonly statusCode: number,
 public readonly retryable: boolean
 ) {
 super(message);
 this.name = 'PaymentGatewayError';
 }
}
const worker = new Worker('payment-processing', async job => {
 try {
 return await processPayment(job.data);
 } catch (err) {
 if (err instanceof PaymentGatewayError) {
 if (!err.retryable || err.statusCode === 400) {
 // Bad request. no point retrying
 throw new UnrecoverableError(err.message);
 }
 if (err.statusCode === 429) {
 // Rate limited. tell BullMQ to wait longer before next attempt
 const retryAfter = parseInt(err.headers?.['retry-after'] ?? '60', 10);
 throw Object.assign(new Error(err.message), {
 retryDelay: retryAfter * 1000
 });
 }
 }
 // Any other error: use default exponential backoff
 throw err;
 }
}, {
 connection,
 defaultJobOptions: {
 attempts: 5,
 backoff: { type: 'exponential', delay: 2000 }
 }
});

This pattern lets you embed retry intelligence directly in the error boundary rather than spreading conditional logic across multiple places. Claude Code can audit your existing error handling and suggest where to add similar classification logic.

Using Claude Code to Analyze and Optimize Your Workflow

One of Claude Code’s strengths is its ability to analyze your existing BullMQ setup and suggest improvements. When working with delayed retry workflows, consider asking Claude Code to:

Review your retry configuration. Analyze whether your maxRetries and backoff settings align with your processing requirements
Identify potential issues. Detect configurations that might cause job abandonment or excessive resource usage
Suggest monitoring improvements. Help you set up appropriate logging and alerting
Generate migration scripts. Assist in updating legacy queue configurations
Audit for stall detection. BullMQ marks jobs as stalled if a worker locks a job but doesn’t heartbeat within stalledInterval. Claude Code can identify workers missing this configuration.

Here’s an example prompt you can use with Claude Code:

Review my BullMQ worker configuration for a high-volume notification system.
Currently I'm processing about 10,000 jobs per hour with a simple retry strategy.
What improvements would you suggest for handling temporary API failures while
ensuring no jobs are lost?

Claude Code can then analyze your code and provide specific recommendations tailored to your use case. It will typically surface things like missing maxStalledCount settings, overly aggressive retry counts that can saturate Redis, and missing job progress reporting that makes dashboards useless.

Advanced Pattern: Circuit Breaker with BullMQ

For production systems calling external APIs, a circuit breaker prevents cascading failures when a downstream service is down. Here’s how to implement one alongside BullMQ:

import { Worker, Queue } from 'bullmq';
type CircuitState = 'closed' | 'open' | 'half-open';
class CircuitBreaker {
 private state: CircuitState = 'closed';
 private failureCount = 0;
 private lastFailureTime = 0;
 constructor(
 private readonly threshold: number = 5,
 private readonly resetTimeout: number = 30_000
 ) {}
 async call<T>(fn: () => Promise<T>): Promise<T> {
 if (this.state === 'open') {
 if (Date.now() - this.lastFailureTime > this.resetTimeout) {
 this.state = 'half-open';
 } else {
 throw new Error('Circuit open. request blocked');
 }
 }
 try {
 const result = await fn();
 this.onSuccess();
 return result;
 } catch (err) {
 this.onFailure();
 throw err;
 }
 }
 private onSuccess() {
 this.failureCount = 0;
 this.state = 'closed';
 }
 private onFailure() {
 this.failureCount++;
 this.lastFailureTime = Date.now();
 if (this.failureCount >= this.threshold) {
 this.state = 'open';
 }
 }
}
const breaker = new CircuitBreaker(5, 30_000);
const worker = new Worker('notification-queue', async job => {
 return breaker.call(() => sendPushNotification(job.data));
}, {
 connection,
 defaultJobOptions: {
 attempts: 3,
 backoff: { type: 'exponential', delay: 5000 }
 }
});

When the circuit is open, jobs fail immediately and BullMQ applies its backoff before retrying. By the time BullMQ retries, the circuit may have moved to half-open, allowing a probe request through.

Best Practices for Production Environments

When deploying BullMQ delayed retry workflows in production, keep these best practices in mind:

Always use named jobs. Instead of anonymous job functions, use named jobs to make debugging easier:

await queue.add('send-notification', data, {
 jobId: `notification-${data.userId}-${Date.now()}`
});

Using a deterministic jobId also gives you idempotency for free. adding the same jobId twice will not create a duplicate job.

Implement dead letter queues. After max retries are exhausted, jobs should move to a dead letter queue for manual investigation:

const worker = new Worker('my-queue', processJob, {
 connection,
 defaultJobOptions: {
 attempts: 3,
 backoff: {
 type: 'exponential',
 delay: 2000
 }
 }
});
worker.on('failed', async (job, error) => {
 if (job && job.attemptsMade >= (job.opts.attempts ?? 3)) {
 // Move to dead letter queue
 const dlq = new Queue('my-queue-dlq', { connection });
 await dlq.add('failed-job', {
 originalQueue: 'my-queue',
 jobData: job.data,
 jobId: job.id,
 error: error.message,
 stack: error.stack,
 failedAt: new Date().toISOString()
 });
 }
});

Set sensible removeOnComplete and removeOnFail limits. By default BullMQ retains all completed and failed jobs in Redis. On high-throughput queues this can exhaust memory. Use count- and age-based limits:

const defaultJobOptions = {
 removeOnComplete: { count: 500, age: 86_400 }, // Keep last 500 or 24h
 removeOnFail: { count: 2000 } // Keep last 2000 failures
};

Monitor queue health. Set up dashboards to track:

Jobs pending vs completed vs failed
Average processing time per job type
Retry frequency and patterns
Queue depth over time
Stalled job counts (a rising stall count indicates worker crashes)

Use BullMQ’s built-in flow producer for dependent jobs. If job B depends on job A completing, use FlowProducer instead of manually chaining events:

import { FlowProducer } from 'bullmq';
const flow = new FlowProducer({ connection });
await flow.add({
 name: 'charge-card',
 queueName: 'payment-processing',
 data: { amount: 99.99, cardToken: 'tok_xxx' },
 children: [
 {
 name: 'send-receipt',
 queueName: 'email-queue',
 data: { template: 'receipt' }
 },
 {
 name: 'update-ledger',
 queueName: 'accounting-queue',
 data: { entry: 'debit' }
 }
 ]
});

The parent job (charge-card) only becomes active after all children complete successfully. If a child fails and exhausts retries, the parent job is also failed. Claude Code can help you map your existing business logic onto the FlowProducer API.

Comparing Backoff Strategies

Choosing the wrong backoff strategy is a common source of reliability problems. Here’s a practical comparison:

Strategy	Formula	Best For	Drawback
Fixed	delay = N ms	Predictable SLAs	Thundering herd on restart
Linear	delay = attempt * N	Gradual pressure reduction	Still causes spikes
Exponential	delay = 2^attempt * N	General transient failures	Delay grows very fast
Exponential + jitter	delay = (2^attempt * N) + random(N)	High-concurrency systems	Harder to predict max delay
Custom	any function	Rate-limit-aware retries	Requires maintenance

For most production systems, exponential backoff with jitter is the right default. The jitter desynchronizes retries from multiple workers that all failed at the same moment, spreading load on the recovering downstream service.

Claude Code can assist you in designing appropriate monitoring solutions, reviewing your chosen strategy against your throughput numbers, and setting up alerts for anomalous patterns like sudden spikes in failed job counts.

Conclusion

Building reliable BullMQ delayed retry job workflows requires thoughtful configuration and ongoing maintenance. By using BullMQ’s built-in retry mechanisms, implementing custom backoff strategies, using UnrecoverableError to skip pointless retries, and applying circuit breaker patterns for downstream dependencies, you can create solid systems that handle failures gracefully while maintaining processing reliability.

Remember to always implement dead letter queues for jobs that exceed retry limits, set memory-safe removeOnComplete and removeOnFail limits, monitor your queue health proactively, and design your retry strategy based on your specific failure modes and business requirements. Claude Code is particularly effective for auditing existing configurations, surfacing subtle misconfigurations, and generating the boilerplate for patterns like flow producers and circuit breakers.

Try it: Paste your error into our Error Diagnostic for an instant fix.

I'm a solo developer in Vietnam. 50K Chrome extension users. $500K+ on Upwork. 5 Claude Max subscriptions running agent fleets in parallel. These are my actual CLAUDE.md templates, orchestration configs, and prompts. Not a course. Not theory. The files I copy into every project before I write a line of code. **[See what's inside →](https://zovo.one/lifetime?utm_source=ccg&utm_medium=cta-default&utm_campaign=claude-code-bullmq-delayed-retry-job-workflow-guide)** $99 once. Free forever. 47/500 founding spots left.