PostHog A/B Testing with Claude Code (2026)

Multivariate testing (MVT) allows you to test multiple variations of your application simultaneously, helping you understand which combinations of elements perform best. PostHog provides solid experimentation features that integrate smoothly with modern web applications. This guide shows you how to implement multivariate tests using Claude Code, making experiment setup and analysis more efficient.

Understanding Multivariate Tests in PostHog

PostHog’s experimentation platform supports both simple A/B tests and complex multivariate tests with multiple variants and parameters. Unlike traditional A/B testing that compares two versions, multivariate tests let you test multiple combinations. such as different headlines, images, and call-to-action buttons. all at once.

The key components you need to understand are:

  • Feature flags: The foundation for routing users to different variants
  • Experiment: The wrapper that tracks outcomes and calculates statistical significance
  • Variants: The different versions users might see
  • Metrics: The success criteria you’re measuring

A/B Test vs. Multivariate Test: When to Use Each

Before reaching for a multivariate test, it is worth understanding when each approach is the right choice.

Factor A/B Test Multivariate Test
Number of variants 2 3 or more
Traffic required Lower Higher (splits further)
Test duration Shorter Longer
Best for Testing one change Testing combinations of changes
Statistical complexity Simple More complex interactions
Risk of inconclusive results Low Higher with small traffic
Iteration speed Fast Slower per test

A standard A/B test is the right choice when you want to validate a single hypothesis fast. Multivariate tests shine when you need to understand interaction effects. for example, whether a particular headline performs better only when combined with a specific CTA color.

For most teams, the practical guidance is: start with A/B tests until you have enough traffic (typically 10,000+ weekly active users) to reach significance on multivariate experiments within a reasonable timeframe (two to four weeks). Running a four-variant multivariate test on a site with 500 daily visitors will take months to reach significance, and the results may not reflect current conditions by the time they arrive.

Setting Up PostHog for Multivariate Testing

Before implementing tests with Claude Code, ensure you have the PostHog SDK installed in your project:

npm install posthog-js
or
yarn add posthog-js

Initialize PostHog in your application with your project API key:

import posthog from 'posthog-js';
posthog.init('YOUR_POSTHOG_API_KEY', {
 api_host: 'https://app.posthog.com',
 // Enable feature flags for experiments
 feature_flags: ['experiment-homepage-v2'],
 // Load flags synchronously for immediate use
 bootstrap: {
 featureFlags: {},
 },
});

Bootstrap vs. Async Flag Loading

The initialization code above uses bootstrap with an empty object, which means flags load asynchronously from the PostHog API. For pages where the variant content appears above the fold, this creates a flash of content: the user briefly sees the control variant before the flag loads and the correct variant renders.

There are two solutions:

Option 1. Server-side bootstrap: Fetch the flags on your server during SSR and inject them into the page. This eliminates the async loading entirely.

// Next.js example. pages/index.js
export async function getServerSideProps(context) {
 const userId = context.req.cookies.userId || generateAnonymousId();
 // Fetch flags from PostHog Decide API server-side
 const response = await fetch(
 `https://app.posthog.com/decide?v=3&token=YOUR_API_KEY`,
 {
 method: 'POST',
 headers: { 'Content-Type': 'application/json' },
 body: JSON.stringify({ distinct_id: userId }),
 }
 );
 const { featureFlags } = await response.json();
 return {
 props: { bootstrapFlags: featureFlags, userId },
 };
}
// Then initialize posthog with the server-fetched flags
posthog.init('YOUR_API_KEY', {
 bootstrap: { featureFlags: props.bootstrapFlags },
});

Option 2. Hide until loaded: Render nothing (or a skeleton) until flags are confirmed loaded.

const [flagsLoaded, setFlagsLoaded] = useState(false);
useEffect(() => {
 posthog.onFeatureFlags(() => setFlagsLoaded(true));
}, []);
if (!flagsLoaded) return <PageSkeleton />;

Option 1 is preferable for conversion-critical pages because it adds zero client-side delay. Option 2 is simpler to implement and acceptable for non-critical experiments.

Creating a Multivariate Test with Claude Code

Claude Code can help you set up multivariate tests efficiently. Here’s a practical example of implementing an experiment with multiple variants:

// experiments/homepage-test.js
import { getFeatureFlag } from 'posthog-js';
export function getHomepageVariant(userId) {
 // Define your variants
 const variants = {
 control: {
 headline: 'Welcome to Our Platform',
 ctaText: 'Get Started',
 heroImage: '/images/hero-default.jpg',
 },
 variant_a: {
 headline: 'Transform Your Workflow Today',
 ctaText: 'Start Free Trial',
 heroImage: '/images/hero-productivity.jpg',
 },
 variant_b: {
 headline: 'Join Thousands of Happy Users',
 ctaText: 'Try It Now',
 heroImage: '/images/hero-social-proof.jpg',
 },
 variant_c: {
 headline: 'The All-in-One Solution',
 ctaText: 'Learn More',
 heroImage: '/images/hero-features.jpg',
 },
 };
 // Get the variant from PostHog
 const variant = getFeatureFlag('homepage-multivariate-test', {
 // Send distinct_id for consistent user bucketing
 distinct_id: userId,
 });
 return variants[variant] || variants.control;
}

Separating Variant Data from Component Logic

As your experiment library grows, mixing variant definitions directly into component files makes the codebase hard to maintain. A cleaner architecture separates experiment configuration into dedicated files and auto-generates TypeScript types so components get compile-time safety:

// experiments/registry.ts
export type ExperimentKey = 'homepage-multivariate-test' | 'pricing-page-test' | 'onboarding-flow-test';
export interface VariantConfig {
 [key: string]: Record<string, string | number | boolean>;
}
export const EXPERIMENTS: Record<ExperimentKey, VariantConfig> = {
 'homepage-multivariate-test': {
 control: { headline: 'Welcome to Our Platform', ctaText: 'Get Started', ctaColor: 'blue' },
 variant_a: { headline: 'Transform Your Workflow Today', ctaText: 'Start Free Trial', ctaColor: 'green' },
 variant_b: { headline: 'Join Thousands of Happy Users', ctaText: 'Try It Now', ctaColor: 'blue' },
 variant_c: { headline: 'The All-in-One Solution', ctaText: 'Learn More', ctaColor: 'orange' },
 },
 'pricing-page-test': {
 control: { showAnnualDefault: false, highlightPlan: 'pro' },
 variant_a: { showAnnualDefault: true, highlightPlan: 'pro' },
 variant_b: { showAnnualDefault: true, highlightPlan: 'enterprise' },
 },
 'onboarding-flow-test': {
 control: { steps: 5, showVideo: false },
 variant_a: { steps: 3, showVideo: false },
 variant_b: { steps: 5, showVideo: true },
 variant_c: { steps: 3, showVideo: true },
 },
};
// Generic hook that works for any experiment
export function useExperiment<K extends ExperimentKey>(
 experimentKey: K
): typeof EXPERIMENTS[K][string] {
 const posthog = usePostHog();
 const [variant, setVariant] = useState<string>('control');
 useEffect(() => {
 if (posthog) {
 posthog.onFeatureFlags(() => {
 const assigned = posthog.getFeatureFlag(experimentKey) as string;
 setVariant(assigned || 'control');
 posthog.capture('$experiment_started', {
 experiment_name: experimentKey,
 variant: assigned || 'control',
 });
 });
 }
 }, [posthog, experimentKey]);
 return EXPERIMENTS[experimentKey][variant] ?? EXPERIMENTS[experimentKey]['control'];
}

With this registry, adding a new experiment means adding one entry to EXPERIMENTS. The TypeScript type system prevents referencing undefined experiments or accessing properties that don’t exist on a variant.

Implementing the Test Component

Now create a React component that displays different variants based on the feature flag:

// components/Homepage.jsx
import { usePostHog } from 'posthog-js/react';
import { useEffect, useState } from 'react';
import { getHomepageVariant } from '../experiments/homepage-test';
export default function Homepage({ userId }) {
 const posthog = usePostHog();
 const [variant, setVariant] = useState(null);
 useEffect(() => {
 if (posthog) {
 const assignedVariant = posthog.getFeatureFlag('homepage-multivariate-test', {
 distinct_id: userId,
 });
 setVariant(assignedVariant || 'control');
 // Track the exposure event for analytics
 posthog.capture('experiment_exposed', {
 experiment_name: 'homepage-multivariate-test',
 variant: assignedVariant || 'control',
 });
 }
 }, [posthog, userId]);
 if (!variant) {
 return <div>Loading...</div>;
 }
 const content = getHomepageVariant(userId);
 return (
 <div className="homepage">
 <h1>{content.headline}</h1>
 <img src={content.heroImage} alt="Hero" />
 <button>{content.ctaText}</button>
 </div>
 );
}

Using the registry-based approach from the previous section, the component becomes simpler and fully type-safe:

// components/Homepage.tsx. registry-based approach
import { useExperiment } from '../experiments/registry';
export default function Homepage() {
 const content = useExperiment('homepage-multivariate-test');
 return (
 <div className="homepage">
 <h1>{content.headline}</h1>
 <button
 style={{ backgroundColor: content.ctaColor }}
 onClick={() => trackCTAClick(content.ctaText)}
 >
 {content.ctaText}
 </button>
 </div>
 );
}

The component has no direct PostHog dependency. All experiment logic lives in the hook, making it easy to swap out the experiment platform later if needed.

Tracking Custom Metrics for Your Experiment

PostHog allows you to track custom metrics to measure experiment success. Here’s how to set up goal tracking:

// utils/experiment-tracking.js
export function trackConversion(userId, eventName, properties = {}) {
 if (typeof posthog !== 'undefined') {
 posthog.capture(eventName, {
 ...properties,
 distinct_id: userId,
 timestamp: new Date().toISOString(),
 });
 }
}
// Usage examples
export function trackSignupConversion(userId, variant) {
 trackConversion(userId, 'signup_completed', {
 experiment: 'homepage-multivariate-test',
 variant: variant,
 source: 'homepage_cta',
 });
}
export function trackClickThrough(userId, variant, element) {
 trackConversion(userId, 'cta_clicked', {
 experiment: 'homepage-multivariate-test',
 variant: variant,
 element: element,
 });
}

Designing a Metric Hierarchy

Every experiment should track metrics at three levels:

Primary metric. The single metric that determines the winner. This should be a meaningful business outcome, not a proxy metric. “Signup completed” is a good primary metric. “CTA clicked” is a proxy. it measures intent but not the conversion you actually care about.

Secondary metrics. Supporting metrics that help you understand the tradeoff. If variant_a improves signups but also increases support ticket volume, you need to see both numbers to make a good decision.

Guardrail metrics. Metrics that should not worsen significantly. If an experiment improves signups but degrades page load time beyond your SLA, the guardrail metric stops you from shipping a net-negative change.

// utils/experiment-metrics.js
const EXPERIMENT_METRICS = {
 'homepage-multivariate-test': {
 primary: 'signup_completed',
 secondary: ['trial_started', 'docs_viewed', 'demo_requested'],
 guardrails: ['page_load_time_exceeded', 'error_occurred', 'rage_click'],
 },
};
export function captureExperimentEvent(experimentKey, eventName, properties = {}) {
 const metrics = EXPERIMENT_METRICS[experimentKey];
 const metricType =
 metrics.primary === eventName ? 'primary' :
 metrics.secondary.includes(eventName) ? 'secondary' :
 metrics.guardrails.includes(eventName) ? 'guardrail' :
 'untracked';
 posthog.capture(eventName, {
 ...properties,
 experiment_key: experimentKey,
 metric_type: metricType,
 });
}

Tagging events with metric_type makes PostHog dashboards significantly easier to build. You can filter to primary metrics for executive reporting and drill into guardrails when a variant looks suspiciously good.

Analyzing Experiment Results in PostHog

Once your experiment is running, you can analyze results in the PostHog dashboard. Here are the key metrics to watch:

  1. Sample size: Ensure you have enough users in each variant
  2. Conversion rate: Compare the percentage of users completing the desired action
  3. Statistical significance: PostHog calculates this automatically. look for 95% confidence
  4. Primary and secondary metrics: Track multiple outcomes to understand full impact

In your PostHog dashboard, navigate to Experiments to see:

  • Variant breakdown with conversion rates
  • Confidence intervals
  • Recommended action (winner, loser, or inconclusive)

Reading PostHog’s Bayesian Stats Output

PostHog uses Bayesian statistics by default, which reports a “probability of being best” rather than a p-value. This is more intuitive for product teams:

PostHog Output What It Means Action
Probability of being best > 95% Strong evidence variant wins Ship it
Probability of being best 80–95% Moderate evidence Run longer or accept the risk
Probability of being best 50–80% Weak signal Run longer
Inconclusive after target sample No meaningful difference detected Discard or iterate
Any guardrail metric degrading Variant is harmful Pause and investigate

Do not confuse “probability of being best” with “statistical significance” in the frequentist sense. A 95% probability of being best means there is a 5% chance the control is actually better. that is not the same as a frequentist p-value of 0.05, which is a statement about the probability of observing your data under the null hypothesis.

Sample Size Planning

Before launching, calculate the required sample size to avoid underpowered experiments:

sample_size_calculator.py
import math
def required_sample_size(
 baseline_rate: float, # e.g. 0.05 for 5% baseline conversion
 minimum_detectable_effect: float, # e.g. 0.20 for 20% relative lift
 num_variants: int, # including control
 power: float = 0.80, # statistical power (1 - beta)
 alpha: float = 0.05 # significance level
) -> dict:
 """
 Estimates required sample size per variant using standard formula.
 Returns total and per-variant sample size, plus estimated days at given daily traffic.
 """
 target_rate = baseline_rate * (1 + minimum_detectable_effect)
 # z-scores for common alpha/power values
 z_alpha = 1.96 # for alpha=0.05, two-tailed
 z_beta = 0.84 # for power=0.80

 p1 = baseline_rate
 p2 = target_rate
 pooled = (p1 + p2) / 2
 # Standard formula
 n = (
 (z_alpha * math.sqrt(2 * pooled * (1 - pooled)) +
 z_beta * math.sqrt(p1 * (1 - p1) + p2 * (1 - p2))) 2
 ) / (p2 - p1) 2
 n_per_variant = math.ceil(n)
 n_total = n_per_variant * num_variants
 return {
 'per_variant': n_per_variant,
 'total': n_total,
 'num_variants': num_variants,
 }
5% baseline, detect 20% relative lift, 4 variants
result = required_sample_size(
 baseline_rate=0.05,
 minimum_detectable_effect=0.20,
 num_variants=4,
)
print(f"Need {result['per_variant']} users per variant ({result['total']} total)")
Need ~3,855 users per variant (15,420 total)

For a site with 500 daily visitors, a four-variant test requiring 15,420 users takes about 31 days to collect data. before accounting for weekday/weekend traffic variation. At 5,000 daily visitors, the same experiment completes in about 3 days. This math is what separates “we should A/B test everything” as an aspiration from a practical, prioritized testing roadmap.

Best Practices for Multivariate Testing

When implementing multivariate tests with Claude Code, keep these tips in mind:

Limit variants: While multivariate tests are powerful, too many variants dilute traffic and extend test duration. Start with 3-4 variants maximum.

Ensure proper sample size: Use PostHog’s sample size calculator to determine how long to run your test. Insufficient sample size leads to inconclusive results.

Test one change per experiment: Isolate variables by testing one hypothesis at a time. If you want to test multiple changes, run separate A/B tests.

Document your hypothesis: Before launching, write down what you expect to happen and why. This helps with learning even when experiments fail.

Run tests to completion: Don’t stop experiments early just because results look promising. Wait for statistical significance to avoid false positives.

Experiment Hygiene Checklist

Use this checklist when launching each experiment:

Pre-Launch
- [ ] Hypothesis documented (if X, then Y, because Z)
- [ ] Primary metric defined and tracked
- [ ] Secondary metrics defined and tracked
- [ ] Guardrail metrics defined and tracked
- [ ] Sample size calculated and timeline estimated
- [ ] Rollout percentage set (start at 10-20% if risk is uncertain)
- [ ] Feature flag verified in staging environment
- [ ] Exposure event fires correctly for all variants including control
- [ ] QA: manually overrode each variant and confirmed correct rendering
During Test
- [ ] Check for Sample Ratio Mismatch (SRM) in first 24 hours
- [ ] Monitor guardrail metrics daily
- [ ] Do not peek at primary metric results (peeking inflates false positive rate)
Post-Test
- [ ] Wait for target sample size before reading results
- [ ] Document results in experiment log regardless of outcome
- [ ] If shipping winner: remove experiment code and clean up feature flag
- [ ] If inconclusive: document learnings, do not run the same test again

Sample Ratio Mismatch (SRM) occurs when variants receive significantly different traffic than expected. A 25%/25%/25%/25% split experiment where variant_c receives only 18% of traffic indicates a bug in your bucketing logic. cookies not persisting, bots skewing one variant, or a routing bug. Always verify your traffic distribution is close to expected before trusting the results.

Debugging Experiment Issues

When your experiment isn’t working as expected, check these common issues:

// Debug: Verify flag is loading correctly
console.log('Feature flags:', posthog.getFeatureFlags());
// Debug: Check if user is in experiment
const isInExperiment = posthog.getFeatureFlag('homepage-multivariate-test', {
 distinct_id: userId,
});
console.log('User variant:', isInExperiment);
// Debug: Verify events are firing
posthog.debug(); // Enables debug mode

Systematic Debugging by Symptom

Symptom Likely Cause Debug Step
All users get control Feature flag not enabled or targeting rules exclude all users Check flag status in PostHog UI; verify rollout % > 0
Some users switch variants between sessions User identity not stable (anonymous + logged-in mismatch) Call posthog.identify(userId) on login
Variant shows briefly then switches to control Async flag loading; bootstrap not set Use server-side bootstrap or hide-until-loaded pattern
Experiment shows 0 exposures in dashboard Exposure event not firing or event name mismatch Add console.log to exposure capture call; check PostHog Live Events
Sample Ratio Mismatch Bucketing bug; bots; cached responses skewing one variant Audit CDN cache headers; verify bot filtering; check flag evaluation logic
Significant result that reverses after shipping Novelty effect; peeking; SRM Rerun with fresh users; audit pre-ship analytics
// Comprehensive debug utility
function debugExperiment(experimentKey) {
 const allFlags = posthog.getFeatureFlags();
 const myVariant = posthog.getFeatureFlag(experimentKey);
 const distinctId = posthog.get_distinct_id();
 const sessionId = posthog.get_session_id();
 console.group(`Experiment Debug: ${experimentKey}`);
 console.log('distinct_id:', distinctId);
 console.log('session_id:', sessionId);
 console.log('assigned variant:', myVariant);
 console.log('all flags:', allFlags);
 console.log('PostHog initialized:', posthog.__loaded);
 console.groupEnd();
 // Force override for local testing (never run in production)
 if (process.env.NODE_ENV === 'development') {
 const forceVariant = new URLSearchParams(window.location.search).get('force_variant');
 if (forceVariant) {
 posthog.featureFlags.override({ [experimentKey]: forceVariant });
 console.warn(`Forced variant override: ${forceVariant}`);
 }
 }
}

The force_variant query parameter pattern is particularly useful during QA. Testers can append ?force_variant=variant_b to any URL to verify the visual rendering of a specific variant without needing to manipulate cookies or create test accounts that happen to fall into the right bucket.

Server-Side Experimentation

For server-rendered applications, or when you need to personalize API responses based on experiment variant, use the PostHog Node.js SDK:

// server/experiment-middleware.js
import { PostHog } from 'posthog-node';
const posthogClient = new PostHog('YOUR_API_KEY', {
 host: 'https://app.posthog.com',
 // Disable event batching for serverless environments
 flushAt: 1,
 flushInterval: 0,
});
export async function getServerVariant(userId, experimentKey) {
 try {
 const variant = await posthogClient.getFeatureFlag(experimentKey, userId);
 // Capture exposure server-side
 posthogClient.capture({
 distinctId: userId,
 event: '$experiment_started',
 properties: {
 experiment_name: experimentKey,
 variant: variant || 'control',
 $set: { last_experiment_date: new Date().toISOString() },
 },
 });
 await posthogClient.flush(); // Required for serverless
 return variant || 'control';
 } catch (error) {
 console.error('PostHog flag fetch failed, defaulting to control:', error);
 return 'control';
 }
}

Server-side evaluation has two advantages over client-side: zero flash of content, and the ability to branch server rendering logic (different HTML, different API responses) rather than just visual changes.

The tradeoff is added latency for the initial page request. Mitigate this by caching flag assignments in a fast store (Redis, edge KV) with a TTL of a few minutes. Users get a slightly stale assignment rather than a fresh API call on every request, which is usually acceptable.

Conclusion

Implementing multivariate tests with PostHog and Claude Code is straightforward once you understand the core concepts. Feature flags handle user bucketing, the PostHog SDK provides exposure tracking, and the dashboard gives you statistical analysis out of the box. Claude Code can help you write clean, maintainable experiment code that scales with your application.

Start with a simple A/B test before moving to multivariate experiments, and always ensure you have enough traffic to reach statistical significance. Use the experiment registry pattern to keep variant configurations centralized and type-safe. Plan sample sizes before launching, monitor for Sample Ratio Mismatch early, and track guardrail metrics alongside your primary conversion goal. With proper implementation, you will be able to make data-driven decisions that improve user experience and conversion rates without the debugging burden that poorly instrumented experiments typically produce.


Try it: Paste your error into our Error Diagnostic for an instant fix.

I'm a solo developer in Vietnam. 50K Chrome extension users. $500K+ on Upwork. 5 Claude Max subscriptions running agent fleets in parallel. These are my actual CLAUDE.md templates, orchestration configs, and prompts. Not a course. Not theory. The files I copy into every project before I write a line of code. **[See what's inside →](https://zovo.one/lifetime?utm_source=ccg&utm_medium=cta-default&utm_campaign=claude-code-posthog-multivariate-test-implementation-guide)** $99 once. Free forever. 47/500 founding spots left.

Related Reading

Built by theluckystrike. More at zovo.one

Find the right skill → Browse 155+ skills in our Skill Finder.

Quick setup → Launch your project with our Project Starter.