A/B Test Sample Size Calculator
Calculate the sample size your split test needs to reach statistical significance — and convert it into a test duration in days using your actual traffic. Defaults to alpha 0.05 two-tailed and 80% power, the conventional CRO standard.
Your Test Parameters
Inputs update results instantly.
Current rate of the step you are testing — not end-to-end funnel
Smallest relative lift you want to detect. p₂ = p₁ × (1 + MDE). Default 10%
False positive rate — drives Zα/2
One- or two-tailed
Probability of detecting a real effect — drives Zβ
Visitors per day reaching the page being tested — not part of the n formula
- n
- sample size per variant
- p₁
- baseline conversion rate = 14.00%
- p₂
- target rate = p₁ × (1 + MDE) = 15.40%
- q₁, q₂
- 1 − p₁ and 1 − p₂ (non-conversion rates)
- p̄
- pooled rate = (p₁ + p₂) / 2 = 14.70%
- q̄
- 1 − p̄
- Zα/2
- z-score from significance level = 1.9600
- Zβ
- z-score from statistical power = 0.8416
- (p₁ − p₂)
- absolute difference between rates (also written as δ) = -1.40 pp
How MDE Changes Sample Size and Duration
Halving the MDE roughly quadruples the sample size required — the relationship is squared, not linear. This is the single biggest reason "let us see if anything moves" tests never finish.
| Relative MDE | Target Rate (p₂) | Sample / Variant | Recommended Duration |
|---|---|---|---|
| 5.0% | 14.70% | 39,375 | 49 days |
| 10.0%your input | 15.40% | 10,042 | 14 days |
| 20.0% | 16.80% | 2,608 | 14 days |
Duration assumes a 50/50 traffic split between control and variant, with a 14-day floor and rounding up to full weeks.
Baseline, MDE, Alpha and Power Explained
The current conversion rate of the step you are testing. The rule: baseline is the rate the change can actually move. Testing a pricing page change against an end-to-end LP-to-purchase rate inflates required sample size by 5 to 20 times for no reason.
The smallest relative lift you want the test to detect. Stated as a percent of baseline, not absolute percentage points. 10% relative on a 15.9% baseline means detecting 17.5%. Smaller MDE means much larger sample size — halving it roughly quadruples it.
Your tolerance for false positives — declaring a winner when there is no real difference. 0.05 two-tailed is the standard "statistical significance" threshold. One-tailed cuts sample size by ~20% but assumes you do not care if the variant is worse, which is rarely true in CRO.
Your tolerance for false negatives — missing a real winner. 80% means that if the true lift equals your MDE, the test will detect it 80% of the time. Below 70% you are running tests that mostly cannot succeed even when the variant is genuinely better.
For a worked scenario using the formula above against a real funnel, see the full guide: A/B Test Sample Size Calculator: Baseline, MDE, Alpha and Power for CRO.
A/B Test Sample Size FAQ
I'll review your funnel, identify the highest-leverage step to test, and help you design a test plan that can actually reach statistical significance with the traffic you have.
