Who is A/B 测试显著性计算器 best for?

Marketers validating ad, landing-page, or email tests CRO teams confirming a lift is real before rolling out Anyone avoiding decisions based on random fluctuation

What should beginners know about A/B 测试显著性计算器?

Set your sample size and test duration in advance; stopping early when a result looks good inflates false positives. Significance answers whether an effect exists, not whether it is large enough to matter commercially. Run tests over full weeks so weekday and weekend behaviour are both represented.

Calculators

A/B 测试显著性计算器

Free

Determine whether your A/B test result is statistically significant by entering visitors and conversions for the control and variant groups. The calculator runs a two-proportion z-test entirely in your browser and reports each conversion rate, the relative lift, the two-tailed p-value, and a clear verdict at the 95 percent confidence level. When a result is not yet significant, it also estimates the minimum sample size per variant needed to detect the observed lift at 80 percent statistical power, so you know whether to keep the test running or call it.

AnalyticsA/B testingstatistical significancecalculatorCROp-value

What it does

This calculator decides whether the gap between two variants in an A/B test is a real effect or just noise. Enter the visitors and conversions for your control and variant and it returns each conversion rate, the relative lift, and a statistical significance result — typically a p-value and confidence level — telling you how likely the difference is to hold beyond your sample. It exists because eyeballing a few percentage points of lift routinely fools marketers into shipping changes that were random fluctuations. Use it before declaring a winner on an ad creative, landing page, or email test, and to judge whether a test has collected enough data to trust at all.

Where it fits

It sits at the analytics stage, separating real winners from noise before you act on test results.

Core features

Conversion rate for control and variant from raw counts
Relative lift between the two variants
Statistical significance with p-value and confidence level
Clear winner / no-significant-difference verdict
Quick read on whether sample size is sufficient

Best for

Marketers validating ad, landing-page, or email tests
CRO teams confirming a lift is real before rolling out
Anyone avoiding decisions based on random fluctuation

Beginner notes

Set your sample size and test duration in advance; stopping early when a result looks good inflates false positives.
Significance answers whether an effect exists, not whether it is large enough to matter commercially.
Run tests over full weeks so weekday and weekend behaviour are both represented.

A/B Test Significance Calculator

Two-proportion z-test to determine if A/B test results are statistically significant (95% confidence)

A — Control

VisitorsConversions

CVR:—

B — Variant

VisitorsConversions

CVR:—

Lift

—

p-value

—

Verdict

—

Two-proportion z-test (two-tailed) · p < 0.05 is considered significant · Significance means the difference is unlikely due to chance, not that it is practically important

What is the A/B Test Significance Calculator?

Running an A/B test without statistical analysis is not an experiment — it is guesswork dressed up in data. The A/B Test Significance Calculator determines whether the difference in conversion rates between your control (variant A) and challenger (variant B) is statistically significant, or whether it could plausibly be explained by random chance.

Statistical significance testing answers the question: "If variant A and variant B actually had the same true conversion rate, how likely is it that I would observe a difference this large just by chance?" When that probability drops below 5% (the conventional threshold), we say the result is statistically significant at 95% confidence — meaning there is less than a 5% chance that the observed difference is a fluke.

This matters enormously in advertising and conversion tracking contexts. Without significance testing, even experienced marketers systematically misread noise as signal. A variant that appears to win by 10% after 200 conversions may have a 30% chance of being random luck. Launching that "winner" at scale means scaling variance, not a real performance improvement. Over hundreds of such decisions, the compounding error erodes program performance significantly.

The calculator uses a two-proportion z-test, the standard statistical method for comparing conversion rates between two independent groups. It outputs a p-value, a z-score, and a plain-language verdict on whether your result has cleared the 95% confidence threshold.

For the decisions that follow a significant test result — budget reallocation, creative rotation, bid strategy changes — the Campaign Metrics Calculator provides the full-funnel context to size the impact.

Formula & How It Works

Two-proportion z-test:

Given two variants with:

Variant A: n_A visitors, c_A conversions → conversion rate p_A = c_A / n_A
Variant B: n_B visitors, c_B conversions → conversion rate p_B = c_B / n_B

Pooled proportion: p_pool = (c_A + c_B) / (n_A + n_B)

Standard error: SE = √[ p_pool × (1 − p_pool) × (1/n_A + 1/n_B) ]

Z-score: z = (p_B − p_A) / SE

P-value: the probability of observing a z-score this extreme if the null hypothesis (no true difference) were true. For a two-tailed test, p-value = 2 × (1 − Φ(|z|)), where Φ is the standard normal CDF.

Decision rule:

p-value < 0.05 → statistically significant at 95% confidence
p-value < 0.01 → statistically significant at 99% confidence
p-value ≥ 0.05 → not significant; do not declare a winner

Worked example:

A landing page test sends 5,000 visitors to variant A (control) and 5,000 to variant B (new headline). Variant A converts at 2.4% (120 conversions); variant B converts at 2.9% (145 conversions). Is the 20.8% relative lift real?

p_A = 0.024, p_B = 0.029
p_pool = (120 + 145) / 10,000 = 0.0265
SE = √[0.0265 × 0.9735 × (1/5000 + 1/5000)] = √[0.0000515] ≈ 0.00718
z = (0.029 − 0.024) / 0.00718 ≈ 0.696
p-value ≈ 0.487

Result: Not significant. Despite a 20.8% relative lift appearing in the data, there is a 49% probability this difference is due to random variation. You need more data before declaring variant B the winner.

This example illustrates the most common A/B testing mistake: calling a test too early because the lift "looks big." The underlying conversion rates (2.4% vs. 2.9%) are close enough and the sample small enough that variance drowns the signal.

Industry Benchmarks

Minimum detectable effect and required sample size:

The sample size required for a significant test depends on your baseline conversion rate, the minimum lift you care about detecting, and your desired confidence level. Rough estimates at 95% confidence and 80% statistical power:

Baseline CVR	Minimum Lift to Detect	Required Visitors Per Variant
1%	20% (to 1.2%)	~35,000
1%	50% (to 1.5%)	~7,500
2%	20% (to 2.4%)	~17,500
5%	10% (to 5.5%)	~23,000
5%	20% (to 6.0%)	~6,000
10%	10% (to 11%)	~11,000

Key observation: at low conversion rates (under 2%), detecting small lifts requires very large samples. An e-commerce checkout page converting at 1% needs tens of thousands of visitors per variant to detect a 20% improvement — at 1,000 daily visitors per variant, that is a 35-day test minimum.

Test duration guidelines:

Minimum runtime: 2 full weeks (to account for day-of-week variation)
Maximum runtime: 6–8 weeks (beyond this, external factors contaminate the read)
Never end a test solely because significance was reached early — peeking inflates false positive rates severely
Recommended confidence threshold: 95% (p < 0.05) for most decisions; 99% for irreversible changes

How to Use This Calculator

Enter variant A sample size — the number of visitors or impressions exposed to the control.
Enter variant A conversions — the number of goal completions (purchases, sign-ups, clicks) for the control.
Enter variant B sample size — visitors exposed to the challenger variant.
Enter variant B conversions — goal completions for the challenger.
Read the p-value and z-score — the primary statistical outputs.
Read the verdict — the calculator tells you plainly whether the result is significant at 95% confidence.
Check the relative lift — the percentage improvement (or decline) of B versus A, alongside its significance rating.
Do not call it early — if the result is not yet significant, continue the test rather than pulling the plug or launching the "leading" variant.

After confirming significance, use conversion tracking data to validate the result in your analytics platform, and attribution data to ensure the winning variant's lift holds across different traffic sources.

FAQ

What does 95% confidence actually mean?

It means that if you ran this exact experiment 100 times under identical conditions with no true difference between variants, you would expect to see a result this extreme or more extreme about 5 times by chance. When a test clears 95% confidence, it does not mean there is a 95% chance that variant B is truly better — it means there is less than a 5% chance that your observed data would occur if there were no real difference. Statisticians call this rejecting the null hypothesis at the 5% significance level.

Why should I not stop the test when I first see p < 0.05?

Peeking at results and stopping as soon as significance is reached is called "optional stopping," and it dramatically inflates your false positive rate. If you check results continuously and stop at first significance, your actual false positive rate can reach 30–40% even when you set a 5% threshold. This is because with each additional peek, you are giving random variance more chances to create a spurious significant result. Commit to a minimum sample size or runtime before you launch the test, and honor it.

My test shows a significant result but the lift is tiny. Should I implement variant B?

Statistical significance and practical significance are different things. A 0.1% lift in conversion rate may be statistically significant with 1 million visitors per variant, but the business impact may not justify the implementation cost. Always pair the significance verdict with an estimate of the revenue impact: Incremental Conversions = (p_B − p_A) × Monthly Traffic × 0.5, then multiply by average order value. If the annualized revenue impact exceeds the implementation cost, implement. If not, consider the test informative but not action-worthy.

Can I use this calculator for metrics other than conversion rate?

The two-proportion z-test applies specifically to binary outcomes (converted / did not convert). For continuous metrics — average order value, revenue per visitor, session duration — you need a different statistical test (typically a two-sample t-test or a non-parametric equivalent). For click-through rate tests on ad creative, where clicks versus impressions are the binary outcome, this calculator applies directly. For testing against attribution revenue metrics, consult a statistician or use a platform-specific testing tool that handles revenue distributions properly.

Concepts behind this tool

Conversion Tracking

Conversion tracking records the actions used to measure and optimize campaigns.

Ad Creative Testing

Ad creative testing is the process of running controlled experiments to determine which creative elements produce better campaign results.

More tools: Calculators

Free

广告 ROI 计算器

Calculate your advertising return on investment by entering total ad spend and the revenue your campaigns have generated, and this tool immediately shows your ROI percentage and net profit. Performance is rated against industry benchmarks so you instantly know whether a campaign is losing money, breaking even, or delivering strong results. Suitable for any paid channel including Google Ads, Meta Ads, TikTok Ads, and programmatic display, this calculator removes guesswork from basic campaign profitability analysis.

Calculators

Free

ROAS 计算器

Calculate return on ad spend by entering your advertising cost and the revenue directly attributed to those ads, and this calculator instantly produces your ROAS multiplier with a performance label showing whether your result is below average, acceptable, or excellent. Industry benchmark ranges help you judge your campaigns against typical standards for e-commerce, lead generation, and app install campaigns. A reverse calculation mode lets you set a target ROAS and compute the revenue needed to achieve it.

Calculators

Free

CPM / CPC 计算器

Convert between advertising budget, impressions, clicks, CPM, and CPC using this bidirectional media planning tool that solves for whichever metric you need. In budget-to-volume mode, enter spend plus a rate to calculate how many impressions or clicks your budget buys. In volume-to-budget mode, enter a target quantity to find the spend required, and use the built-in CTR module to convert between impression and click figures in either direction.

Calculators