Skip to main content
United StatesStatisticsSyllabus dot point

How is the sampling distribution of the difference between two sample proportions described?

Topic 5.6 Sampling Distributions for Differences in Sample Proportions: describe the mean, standard deviation, and shape of the sampling distribution of the difference between two independent sample proportions, and check the conditions for the normal model.

A focused answer to AP Statistics Topic 5.6, on the mean, standard deviation, and approximately normal shape of the difference between two independent sample proportions, the conditions, and finding probabilities, with full worked calculations.

Generated by Claude Opus 4.810 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this topic is asking
  2. Center and spread of the difference
  3. Why variances add for a difference
  4. The conditions, doubled
  5. Why this matters for inference
  6. Try this

What this topic is asking

The College Board (Topic 5.6) wants you to describe the mean, standard deviation, and shape of the sampling distribution of the difference between two independent sample proportions p^1βˆ’p^2\hat{p}_1 - \hat{p}_2, and to check the conditions for the normal model.

Center and spread of the difference

The mean rule is the difference of the individual means, ΞΌp^1βˆ’ΞΌp^2=p1βˆ’p2\mu_{\hat{p}_1} - \mu_{\hat{p}_2} = p_1 - p_2, with no surprises. The standard deviation is the one to internalise: it adds the two separate variances p1(1βˆ’p1)n1\dfrac{p_1(1-p_1)}{n_1} and p2(1βˆ’p2)n2\dfrac{p_2(1-p_2)}{n_2} and then takes the square root. This is a direct use of Topic 4.9's rule that variances add for independent variables, even for a difference.

Why variances add for a difference

The recurring trap is to subtract the variances (because it is a difference) or to subtract or add the standard deviations. Neither is correct. Variability accumulates when independent quantities are combined regardless of the sign, so the variance of the difference is the sum of the variances, and the standard deviation is the square root of that sum. This is the single most important computational point of the topic.

The conditions, doubled

The conditions are the same as for a single proportion, but they must hold for both samples, plus an independence condition between the samples. The large-counts condition requires at least about 1010 expected successes and 1010 expected failures in each sample (n1p1β‰₯10n_1 p_1 \ge 10, n1(1βˆ’p1)β‰₯10n_1(1-p_1) \ge 10, and the same for sample 2), which makes each p^\hat{p} approximately normal so their difference is too. The 10%10\% condition must hold for each sample separately, so that within each, the standard deviation formula is valid. And the two samples must be independent of each other (separate random samples, or two randomly assigned treatment groups), because the add-the-variances formula depends on independence. A complete answer verifies all of these before invoking the normal model. This doubling of conditions is the main way the two-sample topic differs from the one-sample Topic 5.5; the underlying logic, large counts for shape, 10%10\% for the standard deviation, is identical, just applied twice and supplemented by between-sample independence.

Why this matters for inference

Topic 5.6 is the sampling-distribution foundation for comparing two proportions, one of the most common inference tasks (Unit 6). A confidence interval for p1βˆ’p2p_1 - p_2 is centered at p^1βˆ’p^2\hat{p}_1 - \hat{p}_2 with a width built from this same added-variances standard deviation (as a standard error), and a two-proportion significance test computes a z-score using it. The ability to answer "how likely is a difference this large by chance?" comes directly from knowing that p^1βˆ’p^2\hat{p}_1 - \hat{p}_2 is approximately normal with the mean and standard deviation above. A particularly instructive question type asks for the probability that the difference is negative even when p1>p2p_1 > p_2, which shows that sampling variability can make the second sample proportion exceed the first on a given pair of samples, a reminder that a single observed difference is one draw from a distribution, not the true difference. Working through the center, the added-variances spread, the conditions, and a probability cements the template for two-proportion inference.

Try this

Q1. Write the standard deviation formula for p^1βˆ’p^2\hat{p}_1 - \hat{p}_2 and state why variances are added. [2 points]

  • Cue. Οƒ=p1(1βˆ’p1)n1+p2(1βˆ’p2)n2\sigma = \sqrt{\dfrac{p_1(1-p_1)}{n_1} + \dfrac{p_2(1-p_2)}{n_2}}; variances add because the samples are independent (and add even for a difference).

Q2. List the conditions needed for p^1βˆ’p^2\hat{p}_1 - \hat{p}_2 to be approximately normal. [1 point]

  • Cue. Large counts (β‰₯10\ge 10 expected successes and failures) in both samples, the two samples independent, and each sample at most 10%10\% of its population.

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2018 (style)1 marksSection I (multiple choice). For two independent sample proportions, the standard deviation of p^1βˆ’p^2\hat{p}_1 - \hat{p}_2 is found by (A) subtracting the two standard deviations (B) adding the two standard deviations (C) adding the two variances, then taking the square root (D) averaging the two standard deviations
Show worked answer β†’

The correct answer is (C).

For independent random variables, variances add (even for a difference), so Οƒp^1βˆ’p^2=Οƒp^12+Οƒp^22\sigma_{\hat{p}_1 - \hat{p}_2} = \sqrt{\sigma_{\hat{p}_1}^2 + \sigma_{\hat{p}_2}^2}: add the two variances, then square-root.

(A) and (B) wrongly operate on standard deviations directly. (D) is not a valid rule. Adding variances and rooting gives (C).

AP 2022 (style)4 marksSection II (free response). In population 1, p1=0.50p_1 = 0.50; in population 2, p2=0.40p_2 = 0.40. Independent random samples of n1=100n_1 = 100 and n2=80n_2 = 80 are taken. Let D=p^1βˆ’p^2D = \hat{p}_1 - \hat{p}_2. (a) Find the mean and standard deviation of DD. (b) Check the conditions for normality. (c) Find the probability that p^1βˆ’p^2\hat{p}_1 - \hat{p}_2 is less than 00, and interpret in context.
Show worked answer β†’

A 4-point question on the difference of two proportions.

(a) (2 points) ΞΌD=p1βˆ’p2=0.50βˆ’0.40=0.10\mu_D = p_1 - p_2 = 0.50 - 0.40 = 0.10 (1 point); ΟƒD=p1(1βˆ’p1)n1+p2(1βˆ’p2)n2=0.25100+0.2480=0.0025+0.003=0.0055β‰ˆ0.0742\sigma_D = \sqrt{\dfrac{p_1(1-p_1)}{n_1} + \dfrac{p_2(1-p_2)}{n_2}} = \sqrt{\dfrac{0.25}{100} + \dfrac{0.24}{80}} = \sqrt{0.0025 + 0.003} = \sqrt{0.0055} \approx 0.0742 (1 point).
(b) (1 point) Large counts in each sample: n1p1=50n_1 p_1 = 50, n1(1βˆ’p1)=50n_1(1-p_1) = 50, n2p2=32n_2 p_2 = 32, n2(1βˆ’p2)=48n_2(1-p_2) = 48, all β‰₯10\ge 10; samples independent and each under 10%10\% of its population, so DD is approximately normal.
(c) (1 point) z=0βˆ’0.100.0742β‰ˆβˆ’1.35z = \dfrac{0 - 0.10}{0.0742} \approx -1.35, so P(D<0)=P(Z<βˆ’1.35)β‰ˆ0.0885P(D < 0) = P(Z < -1.35) \approx 0.0885; about an 8.9%8.9\% chance the first sample proportion is below the second despite p1>p2p_1 > p_2.

Markers reward the mean and the add-the-variances standard deviation, checking large counts in both samples, and the probability with interpretation.

Related dot points

Sources & how we know this