United StatesStatisticsSyllabus dot point

When a categorical variable has more than two categories, how do we judge whether observed counts depart from what was expected?

Topic 8.1 Introducing Statistics: Are My Results Unexpected?: explain why comparing observed counts across several categories to expected counts motivates the chi-square family of tests.

A focused answer to AP Statistics Topic 8.1, on why comparing observed counts across several categories to expected counts motivates chi-square tests, extending proportion inference to variables with more than two categories.

Generated by Claude Opus 4.89 min answerUpdated 2026-06-04

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this topic is asking
Why proportions are not enough
Observed versus expected: the core comparison
The three chi-square procedures, previewed
The mindset for the unit
Try this

What this topic is asking

The College Board (Topic 8.1) opens Unit 8 with the idea behind chi-square tests: when a categorical variable has several categories, we judge whether the data are "unexpected" by comparing the observed counts to the expected counts under a claim. This motivates the whole chi-square family.

Why proportions are not enough

Unit 6 handled one proportion (two categories: success/failure) or a comparison of two proportions. But many questions involve a variable with several categories, the color of an item, a day of the week, a preference among four brands, or the relationship between two categorical variables in a two-way table. Testing each category with a separate proportion test is clumsy and inflates error rates. Chi-square answers the whole question, "does the entire distribution match what was claimed?", in one procedure.

Observed versus expected: the core comparison

The intuition of "are my results unexpected?" is exactly this gap. If a fair die predicts $10$ of each face in $60$ rolls and you observe $9, 11, 10, 8, 12, 10$ , the small mismatches are unsurprising. If you observe $2, 3, 4, 20, 16, 15$ , the large mismatches are surprising and suggest the die is not fair. Chi-square converts these per-category gaps into a single number whose size measures overall surprise.

The three chi-square procedures, previewed

Unit 8 builds three tests on this idea, all comparing observed to expected counts.

Goodness of fit (Topics 8.2 to 8.3): does one categorical variable's distribution match a claimed distribution (equal, or a stated ratio)?
Homogeneity (Topics 8.4 to 8.6): do several populations or groups share the same distribution of one categorical variable?
Independence (Topics 8.4 to 8.6): are two categorical variables measured on one sample associated, or independent?

All three reduce to the same machinery: compute expected counts under the null, measure the total observed-expected discrepancy with the chi-square statistic, and find a P-value from the chi-square distribution. Topic 8.1 plants the unifying idea; the later topics supply the details.

The mindset for the unit

As with every inference unit, the key shift is to see the counts as one realization from a distribution. Even if the null is true, observed counts will not exactly equal expected counts, because of sampling variability. The chi-square test asks whether the observed discrepancy is larger than chance alone would typically produce. Holding that question in view keeps the procedures meaningful.

Why we compare observed to expected

A website claims its visitors are split equally across four landing pages. In a random sample of $200$ visits, the counts are page 1: $40$ , page 2: $55$ , page 3: $45$ , page 4: $60$ . Explain how to judge whether these results are unexpected under the claim.

step 1 State the claim and the parameter of interest

The claim is that the true distribution of visits is equal across the four pages, that is, each page gets $\tfrac{1}{4}$ of visits. The question is whether the observed split departs from this.

step 2 Find the expected counts

Under "equal," each page expects $\tfrac{1}{4} \times 200 = 50$ visits. So the expected counts are $50, 50, 50, 50$ .

step 3 Compare observed to expected

Observed: $40, 55, 45, 60$ . The gaps from $50$ are $-10, +5, -5, +10$ . Some pages are below and some above the expected $50$ ; the question is whether these gaps, taken together, are larger than sampling variability would usually produce.

step 4 Interpret the idea

A chi-square goodness-of-fit test will combine these four gaps into one statistic. A small statistic means the split is consistent with "equal" (results not unexpected); a large statistic means the split departs from "equal" (results unexpected), giving evidence the pages are not visited equally. This single comparison across all four categories is what motivates the chi-square approach.

Try this

Q1. Why can a chi-square test handle a six-category claim that proportion tests cannot easily address? [1 point]

Cue. It compares the whole distribution of observed counts to expected counts in one procedure, rather than testing each category separately.

Q2. In $80$ trials with an "equal across $5$ categories" claim, what is each expected count? [1 point]

Cue. $\tfrac{1}{5} \times 80 = 16$ per category.

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2019 (style)1 marksSection I (multiple choice). A die is rolled

60

times and the counts of each face are recorded. To test whether the die is fair, you need a procedure that compares (A) one sample mean to a claimed mean (B) observed counts across six categories to expected counts (C) two proportions (D) a slope to zero

Show worked answer →

The correct answer is (B).

A fair die predicts equal expected counts ( $10$ per face). Testing whether the six observed counts depart from these expected counts is exactly what a chi-square goodness-of-fit test does.

(A) is a mean procedure. (C) compares only two proportions, not six categories at once. (D) is regression-slope inference.

AP 2021 (style)3 marksSection II (free response). A bag is claimed to contain colors in the ratio

2:3:5

. A random sample of

100

items is drawn and the color counts recorded. (a) Explain why a one-sample proportion test is not sufficient to assess the whole claim at once. (b) Describe what quantities you would compare to judge whether the results are unexpected. (c) State, in general terms, what a large discrepancy would suggest.

Show worked answer →

A 3-point conceptual question.

(a) (1 point) The claim is about the distribution across three categories simultaneously; a single proportion test addresses only one category at a time and cannot test the full distribution in one procedure.
(b) (1 point) Compare the observed counts in each color to the expected counts under the claimed ratio ( $20$ , $30$ , $50$ for $100$ items), measuring how far observed falls from expected across all categories.
(c) (1 point) A large overall discrepancy between observed and expected counts would suggest the true distribution differs from the claimed $2:3:5$ ratio (the results are unexpected under the claim).

Markers reward recognizing the multi-category nature of the claim, the observed-versus-expected comparison, and the meaning of a large discrepancy.

Related dot points

Sources & how we know this

AP Statistics Course and Exam Description — College Board (2020)