Skip to main content
United StatesStatisticsSyllabus dot point

When a categorical variable has more than two categories, how do we judge whether observed counts depart from what was expected?

Topic 8.1 Introducing Statistics: Are My Results Unexpected?: explain why comparing observed counts across several categories to expected counts motivates the chi-square family of tests.

A focused answer to AP Statistics Topic 8.1, on why comparing observed counts across several categories to expected counts motivates chi-square tests, extending proportion inference to variables with more than two categories.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this topic is asking
  2. Why proportions are not enough
  3. Observed versus expected: the core comparison
  4. The three chi-square procedures, previewed
  5. The mindset for the unit
  6. Try this

What this topic is asking

The College Board (Topic 8.1) opens Unit 8 with the idea behind chi-square tests: when a categorical variable has several categories, we judge whether the data are "unexpected" by comparing the observed counts to the expected counts under a claim. This motivates the whole chi-square family.

Why proportions are not enough

Unit 6 handled one proportion (two categories: success/failure) or a comparison of two proportions. But many questions involve a variable with several categories, the color of an item, a day of the week, a preference among four brands, or the relationship between two categorical variables in a two-way table. Testing each category with a separate proportion test is clumsy and inflates error rates. Chi-square answers the whole question, "does the entire distribution match what was claimed?", in one procedure.

Observed versus expected: the core comparison

The intuition of "are my results unexpected?" is exactly this gap. If a fair die predicts 1010 of each face in 6060 rolls and you observe 9,11,10,8,12,109, 11, 10, 8, 12, 10, the small mismatches are unsurprising. If you observe 2,3,4,20,16,152, 3, 4, 20, 16, 15, the large mismatches are surprising and suggest the die is not fair. Chi-square converts these per-category gaps into a single number whose size measures overall surprise.

The three chi-square procedures, previewed

Unit 8 builds three tests on this idea, all comparing observed to expected counts.

  1. Goodness of fit (Topics 8.2 to 8.3): does one categorical variable's distribution match a claimed distribution (equal, or a stated ratio)?
  2. Homogeneity (Topics 8.4 to 8.6): do several populations or groups share the same distribution of one categorical variable?
  3. Independence (Topics 8.4 to 8.6): are two categorical variables measured on one sample associated, or independent?

All three reduce to the same machinery: compute expected counts under the null, measure the total observed-expected discrepancy with the chi-square statistic, and find a P-value from the chi-square distribution. Topic 8.1 plants the unifying idea; the later topics supply the details.

The mindset for the unit

As with every inference unit, the key shift is to see the counts as one realization from a distribution. Even if the null is true, observed counts will not exactly equal expected counts, because of sampling variability. The chi-square test asks whether the observed discrepancy is larger than chance alone would typically produce. Holding that question in view keeps the procedures meaningful.

Try this

Q1. Why can a chi-square test handle a six-category claim that proportion tests cannot easily address? [1 point]

  • Cue. It compares the whole distribution of observed counts to expected counts in one procedure, rather than testing each category separately.

Q2. In 8080 trials with an "equal across 55 categories" claim, what is each expected count? [1 point]

  • Cue. 15×80=16\tfrac{1}{5} \times 80 = 16 per category.

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2019 (style)1 marksSection I (multiple choice). A die is rolled 6060 times and the counts of each face are recorded. To test whether the die is fair, you need a procedure that compares (A) one sample mean to a claimed mean (B) observed counts across six categories to expected counts (C) two proportions (D) a slope to zero
Show worked answer →

The correct answer is (B).

A fair die predicts equal expected counts (1010 per face). Testing whether the six observed counts depart from these expected counts is exactly what a chi-square goodness-of-fit test does.

(A) is a mean procedure. (C) compares only two proportions, not six categories at once. (D) is regression-slope inference.

AP 2021 (style)3 marksSection II (free response). A bag is claimed to contain colors in the ratio 2:3:52:3:5. A random sample of 100100 items is drawn and the color counts recorded. (a) Explain why a one-sample proportion test is not sufficient to assess the whole claim at once. (b) Describe what quantities you would compare to judge whether the results are unexpected. (c) State, in general terms, what a large discrepancy would suggest.
Show worked answer →

A 3-point conceptual question.

(a) (1 point) The claim is about the distribution across three categories simultaneously; a single proportion test addresses only one category at a time and cannot test the full distribution in one procedure.
(b) (1 point) Compare the observed counts in each color to the expected counts under the claimed ratio (2020, 3030, 5050 for 100100 items), measuring how far observed falls from expected across all categories.
(c) (1 point) A large overall discrepancy between observed and expected counts would suggest the true distribution differs from the claimed 2:3:52:3:5 ratio (the results are unexpected under the claim).

Markers reward recognizing the multi-category nature of the claim, the observed-versus-expected comparison, and the meaning of a large discrepancy.

Related dot points

Sources & how we know this