United StatesStatisticsSyllabus dot point

How do we ask whether two variables are related, and what does an association really mean?

Topic 2.1 Introducing Statistics - Are Variables Related?: identify questions about the association between two variables, distinguish association from causation, and recognize what two-variable data can answer.

A focused answer to AP Statistics Topic 2.1, on framing questions about the association between two variables, the difference between explanatory and response variables, why association is not causation, and what two-variable data can answer, with worked examples.

Generated by Claude Opus 4.89 min answerUpdated 2026-06-04

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this topic is asking
Explanatory and response variables
What "associated" means
Association is not causation
What two-variable data can and cannot answer
Try this

What this topic is asking

The College Board (Topic 2.1) wants you to frame statistical questions about the relationship between two variables, to identify the explanatory and response variables, and above all to distinguish association from causation, recognizing that observational two-variable data can show a relationship but not prove that one variable causes the other.

Explanatory and response variables

Choosing which is which is a modelling decision driven by the question, not by the data alone. "Does studying time explain exam score?" makes study time explanatory and score the response. The labels matter because they fix the axes of a scatterplot and the direction of any prediction you later make.

What "associated" means

The form the analysis takes depends entirely on the variable types, which is why Topic 1.2's classification returns here: two categorical variables call for two-way tables (Topics 2.2 to 2.3), while two quantitative variables call for scatterplots, correlation, and regression (Topics 2.4 to 2.9).

Association is not causation

The defining lesson of this topic, and one of the most important in the whole course, is that an association does not by itself mean one variable causes the other. There are several reasons an association can appear without a causal link. A lurking (confounding) variable may influence both: ice-cream sales and drowning deaths rise together, but hot weather drives both, with no causal link between ice cream and drowning. Reverse causation is possible: maybe the response actually affects the explanatory variable. And the association could even be coincidence in a small sample. Because observational data cannot rule these out, the only design that supports a causal claim is a randomised experiment, in which random assignment to treatment groups balances out lurking variables. So when a question shows an observational study, the exam expects you to describe the association ("students who slept more tended to score higher") and then explicitly decline to claim cause, naming a plausible lurking variable or noting the lack of random assignment. Writing "this proves that X causes Y" from observational data is the single error most reliably punished in Unit 2 and beyond.

What two-variable data can and cannot answer

Two-variable data extend what you can ask beyond Unit 1's single distributions: you can now ask whether and how two variables move together, predict one from the other (with regression), and quantify the strength of a linear relationship (with correlation). What they still cannot do, in an observational setting, is establish cause, and they cannot generalize beyond the individuals studied unless those individuals were a random sample of a defined population. Keeping these limits in mind frames the rest of the unit honestly: correlation and regression are tools for describing and predicting an association, not for proving that manipulating one variable would change the other. The most sophisticated exam answers therefore pair a confident description of the relationship with an equally confident statement of its limits, which is exactly the balance the College Board is looking to assess from the very first topic of the unit.

Diagnosing an association

A city's records show that months with higher average temperature also have higher electricity use for air conditioning. Frame the relationship correctly.

step 1 Identify the variables and their roles

Average temperature is the explanatory variable (used to explain demand); electricity use is the response variable (the outcome). Both are quantitative, so a scatterplot is the natural display.

step 2 State the association

There is a positive association: as average temperature increases, electricity use for air conditioning tends to increase as well.

step 3 Consider causation

Here a causal link is physically plausible (heat drives air-conditioner use), but the data themselves are observational and only establish the association; a careful answer notes that the data show association and that causation, while plausible, is not proven by the correlation alone.

step 4 Interpret

The records reveal a positive association between temperature and electricity use; we can use it to predict and describe demand, but on the strength of observational data we report it as an association, reserving causal language for what an experiment or established physical mechanism supports.

Try this

Q1. In "does fertilizer amount explain crop yield?", identify the explanatory and response variables. [1 point]

Cue. Fertilizer amount is explanatory; crop yield is the response.

Q2. A study finds taller children have larger vocabularies. Explain why this does not mean height causes vocabulary. [2 points]

Cue. Age is a lurking variable: older children are both taller and have larger vocabularies, so age drives both and there is no direct causal link.

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2018 (style)1 marksSection I (multiple choice). A study finds that towns with more fire trucks at a fire tend to have more fire damage. Which conclusion is best supported? (A) Fire trucks cause damage (B) There is an association between number of fire trucks and damage, likely explained by a third variable, fire size (C) Reducing fire trucks would reduce damage (D) The two variables are unrelated

Show worked answer →

The correct answer is (B).

The data show an association, but a lurking variable (the size of the fire) plausibly drives both: bigger fires bring more trucks and cause more damage. Observational data cannot establish causation.

(A) and (C) wrongly read causation into an association. (D) ignores the clear association. This is the classic confounding example, and the correct stance is to name the association and the likely lurking variable without claiming cause.

AP 2021 (style)3 marksSection II (free response). An observational study records, for many students, hours of sleep and exam score, and finds students who sleep more tend to score higher. (a) Identify the explanatory and response variables. (b) Explain why this study cannot conclude that more sleep causes higher scores. (c) Suggest one lurking variable that could explain the association.

Show worked answer →

A 3-point question on association versus causation.

(a) (1 point) The explanatory variable is hours of sleep; the response variable is exam score (sleep is used to explain or predict score).
(b) (1 point) The study is observational, not an experiment: students were not randomly assigned to amounts of sleep, so a lurking variable could be responsible and we cannot rule out reverse causation; therefore causation cannot be concluded.
(c) (1 point) A plausible lurking variable: overall conscientiousness or good time management, which could increase both sleep and study quality; or a less stressful course load. Any reasonable third variable affecting both earns the point.

Markers reward correct identification of explanatory and response roles, a reason grounded in the observational design, and a sensible lurking variable.

Related dot points

Sources & how we know this

AP Statistics Course and Exam Description — College Board (2020)