What does the correlation coefficient r measure, and what are its limits?
Topic 2.5 Correlation: calculate and interpret the correlation coefficient r, understand its properties (range, unit-free, resistance), and recognize what it can and cannot tell you.
A focused answer to AP Statistics Topic 2.5, defining the correlation coefficient r, its range and properties (unit-free, symmetric, non-resistant), what it measures and misses, and the correlation-causation caution, with a worked interpretation.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The College Board (Topic 2.5) wants you to calculate and interpret the correlation coefficient , to know its properties (its range, that it is unit-free and symmetric, and that it is not resistant), and to understand exactly what measures and what it misses, including the correlation-causation caution.
What r measures
You will almost always get from technology rather than this formula, but the formula reveals two things: is built from how the two variables' standardized deviations move together, and because standardizing strips out units and scale, is unit-free, unaffected by changing units, and symmetric in and .
The properties of r
A few consequences are worth stating plainly. Because is symmetric, "the correlation of height with weight" equals "the correlation of weight with height," unlike a regression line, whose slope depends on which variable is the response. Because is not resistant, you should always look at the scatterplot before trusting , since one stray point can inflate or deflate it. And because captures only linearity, it is silent about curvature.
What r cannot tell you
Two limitations are examined relentlessly. First, correlation is not causation: a large shows that two variables move together linearly, but a lurking variable, reverse causation, or coincidence can produce that pattern, so you may never conclude cause from alone. The classic examples (churches and crime, ice cream and drowning) all feature a lurking variable that drives both. Second, measures only linear association, so a value near zero does not mean the variables are unrelated; a strongly curved relationship (a U-shape, say) can have while clearly being a tight relationship, just not a straight-line one. The reverse trap also exists: a high confirms a strong linear fit only if the scatterplot is genuinely linear; computing for a curved cloud and reporting "strong relationship" is wrong. The discipline that protects you from both traps is the same one from Topic 2.4: always describe the scatterplot's form first, and only interpret once you have confirmed the pattern is roughly linear. The College Board returns to these two cautions, no causation and linear-only, in almost every regression question, so internalising them now pays off across the whole unit.
Reading the size of r
It helps to attach rough verbal labels to magnitudes, while remembering they are conventions, not hard rules. An around or above is usually called strong, around to moderate, and below about weak, with the exact cut-offs unimportant compared with reading the scatterplot. What matters on the exam is interpreting in context and pairing the number with the picture: " indicates a strong positive linear association between distance and fuel used, consistent with the tight upward-sloping scatterplot." A common follow-up is the relationship between and (the coefficient of determination of Topic 2.8): is the fraction of the variation in explained by the linear model, so a correlation of corresponds to , meaning about of the variation in is accounted for by the linear relationship with . Holding and as related-but-different ideas, one a measure of linear strength and the other a proportion of explained variation, prepares you for the regression topics that follow.
Try this
Q1. State what the sign and the magnitude of each tell you. [2 points]
- Cue. The sign gives the direction of the linear association (positive or negative); the magnitude (closeness of to ) gives its strength.
Q2. A scatterplot is strongly U-shaped, and . Does this mean no relationship? Explain. [1 point]
- Cue. No; measures only linear association, so here reflects the lack of a linear trend, not the absence of the strong non-linear (curved) relationship.
Exam-style practice questions
Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AP 2019 (style)1 marksSection I (multiple choice). A correlation of between two variables indicates which of the following? (A) A strong positive linear relationship (B) A strong negative linear relationship (C) A weak negative relationship (D) That one variable causes the other to decreaseShow worked answer →
The correct answer is (B).
The sign of gives direction (negative here) and the magnitude gives strength; is close to , so the linear relationship is strong. Thus means a strong negative linear relationship.
(A) has the wrong sign. (C) understates the strength. (D) wrongly infers causation; correlation never establishes cause. Sign for direction, magnitude for strength, and no causal claim.
AP 2022 (style)4 marksSection II (free response). For towns, the correlation between number of churches and number of crimes is . (a) Interpret this correlation. (b) Explain why it would be wrong to conclude that building more churches causes more crime. (c) A student computes for a scatterplot that is strongly curved and finds ; explain what this small value does and does not tell you.Show worked answer →
A 4-point question on interpreting and critiquing correlation.
(a) (1 point) Interpretation: there is a strong positive linear association between the number of churches and the number of crimes across these towns (as one is larger, the other tends to be larger).
(b) (2 points) Correlation is not causation (1 point); a lurking variable, population size, plausibly drives both, since larger towns have more churches and more crimes (1 point). So the association reflects town size, not a causal link.
(c) (1 point) A small () means a weak linear association, but because the pattern is strongly curved, near zero does not mean "no relationship"; there can be a strong non-linear relationship that fails to detect.
Markers reward a correct interpretation of strength and direction, the causation caution with a lurking variable, and the insight that measures linear association only.
Related dot points
- Topic 2.4 Representing the Relationship Between Two Quantitative Variables: construct and describe scatterplots by direction, form, strength, and unusual features, in context.
A focused answer to AP Statistics Topic 2.4, on building scatterplots and describing them by direction, form, strength, and unusual features (the DUFS framework), in context, with a worked description.
- Topic 2.6 Linear Regression Models: write, interpret, and use a least-squares regression equation to predict a response, interpreting the slope and intercept in context, and recognizing the danger of extrapolation.
A focused answer to AP Statistics Topic 2.6, on the form of a regression equation, interpreting slope and intercept in context, making predictions, and the danger of extrapolation, with a worked prediction and interpretation.
- Topic 2.8 Least Squares Regression: determine the least-squares regression line from summary statistics, and interpret the coefficient of determination r-squared and the standard deviation of the residuals.
A focused answer to AP Statistics Topic 2.8, on why the least-squares line minimizes squared residuals, computing it from means, standard deviations, and r, and interpreting r-squared and s, with full worked calculations.
- Topic 2.7 Residuals: calculate and interpret residuals, construct and read residual plots, and use them to assess whether a linear model is appropriate.
A focused answer to AP Statistics Topic 2.7, defining the residual as observed minus predicted, interpreting positive and negative residuals, and using residual plots to judge whether a linear model is appropriate, with worked calculations.
- Topic 2.1 Introducing Statistics - Are Variables Related?: identify questions about the association between two variables, distinguish association from causation, and recognize what two-variable data can answer.
A focused answer to AP Statistics Topic 2.1, on framing questions about the association between two variables, the difference between explanatory and response variables, why association is not causation, and what two-variable data can answer, with worked examples.
Sources & how we know this
- AP Statistics Course and Exam Description — College Board (2020)