United StatesStatisticsSyllabus dot point

What does the correlation coefficient r measure, and what are its limits?

Topic 2.5 Correlation: calculate and interpret the correlation coefficient r, understand its properties (range, unit-free, resistance), and recognize what it can and cannot tell you.

A focused answer to AP Statistics Topic 2.5, defining the correlation coefficient r, its range and properties (unit-free, symmetric, non-resistant), what it measures and misses, and the correlation-causation caution, with a worked interpretation.

Generated by Claude Opus 4.89 min answerUpdated 2026-06-04

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this topic is asking
What r measures
The properties of r
What r cannot tell you
Reading the size of r
Try this

What this topic is asking

The College Board (Topic 2.5) wants you to calculate and interpret the correlation coefficient $r$ , to know its properties (its range, that it is unit-free and symmetric, and that it is not resistant), and to understand exactly what $r$ measures and what it misses, including the correlation-causation caution.

What r measures

You will almost always get $r$ from technology rather than this formula, but the formula reveals two things: $r$ is built from how the two variables' standardized deviations move together, and because standardizing strips out units and scale, $r$ is unit-free, unaffected by changing units, and symmetric in $x$ and $y$ .

The properties of r

A few consequences are worth stating plainly. Because $r$ is symmetric, "the correlation of height with weight" equals "the correlation of weight with height," unlike a regression line, whose slope depends on which variable is the response. Because $r$ is not resistant, you should always look at the scatterplot before trusting $r$ , since one stray point can inflate or deflate it. And because $r$ captures only linearity, it is silent about curvature.

What r cannot tell you

Two limitations are examined relentlessly. First, correlation is not causation: a large $|r|$ shows that two variables move together linearly, but a lurking variable, reverse causation, or coincidence can produce that pattern, so you may never conclude cause from $r$ alone. The classic examples (churches and crime, ice cream and drowning) all feature a lurking variable that drives both. Second, $r$ measures only linear association, so a value near zero does not mean the variables are unrelated; a strongly curved relationship (a U-shape, say) can have $r \approx 0$ while clearly being a tight relationship, just not a straight-line one. The reverse trap also exists: a high $r$ confirms a strong linear fit only if the scatterplot is genuinely linear; computing $r$ for a curved cloud and reporting "strong relationship" is wrong. The discipline that protects you from both traps is the same one from Topic 2.4: always describe the scatterplot's form first, and only interpret $r$ once you have confirmed the pattern is roughly linear. The College Board returns to these two cautions, no causation and linear-only, in almost every regression question, so internalising them now pays off across the whole unit.

Reading the size of r

It helps to attach rough verbal labels to magnitudes, while remembering they are conventions, not hard rules. An $|r|$ around $0.9$ or above is usually called strong, around $0.5$ to $0.8$ moderate, and below about $0.3$ weak, with the exact cut-offs unimportant compared with reading the scatterplot. What matters on the exam is interpreting $r$ in context and pairing the number with the picture: " $r = 0.85$ indicates a strong positive linear association between distance and fuel used, consistent with the tight upward-sloping scatterplot." A common follow-up is the relationship between $r$ and $r^2$ (the coefficient of determination of Topic 2.8): $r^2$ is the fraction of the variation in $y$ explained by the linear model, so a correlation of $0.85$ corresponds to $r^2 = 0.7225$ , meaning about $72\%$ of the variation in $y$ is accounted for by the linear relationship with $x$ . Holding $r$ and $r^2$ as related-but-different ideas, one a measure of linear strength and the other a proportion of explained variation, prepares you for the regression topics that follow.

Interpreting and critiquing a correlation

For $30$ cars, the correlation between engine size (liters) and fuel consumption (liters per $100$ km) is $r = 0.78$ . (a) Interpret $r$ . (b) The data analyst then notes one sports car with a huge engine and unusually high consumption. Explain how removing it might change $r$ , and what that reveals about $r$ .

step 1 Interpret the value (part a)

The sign is positive and $|r| = 0.78$ is moderately strong, so there is a moderately strong, positive linear association between engine size and fuel consumption: cars with larger engines tend to use more fuel.

step 2 Consider the outlier's leverage (part b)

The sports car sits far from the rest in both variables and follows the same upward trend, so it likely inflates $r$ ; removing it could lower $r$ noticeably (or, if it bucked the trend, raising it). Either way, one point moves $r$ .

step 3 Draw the lesson about resistance

This shows $r$ is not resistant: a single outlier can change it substantially, so $r$ should always be read alongside the scatterplot rather than trusted on its own.

step 4 Interpret

The correlation indicates a moderately strong positive linear relationship, but because $r$ is sensitive to outliers, the reported value depends on that one extreme car, underlining why you inspect the plot before interpreting $r$ , and why $r$ alone never proves that bigger engines cause higher consumption.

Try this

Q1. State what the sign and the magnitude of $r$ each tell you. [2 points]

Cue. The sign gives the direction of the linear association (positive or negative); the magnitude (closeness of $|r|$ to $1$ ) gives its strength.

Q2. A scatterplot is strongly U-shaped, and $r = 0.02$ . Does this mean no relationship? Explain. [1 point]

Cue. No; $r$ measures only linear association, so $r \approx 0$ here reflects the lack of a linear trend, not the absence of the strong non-linear (curved) relationship.

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2019 (style)1 marksSection I (multiple choice). A correlation of

r = -0.92

between two variables indicates which of the following? (A) A strong positive linear relationship (B) A strong negative linear relationship (C) A weak negative relationship (D) That one variable causes the other to decrease

Show worked answer →

The correct answer is (B).

The sign of $r$ gives direction (negative here) and the magnitude gives strength; $|r| = 0.92$ is close to $1$ , so the linear relationship is strong. Thus $r = -0.92$ means a strong negative linear relationship.

(A) has the wrong sign. (C) understates the strength. (D) wrongly infers causation; correlation never establishes cause. Sign for direction, magnitude for strength, and no causal claim.

AP 2022 (style)4 marksSection II (free response). For

40

towns, the correlation between number of churches and number of crimes is

r = 0.85

. (a) Interpret this correlation. (b) Explain why it would be wrong to conclude that building more churches causes more crime. (c) A student computes

r

for a scatterplot that is strongly curved and finds

r = 0.1

; explain what this small value does and does not tell you.

Show worked answer →

A 4-point question on interpreting and critiquing correlation.

(a) (1 point) Interpretation: there is a strong positive linear association between the number of churches and the number of crimes across these $40$ towns (as one is larger, the other tends to be larger).
(b) (2 points) Correlation is not causation (1 point); a lurking variable, population size, plausibly drives both, since larger towns have more churches and more crimes (1 point). So the association reflects town size, not a causal link.
(c) (1 point) A small $r$ ( $0.1$ ) means a weak linear association, but because the pattern is strongly curved, $r$ near zero does not mean "no relationship"; there can be a strong non-linear relationship that $r$ fails to detect.

Markers reward a correct interpretation of strength and direction, the causation caution with a lurking variable, and the insight that $r$ measures linear association only.

Related dot points

Sources & how we know this

AP Statistics Course and Exam Description — College Board (2020)