Skip to main content
GeorgiaMathsSyllabus dot point

What does the correlation coefficient tell you, why is correlation not causation, and how do residuals show whether a line fits?

Interpret the correlation coefficient, distinguish correlation from causation, and use residuals and a residual plot to judge how well a linear model fits (A.DSR, Data and Statistical Reasoning).

A Georgia Milestones Algebra: Concepts & Connections answer on the correlation coefficient r, why correlation does not imply causation, computing a residual as actual minus predicted, and reading a residual plot to judge whether a linear model is appropriate.

Generated by Claude Opus 4.810 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this topic is asking
  2. The correlation coefficient
  3. Correlation is not causation
  4. Residuals and residual plots
  5. How the Milestones examines this topic
  6. Why a residual plot reveals the wrong model
  7. Why causation needs more than correlation
  8. Try this

What this topic is asking

This Data and Statistical Reasoning (A.DSR) standard rounds out the bivariate-data work with three interpretive ideas: the correlation coefficient rr, the principle that correlation does not imply causation, and residuals as a diagnostic for whether a line fits. The Georgia Milestones EOC tests reading rr (sign for direction, magnitude for strength), recognizing causation fallacies, and computing a residual (actual minus predicted) or reading a residual plot. These are the reasoning checks that keep a fitted line honest, and they are frequent constructed-response and multiple-choice points.

The correlation coefficient

The correlation coefficient rr measures the strength and direction of a linear relationship, on a scale from βˆ’1-1 to 11.

  • The sign matches the direction: r>0r > 0 for a positive (upward) trend, r<0r < 0 for a negative (downward) trend.
  • The magnitude ∣r∣|r| measures strength: close to 1 means a strong linear relationship (points near a line), close to 0 means weak or none.

So r=βˆ’0.92r = -0.92 is a strong negative linear relationship; r=0.15r = 0.15 is a weak positive one; r=0r = 0 means no linear relationship.

Correlation is not causation

A strong correlation shows that two variables move together, not that one causes the other. A third, hidden lurking variable can drive both.

For example, ice-cream sales and drowning incidents are positively correlated, but ice cream does not cause drowning; hot weather (a lurking variable) increases both. The EOC tests this directly: from a correlation, you may say the variables are associated, but you may not conclude causation without a controlled experiment.

Residuals and residual plots

A residual measures how far a data point is from the line of best fit.

For an actual value y=47y = 47 where the line predicts y^=50\hat{y} = 50, the residual is 47βˆ’50=βˆ’347 - 50 = -3 (the point is 3 below the line).

A residual plot graphs the residuals against xx. Its pattern judges the model:

  • No pattern (random scatter around zero): a linear model fits well.
  • A clear pattern (such as a curve or fan shape): a line is not the right model.

How the Milestones examines this topic

  • Multiple choice. Interpret a value of rr, or identify a correlation-causation fallacy.
  • Numeric entry. Compute a residual (actual minus predicted).
  • Constructed response. Interpret rr, explain why correlation is not causation, or read a residual plot to judge fit.

Why a residual plot reveals the wrong model

A residual plot is a more sensitive fit check than the scatterplot itself, and seeing why explains its power. When you subtract the line's prediction from each actual value, you remove the linear trend and are left with only what the line failed to capture. If the line is the right model, what remains is random noise, so the residuals scatter formlessly around zero. But if the true relationship is curved, the line will overshoot in some regions and undershoot in others, and those systematic misses show up as a curve or pattern in the residuals, even when the original scatterplot looked roughly linear. So a patterned residual plot is a red flag that says "a line is hiding a curve," telling you to use a different model. This is why the EOC pairs residuals with the line of best fit: residuals are how you check that the linear assumption was reasonable.

Why causation needs more than correlation

The correlation-is-not-causation rule is not pedantry; it reflects how data can mislead. Two variables can be strongly correlated for three different reasons: xx really does cause yy, yy causes xx, or a lurking variable causes both. A correlation coefficient cannot distinguish these, because it only measures how tightly the points follow a line, not why. To establish causation you need a controlled experiment that holds other factors fixed and varies only xx, which observational data (just measuring xx and yy as they occur) cannot do. On the EOC, the safe conclusion from any correlation is that the variables are associated or related, and asserting that one causes the other is the trap the test sets, exemplified by spurious correlations like ice-cream sales and drownings.

Try this

Q1. Interpret r=0.05r = 0.05. [1 point]

  • Cue. Near 0, so a weak (essentially no) linear relationship.

Q2. A line predicts y^=30\hat{y} = 30 where the actual value is 34. Find the residual. [1 point]

  • Cue. 34βˆ’30=+434 - 30 = +4 (the point is 4 above the line).

Exam-style practice questions

Practice questions written in the style of GaDOE exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Milestones (style)1 marksMultiple choice. A correlation coefficient is r=βˆ’0.92r = -0.92. What does it indicate? (A) a strong positive linear relationship (B) a strong negative linear relationship (C) a weak negative relationship (D) no relationship
Show worked answer β†’

The correct answer is (B).

The correlation coefficient rr ranges from βˆ’1-1 to 11. The sign gives direction (negative here, so yy decreases as xx increases), and the magnitude gives strength: ∣r∣=0.92|r| = 0.92 is close to 1, so the relationship is strong. Thus r=βˆ’0.92r = -0.92 is a strong negative linear relationship. Values near 0 indicate a weak or no linear relationship.

Milestones (style)2 marksConstructed response. A line of best fit predicts y^=50\hat{y} = 50 for a data point whose actual value is y=47y = 47. Find the residual, and explain what a residual plot with no pattern indicates.
Show worked answer β†’

The residual is βˆ’3-3.

A residual is actual minus predicted: 47βˆ’50=βˆ’347 - 50 = -3, meaning the data point is 3 below the line. A residual plot with no pattern (points scattered randomly above and below zero) indicates that a linear model is appropriate for the data. A residual plot with a clear pattern (such as a curve) would indicate that a line is not the right model. Full credit needs the residual value and the interpretation of a patternless plot.

Related dot points

Sources & how we know this