What does the correlation coefficient tell you, why is correlation not causation, and how do residuals show whether a line fits?
Interpret the correlation coefficient, distinguish correlation from causation, and use residuals and a residual plot to judge how well a linear model fits (A.DSR, Data and Statistical Reasoning).
A Georgia Milestones Algebra: Concepts & Connections answer on the correlation coefficient r, why correlation does not imply causation, computing a residual as actual minus predicted, and reading a residual plot to judge whether a linear model is appropriate.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
This Data and Statistical Reasoning (A.DSR) standard rounds out the bivariate-data work with three interpretive ideas: the correlation coefficient , the principle that correlation does not imply causation, and residuals as a diagnostic for whether a line fits. The Georgia Milestones EOC tests reading (sign for direction, magnitude for strength), recognizing causation fallacies, and computing a residual (actual minus predicted) or reading a residual plot. These are the reasoning checks that keep a fitted line honest, and they are frequent constructed-response and multiple-choice points.
The correlation coefficient
The correlation coefficient measures the strength and direction of a linear relationship, on a scale from to .
- The sign matches the direction: for a positive (upward) trend, for a negative (downward) trend.
- The magnitude measures strength: close to 1 means a strong linear relationship (points near a line), close to 0 means weak or none.
So is a strong negative linear relationship; is a weak positive one; means no linear relationship.
Correlation is not causation
A strong correlation shows that two variables move together, not that one causes the other. A third, hidden lurking variable can drive both.
For example, ice-cream sales and drowning incidents are positively correlated, but ice cream does not cause drowning; hot weather (a lurking variable) increases both. The EOC tests this directly: from a correlation, you may say the variables are associated, but you may not conclude causation without a controlled experiment.
Residuals and residual plots
A residual measures how far a data point is from the line of best fit.
For an actual value where the line predicts , the residual is (the point is 3 below the line).
A residual plot graphs the residuals against . Its pattern judges the model:
- No pattern (random scatter around zero): a linear model fits well.
- A clear pattern (such as a curve or fan shape): a line is not the right model.
How the Milestones examines this topic
- Multiple choice. Interpret a value of , or identify a correlation-causation fallacy.
- Numeric entry. Compute a residual (actual minus predicted).
- Constructed response. Interpret , explain why correlation is not causation, or read a residual plot to judge fit.
Why a residual plot reveals the wrong model
A residual plot is a more sensitive fit check than the scatterplot itself, and seeing why explains its power. When you subtract the line's prediction from each actual value, you remove the linear trend and are left with only what the line failed to capture. If the line is the right model, what remains is random noise, so the residuals scatter formlessly around zero. But if the true relationship is curved, the line will overshoot in some regions and undershoot in others, and those systematic misses show up as a curve or pattern in the residuals, even when the original scatterplot looked roughly linear. So a patterned residual plot is a red flag that says "a line is hiding a curve," telling you to use a different model. This is why the EOC pairs residuals with the line of best fit: residuals are how you check that the linear assumption was reasonable.
Why causation needs more than correlation
The correlation-is-not-causation rule is not pedantry; it reflects how data can mislead. Two variables can be strongly correlated for three different reasons: really does cause , causes , or a lurking variable causes both. A correlation coefficient cannot distinguish these, because it only measures how tightly the points follow a line, not why. To establish causation you need a controlled experiment that holds other factors fixed and varies only , which observational data (just measuring and as they occur) cannot do. On the EOC, the safe conclusion from any correlation is that the variables are associated or related, and asserting that one causes the other is the trap the test sets, exemplified by spurious correlations like ice-cream sales and drownings.
Try this
Q1. Interpret . [1 point]
- Cue. Near 0, so a weak (essentially no) linear relationship.
Q2. A line predicts where the actual value is 34. Find the residual. [1 point]
- Cue. (the point is 4 above the line).
Exam-style practice questions
Practice questions written in the style of GaDOE exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Milestones (style)1 marksMultiple choice. A correlation coefficient is . What does it indicate? (A) a strong positive linear relationship (B) a strong negative linear relationship (C) a weak negative relationship (D) no relationshipShow worked answer β
The correct answer is (B).
The correlation coefficient ranges from to . The sign gives direction (negative here, so decreases as increases), and the magnitude gives strength: is close to 1, so the relationship is strong. Thus is a strong negative linear relationship. Values near 0 indicate a weak or no linear relationship.
Milestones (style)2 marksConstructed response. A line of best fit predicts for a data point whose actual value is . Find the residual, and explain what a residual plot with no pattern indicates.Show worked answer β
The residual is .
A residual is actual minus predicted: , meaning the data point is 3 below the line. A residual plot with no pattern (points scattered randomly above and below zero) indicates that a linear model is appropriate for the data. A residual plot with a clear pattern (such as a curve) would indicate that a line is not the right model. Full credit needs the residual value and the interpretation of a patternless plot.
Related dot points
- Fit a line of best fit (linear regression) to two-variable data, interpret the slope and y-intercept in context, and use the line to make predictions (A.DSR, Data and Statistical Reasoning).
A Georgia Milestones Algebra: Concepts & Connections answer on lines of best fit and linear regression, interpreting the slope as a rate and the y-intercept as a starting value in context, using the line to predict, and the difference between interpolation and extrapolation.
- Represent two-variable quantitative data with scatterplots and describe the association by its form, direction, strength, and any outliers (A.DSR, Data and Statistical Reasoning).
A Georgia Milestones Algebra: Concepts & Connections answer on scatterplots and two-variable quantitative data, describing the association by its form (linear or nonlinear), direction (positive or negative), strength, and outliers or clusters.
- Represent one-variable quantitative data with dot plots, histograms, and box plots, and describe the shape of a distribution (A.DSR, Data and Statistical Reasoning).
A Georgia Milestones Algebra: Concepts & Connections answer on displaying one-variable quantitative data with dot plots, histograms, and box plots, reading the five-number summary from a box plot, and describing the shape of a distribution as symmetric, skewed, or having outliers.
- Compute and interpret measures of center (mean, median) and spread (range, IQR, standard deviation), and compare two distributions using center, spread, and shape (A.DSR, Data and Statistical Reasoning).
A Georgia Milestones Algebra: Concepts & Connections answer on measures of center (mean and median) and spread (range, IQR, standard deviation), choosing mean or median based on skew and outliers, and comparing two distributions by their center, spread, and shape.
Sources & how we know this
- Georgia's K-12 Mathematics Standards (Algebra: Concepts & Connections) β Georgia Department of Education (2023)
- Georgia Milestones Assessment System β Georgia Department of Education (2024)