What does the correlation coefficient r tell you about a linear relationship, and why does correlation not prove causation?
Interpret the correlation coefficient of a linear fit and distinguish correlation from causation, recognizing lurking variables (Ohio S-ID.8, S-ID.9).
An Ohio Algebra I answer on correlation (S-ID.8, S-ID.9): what the correlation coefficient r measures, reading its sign and strength, and why a strong correlation does not prove one variable causes the other.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
Ohio standards S-ID.8 and S-ID.9 ask you to interpret the correlation coefficient of a linear fit and to distinguish correlation from causation. The coefficient measures how well a line fits the data; causation is a separate, stronger claim that data alone rarely proves. This is a frequent, high-yield Statistics item, and a classic reasoning trap.
Reading the correlation coefficient
A single number, , captures both the direction and the tightness of a linear trend.
So is a strong positive fit, a weak negative one, and no linear relationship. Algebra I computes with technology and focuses on interpreting it.
Strength versus direction
Keep the two pieces of information separate.
Correlation is not causation
This is the headline idea, and the most tested.
The classic example: ice-cream sales and drowning rise together, but hot weather (the lurking variable) drives both, neither causes the other.
How Ohio examines this topic
- Multiple choice and multiple-select. Interpret the sign and strength of , or compare two values.
- Reasoning items. Choose the best conclusion about a correlation, recognizing lurking variables and the correlation-causation distinction.
- Numeric response. Match an value to a described scatter plot.
Why magnitude is strength and sign is direction
The correlation coefficient packs two separate facts into one number, and keeping them apart prevents the most common errors. The sign answers "which way does the trend go?", a positive goes with a line of positive slope (up-right), a negative with a line of negative slope (down-right). The magnitude answers "how closely do the points follow that line?", values near mean the points cluster tightly around the line (strong), values near mean they scatter widely (weak). Because these are independent, a strong relationship can be either positive or negative: is stronger than even though it is negative. Comparing strength means comparing , while the sign only tells direction, which is exactly the distinction reasoning items probe.
Why correlation cannot prove causation
The gap between correlation and causation is the deepest idea in the standard, and it rests on what observational data can and cannot rule out. A correlation tells you two variables move together, but movement-together is consistent with several stories: the first variable might cause the second, the second might cause the first, or a hidden third variable might cause both while neither affects the other. Observational data, where you simply record what happens, cannot distinguish these, because it does not control the other factors that might be responsible. Only a controlled experiment, which deliberately changes one variable while holding others fixed, can isolate cause and effect. This is why "correlation does not imply causation" is a rule rather than a slogan: a strong is genuine evidence of a relationship, but the explanation for that relationship requires more than the correlation itself.
Try this
Q1. Which is a stronger linear relationship, or ? [1 point]
- Cue. Compare : , so is stronger.
Q2. Sleep and test scores are positively correlated. Does more sleep cause higher scores? [1 point]
- Cue. Not necessarily; correlation alone does not prove causation (a lurking variable or reverse direction is possible).
Exam-style practice questions
Practice questions written in the style of ODEW exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Ohio Algebra I EOC (style)2 marksMultiple choice. A correlation coefficient is . What does this indicate? (A) a strong negative linear relationship (B) a weak negative relationship (C) a strong positive relationship (D) no relationshipShow worked answer →
The correct answer is (A).
The sign of gives the direction: negative means as increases, tends to decrease. The magnitude (how close is to ) gives the strength: is very close to , so the linear relationship is strong. Together, indicates a strong negative linear relationship. A value near would mean no linear relationship, and the sign would still tell the direction.
Ohio Algebra I EOC (style)2 marksMultiple choice. Ice-cream sales and drowning deaths are strongly correlated. What is the best conclusion? (A) a third variable (hot weather) likely affects both (B) ice cream causes drowning (C) drowning causes ice-cream sales (D) the correlation must be a calculation errorShow worked answer →
The correct answer is (A).
A strong correlation does not prove causation. Here a lurking variable, hot weather, plausibly drives both ice-cream sales and swimming (hence drowning), creating a correlation without either causing the other. Concluding that ice cream causes drowning (B) or the reverse (C) confuses correlation with causation. The correlation is real, not an error (D); it simply has a common-cause explanation.
Related dot points
- Represent two-variable data on a scatter plot, fit a linear model (line of best fit), and interpret the slope and intercept in context, using the model to predict (Ohio S-ID.6, S-ID.7).
An Ohio Algebra I answer on scatter plots and lines of best fit (S-ID.6, S-ID.7): plotting paired data, fitting a trend line, interpreting slope as a rate and intercept as a starting value, and predicting from the model.
- Represent data with dot plots, histograms, and box plots, and describe the shape of a distribution including skew, symmetry, and outliers (Ohio S-ID.1).
An Ohio Algebra I answer on representing one-variable data (S-ID.1): building and reading dot plots, histograms, and box plots, what the five-number summary means, and describing shape, skew, and outliers.
- Compute and compare measures of center (mean, median) and spread (range, interquartile range, and informally standard deviation), and choose appropriate measures accounting for outliers (Ohio S-ID.2, S-ID.3).
An Ohio Algebra I answer on center and spread (S-ID.2, S-ID.3): computing mean and median, range and interquartile range, why outliers pull the mean, and choosing resistant measures when data is skewed.
- Summarize categorical data in two-way frequency tables and interpret joint, marginal, and conditional relative frequencies, recognizing possible associations (Ohio S-ID.5).
An Ohio Algebra I answer on two-way frequency tables (S-ID.5): reading counts and totals, computing joint, marginal, and conditional relative frequencies, and judging whether two categorical variables are associated.
- Distinguish linear, quadratic, and exponential functions from tables, graphs, and contexts using constant differences and ratios, and compare their long-run growth (Ohio F-LE.1, F-LE.3, F-IF.4).
An Ohio Algebra I answer on comparing function families (F-LE.1, F-LE.3): constant first differences for linear, constant second differences for quadratic, constant ratios for exponential, and why exponential growth eventually overtakes the others.
Sources & how we know this
- Ohio's Learning Standards for Mathematics: Algebra 1 — Ohio Department of Education and Workforce (2024)
- Algebra I course resources (blueprint, reference sheet, released items) — Ohio Department of Education and Workforce (2024)