What does the correlation coefficient tell you about a linear relationship, and why does correlation not imply causation?
Interpret the correlation coefficient of a linear fit and distinguish correlation from causation (TN A1.S.ID.C.8, A1.S.ID.C.9).
A TNReady Algebra I answer on the correlation coefficient (TN A1.S.ID.C.8-9), reading r between -1 and 1 for direction and strength, and why a correlation does not prove one variable causes another.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
Standards A1.S.ID.C.8 and A1.S.ID.C.9 finish the data strand. C.8: interpret the correlation coefficient of a linear fit, a number describing the direction and strength of a linear relationship. C.9: distinguish correlation from causation, knowing that a relationship between two variables does not prove one causes the other.
Reading the correlation coefficient
packs two pieces of information into one number between and :
- Sign = direction. is a positive association (line rises); is negative (line falls).
- Magnitude = strength. near means a strong linear pattern (tight cluster); near means weak or no linear pattern.
So is strong positive, is strong negative, and is weak. A value of exactly means the points lie perfectly on a line.
Correlation is not causation
A strong tells you two variables move together, not why. Three explanations are possible, and only the first is causation:
- really does cause .
- A lurking variable causes both (the ice cream and drowning example: hot weather drives both).
- Coincidence in the sample.
Because the data alone cannot tell these apart, you must not leap from a correlation to " causes ." Establishing causation requires a controlled experiment, not just observed association.
How TNReady examines this topic
- Multiple choice. Interpret an value (direction and strength), or choose the valid conclusion about causation.
- Inline choice. Complete a statement about whether a relationship is causal.
- Multiple select. Choose all correct interpretations of a correlation.
A clarifying idea is that describes the same line you fit in scatter plots and linear models: a high means the points hug that line of best fit, so the model predicts well, but it still says nothing about cause.
Why the causation distinction matters
This is one of the most important ideas in the whole statistics strand because it guards against a tempting but wrong inference. People naturally read a strong correlation as "doing more of will change ," and that mistake leads to bad decisions: cutting firefighters to reduce fire damage, or banning ice cream to prevent drownings. The fix is to ask, every time, could a third factor explain both? Hot weather, town size, age, or income are common lurking variables that make two unrelated things rise together. The standard wants you to recognize that observational data, no matter how strong the correlation, supports only an association, and that a controlled experiment (randomly assigning the treatment) is what can establish cause. On the EOC, the safe answer to "what does this strong correlation prove" is almost always that the variables are associated, not that one causes the other, and the best distractor to avoid is the one that asserts causation.
Try this
Q1. A data set has . Describe the linear relationship. [1 point]
- Cue. Weak positive (small magnitude, positive sign).
Q2. Shoe size and reading ability are positively correlated in children. What likely explains this? [1 point]
- Cue. A lurking variable, age: older children have bigger feet and read better.
Exam-style practice questions
Practice questions written in the style of TDOE exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
TNReady (style)1 marksMultiple choice. A data set has correlation coefficient . What does this indicate? (A) a strong negative linear relationship (B) a weak negative relationship (C) a strong positive relationship (D) no relationshipShow worked answer →
The correct answer is (A).
The correlation coefficient ranges from to . The sign gives direction (negative here) and the magnitude gives strength (closer to is stronger). Since is close to , it indicates a strong negative linear relationship: as increases, tends to decrease, with points tightly clustered around a falling line. A value near would mean little linear relationship.
TNReady (style)2 marksMultiple choice. Ice cream sales and drowning deaths are strongly positively correlated. Which conclusion is valid? (A) eating ice cream causes drowning (B) a third factor (hot weather) likely drives both (C) drowning causes ice cream sales (D) the correlation must be a calculation errorShow worked answer →
The correct answer is (B).
A strong correlation does not prove causation. Here a lurking variable, hot weather, plausibly increases both ice cream sales and swimming (hence drownings), so the two rise together without one causing the other. Choices (A) and (C) wrongly infer causation from correlation, and (D) is unwarranted. Recognizing a confounding third factor is exactly standard A1.S.ID.C.9.
Related dot points
- Represent two quantitative variables on a scatter plot, describe the relationship, fit a linear model, and interpret its slope and intercept in context (TN A1.S.ID.C.6, A1.S.ID.C.7).
A TNReady Algebra I answer on scatter plots and linear models (TN A1.S.ID.C.6-7), describing association, fitting a line of best fit, interpreting slope and intercept, and predicting with the model.
- Represent data with plots on the real number line, including dot plots, histograms, and box plots, and read the five-number summary from a box plot (TN A1.S.ID.A.1).
A TNReady Algebra I answer on representing single-variable data (TN A1.S.ID.A.1), dot plots, histograms, and box plots, and reading the median, quartiles, and range from a box plot.
- Use statistics appropriate to the shape of a distribution to compare center (mean, median) and spread (range, IQR, standard deviation), and interpret differences in context (TN A1.S.ID.A.2, A1.S.ID.A.3).
A TNReady Algebra I answer on comparing center and spread (TN A1.S.ID.A.2-3), mean versus median, range, IQR, and standard deviation, choosing statistics by shape, and the effect of outliers.
- Summarize categorical data for two categories in two-way frequency tables, and interpret joint, marginal, and conditional relative frequencies (TN A1.S.ID.C.5).
A TNReady Algebra I answer on two-way frequency tables (TN A1.S.ID.C.5), reading joint and marginal totals, and computing conditional relative frequencies as a fraction of a row or column.
- Compare properties of linear, quadratic, and exponential functions represented in different ways, and identify the family that models a situation (TN A1.F.IF.D.9, A1.F.LE.A.3).
A TNReady Algebra I answer on comparing function families (TN A1.F.IF.D.9, A1.F.LE.A.3), identifying linear, quadratic, and exponential behavior from tables and graphs, and comparing rates of growth.
Sources & how we know this
- Tennessee Academic Standards for Mathematics — Tennessee Department of Education (2024)
- TCAP Assessment Blueprint: Algebra I — Tennessee Department of Education (2024)