Skip to main content
TennesseeMathsSyllabus dot point

What does the correlation coefficient tell you about a linear relationship, and why does correlation not imply causation?

Interpret the correlation coefficient of a linear fit and distinguish correlation from causation (TN A1.S.ID.C.8, A1.S.ID.C.9).

A TNReady Algebra I answer on the correlation coefficient (TN A1.S.ID.C.8-9), reading r between -1 and 1 for direction and strength, and why a correlation does not prove one variable causes another.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this topic is asking
  2. Reading the correlation coefficient
  3. Correlation is not causation
  4. How TNReady examines this topic
  5. Why the causation distinction matters
  6. Try this

What this topic is asking

Standards A1.S.ID.C.8 and A1.S.ID.C.9 finish the data strand. C.8: interpret the correlation coefficient rr of a linear fit, a number describing the direction and strength of a linear relationship. C.9: distinguish correlation from causation, knowing that a relationship between two variables does not prove one causes the other.

Reading the correlation coefficient

rr packs two pieces of information into one number between 1-1 and 11:

  • Sign = direction. r>0r > 0 is a positive association (line rises); r<0r < 0 is negative (line falls).
  • Magnitude = strength. r|r| near 11 means a strong linear pattern (tight cluster); r|r| near 00 means weak or no linear pattern.

So r=0.9r = 0.9 is strong positive, r=0.85r = -0.85 is strong negative, and r=0.1r = 0.1 is weak. A value of exactly ±1\pm 1 means the points lie perfectly on a line.

Correlation is not causation

A strong rr tells you two variables move together, not why. Three explanations are possible, and only the first is causation:

  1. xx really does cause yy.
  2. A lurking variable causes both (the ice cream and drowning example: hot weather drives both).
  3. Coincidence in the sample.

Because the data alone cannot tell these apart, you must not leap from a correlation to "xx causes yy." Establishing causation requires a controlled experiment, not just observed association.

How TNReady examines this topic

  • Multiple choice. Interpret an rr value (direction and strength), or choose the valid conclusion about causation.
  • Inline choice. Complete a statement about whether a relationship is causal.
  • Multiple select. Choose all correct interpretations of a correlation.

A clarifying idea is that rr describes the same line you fit in scatter plots and linear models: a high r|r| means the points hug that line of best fit, so the model predicts well, but it still says nothing about cause.

Why the causation distinction matters

This is one of the most important ideas in the whole statistics strand because it guards against a tempting but wrong inference. People naturally read a strong correlation as "doing more of xx will change yy," and that mistake leads to bad decisions: cutting firefighters to reduce fire damage, or banning ice cream to prevent drownings. The fix is to ask, every time, could a third factor explain both? Hot weather, town size, age, or income are common lurking variables that make two unrelated things rise together. The standard wants you to recognize that observational data, no matter how strong the correlation, supports only an association, and that a controlled experiment (randomly assigning the treatment) is what can establish cause. On the EOC, the safe answer to "what does this strong correlation prove" is almost always that the variables are associated, not that one causes the other, and the best distractor to avoid is the one that asserts causation.

Try this

Q1. A data set has r=0.2r = 0.2. Describe the linear relationship. [1 point]

  • Cue. Weak positive (small magnitude, positive sign).

Q2. Shoe size and reading ability are positively correlated in children. What likely explains this? [1 point]

  • Cue. A lurking variable, age: older children have bigger feet and read better.

Exam-style practice questions

Practice questions written in the style of TDOE exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

TNReady (style)1 marksMultiple choice. A data set has correlation coefficient r=0.95r = -0.95. What does this indicate? (A) a strong negative linear relationship (B) a weak negative relationship (C) a strong positive relationship (D) no relationship
Show worked answer →

The correct answer is (A).

The correlation coefficient rr ranges from 1-1 to 11. The sign gives direction (negative here) and the magnitude gives strength (closer to ±1\pm 1 is stronger). Since 0.95-0.95 is close to 1-1, it indicates a strong negative linear relationship: as xx increases, yy tends to decrease, with points tightly clustered around a falling line. A value near 00 would mean little linear relationship.

TNReady (style)2 marksMultiple choice. Ice cream sales and drowning deaths are strongly positively correlated. Which conclusion is valid? (A) eating ice cream causes drowning (B) a third factor (hot weather) likely drives both (C) drowning causes ice cream sales (D) the correlation must be a calculation error
Show worked answer →

The correct answer is (B).

A strong correlation does not prove causation. Here a lurking variable, hot weather, plausibly increases both ice cream sales and swimming (hence drownings), so the two rise together without one causing the other. Choices (A) and (C) wrongly infer causation from correlation, and (D) is unwarranted. Recognizing a confounding third factor is exactly standard A1.S.ID.C.9.

Related dot points

Sources & how we know this