Skip to main content
North CarolinaMathsSyllabus dot point

What does the correlation coefficient measure, and why does correlation not imply causation?

Use the correlation coefficient to describe the strength and direction of a linear relationship and distinguish correlation from causation (NC.M1.S-ID.8, S-ID.6c).

An NC Math 1 EOC answer on correlation (NC.M1.S-ID.8, S-ID.6c): what the correlation coefficient r measures, reading its sign and size, why correlation does not imply causation, and assessing fit with residuals.

Generated by Claude Opus 4.810 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this topic is asking
  2. What the correlation coefficient measures
  3. Reading r in context
  4. Why correlation is not causation
  5. Residuals and fit (S-ID.6c)
  6. How the NC Math 1 EOC examines this topic
  7. Why the causation caution matters
  8. Try this

What this topic is asking

NC.M1.S-ID.8 asks you to use the correlation coefficient rr to describe the strength and direction of a linear relationship and to recognize that correlation does not imply causation. NC.M1.S-ID.6c adds assessing fit qualitatively with residuals. This is about quantifying and interpreting a linear relationship responsibly.

What the correlation coefficient measures

The value rr packs direction and strength into one number.

So r=0.9r = 0.9 is strong positive, r=0.85r = -0.85 is strong negative, and r=0.1r = 0.1 is essentially no linear relationship.

Reading r in context

Why correlation is not causation

This is the most-tested idea in the strand. A strong correlation means two variables move together, but the cause could be:

  • A lurking variable affecting both (hot weather raising both ice cream sales and drowning rates).
  • Reverse causation (the assumed effect actually drives the cause).
  • Coincidence in a particular data set.

Only a controlled experiment, not a correlation, can establish causation.

Residuals and fit (S-ID.6c)

A residual is the difference between an actual data value and the value predicted by the line of best fit (observed minus predicted). If residuals are small and randomly scattered with no pattern, the line fits well; a clear pattern in the residuals suggests a line is not the right model. This is a qualitative check on the fit of a line of best fit.

How the NC Math 1 EOC examines this topic

  • Multiple choice. Interpret an rr value's sign and strength, or identify a correlation-causation error.
  • Short reasoning. Explain why a correlation does not prove causation.
  • Technology-enhanced. Match rr values to scatter plots.

This completes the two-variable thread that begins with scatter plots and parallels association in two-way tables.

Why the causation caution matters

It is tempting to leap from "these move together" to "this causes that," but that leap is the most common statistical mistake, and the EOC tests it deliberately. The correlation coefficient is honest about what it measures, the tightness and direction of a linear pattern, and silent about why the pattern exists. A lurking variable can manufacture a strong correlation between two effects of a common cause, as with ice cream and drowning both driven by summer heat. Holding this distinction protects you from a whole class of wrong conclusions and is exactly the reasoning S-ID.8 asks for: report what rr shows (association), and refuse to claim what it cannot (causation) without an experiment.

Try this

Q1. What does r=0.05r = 0.05 indicate about a linear relationship? [1 point]

  • Cue. Near 00: essentially no linear relationship.

Q2. Shoe size and reading ability correlate in children. Does bigger feet cause better reading? [1 point]

  • Cue. No; age is a lurking variable (older children have bigger feet and read better).

Exam-style practice questions

Practice questions written in the style of NCDPI exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

NC Math 1 EOC (style)1 marksA data set has correlation coefficient r=0.95r = -0.95. What does this indicate? (A) weak positive (B) strong negative (C) no relationship (D) strong positive
Show worked answer →

The correct answer is (B), strong negative.

The correlation coefficient rr ranges from 1-1 to 11. A value near 1-1 (like 0.95-0.95) means a strong negative linear relationship: as one variable increases, the other tends to decrease, and the points lie close to a downward line. The sign gives direction; the size (closeness to 11) gives strength.

NC Math 1 EOC (style)2 marksIce cream sales and drowning rates both rise in summer, with high correlation. Does ice cream cause drowning? Explain.
Show worked answer →

No. The two are correlated but neither causes the other; a lurking variable (hot weather) drives both.

High correlation only means the two variables move together, not that one causes the other. Here warm weather increases both ice cream sales and swimming (and thus drownings), so weather is a lurking variable. This is the classic "correlation does not imply causation" point that S-ID.8 tests directly.

Related dot points

Sources & how we know this