Skip to main content
United StatesStatisticsSyllabus dot point

What is a residual, and how does a residual plot reveal whether a linear model fits?

Topic 2.7 Residuals: calculate and interpret residuals, construct and read residual plots, and use them to assess whether a linear model is appropriate.

A focused answer to AP Statistics Topic 2.7, defining the residual as observed minus predicted, interpreting positive and negative residuals, and using residual plots to judge whether a linear model is appropriate, with worked calculations.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this topic is asking
  2. What a residual is
  3. Residuals sum to zero for least squares
  4. The residual plot
  5. Why the residual plot beats the correlation for checking fit
  6. Try this

What this topic is asking

The College Board (Topic 2.7) wants you to calculate and interpret residuals (observed minus predicted) and to read a residual plot to decide whether a linear model is appropriate. The residual plot is the standard diagnostic for linearity.

What a residual is

The order of subtraction is fixed: observed minus predicted, never the reverse. The sign then carries meaning. A positive residual (y>y^y > \hat{y}) means the actual value is above the line, so the model underpredicted. A negative residual (y<y^y < \hat{y}) means the actual value is below the line, so the model overpredicted. A residual of 00 means the point lies exactly on the line.

Residuals sum to zero for least squares

A property worth knowing is that for a least-squares line the residuals always sum to zero (and so average zero): the line balances the points so that over- and under-predictions cancel overall. This is why you cannot judge fit by adding residuals up; instead you look at their pattern, which is the job of the residual plot. It also connects to the next topic, where the least-squares line is defined precisely as the line that minimizes the sum of squared residuals.

The residual plot

The residual plot magnifies departures from linearity that are hard to see in the original scatterplot. By removing the linear trend, it leaves only what the line failed to capture, so any leftover structure jumps out. The single most important reading is: random scatter means the line fits; a curve means it does not. A U-shaped or arch-shaped residual plot is the classic signature of a relationship that is really curved, telling you to transform the data (Topic 2.9) rather than trust the straight line.

Why the residual plot beats the correlation for checking fit

Students often try to judge a model's appropriateness from rr or r2r^2, but those numbers can be large even when a line is the wrong model. A gently curved relationship can have a high correlation while still being non-linear, and a straight-line fit would then systematically over- and under-predict in a pattern, exactly what the residual plot exposes and the correlation hides. So the correct workflow is: fit the line, then look at the residual plot, and only if it shows random scatter do you conclude the linear model is appropriate. This is why the College Board so often pairs a regression with a residual plot and asks "is a linear model appropriate?": the expected reasoning cites the residual plot's pattern (random scatter, yes; curve or fan, no), not the size of rr. Interpreting residual plots correctly, and resisting the temptation to declare a good fit on the strength of a high r2r^2 alone, is one of the most reliably tested skills in the unit. A strong answer also reads individual large residuals as points the model fits poorly, which links forward to identifying outliers and influential points in Topic 2.9.

Try this

Q1. A point has observed y=40y = 40 and predicted y^=46\hat{y} = 46. Find the residual and state whether the model over- or under-predicted. [2 points]

  • Cue. Residual =40βˆ’46=βˆ’6= 40 - 46 = -6 (negative), so the model overpredicted; the point lies 66 below the line.

Q2. A residual plot shows a clear upward-opening curve. What does this say about the linear model? [1 point]

  • Cue. The linear model is not appropriate; the curved residual pattern shows a non-linear relationship the straight line fails to capture.

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2019 (style)1 marksSection I (multiple choice). A regression line predicts y^=18\hat{y} = 18 for a point whose observed value is y=15y = 15. What is the residual? (A) 3333 (B) 33 (C) βˆ’3-3 (D) 1.21.2
Show worked answer β†’

The correct answer is (C).

A residual is observed minus predicted: yβˆ’y^=15βˆ’18=βˆ’3y - \hat{y} = 15 - 18 = -3. The negative sign means the model overpredicted this point (the line is above the actual value).

(A) adds instead of subtracting. (B) reverses the subtraction (y^βˆ’y\hat{y} - y). (D) is unrelated. The residual is always observed minus predicted, keeping its sign.

AP 2022 (style)4 marksSection II (free response). A least-squares line y^=5+2x\hat{y} = 5 + 2x is fit to data. (a) For the point (4,15)(4, 15), calculate the residual and interpret its sign. (b) A residual plot of these data shows a clear U-shaped (curved) pattern. What does this tell you about the linear model? (c) Describe what a residual plot should look like if a linear model is appropriate.
Show worked answer β†’

A 4-point question on residuals and residual plots.

(a) (2 points) Predicted: y^=5+2(4)=13\hat{y} = 5 + 2(4) = 13. Residual: yβˆ’y^=15βˆ’13=2y - \hat{y} = 15 - 13 = 2 (1 point). Interpretation (1 point): the residual is positive, so the model underpredicted this point; the actual value lies 22 units above the line.
(b) (1 point) A U-shaped (curved) pattern in the residual plot indicates the linear model is not appropriate: the data have a non-linear relationship that the straight line systematically misses.
(c) (1 point) For an appropriate linear model, the residual plot should show random scatter about 00 with no pattern (no curve, no fanning), indicating the line captures the trend and only random variation remains.

Markers reward a correct residual with interpretation, recognizing the curve as evidence against linearity, and describing random scatter as the sign of a good linear fit.

Related dot points

Sources & how we know this