What is a residual, and how does a residual plot reveal whether a linear model fits?
Topic 2.7 Residuals: calculate and interpret residuals, construct and read residual plots, and use them to assess whether a linear model is appropriate.
A focused answer to AP Statistics Topic 2.7, defining the residual as observed minus predicted, interpreting positive and negative residuals, and using residual plots to judge whether a linear model is appropriate, with worked calculations.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The College Board (Topic 2.7) wants you to calculate and interpret residuals (observed minus predicted) and to read a residual plot to decide whether a linear model is appropriate. The residual plot is the standard diagnostic for linearity.
What a residual is
The order of subtraction is fixed: observed minus predicted, never the reverse. The sign then carries meaning. A positive residual () means the actual value is above the line, so the model underpredicted. A negative residual () means the actual value is below the line, so the model overpredicted. A residual of means the point lies exactly on the line.
Residuals sum to zero for least squares
A property worth knowing is that for a least-squares line the residuals always sum to zero (and so average zero): the line balances the points so that over- and under-predictions cancel overall. This is why you cannot judge fit by adding residuals up; instead you look at their pattern, which is the job of the residual plot. It also connects to the next topic, where the least-squares line is defined precisely as the line that minimizes the sum of squared residuals.
The residual plot
The residual plot magnifies departures from linearity that are hard to see in the original scatterplot. By removing the linear trend, it leaves only what the line failed to capture, so any leftover structure jumps out. The single most important reading is: random scatter means the line fits; a curve means it does not. A U-shaped or arch-shaped residual plot is the classic signature of a relationship that is really curved, telling you to transform the data (Topic 2.9) rather than trust the straight line.
Why the residual plot beats the correlation for checking fit
Students often try to judge a model's appropriateness from or , but those numbers can be large even when a line is the wrong model. A gently curved relationship can have a high correlation while still being non-linear, and a straight-line fit would then systematically over- and under-predict in a pattern, exactly what the residual plot exposes and the correlation hides. So the correct workflow is: fit the line, then look at the residual plot, and only if it shows random scatter do you conclude the linear model is appropriate. This is why the College Board so often pairs a regression with a residual plot and asks "is a linear model appropriate?": the expected reasoning cites the residual plot's pattern (random scatter, yes; curve or fan, no), not the size of . Interpreting residual plots correctly, and resisting the temptation to declare a good fit on the strength of a high alone, is one of the most reliably tested skills in the unit. A strong answer also reads individual large residuals as points the model fits poorly, which links forward to identifying outliers and influential points in Topic 2.9.
Try this
Q1. A point has observed and predicted . Find the residual and state whether the model over- or under-predicted. [2 points]
- Cue. Residual (negative), so the model overpredicted; the point lies below the line.
Q2. A residual plot shows a clear upward-opening curve. What does this say about the linear model? [1 point]
- Cue. The linear model is not appropriate; the curved residual pattern shows a non-linear relationship the straight line fails to capture.
Exam-style practice questions
Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AP 2019 (style)1 marksSection I (multiple choice). A regression line predicts for a point whose observed value is . What is the residual? (A) (B) (C) (D) Show worked answer β
The correct answer is (C).
A residual is observed minus predicted: . The negative sign means the model overpredicted this point (the line is above the actual value).
(A) adds instead of subtracting. (B) reverses the subtraction (). (D) is unrelated. The residual is always observed minus predicted, keeping its sign.
AP 2022 (style)4 marksSection II (free response). A least-squares line is fit to data. (a) For the point , calculate the residual and interpret its sign. (b) A residual plot of these data shows a clear U-shaped (curved) pattern. What does this tell you about the linear model? (c) Describe what a residual plot should look like if a linear model is appropriate.Show worked answer β
A 4-point question on residuals and residual plots.
(a) (2 points) Predicted: . Residual: (1 point). Interpretation (1 point): the residual is positive, so the model underpredicted this point; the actual value lies units above the line.
(b) (1 point) A U-shaped (curved) pattern in the residual plot indicates the linear model is not appropriate: the data have a non-linear relationship that the straight line systematically misses.
(c) (1 point) For an appropriate linear model, the residual plot should show random scatter about with no pattern (no curve, no fanning), indicating the line captures the trend and only random variation remains.
Markers reward a correct residual with interpretation, recognizing the curve as evidence against linearity, and describing random scatter as the sign of a good linear fit.
Related dot points
- Topic 2.6 Linear Regression Models: write, interpret, and use a least-squares regression equation to predict a response, interpreting the slope and intercept in context, and recognizing the danger of extrapolation.
A focused answer to AP Statistics Topic 2.6, on the form of a regression equation, interpreting slope and intercept in context, making predictions, and the danger of extrapolation, with a worked prediction and interpretation.
- Topic 2.8 Least Squares Regression: determine the least-squares regression line from summary statistics, and interpret the coefficient of determination r-squared and the standard deviation of the residuals.
A focused answer to AP Statistics Topic 2.8, on why the least-squares line minimizes squared residuals, computing it from means, standard deviations, and r, and interpreting r-squared and s, with full worked calculations.
- Topic 2.9 Analyzing Departures from Linearity: identify outliers, high-leverage, and influential points in regression, and use transformations to model a non-linear relationship.
A focused answer to AP Statistics Topic 2.9, on regression outliers, high-leverage and influential points, and using transformations (logs and powers) to linearise a curved relationship, with a worked transformation example.
- Topic 2.4 Representing the Relationship Between Two Quantitative Variables: construct and describe scatterplots by direction, form, strength, and unusual features, in context.
A focused answer to AP Statistics Topic 2.4, on building scatterplots and describing them by direction, form, strength, and unusual features (the DUFS framework), in context, with a worked description.
- Topic 2.5 Correlation: calculate and interpret the correlation coefficient r, understand its properties (range, unit-free, resistance), and recognize what it can and cannot tell you.
A focused answer to AP Statistics Topic 2.5, defining the correlation coefficient r, its range and properties (unit-free, symmetric, non-resistant), what it measures and misses, and the correlation-causation caution, with a worked interpretation.
Sources & how we know this
- AP Statistics Course and Exam Description β College Board (2020)