Skip to main content
United StatesStatisticsSyllabus dot point

What makes the least-squares line the best line, and what do its formulas and r-squared tell us?

Topic 2.8 Least Squares Regression: determine the least-squares regression line from summary statistics, and interpret the coefficient of determination r-squared and the standard deviation of the residuals.

A focused answer to AP Statistics Topic 2.8, on why the least-squares line minimizes squared residuals, computing it from means, standard deviations, and r, and interpreting r-squared and s, with full worked calculations.

Generated by Claude Opus 4.810 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this topic is asking
  2. Why "least squares"
  3. Computing the line from summary statistics
  4. Interpreting r-squared
  5. The standard deviation of the residuals
  6. Try this

What this topic is asking

The College Board (Topic 2.8) wants you to find the least-squares regression line from summary statistics (means, standard deviations, and rr), to know why it is the best-fitting line, and to interpret the coefficient of determination r2r^2 and the standard deviation of the residuals.

Why "least squares"

Among all possible lines, exactly one minimizes that sum, and that is the line technology reports. The choice to square (rather than, say, take absolute values) is what makes the slope and intercept have clean formulas in terms of rr and the standard deviations, and it ties the line to the correlation you already know.

Computing the line from summary statistics

The slope formula is worth reading: it scales the correlation by the ratio of the spreads, converting the unit-free rr into a slope in the units of yy per unit of xx. Because it contains rr, the slope has the same sign as the correlation: positive correlation gives positive slope. And once you have the slope, the intercept formula forces the line through (xˉ,yˉ)(\bar{x}, \bar{y}), a fact that is itself sometimes tested directly.

Interpreting r-squared

The coefficient of determination r2r^2 is the single most important fit measure on the exam. It is the proportion (or percentage) of the variation in the response yy that is explained by the linear model with xx. If r2=0.64r^2 = 0.64, then about 64%64\% of the variability in yy is accounted for by its linear relationship with xx, and the remaining 36%36\% is due to other factors and random variation. A full-credit interpretation always contains four elements: the percentage, "of the variation in [yy in context]," "is explained by," and "the linear relationship with [xx in context]." Two errors recur: interpreting r2r^2 as the proportion of points on the line (it is about variation, not points), and confusing r2r^2 with rr (the correlation). Because r2=(r)2r^2 = (r)^2, you can move between them, but they answer different questions: rr measures the strength and direction of the linear association, while r2r^2 measures the share of variation explained.

The standard deviation of the residuals

The other fit measure is ss, the standard deviation of the residuals, which estimates the typical size of a prediction error in the units of yy. Where r2r^2 is a unitless proportion, ss is a concrete "on average our predictions are off by about ss [units]." A smaller ss means tighter predictions. Reading the two together gives a rounded picture: r2r^2 says what fraction of the variation the line captures, and ss says, in real units, how large the leftover errors typically are. On the exam, ss usually appears in computer output labelled near the regression equation, and you interpret it as the typical residual size, for example "predicted exam scores are typically off by about 44 points." Being fluent at pulling bb, aa, rr, r2r^2, and ss out of standard regression output, and interpreting each in context, is exactly the skill the next layer of exam questions (and the guide on reading computer output) builds on.

Try this

Q1. A regression has r=−0.5r = -0.5. Find r2r^2 and state what it means. [2 points]

  • Cue. r2=(−0.5)2=0.25r^2 = (-0.5)^2 = 0.25; about 25%25\% of the variation in yy is explained by the linear relationship with xx.

Q2. Given xˉ=10\bar{x} = 10, yˉ=50\bar{y} = 50, and slope b=4b = 4, find the intercept. [1 point]

  • Cue. a=yˉ−bxˉ=50−4(10)=50−40=10a = \bar{y} - b\bar{x} = 50 - 4(10) = 50 - 40 = 10.

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2018 (style)1 marksSection I (multiple choice). A regression of yy on xx has r=0.8r = 0.8. What proportion of the variation in yy is explained by the linear relationship with xx? (A) 0.80.8 (B) 0.640.64 (C) 0.40.4 (D) 0.90.9
Show worked answer →

The correct answer is (B).

The coefficient of determination is r2=0.82=0.64r^2 = 0.8^2 = 0.64, so about 64%64\% of the variation in yy is explained by the linear relationship with xx.

(A) is rr itself, the correlation, not the proportion of variation explained. (C) and (D) are unrelated. The proportion of variation explained is always r2r^2, not rr.

AP 2021 (style)4 marksSection II (free response). For a data set, xˉ=50\bar{x} = 50, sx=10s_x = 10, yˉ=200\bar{y} = 200, sy=40s_y = 40, and r=0.75r = 0.75. (a) Find the slope and intercept of the least-squares line. (b) Interpret r2r^2 in context, where xx is hours of training and yy is a performance score. (c) State what the least-squares line minimizes.
Show worked answer →

A 4-point computation-and-interpretation question.

(a) (2 points) Slope b=r⋅sysx=0.75⋅4010=0.75⋅4=3b = r \cdot \frac{s_y}{s_x} = 0.75 \cdot \frac{40}{10} = 0.75 \cdot 4 = 3 (1 point). Intercept a=yˉ−bxˉ=200−3(50)=200−150=50a = \bar{y} - b\bar{x} = 200 - 3(50) = 200 - 150 = 50 (1 point). So y^=50+3x\hat{y} = 50 + 3x.
(b) (1 point) r2=0.752=0.5625r^2 = 0.75^2 = 0.5625, so about 56%56\% of the variation in performance score is explained by the linear relationship with hours of training.
(c) (1 point) The least-squares line minimizes the sum of the squared residuals (the sum of squared vertical distances from the points to the line).

Markers reward the correct slope and intercept from the summary-statistic formulas, an r2r^2 interpretation in context, and the definition of what least squares minimizes.

Related dot points

Sources & how we know this