Given x = 10, y = 50, and slope b = 4, find the intercept. [1 point]

United StatesStatisticsSyllabus dot point

What makes the least-squares line the best line, and what do its formulas and r-squared tell us?

Topic 2.8 Least Squares Regression: determine the least-squares regression line from summary statistics, and interpret the coefficient of determination r-squared and the standard deviation of the residuals.

A focused answer to AP Statistics Topic 2.8, on why the least-squares line minimizes squared residuals, computing it from means, standard deviations, and r, and interpreting r-squared and s, with full worked calculations.

Generated by Claude Opus 4.810 min answerUpdated 2026-06-04

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this topic is asking
Why "least squares"
Computing the line from summary statistics
Interpreting r-squared
The standard deviation of the residuals
Try this

What this topic is asking

The College Board (Topic 2.8) wants you to find the least-squares regression line from summary statistics (means, standard deviations, and $r$ ), to know why it is the best-fitting line, and to interpret the coefficient of determination $r^2$ and the standard deviation of the residuals.

Why "least squares"

Among all possible lines, exactly one minimizes that sum, and that is the line technology reports. The choice to square (rather than, say, take absolute values) is what makes the slope and intercept have clean formulas in terms of $r$ and the standard deviations, and it ties the line to the correlation you already know.

Computing the line from summary statistics

The slope formula is worth reading: it scales the correlation by the ratio of the spreads, converting the unit-free $r$ into a slope in the units of $y$ per unit of $x$ . Because it contains $r$ , the slope has the same sign as the correlation: positive correlation gives positive slope. And once you have the slope, the intercept formula forces the line through $(\bar{x}, \bar{y})$ , a fact that is itself sometimes tested directly.

Interpreting r-squared

The coefficient of determination $r^2$ is the single most important fit measure on the exam. It is the proportion (or percentage) of the variation in the response $y$ that is explained by the linear model with $x$ . If $r^2 = 0.64$ , then about $64\%$ of the variability in $y$ is accounted for by its linear relationship with $x$ , and the remaining $36\%$ is due to other factors and random variation. A full-credit interpretation always contains four elements: the percentage, "of the variation in [ $y$ in context]," "is explained by," and "the linear relationship with [ $x$ in context]." Two errors recur: interpreting $r^2$ as the proportion of points on the line (it is about variation, not points), and confusing $r^2$ with $r$ (the correlation). Because $r^2 = (r)^2$ , you can move between them, but they answer different questions: $r$ measures the strength and direction of the linear association, while $r^2$ measures the share of variation explained.

The standard deviation of the residuals

The other fit measure is $s$ , the standard deviation of the residuals, which estimates the typical size of a prediction error in the units of $y$ . Where $r^2$ is a unitless proportion, $s$ is a concrete "on average our predictions are off by about $s$ [units]." A smaller $s$ means tighter predictions. Reading the two together gives a rounded picture: $r^2$ says what fraction of the variation the line captures, and $s$ says, in real units, how large the leftover errors typically are. On the exam, $s$ usually appears in computer output labelled near the regression equation, and you interpret it as the typical residual size, for example "predicted exam scores are typically off by about $4$ points." Being fluent at pulling $b$ , $a$ , $r$ , $r^2$ , and $s$ out of standard regression output, and interpreting each in context, is exactly the skill the next layer of exam questions (and the guide on reading computer output) builds on.

Building the line and interpreting the fit

A data set has $\bar{x} = 20$ , $s_x = 5$ , $\bar{y} = 100$ , $s_y = 15$ , $r = 0.6$ , where $x$ is hours of practice and $y$ is a skill score. (a) Find the least-squares line. (b) Interpret $r^2$ .

step 1 Slope

b = r \cdot \frac{s_y}{s_x} = 0.6 \cdot \frac{15}{5} = 0.6 \cdot 3 = 1.8.

For each extra hour of practice, predicted skill score rises by

1.8

points, on average.

step 2 Intercept

a = \bar{y} - b\bar{x} = 100 - 1.8(20) = 100 - 36 = 64.

So the line is

\hat{y} = 64 + 1.8x

, and it passes through

(\bar{x}, \bar{y}) = (20, 100)

(check:

64 + 1.8(20) = 100

step 3 Compute r-squared

r^2 = 0.6^2 = 0.36.

step 4 Interpret r-squared in context

About $36\%$ of the variation in skill score is explained by the linear relationship with hours of practice. The other $64\%$ is due to other factors and random variation.

step 5 Interpret

The least-squares line $\hat{y} = 64 + 1.8x$ predicts skill score from practice hours, rising $1.8$ points per hour, and its $r^2$ of $0.36$ shows the linear model explains only about a third of the variation, so practice matters but is far from the whole story.

Try this

Q1. A regression has $r = -0.5$ . Find $r^2$ and state what it means. [2 points]

Cue. $r^2 = (-0.5)^2 = 0.25$ ; about $25\%$ of the variation in $y$ is explained by the linear relationship with $x$ .

Q2. Given $\bar{x} = 10$ , $\bar{y} = 50$ , and slope $b = 4$ , find the intercept. [1 point]

Cue. $a = \bar{y} - b\bar{x} = 50 - 4(10) = 50 - 40 = 10$ .

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2018 (style)1 marksSection I (multiple choice). A regression of

y

x

has

r = 0.8

. What proportion of the variation in

y

is explained by the linear relationship with

x

? (A)

0.8

(B)

0.64

(C)

0.4

(D)

0.9

Show worked answer →

The correct answer is (B).

The coefficient of determination is $r^2 = 0.8^2 = 0.64$ , so about $64\%$ of the variation in $y$ is explained by the linear relationship with $x$ .

(A) is $r$ itself, the correlation, not the proportion of variation explained. (C) and (D) are unrelated. The proportion of variation explained is always $r^2$ , not $r$ .

AP 2021 (style)4 marksSection II (free response). For a data set,

\bar{x} = 50

s_x = 10

\bar{y} = 200

s_y = 40

, and

r = 0.75

. (a) Find the slope and intercept of the least-squares line. (b) Interpret

r^2

in context, where

x

is hours of training and

y

is a performance score. (c) State what the least-squares line minimizes.

Show worked answer →

A 4-point computation-and-interpretation question.

(a) (2 points) Slope $b = r \cdot \frac{s_y}{s_x} = 0.75 \cdot \frac{40}{10} = 0.75 \cdot 4 = 3$ (1 point). Intercept $a = \bar{y} - b\bar{x} = 200 - 3(50) = 200 - 150 = 50$ (1 point). So $\hat{y} = 50 + 3x$ .
(b) (1 point) $r^2 = 0.75^2 = 0.5625$ , so about $56\%$ of the variation in performance score is explained by the linear relationship with hours of training.
(c) (1 point) The least-squares line minimizes the sum of the squared residuals (the sum of squared vertical distances from the points to the line).

Markers reward the correct slope and intercept from the summary-statistic formulas, an $r^2$ interpretation in context, and the definition of what least squares minimizes.

Related dot points

Sources & how we know this

AP Statistics Course and Exam Description — College Board (2020)