Why is the slope of a least-squares regression line a statistic with its own sampling distribution, and what does that allow us to infer?
Topic 9.1 Introducing Statistics: Do Those Points Align?: explain why a sample regression slope varies from sample to sample, motivating inference about the true population slope of a linear model.
A focused answer to AP Statistics Topic 9.1, on why a sample regression slope is a statistic that varies across samples, motivating confidence intervals and tests about the true population slope of a linear model.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The College Board (Topic 9.1) opens Unit 9 with the idea behind slope inference: the slope of a least-squares line fitted to a sample is a statistic that varies from sample to sample, so it estimates, but rarely equals, the true population slope . This motivates confidence intervals and tests about .
The sample slope is a statistic
Units 2 fit least-squares lines descriptively; Unit 9 treats the slope as a quantity with sampling variability. Just as a sample mean estimates and varies across samples, the sample slope estimates and varies across samples. Take a new random sample, refit the line, and you get a slightly different slope. The collection of all possible sample slopes forms a sampling distribution centered (under the model's conditions) at the true slope .
Why a non-zero slope is not proof
This is the central caution of the unit, and the reason inference is needed. The question "do those points align?" is really "is the observed slope larger than sampling variability alone would typically produce if ?" A small, easily-explained-by-chance slope is consistent with no relationship; a slope too large to be chance is evidence of a real linear association. Distinguishing the two requires the sampling distribution of , not just its observed value.
The parameter and the tools, previewed
The parameter of interest is the population slope . Unit 9 builds the two familiar tools on it, mirroring Units 6 and 7.
- Confidence interval for (Topics 9.2 to 9.3): estimate the true slope with , and judge claims (including whether is plausible).
- Significance test for (Topics 9.4 to 9.5): test (no linear relationship) with a t-statistic and P-value.
Both use a -distribution with degrees of freedom (two are spent estimating the intercept and slope), and both rely on regression conditions about the residuals. Topic 9.1 plants the idea that is a variable estimate of a fixed ; the later topics supply the machinery.
The mindset for the unit
As in every inference unit, the key move is to see the observed slope as one draw from a distribution, not the truth. The fitted line summarizes one sample; the population line is fixed but unknown. Inference about is the disciplined way to ask whether an observed trend is real or could be a fluke of sampling, the precise meaning of "do those points align?"
Try this
Q1. What parameter does the sample slope estimate, and why does vary? [2 points]
- Cue. estimates the population slope ; it varies because each random sample yields a different fitted line (sampling variability).
Q2. Why is a non-zero sample slope not proof of a relationship? [1 point]
- Cue. Even if , chance variation across samples can produce a non-zero ; inference is needed to rule that out.
Exam-style practice questions
Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AP 2019 (style)1 marksSection I (multiple choice). Two analysts each fit a least-squares line to a different random sample from the same population and get different slopes. This is best explained by (A) one made an error (B) the population slope changed (C) sampling variability in the sample slope (D) the data are not linearShow worked answer →
The correct answer is (C).
A sample slope is a statistic estimating the true slope ; different samples give different slopes because of sampling variability. The slope has its own sampling distribution centered at .
(A) needs no error. (B) the population slope is fixed. (D) the data can be linear and still give different sample slopes from different samples.
AP 2021 (style)3 marksSection II (free response). A biologist fits a least-squares line predicting plant height from rainfall, using one random sample, and obtains a positive sample slope. (a) Explain why a positive sample slope does not by itself prove that rainfall and height are truly related in the population. (b) Identify the parameter the biologist should make inferences about. (c) State, in general terms, how the biologist could decide whether the relationship is real.Show worked answer →
A 3-point conceptual question.
(a) (1 point) The sample slope varies from sample to sample; even if the true slope were (no relationship), a single random sample could produce a non-zero slope by chance, so a positive alone is not proof of a real relationship.
(b) (1 point) The true population slope of the linear model relating height to rainfall.
(c) (1 point) Build a confidence interval for or run a significance test of ; if is implausible (outside the interval, or a small P-value), there is evidence of a real linear relationship.
Markers reward recognizing sampling variability in , naming as the parameter, and naming a test or interval as the decision tool.
Related dot points
- Topic 9.2 Confidence Intervals for the Slope of a Regression Model: check the regression conditions and construct a t-interval for the population slope using the sample slope, its standard error, and n minus 2 degrees of freedom.
A focused answer to AP Statistics Topic 9.2, on building a t-interval for the population slope - checking the regression conditions, reading the slope and its standard error from computer output, and using n minus 2 degrees of freedom - with a full worked interval.
- Topic 9.4 Setting Up a Test for the Slope of a Regression Model: state the null and alternative hypotheses about the population slope, identify the significance level, and verify the regression conditions for a t-test.
A focused answer to AP Statistics Topic 9.4, on writing the null and alternative hypotheses for a regression slope (testing beta equals 0), choosing the significance level, and checking the regression conditions for a t-test.
- Topic 2.8 Least Squares Regression: determine the least-squares regression line from summary statistics, and interpret the coefficient of determination r-squared and the standard deviation of the residuals.
A focused answer to AP Statistics Topic 2.8, on why the least-squares line minimizes squared residuals, computing it from means, standard deviations, and r, and interpreting r-squared and s, with full worked calculations.
- Topic 2.6 Linear Regression Models: write, interpret, and use a least-squares regression equation to predict a response, interpreting the slope and intercept in context, and recognizing the danger of extrapolation.
A focused answer to AP Statistics Topic 2.6, on the form of a regression equation, interpreting slope and intercept in context, making predictions, and the danger of extrapolation, with a worked prediction and interpretation.
- Topic 5.7 Sampling Distributions for Sample Means: describe the mean, standard deviation, and shape of the sampling distribution of a sample mean, using the central limit theorem and the standard deviation formula sigma over root n.
A focused answer to AP Statistics Topic 5.7, on the mean, standard deviation, and shape of the sampling distribution of a sample mean, the sigma-over-root-n formula, the conditions for normality, and finding probabilities, with full worked calculations.
Sources & how we know this
- AP Statistics Course and Exam Description — College Board (2020)