New YorkMathsSyllabus dot point

How do you fit a line to two-variable data, interpret its slope and intercept, and read the correlation and residuals?

Construct and interpret scatter plots; fit a linear (or exponential) model to bivariate data; interpret the slope and intercept in context; compute and interpret residuals; and distinguish the correlation coefficient from causation.

A NY Regents Algebra I answer on bivariate data: scatter plots, fitting a line of best fit, interpreting slope and intercept, computing residuals, reading the correlation coefficient, and the correlation-versus-causation distinction.

Generated by Claude Opus 4.89 min answerUpdated 2026-06-02

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this topic is asking
Scatter plots: form, direction, strength
The line of best fit and its parameters
Residuals
Correlation versus causation
Try this

What this topic is asking

The Regents Algebra I exam (the Interpreting Categorical and Quantitative Data, S-ID, cluster) wants you to build and read a scatter plot, fit a line of best fit to two-variable data, interpret its slope and intercept in context, compute and interpret residuals, and tell the correlation coefficient apart from causation. Two-variable statistics reliably contributes several questions, including a multi-step constructed-response item.

Scatter plots: form, direction, strength

A scatter plot plots each data pair $(x, y)$ as a point. You describe it three ways: form (does it follow a line or a curve?), direction (as $x$ rises, does $y$ rise, a positive association, or fall, a negative one?), and strength (how tightly do the points cluster around the trend?). A roughly straight, tight, upward cloud is a strong positive linear association.

The line of best fit and its parameters

The line of best fit (least-squares regression line) is the line that best models the trend, usually found with a graphing calculator. In context its parameters carry meaning:

The hat on $\hat{y}$ marks it as a prediction, not an observed value. Interpreting the slope and intercept in the situation's units is a frequent constructed-response task: for $\hat{y} = 3.2x + 14$ relating study hours to score, the slope means about 3.2 more points per hour, and the intercept is the predicted score with no study.

Residuals

A residual measures how far an actual data point lies from the prediction.

Computing and interpreting a residual

A line of best fit for monthly sales (in thousands) versus advertising spend is $\hat{y} = 1.8x + 5$ . In a month with spend $x = 10$ , actual sales were $20$ thousand. Find and interpret the residual.

step 1 Compute the predicted value

$\hat{y} = 1.8(10) + 5 = 18 + 5 = 23$ thousand.

step 2 Apply the residual formula

Residual $= \text{actual} - \text{predicted} = 20 - 23 = -3$ thousand.

step 3 Interpret the sign

The residual is negative, so the actual sales fell below the prediction: the model overestimated sales that month.

step 4 State it in context

For that month, the model overpredicted sales by 3 thousand. The order actual minus predicted is essential; reversing it flips the sign and the interpretation.

Correlation versus causation

The correlation coefficient $r$ ranges from $-1$ to $1$ . Values near $1$ or $-1$ indicate a strong linear relationship (positive or negative), and values near $0$ indicate little linear relationship. A residual plot is a second diagnostic: a patternless scatter of residuals supports a linear model, while a curved pattern suggests a nonlinear one fits better.

The most tested conceptual point is that correlation does not imply causation. Two variables can move together because one causes the other, because a third variable drives both, or by coincidence. Ice cream sales and drowning rates correlate (both rise in summer), but neither causes the other. On the Regents, a strong $r$ supports prediction within the data range but never proves that changing $x$ would change $y$ .

Try this

Q1. For $\hat{y} = -0.5x + 30$ , interpret the slope if $x$ is days and $y$ is battery percent. [2 credits]

Cue. The battery drops about 0.5 percent per day.

Q2. Actual value 12, predicted 15. Find the residual and state over/underestimate. [2 credits]

Cue. $12 - 15 = -3$ ; negative, so the model overestimates.

Exam-style practice questions

Practice questions written in the style of NYSED exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

Regents (style)2 marksPart I (multiple choice). A line of best fit is

y = 3.2x + 14

, where

x

is hours studied and

y

is the test score. What does the slope 3.2 represent? (1) the score with no studying (2) the increase in score per additional hour studied (3) the maximum possible score (4) the number of hours studied

Show worked answer →

The correct answer is (2).

In $y = mx + b$ , the slope $m = 3.2$ is the change in $y$ per one-unit change in $x$ . Here that is the increase in predicted test score for each additional hour studied (about 3.2 points per hour). The intercept 14 is the predicted score with no studying (choice 1 describes the intercept, not the slope).

Regents (style)4 marksPart III (constructed response). A line of best fit for plant height (cm) versus weeks is

\hat{y} = 2.5x + 6

. (a) Predict the height at week 8. (b) The actual height at week 8 was 24 cm. Compute the residual and state whether the model overestimates or underestimates.

Show worked answer →

A 4-credit question with credits across the parts.

(a) Predicted height: $\hat{y} = 2.5(8) + 6 = 20 + 6 = 26$ cm.
(b) Residual $= \text{actual} - \text{predicted} = 24 - 26 = -2$ cm. A negative residual means the actual value is below the prediction, so the model overestimates at week 8. Computing predicted minus actual (the wrong order) or omitting the over/underestimate interpretation costs credits.

Related dot points

Sources & how we know this

Regents Examination in Algebra I — NYSED (2024)
New York State Next Generation Mathematics Learning Standards — NYSED (2017)