How do you fit a line to two-variable data, interpret its slope and intercept, and read the correlation and residuals?
Construct and interpret scatter plots; fit a linear (or exponential) model to bivariate data; interpret the slope and intercept in context; compute and interpret residuals; and distinguish the correlation coefficient from causation.
A NY Regents Algebra I answer on bivariate data: scatter plots, fitting a line of best fit, interpreting slope and intercept, computing residuals, reading the correlation coefficient, and the correlation-versus-causation distinction.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The Regents Algebra I exam (the Interpreting Categorical and Quantitative Data, S-ID, cluster) wants you to build and read a scatter plot, fit a line of best fit to two-variable data, interpret its slope and intercept in context, compute and interpret residuals, and tell the correlation coefficient apart from causation. Two-variable statistics reliably contributes several questions, including a multi-step constructed-response item.
Scatter plots: form, direction, strength
A scatter plot plots each data pair as a point. You describe it three ways: form (does it follow a line or a curve?), direction (as rises, does rise, a positive association, or fall, a negative one?), and strength (how tightly do the points cluster around the trend?). A roughly straight, tight, upward cloud is a strong positive linear association.
The line of best fit and its parameters
The line of best fit (least-squares regression line) is the line that best models the trend, usually found with a graphing calculator. In context its parameters carry meaning:
The hat on marks it as a prediction, not an observed value. Interpreting the slope and intercept in the situation's units is a frequent constructed-response task: for relating study hours to score, the slope means about 3.2 more points per hour, and the intercept is the predicted score with no study.
Residuals
A residual measures how far an actual data point lies from the prediction.
Correlation versus causation
The correlation coefficient ranges from to . Values near or indicate a strong linear relationship (positive or negative), and values near indicate little linear relationship. A residual plot is a second diagnostic: a patternless scatter of residuals supports a linear model, while a curved pattern suggests a nonlinear one fits better.
The most tested conceptual point is that correlation does not imply causation. Two variables can move together because one causes the other, because a third variable drives both, or by coincidence. Ice cream sales and drowning rates correlate (both rise in summer), but neither causes the other. On the Regents, a strong supports prediction within the data range but never proves that changing would change .
Try this
Q1. For , interpret the slope if is days and is battery percent. [2 credits]
- Cue. The battery drops about 0.5 percent per day.
Q2. Actual value 12, predicted 15. Find the residual and state over/underestimate. [2 credits]
- Cue. ; negative, so the model overestimates.
Exam-style practice questions
Practice questions written in the style of NYSED exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
Regents (style)2 marksPart I (multiple choice). A line of best fit is , where is hours studied and is the test score. What does the slope 3.2 represent? (1) the score with no studying (2) the increase in score per additional hour studied (3) the maximum possible score (4) the number of hours studiedShow worked answer →
The correct answer is (2).
In , the slope is the change in per one-unit change in . Here that is the increase in predicted test score for each additional hour studied (about 3.2 points per hour). The intercept 14 is the predicted score with no studying (choice 1 describes the intercept, not the slope).
Regents (style)4 marksPart III (constructed response). A line of best fit for plant height (cm) versus weeks is . (a) Predict the height at week 8. (b) The actual height at week 8 was 24 cm. Compute the residual and state whether the model overestimates or underestimates.Show worked answer →
A 4-credit question with credits across the parts.
(a) Predicted height: cm.
(b) Residual cm. A negative residual means the actual value is below the prediction, so the model overestimates at week 8. Computing predicted minus actual (the wrong order) or omitting the over/underestimate interpretation costs credits.
Related dot points
- Represent and interpret one-variable data with dot plots, histograms, and box plots; compute and interpret measures of center (mean, median) and spread (range, interquartile range, standard deviation informally); identify outliers; and compare two distributions.
A NY Regents Algebra I answer on one-variable data: dot plots, histograms, and box plots, the mean and median, range, interquartile range and standard deviation, the 1.5 times IQR outlier rule, and comparing distributions.
- Distinguish linear from exponential growth (constant difference versus constant ratio), construct linear and exponential functions from descriptions, tables, or two points, and interpret their parameters (initial value, rate of change, growth factor) in context.
A NY Regents Algebra I answer on linear and exponential models: recognizing constant difference versus constant ratio, building each model from a context or table, and interpreting the slope, initial value, and growth factor.
- Understand the definition of a function and function notation; evaluate functions; identify domain and range; and interpret the key features of a graph (intercepts, intervals of increase and decrease, relative maxima and minima, and average rate of change) in context.
A NY Regents Algebra I answer on functions: the definition and the vertical-line test, function notation and evaluation, domain and range, and reading key features of a graph such as intercepts, increasing intervals, and average rate of change.
- Create equations and inequalities in one variable and use them to solve problems; solve linear equations and inequalities including those with variables on both sides; rearrange literal equations (formulas) to isolate a chosen variable; and graph the solution set of an inequality on a number line.
A NY Regents Algebra I answer on creating and solving linear equations and inequalities: variables on both sides, literal equations, contextual modeling, the sign-flip rule for inequalities, and graphing solutions on a number line.
- Fit linear, exponential, and other regression models to data; interpret the parameters and the correlation coefficient in context; use a residual plot to judge whether a model is appropriate; and use a model to make predictions.
A NY Regents Algebra II answer on regression: fitting linear and exponential models, interpreting parameters and the correlation coefficient, reading a residual plot to judge model fit, and making predictions.
Sources & how we know this
- Regents Examination in Algebra I — NYSED (2024)
- New York State Next Generation Mathematics Learning Standards — NYSED (2017)