How do we identify outliers and influential points, and how do transformations rescue a non-linear relationship?
Topic 2.9 Analyzing Departures from Linearity: identify outliers, high-leverage, and influential points in regression, and use transformations to model a non-linear relationship.
A focused answer to AP Statistics Topic 2.9, on regression outliers, high-leverage and influential points, and using transformations (logs and powers) to linearise a curved relationship, with a worked transformation example.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The College Board (Topic 2.9) wants you to identify outliers, high-leverage points, and influential points in a regression, to understand how each affects the line, and to use transformations (such as logs or powers) to model a relationship that is not linear.
Outliers, leverage, and influence
These three ideas are related but distinct, and the exam tests whether you can keep them apart. Outlier is about a large residual (extreme in relative to the pattern). Leverage is about an extreme (far out along the horizontal axis). Influence is about effect: does removing the point change the line? A high-leverage point that lies on the trend has little influence (small residual, no change to the slope), whereas a high-leverage point off the trend is typically highly influential, because its distant lets it swing the line like a long lever arm. The cleanest test for influence is the thought experiment: imagine deleting the point; if the line moves a lot, the point was influential.
How influential points distort regression
Because the least-squares line minimizes squared vertical distances, an influential point can drag the slope and intercept toward itself and can inflate or deflate the correlation, giving a misleading summary of the bulk of the data. This is why Topic 2.5's warning that is not resistant matters here: a single influential point can make a weak relationship look strong, or hide a strong one. The practical advice the exam rewards is to identify such points (from the scatterplot and residual plot), to consider analyzing the data with and without them, and to report how the conclusions change, rather than silently letting one point dominate. You do not simply delete points, but you flag them and assess their effect, which is honest data analysis.
Transformations to achieve linearity
When the relationship itself is non-linear, the cure is not a different point but a transformation of a variable. If the scatterplot curves and the residual plot of a linear fit shows a systematic pattern (Topic 2.7), applying a function to or can straighten the relationship so a line fits the transformed data. Common transformations are the logarithm (taking linearises exponential growth, where ; taking of both variables linearises a power law ) and powers or roots (such as ). The workflow is: transform, refit the line to the transformed data, check that the new residual plot shows random scatter (confirming the transformation worked), and then back-transform to make predictions in the original units. For a log- model , a prediction comes from computing and then raising to that power: . The back-transformation step is where marks are most often lost, because students stop at the transformed prediction; remembering that must be undone with to recover is essential. Transformation is the topic's payoff: it extends the entire regression toolkit (line, , , residual analysis) to curved relationships, provided you transform first and interpret in the original units at the end.
Try this
Q1. A regression point lies far to the right (extreme ) but right on the trend line. Classify it (outlier, high leverage, influential?). [2 points]
- Cue. High leverage (extreme ) but not an outlier (small residual) and not necessarily influential (it lies on the pattern, so removing it changes the line little).
Q2. After fitting , you compute at some . What is the predicted ? [1 point]
- Cue. Back-transform: .
Exam-style practice questions
Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AP 2019 (style)1 marksSection I (multiple choice). A point in a regression has an -value far from the other -values but lies near the overall pattern. This point is best described as: (A) An outlier with a large residual (B) A high-leverage point (C) An influential point that changes the slope greatly (D) Irrelevant to the regressionShow worked answer →
The correct answer is (B).
A point with an extreme -value has high leverage (potential to influence the line), but because it lies near the pattern its residual is small and it does not necessarily change the slope much. So it is high-leverage but not necessarily influential.
(A) describes an outlier (large residual), which this point is not. (C) would require it to actually change the slope a lot. (D) is wrong; leverage points always matter to check. Extreme in equals high leverage.
AP 2022 (style)4 marksSection II (free response). A scatterplot of against is clearly curved (concave up), and the residual plot of a linear fit shows a U-shape. A statistician takes and finds that against is now linear, with line . (a) Explain why the original linear model was inappropriate. (b) Predict when using the transformed model. (c) Explain one advantage of transforming rather than forcing a straight line on the curved data.Show worked answer →
A 4-point question on transformations.
(a) (1 point) The original linear model was inappropriate because the scatterplot was curved and the residual plot showed a U-shaped pattern, both signs that a straight line systematically misses the non-linear relationship.
(b) (2 points) At : (1 point). Back-transform: (1 point).
(c) (1 point) Advantage: after transforming, the relationship is linear, so the least-squares line, , and are valid and predictions are reliable; forcing a line on curved data gives biased predictions and a patterned residual plot.
Markers reward citing the curve and U-shaped residual plot, a correct prediction with back-transformation via , and an advantage tied to the validity of the linear model after transforming.
Related dot points
- Topic 2.7 Residuals: calculate and interpret residuals, construct and read residual plots, and use them to assess whether a linear model is appropriate.
A focused answer to AP Statistics Topic 2.7, defining the residual as observed minus predicted, interpreting positive and negative residuals, and using residual plots to judge whether a linear model is appropriate, with worked calculations.
- Topic 2.8 Least Squares Regression: determine the least-squares regression line from summary statistics, and interpret the coefficient of determination r-squared and the standard deviation of the residuals.
A focused answer to AP Statistics Topic 2.8, on why the least-squares line minimizes squared residuals, computing it from means, standard deviations, and r, and interpreting r-squared and s, with full worked calculations.
- Topic 2.6 Linear Regression Models: write, interpret, and use a least-squares regression equation to predict a response, interpreting the slope and intercept in context, and recognizing the danger of extrapolation.
A focused answer to AP Statistics Topic 2.6, on the form of a regression equation, interpreting slope and intercept in context, making predictions, and the danger of extrapolation, with a worked prediction and interpretation.
- Topic 2.5 Correlation: calculate and interpret the correlation coefficient r, understand its properties (range, unit-free, resistance), and recognize what it can and cannot tell you.
A focused answer to AP Statistics Topic 2.5, defining the correlation coefficient r, its range and properties (unit-free, symmetric, non-resistant), what it measures and misses, and the correlation-causation caution, with a worked interpretation.
- Topic 2.4 Representing the Relationship Between Two Quantitative Variables: construct and describe scatterplots by direction, form, strength, and unusual features, in context.
A focused answer to AP Statistics Topic 2.4, on building scatterplots and describing them by direction, form, strength, and unusual features (the DUFS framework), in context, with a worked description.
Sources & how we know this
- AP Statistics Course and Exam Description — College Board (2020)