How do scatterplots display two quantitative variables, and how do we describe what we see?
Topic 2.4 Representing the Relationship Between Two Quantitative Variables: construct and describe scatterplots by direction, form, strength, and unusual features, in context.
A focused answer to AP Statistics Topic 2.4, on building scatterplots and describing them by direction, form, strength, and unusual features (the DUFS framework), in context, with a worked description.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The College Board (Topic 2.4) wants you to display two quantitative variables with a scatterplot and to describe it by direction, form, strength, and unusual features, always in context. This verbal description is the two-variable counterpart of Unit 1's SOCS.
The scatterplot
The axis convention matters: explanatory on , response on . It fixes the meaning of "positive" and "negative" direction and sets up the regression line later, whose slope and prediction read off the same axes. A point that does not follow the convention is a frequent small error that confuses everything downstream.
Describing a scatterplot
Run through four features, the two-variable analogue of SOCS:
- Direction. Positive if tends to increase as increases (points rise left to right); negative if tends to decrease (points fall). No clear trend means no association.
- Form. Is the pattern linear (a straight-line trend) or curved (for example, exponential or quadratic)?
- Strength. How tightly do the points cluster around the underlying pattern? Tight clustering is strong; a loose cloud is weak.
- Unusual features. Outliers (points far from the pattern), clusters, or distinct subgroups that suggest a lurking categorical variable.
Why describing comes before fitting
The order of operations in Unit 2 is deliberate: you describe the scatterplot first, then decide whether a line is appropriate, then fit and assess it. The reason is that the tools that follow, the correlation coefficient and the least-squares regression line, are summaries of a linear relationship. If the scatterplot is clearly curved, a correlation near zero can hide a strong non-linear relationship, and a straight-line fit will systematically miss the pattern, so you would need a transformation (Topic 2.9) instead. If there is a glaring outlier, it can distort both the correlation and the line, so you want to notice it before it quietly skews your numbers. This is why the College Board tests scatterplot description on its own: spotting non-linearity and outliers by eye is the gatekeeping judgement that decides whether the rest of the unit's machinery even applies. A student who fits a line to an obviously curved pattern without comment has missed the point of the whole sequence.
Direction, strength, and the link to correlation
Direction and strength foreshadow the correlation coefficient of the next topic, which puts a single number on exactly these two features for a linear pattern: the sign of encodes direction and its magnitude (closeness to ) encodes strength. But only makes sense once you have confirmed the form is roughly linear, which is the verbal judgement you make here. Form is the feature cannot capture: a strong curved relationship and a weak linear one can share the same , so the scatterplot's form must be read by eye. Keeping direction, form, and strength as three separate ideas, rather than collapsing them into "strong," is what lets you describe and later model a relationship correctly, and it is precisely the distinction the exam probes when it shows a curved scatterplot with a deceptively small or large correlation.
Try this
Q1. Points on a scatterplot form a tight upward-curving arc. State the direction and form. [2 points]
- Cue. Direction is positive (rising), but the form is curved (non-linear), so a straight-line summary would be inappropriate.
Q2. Why should you describe a scatterplot's form before computing a correlation? [1 point]
- Cue. Correlation only measures linear strength; for a curved relationship a small can hide a strong association, so you must confirm linearity first.
Exam-style practice questions
Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AP 2018 (style)1 marksSection I (multiple choice). A scatterplot shows points falling tightly along a line that slopes downward from upper left to lower right. How is this relationship best described? (A) Strong positive linear (B) Strong negative linear (C) Weak positive linear (D) No associationShow worked answer →
The correct answer is (B).
Downward slope means a negative association (as increases, decreases); points falling tightly along a line mean strong and linear. So the relationship is strong, negative, and linear.
(A) is wrong on direction (downward is negative). (C) is wrong on both direction and strength. (D) ignores the clear linear pattern. Direction comes from the slope, strength from how tightly the points cluster.
AP 2021 (style)4 marksSection II (free response). A scatterplot plots fuel used (liters) against distance driven (km) for trips. The points rise from lower left to upper right, lie fairly close to a straight line, and there is one point far above the pattern at a moderate distance. (a) Describe the relationship in terms of direction, form, strength, and unusual features. (b) Identify the explanatory and response variables. (c) Explain why describing the scatterplot should come before fitting any line.Show worked answer →
A 4-point scatterplot-description question.
(a) (2 points) Direction: positive (fuel increases with distance). Form: roughly linear. Strength: fairly strong (points lie close to a line). Unusual features: one point lies well above the pattern, a possible outlier. Award up to 2 points for covering direction, form, strength, and unusual features in context.
(b) (1 point) Explanatory variable: distance driven (horizontal axis); response variable: fuel used (vertical axis).
(c) (1 point) You describe the scatterplot first to check that the relationship is roughly linear (and to spot outliers) before fitting a line, because a least-squares line and correlation only summarize a linear pattern sensibly; fitting a line to a clearly curved or outlier-distorted pattern would mislead.
Markers reward a description covering direction, form, strength, and unusual features in context, correct variable roles, and the reason that linearity must be checked before fitting a line.
Related dot points
- Topic 2.1 Introducing Statistics - Are Variables Related?: identify questions about the association between two variables, distinguish association from causation, and recognize what two-variable data can answer.
A focused answer to AP Statistics Topic 2.1, on framing questions about the association between two variables, the difference between explanatory and response variables, why association is not causation, and what two-variable data can answer, with worked examples.
- Topic 2.5 Correlation: calculate and interpret the correlation coefficient r, understand its properties (range, unit-free, resistance), and recognize what it can and cannot tell you.
A focused answer to AP Statistics Topic 2.5, defining the correlation coefficient r, its range and properties (unit-free, symmetric, non-resistant), what it measures and misses, and the correlation-causation caution, with a worked interpretation.
- Topic 2.6 Linear Regression Models: write, interpret, and use a least-squares regression equation to predict a response, interpreting the slope and intercept in context, and recognizing the danger of extrapolation.
A focused answer to AP Statistics Topic 2.6, on the form of a regression equation, interpreting slope and intercept in context, making predictions, and the danger of extrapolation, with a worked prediction and interpretation.
- Topic 2.7 Residuals: calculate and interpret residuals, construct and read residual plots, and use them to assess whether a linear model is appropriate.
A focused answer to AP Statistics Topic 2.7, defining the residual as observed minus predicted, interpreting positive and negative residuals, and using residual plots to judge whether a linear model is appropriate, with worked calculations.
- Topic 2.8 Least Squares Regression: determine the least-squares regression line from summary statistics, and interpret the coefficient of determination r-squared and the standard deviation of the residuals.
A focused answer to AP Statistics Topic 2.8, on why the least-squares line minimizes squared residuals, computing it from means, standard deviations, and r, and interpreting r-squared and s, with full worked calculations.
Sources & how we know this
- AP Statistics Course and Exam Description — College Board (2020)