What is the single most common reason AP Statistics free-response answers lose marks?

Failing to answer in context. The College Board scoring guidelines repeatedly withhold credit for statistically correct statements that do not name the variables and units of the question. 'The median is higher' loses where 'City X has a higher median commute time (25 minutes) than City Y (20 minutes)' earns full credit. Always tie every number and conclusion back to the variables, units, and population in the prompt.

How do you describe a distribution for full credit?

Cover shape, center, spread, and unusual features (often remembered as SOCS), each in context. Name the shape (symmetric or skewed, and the number of peaks), flag outliers, gaps, and clusters, give a typical value for the center, and quantify the spread. For skewed or outlier-laden data use the resistant median and IQR; for symmetric data use the mean and standard deviation. Markers award each component separately, so missing one (often spread or unusual features) costs a point.

How do you compare two distributions correctly?

Use explicitly comparative language. Two separate one-group descriptions are not a comparison, even if each is correct. Say 'Group A has a higher median than Group B' and 'Group A is more variable than Group B', comparing like with like (median to median, IQR to IQR) across shape, center, spread, and unusual features, all in context. The comparison words ('higher than', 'more variable than') are what earn the marks.

How do you read regression computer output on the AP exam?

Standard output lists the predictor and constant rows with their coefficients: the constant (intercept) and the slope (the coefficient on the predictor). It also reports s (the standard deviation of the residuals), R-sq (the coefficient of determination r-squared), and sometimes R-sq adjusted. Build the equation as y-hat = intercept + slope times x, take the square root of R-sq for the correlation (matching the slope's sign), and interpret each piece in context.

What cautions must every AP Statistics answer respect?

Three recur constantly. Correlation is not causation: an association in observational data may be due to a lurking variable, so never conclude cause without a randomised experiment. The normal model needs approximately normal data: do not apply z-scores or the empirical rule to skewed distributions. Extrapolation is unreliable: predicting outside the range of the data assumes a pattern you have not observed. Respecting these limits is part of full-credit reasoning.

United StatesStatistics

AP Statistics: how to answer free-response questions and interpret computer output for full credit

A deep-dive AP Statistics guide to answering free-response questions for full credit. Covers describing and comparing distributions in context, reading regression computer output, interpreting slope, intercept, r, r-squared, and s, the correlation-causation and extrapolation cautions, and the in-context communication the College Board rewards.

Generated by Claude Opus 4.818 min readAP-STATS-FRQUpdated 2026-06-04

Reviewed by: AI editorial process; not yet individually human-reviewed

Jump to a section

What AP Statistics free-response actually demands
The golden rule: answer in context
Describing a distribution for full credit
Comparing distributions: use comparative language
Reading regression computer output
The three cautions that protect your marks
Showing work and justifying conclusions
Check your knowledge
For the official guidance

What AP Statistics free-response actually demands

The free-response section is half the AP Statistics exam, and it is where students with sound statistical knowledge most often leave marks on the table, not because the statistics are wrong but because the communication is. The College Board scores free-response questions on four skill categories, and across all of them the recurring requirement is the same: answer in context, show your reasoning, and respect the limits of the data. This guide ties together the Unit 1 and Unit 2 dot-point pages, each with its own practice: describing the distribution of a quantitative variable, comparing distributions of a quantitative variable, summary statistics for a quantitative variable, correlation, least squares regression, and residuals.

The golden rule: answer in context

The most reliable way to gain (or lose) free-response marks is context. A statement that is statistically correct but generic, "the median is higher," "the slope is $0.6$ ," "there is a strong positive correlation," will often score below a statement that names the variables, units, and group. The scoring guidelines are written around context: a description of a distribution must say what is being measured and in what units; a slope interpretation must name both variables; a conclusion about association must reference the actual study.

Describing a distribution for full credit

Describing a single quantitative distribution is one of the most common free-response tasks. Cover shape, center, spread, and unusual features (SOCS), each in context, and match your measures to the shape.

Shape: symmetric or skewed (named by the tail, not the hump), and the number of peaks (unimodal, bimodal).
Outliers and unusual features: gaps, clusters, and isolated points; use the $1.5 \times \text{IQR}$ rule when asked to justify an outlier formally.
Center: a typical value, the median (resistant) for skewed data or the mean for symmetric data.
Spread: the IQR (resistant) for skewed data or the standard deviation for symmetric data, with the range as a last resort.

Markers award the components separately, so the fastest way to lose a point is to omit one, most often spread or unusual features. Run the SOCS checklist deliberately every time.

Comparing distributions: use comparative language

When a question asks you to compare distributions, two correct one-group descriptions placed side by side do not earn the comparison marks. You must use explicitly comparative language.

A reliable structure is four sentences, one each for shape, center, spread, and unusual features, every sentence naming both groups, the relevant measure, a comparison word, and the units.

Reading regression computer output

AP Statistics free-response questions frequently give a regression table from software rather than the equation directly, and expect you to extract and interpret it. A typical layout reads:

Predictor     Coef     SE Coef      T        P
Constant      12.4     2.1          5.90     0.000
HoursStudied   8.0     0.9          8.89     0.000

S = 4.2     R-Sq = 81.0%     R-Sq(adj) = 80.2%

Here is how to read it for full credit:

Intercept (Constant row, Coef): $12.4$ . Build the equation as $\hat{y} = 12.4 + 8.0x$ .
Slope (predictor row, Coef): $8.0$ , the coefficient on HoursStudied. Interpret: "for each additional hour studied, the predicted exam score increases by $8.0$ points, on average."
$s$ (the S value): $4.2$ , the standard deviation of the residuals. Interpret: "predicted exam scores are typically off by about $4.2$ points."
$r^2$ (R-Sq): $81.0\%$ . Interpret: "about $81\%$ of the variation in exam score is explained by the linear relationship with hours studied."
$r$ (correlation): $\sqrt{0.81} = 0.9$ , taking the sign of the slope (positive here), so $r = 0.9$ .

The three cautions that protect your marks

Three limits of the data come up in almost every Unit 1 to Unit 2 free-response question, and respecting them is part of full-credit reasoning.

Correlation is not causation. An association in observational data may be driven by a lurking variable, by reverse causation, or by coincidence. Only a randomised experiment supports a causal claim. When you find an association, describe it and then decline to claim cause, naming a plausible lurking variable if asked.
The normal model needs approximately normal data. The empirical ( $68$ - $95$ - $99.7$ ) rule and z-score proportions only hold for roughly symmetric, bell-shaped data. If a distribution is clearly skewed, do not reach for the normal model.
Extrapolation is unreliable. A regression line is trustworthy only within the range of the observed data. Predicting outside that range assumes a pattern you have not seen, and can produce nonsense (such as a negative price). Compute if asked, but flag it as an extrapolation.

Showing work and justifying conclusions

The calculator does the arithmetic, but the marks are in the reasoning. When you describe an outlier, show the $1.5 \times \text{IQR}$ fences. When you judge whether a linear model fits, cite the residual plot (random scatter means it fits; a curve means it does not), not just a high $r^2$ . When you conclude an association from a two-way table, compare the conditional distributions, not raw counts. A bare correct answer with no supporting reasoning frequently scores below a well-justified answer, because the skill categories explicitly reward justification. Treat every "explain" or "justify" as a request to make your statistical reasoning visible, in context.

Interpreting a full regression output in context

A study fits a regression of weekly sales (\ $1000s,$ y $) on advertising spend (\$ 1000s, $x$ ) and reports: Constant Coef $= 20$ , AdSpend Coef $= 3.5$ , $S = 6$ , R-Sq $= 64\%$ . The data covered spends from \ $2000 to \$ 30{,}000. Interpret the output and predict sales at a spend of \ $10{,}000, then comment on a spend of \$ 80{,}000.

step 1 Write the regression equation

$\hat{y} = 20 + 3.5x$ , where $x$ is advertising spend in \ $1000s and$ \hat{y} $is predicted weekly sales in \$ 1000s.

step 2 Interpret the slope and intercept

Slope: for each additional \ $1000 of advertising spend, predicted weekly sales increase by \$ 3500 ( $3.5$ thousand), on average. Intercept: with \ $0 spend the model predicts \$ 20{,}000 in sales; $0 is just below the data range, so treat it as a borderline extrapolation.

step 3 Interpret r-squared and s

$r^2 = 64\%$ : about $64\%$ of the variation in weekly sales is explained by the linear relationship with advertising spend. $s = 6$ : predictions are typically off by about \ $6000. The correlation is$ r = +\sqrt{0.64} = 0.8$ (positive, matching the slope).

step 4 Predict within the data range

At a spend of \ $10{,}000,$ x = 10 $:$ \hat{y} = 20 + 3.5(10) = 20 + 35 = 55 $, so predicted sales are about \$ 55{,}000.

step 5 Comment on extrapolation

A spend of \ $80{,}000 ($ x = 80 $) is far outside the data range (\$ 2000 to \ $30{,}000). Predicting there is an unreliable extrapolation: the linear relationship may not continue, so the prediction should not be trusted even though the arithmetic ($ \hat{y} = 20 + 3.5(80) = 300$) is easy.

AP Statistics free-response marks come from communication, not just correct statistics. Answer in context (name the variables, units, and group in every interpretation). Describe a distribution with shape, center, spread, and unusual features, matching resistant measures (median, IQR) to skewed data; compare distributions with explicitly comparative language. Read regression computer output by building $\hat{y} = \text{constant} + \text{slope} \cdot x$ , then interpreting the slope (with units and "predicted"), $r^2$ (proportion of variation explained), $s$ (typical residual size), and $r = \pm\sqrt{r^2}$ . Respect the three cautions: correlation is not causation, the normal model needs approximately normal data, and extrapolation is unreliable. Show reasoning (fences, residual plots, conditional distributions), because the skill categories reward justification.

Check your knowledge

A mix of description, comparison, regression-output, and caution questions. Write full-credit, in-context answers, then check against the quiz solutions.

Rewrite "the mean is $50$ and the standard deviation is $10$ " as a contextual description for a distribution of resting heart rates (bpm).
A distribution of incomes is strongly right-skewed. Which center and spread should you report, and why?
Turn "Group A median $30$ , Group B median $24$ " into a proper comparison.
From output with Constant Coef $= 5$ and slope Coef $= 2$ (predicting $y$ from $x$ ), write the equation and interpret the slope.
R-Sq $= 49\%$ in a regression of $y$ on $x$ . Interpret $r^2$ and find $r$ if the slope is negative.
A study finds an association between coffee drinking and heart disease in observational data. What can and cannot be concluded?
A regression valid for $x$ from $5$ to $20$ is used to predict at $x = 40$ . What is the problem?
A residual plot of a linear fit shows a clear U-shape. Is the linear model appropriate? Justify.

For the official guidance

The College Board publishes released free-response questions and scoring guidelines at apcentral.collegeboard.org. Studying the official scoring guidelines shows exactly how context, comparative language, and justification are rewarded, so always practice against the board's own released exams.

Sources & how we know this

AP Statistics Course and Exam Description — College Board (2020)