What does a box plot show?

A box plot displays the five-number summary: the minimum, first quartile, median, third quartile, and maximum. The box runs from the first to the third quartile and holds the middle fifty percent of the data, with a line at the median. Each of the four sections holds about a quarter of the values, so a box plot shows center, spread, and skew at a glance.

When should you use the median instead of the mean?

Use the median when the data is skewed or has outliers, because the median is resistant: it depends only on the middle position, so an extreme value barely moves it. The mean uses every value, so a single outlier pulls it toward the extreme. For roughly symmetric data with no outliers, the mean is fine; otherwise report the median, and pair it with the interquartile range.

What is the difference between joint, marginal, and conditional relative frequency?

All three are counts written as fractions, but they divide by different totals. A joint relative frequency is an inner cell over the grand total. A marginal relative frequency is a row or column total over the grand total, describing one variable alone. A conditional relative frequency is a cell over its own row or column total, describing one category within another. The denominator decides the type.

How do you interpret the slope of a line of best fit?

The slope is the rate of change: the amount the predicted output changes for each one-unit increase in the input, stated with units. For example, a slope of six points per study hour means each extra hour is associated with about six more points. The intercept is the predicted output when the input is zero, the baseline value, interpreted with care if zero is far from the data.

What does the correlation coefficient r tell you?

The correlation coefficient r runs from negative one to one and measures a linear relationship. Its sign gives the direction, positive for an upward trend and negative for a downward one. Its magnitude gives the strength: close to one is a strong linear fit, close to zero is weak or none, and plus or minus one is a perfect line. Strength is about the size of r, not its sign.

Why does correlation not prove causation?

Because two variables moving together can be explained in several ways: the first could cause the second, the second could cause the first, or a hidden third variable could cause both. Observational data cannot rule these out, since it does not control other factors. Only a controlled experiment can establish cause, so a strong correlation is evidence of a relationship but not proof of causation.

OhioMaths

Ohio Algebra I: a complete guide to statistics and probability

A deep-dive Ohio Algebra I guide to statistics and probability, a smaller but reliable reporting category. Covers representing data with dot plots, histograms, and box plots, comparing center and spread, two-way frequency tables, scatter plots and lines of best fit, and the correlation coefficient with the correlation-causation distinction.

Generated by Claude Opus 4.816 min readS-ID.1, S-ID.2, S-ID.3, S-ID.5, S-ID.6, S-ID.7, S-ID.8, S-ID.9Updated 2026-06-13

Reviewed by: AI editorial process; not yet individually human-reviewed

Jump to a section

What this category demands
Representing data
Center and spread
Two-way frequency tables
Scatter plots, lines of best fit, and correlation
How this category is examined
Check your knowledge

What this category demands

This guide covers statistics and probability, a smaller but reliable Ohio Algebra I reporting category drawn entirely from S-ID (interpreting categorical and quantitative data). It rewards reading data, computing summaries, and reasoning carefully about relationships. Each dot-point page has its own practice: representing data distributions, comparing center and spread, two-way frequency tables, scatter plots and linear models, and correlation and causation.

Representing data

Show one-variable data with a dot plot (every value, small sets), a histogram (interval counts, shows shape), or a box plot (the five-number summary). Describe shape by the tail: a long tail right is skewed right, left is skewed left, matching tails are symmetric.

Center and spread

Center: the mean (sum over count) or the median (middle value). Spread: the range (max minus min), the IQR ( $Q3 - Q1$ , middle $50\%$ ), or informally standard deviation. The mean and range are sensitive to outliers; the median and IQR are resistant.

Two-way frequency tables

A two-way table cross-classifies two categorical variables. Relative frequencies differ by denominator: joint (cell over grand total), marginal (row/column total over grand total), conditional (cell over its row/column total). Compare conditional relative frequencies to judge association.

Scatter plots, lines of best fit, and correlation

A scatter plot plots paired data; a line of best fit $\hat{y} = mx + b$ summarizes a linear trend, with slope as a rate and intercept as a baseline. The correlation coefficient $r$ ( $-1$ to $1$ ) gives direction (sign) and strength ( $|r|$ near $1$ strong, near $0$ weak). Correlation is not causation: a lurking variable may drive both.

How this category is examined

Numeric response. Compute a summary statistic (mean, median, IQR), a relative frequency, or a prediction.
Multiple choice and multiple-select. Describe shape, choose a resistant measure, interpret $r$ , or pick the best conclusion about a correlation.
Tables and graphs. Complete a two-way table, build a box plot, or read a scatter plot.

Check your knowledge

Work these as you would for credit on the Ohio test.

Find the mean and median of $3, 5, 5, 7, 20$ . (2 points)
A distribution has a long tail to the left. Name its shape. (1 point)
Find the range and IQR of $4, 6, 9, 10, 13, 15$ . (2 points)
Of $40$ apartment dwellers, $24$ own a pet. What is that conditional relative frequency? (1 point)
A line of best fit is $\hat{y} = -2x + 50$ . Interpret the slope. (2 points)
Predict $y$ from $\hat{y} = 3x + 8$ when $x = 9$ . (1 point)
Which is a stronger linear relationship, $r = 0.7$ or $r = -0.9$ ? (1 point)

Solutions to check-your-knowledge questions

Work through each solution fully before reading on.

Step 1: Q1. Find the mean and median of $3, 5, 5, 7, 20$

To find the mean, add all values and divide by the count of five:

\text{mean} = \frac{3 + 5 + 5 + 7 + 20}{5} = \frac{40}{5} = 8.

The data are already in order, so the median is the middle value, which is the third: $5$ . The mean is larger than the median here because the outlier $20$ pulls the mean upward, while the median stays anchored to the center position.

Step 2: Q2. Name the shape of a distribution with a long left tail

The tail of a distribution points in the direction of the skew. A long tail stretching to the left means the data is skewed left (also called negatively skewed).

Final answer: skewed left.

Step 3: Q3. Find the range and IQR of $4, 6, 9, 10, 13, 15$

The range is the simplest spread measure: subtract the minimum from the maximum:

\text{range} = 15 - 4 = 11.

For the IQR, split the six values into two halves of three. The lower half is $4, 6, 9$ , so $Q1 = 6$ . The upper half is $10, 13, 15$ , so $Q3 = 13$ . The median of all six is the average of the third and fourth values:

\text{median} = \frac{9 + 10}{2} = 9.5.

\text{IQR} = Q3 - Q1 = 13 - 6 = 7.

Step 4: Q4. Find the conditional relative frequency for pet ownership

A conditional relative frequency divides the count in question by the total for that group. Of the $40$ apartment dwellers, $24$ own a pet:

\frac{24}{40} = 0.6 \text{ (60\%)}.

Step 5: Q5. Interpret the slope of $\hat{y} = -2x + 50$

The slope is the rate of change: for each one-unit increase in $x$ , the predicted value of $y$ changes by the slope. Here the slope is $-2$ , so $y$ decreases by about $2$ units for each additional unit of $x$ . The negative sign signals a negative (downward) association.

Final answer: for each one-unit increase in $x$ , $y$ decreases by approximately $2$ units.

Step 6: Q6. Predict $y$ from $\hat{y} = 3x + 8$ when $x = 9$

Substitute $x = 9$ directly into the equation of the line of best fit:

\hat{y} = 3(9) + 8 = 27 + 8 = 35.

Step 7: Q7. Which is a stronger linear relationship, $r = 0.7$ or $r = -0.9$ ?

Strength of a linear relationship is measured by the absolute value of $r$ , not its sign. Compare $|0.7| = 0.7$ and $|-0.9| = 0.9$ . Since $0.9 > 0.7$ , the value $r = -0.9$ indicates a stronger linear relationship, even though it is negative.

Final answer: $r = -0.9$ is the stronger linear relationship.

Sources & how we know this

Ohio's Learning Standards for Mathematics: Algebra 1 — Ohio Department of Education and Workforce (2024)
Algebra I course resources (blueprint, reference sheet, released items) — Ohio Department of Education and Workforce (2024)