What can we actually learn from data, and what questions does statistical thinking let us answer?
Topic 1.1 Introducing Statistics - What Can We Learn from Data?: identify questions to be answered, based on variation in one-variable data, and recognize what a data set can and cannot tell us.
A focused answer to AP Statistics Topic 1.1, on how variation in data raises statistical questions, what kinds of question data can answer, and the limits of what a single data set reveals, with worked examples.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The College Board (Topic 1.1) wants you to see statistics as the study of variation: data exist because individuals differ, and a statistical question is one that anticipates that variability and is answered by describing a group rather than stating a single fact. You must be able to identify what a one-variable data set can answer, and what it cannot.
Variation is the reason statistics exists
If every student were exactly tall, there would be no question to ask about height: the answer would just be . Because heights differ, we get a distribution of values, and questions such as "what is a typical height?" and "how much do heights vary?" become meaningful. Throughout Unit 1 you describe that distribution using three lenses: shape, center, and spread.
What a data set can answer
A one-variable data set lets you answer descriptive questions about the group measured:
- Center. What is a typical value? (mean, median)
- Spread. How much do the values vary? (range, IQR, standard deviation)
- Shape. Is the distribution symmetric, skewed, or does it have clusters, gaps, or outliers?
- Comparison within the data. How does one subgroup compare with another, if the data are grouped?
What a data set cannot answer
Recognizing the limits is half the topic. A single observational one-variable data set cannot:
- Establish causation. Observational data can reveal an association, but cause requires a designed experiment (Units 3 to 4 territory).
- Predict an exact future value. Data describe what happened; they do not guarantee any individual outcome.
- Generalize beyond the group unless the data came from a proper random sample of a defined population.
- Answer a non-statistical (deterministic) question about a single known individual, which is just a lookup, not statistics.
Statistical versus deterministic questions
The cleanest way to tell whether a question is statistical is to ask whether the answer would vary if you collected the data again, or whether it is a fixed fact. "How tall is the tallest student in this room right now?" is deterministic: there is one correct answer and no variability to anticipate. "How do the heights of students in this room vary, and what is typical?" is statistical: it anticipates that the values differ and is answered by summarizing the spread and center. The College Board threads this distinction through the whole course. Hypothesis tests later in the course are simply formal statistical questions about a population, asked when variation in a sample leaves room for doubt. So the habit you build here, of checking "does this question anticipate variability, and is it answered by a distribution?", is the same habit that lets you choose the right inference procedure much later. A question that asks only about one fixed individual, or that asks for an exact prediction, falls outside what data can deliver, and saying so plainly earns marks on the exam.
Why scope matters on the exam
Free-response questions in this course frequently ask you to interpret in context and to state limitations. Topic 1.1 is where you first practice that discipline: when handed a data set, you should be able to name a question it answers and, just as importantly, name a question it does not, with a reason. The commonest reason a data set falls short is that the question is causal while the data are merely observational, or that the question concerns a variable that was never measured. A second common reason is generalization: a convenience sample (for example, only the students in one class) cannot support a claim about all students in the school, because the sample was not random. Training yourself to spot these gaps now means that when a later free-response question says "explain why the researcher cannot conclude that X causes Y," you already have the language ready, and you will not be tempted to over-claim from the data in front of you.
Try this
Q1. Identify one statistical question that can be answered by a data set listing the daily rainfall (mm) at one weather station for each day of a year. [1 point]
- Cue. Any question about the distribution, for example "what is a typical daily rainfall, and how much does it vary across the year?"
Q2. Explain why "How much do these values vary?" is a statistical question but "What is the value for individual 7?" is not. [2 points]
- Cue. The first anticipates variation across the group and is answered by describing spread; the second is a deterministic lookup of a single fixed value with no variability to summarize.
Exam-style practice questions
Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AP 2019 (style)1 marksSection I (multiple choice). A researcher records the number of hours of sleep for each of 200 students. Which of the following is a statistical question that this data set is best suited to answer? (A) Did student number 47 get enough sleep last night? (B) How much does sleep vary among these 200 students, and what is a typical amount? (C) Will a new student next year sleep exactly 7 hours? (D) Does caffeine cause insomnia in the general population?Show worked answer →
The correct answer is (B).
A statistical question anticipates variability and is answered with data about a group, not a single fact. (B) asks about the typical value and the spread of a one-variable data set, which is exactly what Unit 1 tools (center, spread, shape) describe.
(A) is a question about one individual, not variation. (C) asks for an exact future value, which data describes but cannot guarantee. (D) is a causal claim about a wider population that this single observational data set cannot establish; it would need an experiment. Recognizing the difference between a statistical question and a deterministic or causal one is the point of Topic 1.1.
AP 2022 (style)3 marksSection II (free response). A school collects, for every student, their commute time to school in minutes. (a) State one statistical question this data set can answer. (b) State one question it cannot answer, and justify why. (c) Explain why variation in the commute times is what makes the question statistical.Show worked answer →
A 3-point question on the nature of statistical questions.
(a) (1 point) A valid statistical question anticipates variability, for example: "What is a typical commute time, and how much do commute times vary across students?" Any question about center, spread, or shape of the one-variable distribution earns the point.
(b) (1 point) A question the data cannot answer, with justification: for example "Does living far from school cause lower grades?" cannot be answered because the data set has only one variable (commute time) and no information on grades or on cause; it is observational. Award the point for a question that is causal, about a different variable, or about an individual future case, with a correct reason.
(c) (1 point) Variation makes the question statistical because if every student had the identical commute there would be nothing to summarize or investigate; it is the spread of values that we describe with center, spread, and shape.
Markers reward a genuinely statistical question, a clearly out-of-scope question with a correct reason, and an explanation that ties "statistical" to variability.
Related dot points
- Topic 1.2 The Language of Variation - Variables: classify variables as categorical or quantitative, and quantitative variables as discrete or continuous, and explain why the type determines the appropriate graphs and statistics.
A focused answer to AP Statistics Topic 1.2, classifying variables as categorical or quantitative (and discrete or continuous), with the consequences for which displays and summaries are valid, plus worked classification examples.
- Topic 1.3 Representing a Categorical Variable with Tables: build and interpret frequency and relative frequency tables for a single categorical variable, and read proportions and percentages from them.
A focused answer to AP Statistics Topic 1.3, on building frequency and relative frequency tables for one categorical variable, converting between counts, proportions, and percentages, and interpreting them in context, with worked tables.
- Topic 1.6 Describing the Distribution of a Quantitative Variable: describe a quantitative distribution by its shape, center, spread, and unusual features (outliers, gaps, clusters) in context.
A focused answer to AP Statistics Topic 1.6, the SOCS framework for describing a quantitative distribution by shape, outliers, center, and spread, with the vocabulary of skew, modality, and clusters, and worked descriptions.
- Topic 1.9 Comparing Distributions of a Quantitative Variable: compare two or more distributions of a quantitative variable by shape, center, spread, and unusual features, in context, using comparative language.
A focused answer to AP Statistics Topic 1.9, on comparing two or more distributions by shape, center, spread, and unusual features using explicit comparative language, with a worked side-by-side comparison.
- Topic 2.1 Introducing Statistics - Are Variables Related?: identify questions about the association between two variables, distinguish association from causation, and recognize what two-variable data can answer.
A focused answer to AP Statistics Topic 2.1, on framing questions about the association between two variables, the difference between explanatory and response variables, why association is not causation, and what two-variable data can answer, with worked examples.
Sources & how we know this
- AP Statistics Course and Exam Description — College Board (2020)