What does a P-value measure, and how is it interpreted in the context of a test?
Topic 6.5 Interpreting P-Values: define the P-value as the probability, assuming the null hypothesis is true, of obtaining a test statistic at least as extreme as the one observed, and interpret it in context.
A focused answer to AP Statistics Topic 6.5, on defining the P-value as the probability under the null of a result at least as extreme as observed, interpreting small and large P-values, and avoiding common misreadings, with a worked interpretation.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The College Board (Topic 6.5) wants you to define and interpret a P-value: the probability, assuming the null hypothesis is true, of getting a test statistic at least as extreme as the one observed (in the direction of ). You must phrase this correctly in context and avoid the standard misreadings.
The precise definition
Three parts must appear in any interpretation: (1) assuming is true, (2) the probability of a result at least as extreme, (3) as the one observed. Drop any one and the interpretation is wrong. The P-value is a measure of surprise: how unusual is our data under the null? The smaller it is, the harder it is to explain the data by chance alone if holds.
Small versus large P-values
The P-value is a sliding scale of evidence, not a verdict on its own; converting it into a decision requires comparing it to (Topic 6.6). A useful mental anchor: "if the null were true, how often would chance alone produce data this extreme?" If that fraction is tiny, the null looks like a poor explanation.
The interpretation that loses marks
The single most penalized error in the course is calling the P-value "the probability that is true" (or "the probability the result is due to chance"). The P-value is a probability about the data, conditional on , not a probability about the hypothesis. is either true or false; it has no probability in this framework. Likewise, a P-value of does not mean "a chance the result is a fluke"; it means "if were true, results this extreme would occur of the time." Saying it correctly, every time, is worth real marks.
What a P-value does not do
A P-value does not measure the size of an effect, only how surprising the data are under . With a very large sample, a tiny, practically meaningless departure from can produce a small P-value; with a small sample, a large real effect can give a non-significant P-value. So statistical significance is not the same as practical importance, and a large P-value never proves , it only fails to provide evidence against it. These limits are why Topic 6.7 (errors) and confidence intervals (which show effect size) accompany P-values rather than replace them.
Try this
Q1. A test gives P-value . Interpret it in one sentence (generic context). [1 point]
- Cue. If were true, there would be a chance of getting a result at least as extreme as the one observed, so the data are consistent with .
Q2. Why is "the P-value is the probability is true" wrong? [1 point]
- Cue. The P-value is a probability about the data assuming is true; it says nothing about the probability of the hypothesis itself.
Exam-style practice questions
Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AP 2019 (style)1 marksSection I (multiple choice). A test of against gives P-value . This means (A) there is a chance is true (B) if , there is a chance of a sample proportion at least as large as observed (C) there is a chance the result is due to chance (D) Show worked answer →
The correct answer is (B).
The P-value is computed assuming is true; it is the probability, under , of getting a test statistic (here a sample proportion) at least as extreme as the one observed in the direction of .
(A) is the classic error: the P-value is not the probability is true. (C) is a vague misstatement of the same error. (D) confuses the P-value with the parameter .
AP 2021 (style)3 marksSection II (free response). A researcher tests whether a coin is unfair, versus , where is the probability of heads, and obtains a P-value of . (a) Interpret this P-value in context. (b) At , what does the P-value imply about the evidence against ? (c) Explain why a large P-value does not prove the coin is fair.Show worked answer →
A 3-point interpretation question.
(a) (1 point) Assuming the coin is fair (), there is a probability of obtaining a sample result at least as far from (in either direction) as the one observed.
(b) (1 point) Since , the result is not surprising under ; there is not convincing evidence against , so we fail to reject it.
(c) (1 point) Failing to reject means the data are consistent with , but they are also consistent with values near ; absence of evidence against fairness is not proof of fairness.
Markers reward the conditional "assuming true," the at-least-as-extreme phrasing, the comparison to , and the point that a large P-value does not prove .
Related dot points
- Topic 6.4 Setting Up a Test for a Population Proportion: state null and alternative hypotheses about a population proportion, identify the significance level, and verify the conditions for a one-sample z-test.
A focused answer to AP Statistics Topic 6.4, on writing the null and alternative hypotheses for a population proportion, choosing the significance level, and checking the random, large-counts (using the null value), and 10% conditions for a one-sample z-test.
- Topic 6.6 Concluding a Test for a Population Proportion: compute the standardized z test statistic and P-value for a one-sample proportion test, compare to the significance level, and state a conclusion in context.
A focused answer to AP Statistics Topic 6.6, on computing the standardized z statistic and P-value for a one-sample proportion test using the null value, comparing to alpha, and stating a conclusion in context, with a full worked test.
- Topic 6.7 Potential Errors When Performing Tests: distinguish Type I and Type II errors and their consequences, define the power of a test, and explain how significance level, sample size, and effect size affect error probabilities and power.
A focused answer to AP Statistics Topic 6.7, on Type I and Type II errors, their real-world consequences, the power of a test, and how alpha, sample size, and effect size change error rates and power, with worked reasoning in context.
- Topic 6.3 Justifying a Claim Based on a Confidence Interval for a Population Proportion: use a confidence interval for a proportion to evaluate whether a claimed value is plausible, and discuss the effect of confidence level and sample size on the interval.
A focused answer to AP Statistics Topic 6.3, on using a one-sample proportion confidence interval to judge whether a claimed value of p is plausible, and explaining how confidence level and sample size change the interval, with worked justifications.
- Topic 6.1 Introducing Statistics: Why Be Normal?: explain how the approximately normal sampling distribution of a sample proportion lets us quantify uncertainty and make inferences about an unknown population proportion.
A focused answer to AP Statistics Topic 6.1, on why the approximately normal sampling distribution of a sample proportion is the engine that lets us build confidence intervals and significance tests about an unknown population proportion.
Sources & how we know this
- AP Statistics Course and Exam Description — College Board (2020)