United StatesStatisticsSyllabus dot point

What are Type I and Type II errors, and how do significance level, sample size, and effect size affect them?

Topic 6.7 Potential Errors When Performing Tests: distinguish Type I and Type II errors and their consequences, define the power of a test, and explain how significance level, sample size, and effect size affect error probabilities and power.

A focused answer to AP Statistics Topic 6.7, on Type I and Type II errors, their real-world consequences, the power of a test, and how alpha, sample size, and effect size change error rates and power, with worked reasoning in context.

Generated by Claude Opus 4.810 min answerUpdated 2026-06-04

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section

What this topic is asking
The four outcomes
Consequences drive the trade-off
Power and what raises it
Try this

What this topic is asking

The College Board (Topic 6.7) wants you to distinguish Type I and Type II errors, describe their consequences in context, define the power of a test, and explain how the significance level $\alpha$ , the sample size $n$ , and the effect size affect error probabilities and power.

The four outcomes

A clear way to keep them straight: a Type I error is a false alarm (you cry effect when there is none); a Type II error is a missed detection (you miss a real effect). Always describe each in the direction of the specific test, naming the context, because exam credit depends on saying which error means what here, not reciting the textbook definition.

Consequences drive the trade-off

Which error is worse depends on context, and naming the real-world consequence of each is routinely examined. In a medical screen, a Type I error (false positive) might cause needless treatment, while a Type II error (false negative) might leave a disease untreated; the relative harms decide whether you set $\alpha$ low or high. There is no universally "safer" choice; you balance the costs. This is why $\alpha$ is chosen before the data, as a policy about acceptable false-alarm risk.

Power and what raises it

Three factors raise power:

Larger sample size $n$ . The biggest controllable lever. More data shrink the standard error, so a true effect produces a more extreme statistic and is detected more often, all without changing $\alpha$ . This is the standard answer to "how can power be increased without raising the Type I error rate?"
Larger true effect (effect size). The farther the true parameter is from the null value, the easier it is to detect, so power rises. This is not under the experimenter's control, but it explains why small effects need large samples.
Larger $\alpha$ . Loosening the rejection threshold rejects $H_0$ more readily, raising power, but at the cost of a higher Type I error rate. So this lever trades one error for the other rather than improving the test for free.

Reduced variability (for example, a less variable population or a better design) also raises power by shrinking the standard error, the same mechanism as a larger $n$ .

Errors and power in context

A factory tests whether a machine's defect rate exceeds the target, $H_0: p = 0.05$ versus $H_a: p > 0.05$ , where $p$ is the true defect proportion, using $\alpha = 0.05$ . (a) Describe both error types in context. (b) Give a consequence of each. (c) The manager wants higher power without a higher false-alarm rate. Justify one change.

step 1 Type I error (part a)

A Type I error is rejecting $H_0$ when it is true: concluding the defect rate exceeds $0.05$ when it actually equals $0.05$ . Its probability is $\alpha = 0.05$ .

step 2 Type II error (part a)

A Type II error is failing to reject $H_0$ when it is false: concluding there is no evidence the defect rate exceeds $0.05$ when in fact it does. Its probability is $\beta$ .

step 3 Consequences (part b)

Type I consequence: the line is stopped or reworked unnecessarily, wasting time and money on a machine that is fine. Type II consequence: a genuinely faulty machine keeps running, so defective products reach customers.

step 4 Raising power without raising $\alpha$ (part c)

Increase the sample size of components inspected. A larger $n$ shrinks the standard error $\sqrt{p_0(1-p_0)/n}$ , so a true elevated defect rate produces a more extreme z-statistic and is detected more often. This raises power ( $1 - \beta$ ) while $\alpha$ stays at $0.05$ .

Try this

Q1. Define a Type II error and name its consequence in a test of whether a treatment works. [2 points]

Cue. Failing to reject $H_0$ when the treatment truly works (a missed effect); consequence: an effective treatment is not adopted, so its benefits are lost.

Q2. Name two ways to increase power, and which one also raises the Type I error rate. [2 points]

Cue. Increase the sample size (does not change $\alpha$ ) and increase $\alpha$ (which does raise the Type I error rate). A larger true effect also raises power.

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2019 (style)1 marksSection I (multiple choice). In a test, rejecting

H_0

when

H_0

is actually true is (A) a Type II error (B) a Type I error (C) the power (D) a correct decision

Show worked answer →

The correct answer is (B).

A Type I error is rejecting a true $H_0$ (a false positive). Its probability equals the significance level $\alpha$ .

(A) is the reverse: a Type II error is failing to reject a false $H_0$ . (C) power is the probability of correctly rejecting a false $H_0$ . (D) describes a correct decision, not an error.

AP 2021 (style)4 marksSection II (free response). A drug regulator tests

H_0

: a new drug is no more effective than the standard, against

H_a

: it is more effective. (a) Describe a Type I and a Type II error in this context. (b) State a real-world consequence of each. (c) The regulator wants to reduce the chance of a Type II error without increasing

\alpha

. Justify in context one change that would do this.

Show worked answer →

A 4-point errors-in-context question.

(a) (2 points) Type I error: concluding the new drug is more effective when it actually is not. Type II error: failing to conclude the new drug is more effective when it actually is.
(b) (1 point) Type I consequence: a useless (or costly) drug is approved or promoted, exposing patients to risk or cost for no benefit. Type II consequence: a genuinely better drug is not adopted, so patients miss out on improved treatment.
(c) (1 point) Increase the sample size. A larger $n$ increases the power of the test (reduces the Type II error rate) without changing $\alpha$ , because it shrinks the standard error and makes a true effect easier to detect.

Markers reward correct directional descriptions of each error, plausible consequences, and identifying larger $n$ (or a larger true effect) as the way to raise power without raising $\alpha$ .

Related dot points

Sources & how we know this

AP Statistics Course and Exam Description — College Board (2020)