How can we estimate a probability by simulating a random process many times?
Topic 4.2 Estimating Probabilities Using Simulation: design and carry out a simulation using a chance device or random numbers to estimate a probability as a long-run relative frequency.
A focused answer to AP Statistics Topic 4.2, on designing and running simulations with random numbers to estimate probabilities, the four-step simulation method, and reading the estimate as a long-run relative frequency.
Reviewed by: AI editorial process; not yet individually human-reviewed
Have a quick question? Jump to the Q&A page
Jump to a section
What this topic is asking
The College Board (Topic 4.2) wants you to design and carry out a simulation, using a chance device or random numbers, to estimate a probability as a long-run relative frequency, and to interpret what one trial and many trials produce.
What a simulation is
Simulation is the practical companion to Topic 4.1: where 4.1 says probability is a long-run relative frequency, 4.2 says you can therefore estimate any probability by generating that long run artificially. It is especially valuable for complex situations (collecting all prizes, waiting times, multi-step processes) where an exact formula is awkward, but the College Board also uses it to build intuition for the rules that come next.
The four-step method
Every full-credit simulation answer touches all four. The most common omission is a vague stopping rule, so be explicit about when a trial ends (for example, "stop when all four prizes have appeared" or "read exactly five digits, one per patient").
Assigning digits correctly
The heart of a good simulation is the digit assignment, because it encodes the probabilities. If an event has probability expressed in tenths, assign that many of the ten digits to to it; for hundredths, read digits in pairs ( to ) and assign the right number of the pairs. For a probability like , any three of the ten digits work (commonly ); the choice of which three does not matter, only that exactly three are used. When a probability does not divide the digits evenly (such as ), you let some digit combinations be ignored and re-read, keeping the kept outcomes in the correct ratio. Modelling several independent events in one trial just means reading more digits, one block per event, with each block using the same assignment. Getting this mapping faithful to the stated probabilities is what makes the simulation a valid model rather than an arbitrary game.
Reading the estimate
After many trials, the estimate is simply a relative frequency: the number of trials in which the event happened, divided by the number of trials. Because it is a long-run relative frequency, it is an estimate of the true probability and, by the law of large numbers, it improves as you run more trials, which is why exam answers note that "more trials give a more reliable estimate." When the question asks for an expected value (such as the average number of boxes needed), you record a count each trial and average those counts instead of counting successes. Either way, you should interpret the result in context: not "the estimate is " but "we estimate about a chance that ..." or "on average about boxes are needed." This habit of translating the numerical estimate back into the situation is what the College Board rewards, and it mirrors the interpret-in-context discipline that runs through every unit.
Try this
Q1. To simulate an event with probability using single random digits, how many digits should represent the event, and give a valid assignment. [2 points]
- Cue. Six of the ten digits; for example let to mean the event and to mean not (any six distinct digits work).
Q2. Why does running more trials of a simulation give a better probability estimate? [1 point]
- Cue. The estimate is a long-run relative frequency, and by the law of large numbers it converges to the true probability as the number of trials increases.
Exam-style practice questions
Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.
AP 2018 (style)1 marksSection I (multiple choice). To simulate whether each of patients (each with a chance of recovery) recovers, which assignment of single random digits works? (A) Let to mean recover, to mean not (B) Let to mean recover, to mean not (C) Let to mean recover, to and mean not (D) Both (A) and (C)Show worked answer →
The correct answer is (D).
A recovery probability of means of the digits to should represent recovery. Option (A) uses digits (three digits) and (C) uses digits (three digits); both assign exactly to recovery.
(B) assigns digits ( to ), giving probability , which is wrong. Any choice of exactly distinct digits for recovery works, so (A) and (C) are both valid, making (D) correct.
AP 2021 (style)4 marksSection II (free response). A cereal box contains one of equally likely prizes. A child wants all . (a) Describe how to use a random digit table to simulate buying boxes until all prizes are collected. (b) Explain what one trial of the simulation produces. (c) Explain how repeating the simulation many times estimates the average number of boxes needed, justifying in context.Show worked answer →
A 4-point question on simulation design.
(a) (1 point) Let digits represent the four prizes and ignore digits (or assign two digits per prize). Read digits one at a time, recording which prize each represents, until all four prizes have appeared.
(b) (1 point) One trial produces the number of boxes (valid digits read) needed to collect all four prizes that time.
(c) (2 points) Repeat the trial many times and record the boxes needed each time (1 point); the average of those counts estimates the expected number of boxes needed, and by the law of large numbers this estimate improves with more trials (1 point, in context).
Markers reward a valid digit assignment with a clear stopping rule, stating what one trial yields, and averaging over many trials with the long-run justification.
Related dot points
- Topic 4.1 Introducing Statistics: Random and Non-Random Patterns? Recognize that random processes produce patterns, and that probability provides the framework for deciding whether an observed pattern is surprising or consistent with chance.
A focused answer to AP Statistics Topic 4.1, on why random processes still produce patterns, what randomness and short-run versus long-run behavior mean, and how probability frames whether an observed pattern is surprising.
- Topic 4.3 Introduction to Probability: apply the basic properties of probability (range, total of one, complement rule) and the law of large numbers to compute and interpret probabilities of events.
A focused answer to AP Statistics Topic 4.3, on the basic axioms of probability, the complement rule, sample spaces and equally likely outcomes, and the law of large numbers, with worked complement and basic probability calculations.
- Topic 4.4 Mutually Exclusive Events: identify mutually exclusive (disjoint) events and apply the addition rule, including the general addition rule that subtracts the overlap, to find the probability of a union.
A focused answer to AP Statistics Topic 4.4, defining mutually exclusive (disjoint) events, the addition rule for disjoint events, and the general addition rule that subtracts the intersection, with worked union calculations.
- Topic 4.10 Introduction to the Binomial Distribution: identify binomial settings (BINS conditions) and use the binomial probability formula to find the probability of a given number of successes in a fixed number of trials.
A focused answer to AP Statistics Topic 4.10, on the binomial setting (the BINS conditions), the binomial probability formula, and computing exact and cumulative binomial probabilities, with full worked calculations.
- Topic 4.12 The Geometric Distribution: identify a geometric setting (waiting for the first success), compute geometric probabilities, and find the mean of a geometric random variable.
A focused answer to AP Statistics Topic 4.12, on the geometric setting, the geometric probability formula, the mean of a geometric random variable, and how it differs from the binomial, with full worked calculations.
Sources & how we know this
- AP Statistics Course and Exam Description — College Board (2020)