Skip to main content
United StatesStatisticsSyllabus dot point

How can we estimate a probability by simulating a random process many times?

Topic 4.2 Estimating Probabilities Using Simulation: design and carry out a simulation using a chance device or random numbers to estimate a probability as a long-run relative frequency.

A focused answer to AP Statistics Topic 4.2, on designing and running simulations with random numbers to estimate probabilities, the four-step simulation method, and reading the estimate as a long-run relative frequency.

Generated by Claude Opus 4.89 min answer

Reviewed by: AI editorial process; not yet individually human-reviewed

Have a quick question? Jump to the Q&A page

Jump to a section
  1. What this topic is asking
  2. What a simulation is
  3. The four-step method
  4. Assigning digits correctly
  5. Reading the estimate
  6. Try this

What this topic is asking

The College Board (Topic 4.2) wants you to design and carry out a simulation, using a chance device or random numbers, to estimate a probability as a long-run relative frequency, and to interpret what one trial and many trials produce.

What a simulation is

Simulation is the practical companion to Topic 4.1: where 4.1 says probability is a long-run relative frequency, 4.2 says you can therefore estimate any probability by generating that long run artificially. It is especially valuable for complex situations (collecting all prizes, waiting times, multi-step processes) where an exact formula is awkward, but the College Board also uses it to build intuition for the rules that come next.

The four-step method

Every full-credit simulation answer touches all four. The most common omission is a vague stopping rule, so be explicit about when a trial ends (for example, "stop when all four prizes have appeared" or "read exactly five digits, one per patient").

Assigning digits correctly

The heart of a good simulation is the digit assignment, because it encodes the probabilities. If an event has probability pp expressed in tenths, assign that many of the ten digits 00 to 99 to it; for hundredths, read digits in pairs (0000 to 9999) and assign the right number of the 100100 pairs. For a probability like 0.30.3, any three of the ten digits work (commonly 0,1,20,1,2); the choice of which three does not matter, only that exactly three are used. When a probability does not divide the digits evenly (such as 1/31/3), you let some digit combinations be ignored and re-read, keeping the kept outcomes in the correct ratio. Modelling several independent events in one trial just means reading more digits, one block per event, with each block using the same assignment. Getting this mapping faithful to the stated probabilities is what makes the simulation a valid model rather than an arbitrary game.

Reading the estimate

After many trials, the estimate is simply a relative frequency: the number of trials in which the event happened, divided by the number of trials. Because it is a long-run relative frequency, it is an estimate of the true probability and, by the law of large numbers, it improves as you run more trials, which is why exam answers note that "more trials give a more reliable estimate." When the question asks for an expected value (such as the average number of boxes needed), you record a count each trial and average those counts instead of counting successes. Either way, you should interpret the result in context: not "the estimate is 0.420.42" but "we estimate about a 42%42\% chance that ..." or "on average about 88 boxes are needed." This habit of translating the numerical estimate back into the situation is what the College Board rewards, and it mirrors the interpret-in-context discipline that runs through every unit.

Try this

Q1. To simulate an event with probability 0.60.6 using single random digits, how many digits should represent the event, and give a valid assignment. [2 points]

  • Cue. Six of the ten digits; for example let 00 to 55 mean the event and 66 to 99 mean not (any six distinct digits work).

Q2. Why does running more trials of a simulation give a better probability estimate? [1 point]

  • Cue. The estimate is a long-run relative frequency, and by the law of large numbers it converges to the true probability as the number of trials increases.

Exam-style practice questions

Practice questions written in the style of College Board exam questions on this dot point, with worked answer explainers. The year tag is the paper they imitate, not the source.

AP 2018 (style)1 marksSection I (multiple choice). To simulate whether each of 55 patients (each with a 0.30.3 chance of recovery) recovers, which assignment of single random digits works? (A) Let 00 to 22 mean recover, 33 to 99 mean not (B) Let 00 to 44 mean recover, 55 to 99 mean not (C) Let 11 to 33 mean recover, 44 to 99 and 00 mean not (D) Both (A) and (C)
Show worked answer →

The correct answer is (D).

A recovery probability of 0.30.3 means 33 of the 1010 digits 00 to 99 should represent recovery. Option (A) uses digits 0,1,20,1,2 (three digits) and (C) uses digits 1,2,31,2,3 (three digits); both assign exactly 3/10=0.33/10 = 0.3 to recovery.

(B) assigns 55 digits (00 to 44), giving probability 0.50.5, which is wrong. Any choice of exactly 33 distinct digits for recovery works, so (A) and (C) are both valid, making (D) correct.

AP 2021 (style)4 marksSection II (free response). A cereal box contains one of 44 equally likely prizes. A child wants all 44. (a) Describe how to use a random digit table to simulate buying boxes until all 44 prizes are collected. (b) Explain what one trial of the simulation produces. (c) Explain how repeating the simulation many times estimates the average number of boxes needed, justifying in context.
Show worked answer →

A 4-point question on simulation design.

(a) (1 point) Let digits 1,2,3,41, 2, 3, 4 represent the four prizes and ignore digits 0,5,6,7,8,90, 5, 6, 7, 8, 9 (or assign two digits per prize). Read digits one at a time, recording which prize each represents, until all four prizes have appeared.
(b) (1 point) One trial produces the number of boxes (valid digits read) needed to collect all four prizes that time.
(c) (2 points) Repeat the trial many times and record the boxes needed each time (1 point); the average of those counts estimates the expected number of boxes needed, and by the law of large numbers this estimate improves with more trials (1 point, in context).

Markers reward a valid digit assignment with a clear stopping rule, stating what one trial yields, and averaging over many trials with the long-run justification.

Related dot points

Sources & how we know this