Binomial Distribution Exercises in R: 17 Practice Problems
Seventeen runnable binomial distribution exercises in R, grouped by the four core functions (dbinom, pbinom, qbinom, rbinom) plus binom.test inference. Each problem ships with a hidden solution and a written explanation so you can self-check the moment you finish coding.
Section 1. Exact probabilities with dbinom (3 problems)
Exercise 1.1: Probability of exactly 5 heads in 10 fair flips
Task: A friend hands you a coin claimed to be fair and asks for the probability of getting exactly 5 heads in 10 flips. Use dbinom() with size = 10 and prob = 0.5 to compute $P(X = 5)$. Save the scalar to ex_1_1.
Expected result:
#> [1] 0.2460938
Difficulty: Beginner
The phrase "exactly 5" points to the mass at a single specific count, not the chance of a whole range of counts.
Reach for the probability-mass function: pass the count 5 as the first argument, then size = 10 and prob = 0.5.
Click to reveal solution
Explanation: dbinom(x, size, prob) returns the probability mass at exactly x successes. The first argument is the count you want a probability for, not a vector of trial outcomes. For a fair coin the answer is $\binom{10}{5} \cdot 0.5^{10} = 252 / 1024 \approx 0.2461$. A common mistake is using pbinom() here, which would return the cumulative probability $P(X \le 5)$ instead of the exact-equality mass.
Exercise 1.2: Quality control: exactly 2 defective widgets in a batch of 50
Task: A quality engineer at a parts supplier samples 50 widgets from a production line with a 3 percent defect rate. Compute the probability that exactly 2 widgets in the sample are defective using dbinom(). Save the result to ex_1_2.
Expected result:
#> [1] 0.2555182
Difficulty: Intermediate
A 3 percent defect rate is the per-trial success probability, and you want the chance of one exact defective count in the batch.
Use dbinom() with x = 2, size = 50, and prob = 0.03.
Click to reveal solution
Explanation: Every trial is one widget, success means defective, and the trials are assumed independent with constant probability 0.03, which is the textbook binomial setup. Around 25.5 percent is the single most likely count near the mean $np = 1.5$. If the defect rate were estimated from a small pilot rather than known, you would want a Beta-binomial or a credible interval on $p$ before treating the answer as exact.
Exercise 1.3: Plot the PMF of Binomial(20, 0.3) with a bar chart
Task: Build a bar chart showing $P(X = k)$ for $k = 0, 1, \ldots, 20$ under Binomial(size = 20, prob = 0.3). Use ggplot2 with geom_col(), label the axes, and save the ggplot object to ex_1_3.
Expected result:
# A ggplot bar chart with k on the x-axis (0..20) and dbinom(k, 20, 0.3) on the y-axis.
# Peak bar sits around k=6 (the mean np=6). Tails are visibly skinny beyond k=12.
# Axes: x = "Successes (k)", y = "P(X = k)".
Difficulty: Intermediate
Ask for the probability at every possible success count at once, so you have a full table of point probabilities to draw as bars.
Build a data frame with k = 0:20 and dbinom(k, size = 20, prob = 0.3), then plot it with ggplot() and geom_col().
Click to reveal solution
Explanation: Passing a vector 0:20 to dbinom() vectorizes the call and returns 21 probabilities in one shot, which is the cleanest way to build a PMF table for plotting. The peak at $k = 6$ matches $np$, the mean of the distribution. For larger size the bars start to look bell-shaped, foreshadowing the normal approximation you will see in Section 4.
Section 2. Cumulative and tail probabilities with pbinom (4 problems)
Exercise 2.1: At most 2 heads in 10 fair flips
Task: Continuing with the fair-coin scenario, compute the probability of getting at most 2 heads in 10 flips. Use pbinom() to evaluate $P(X \le 2)$ with size = 10 and prob = 0.5. Save the scalar to ex_2_1.
Expected result:
#> [1] 0.0546875
Difficulty: Beginner
"At most 2" is a cumulative range running from 0 up through 2, not the mass at a single count.
Use pbinom() with q = 2, size = 10, prob = 0.5; the default lower tail already gives P(X <= q).
Click to reveal solution
Explanation: pbinom(q, size, prob) returns the lower-tail probability $P(X \le q)$ by default, so passing q = 2 includes the masses at 0, 1, and 2. You could verify by hand with sum(dbinom(0:2, 10, 0.5)), which returns the same value. Many beginners reach for dbinom(2, ...) here and get the at-exactly-2 mass instead of the at-most-2 cumulative.
Exercise 2.2: Survival probability: 8 or more successes out of 20
Task: A clinical trial enrolls 20 patients in a single arm, each with response probability 0.30. Compute the probability that 8 or more patients respond using pbinom() with lower.tail = FALSE. Save the upper-tail probability to ex_2_2.
Expected result:
#> [1] 0.2277282
Difficulty: Intermediate
"8 or more" is an upper-tail range, and because the distribution is discrete you must watch which boundary count gets included.
Use pbinom() with lower.tail = FALSE and pass q = 7, so P(X > 7) captures P(X >= 8).
Click to reveal solution
Explanation: The trick is the offset: pbinom(q, ..., lower.tail = FALSE) returns $P(X > q)$, strictly greater than, so to capture $P(X \ge 8)$ you pass q = 7. Equivalently you can write 1 - pbinom(7, 20, 0.3), but the lower.tail = FALSE path is numerically more stable for probabilities near 1 because it avoids 1 - x cancellation when x is tiny.
Exercise 2.3: Email marketing: open rate between 40 and 60 on 100 sends
Task: A growth team blasts 100 emails with a historical open rate of 0.5 and wants the probability that the number opened lies between 40 and 60 inclusive. Use the difference of two pbinom() calls to compute $P(40 \le X \le 60)$. Save the interval probability to ex_2_3.
Expected result:
#> [1] 0.9647998
Difficulty: Intermediate
An inclusive between-range is the cumulative probability up to the top bound minus everything strictly below the bottom bound.
Subtract two pbinom() calls: pbinom(60, 100, 0.5) minus pbinom(39, 100, 0.5).
Click to reveal solution
Explanation: Inclusive interval probabilities use a discrete-aware offset: $P(a \le X \le b) = P(X \le b) - P(X \le a - 1)$. Passing 39 instead of 40 is the most common mistake here. The answer roughly 96.5 percent is close to but not exactly the 95 percent you would read off a continuous normal approximation, which is a reminder that the normal CDF is a smoothing of this step function.
Exercise 2.4: Recompute pbinom by summing dbinom
Task: Verify your understanding of the binomial CDF by recomputing $P(X \le 5)$ for Binomial(12, 0.3) two ways: with pbinom(), and by summing dbinom() over $k = 0, 1, \ldots, 5$. Save both results in a named numeric vector with elements cdf and pmf_sum to ex_2_4.
Expected result:
#> cdf pmf_sum
#> 0.882131 0.882131
Difficulty: Intermediate
The cumulative probability is simply the running total of the individual point probabilities up to that count.
Combine pbinom(5, size = 12, prob = 0.3) with sum(dbinom(0:5, size = 12, prob = 0.3)) inside c(cdf = ..., pmf_sum = ...).
Click to reveal solution
Explanation: The CDF is defined as the cumulative sum of the PMF, so the two values must agree to floating-point precision. This identity is the reason pbinom() and sum(dbinom(...)) are interchangeable for small size, but pbinom() is preferred for large size because R uses a stable algorithm rather than summing many tiny floats. Exercises like this one are also useful when sanity-checking custom probability code against the built-in family.
Section 3. Quantiles and prediction with qbinom (3 problems)
Exercise 3.1: Find the 95th percentile of Binomial(100, 0.5)
Task: Use qbinom() to find the smallest integer $k$ such that $P(X \le k) \ge 0.95$ when $X \sim \text{Binomial}(100, 0.5)$. Save this critical value to ex_3_1 and verify it returns a numeric value.
Expected result:
#> [1] 58
#> [1] "numeric"
Difficulty: Intermediate
You want the smallest count whose cumulative probability reaches a target level - the inverse direction of a cumulative lookup.
Use qbinom() with p = 0.95, size = 100, prob = 0.5.
Click to reveal solution
Explanation: qbinom(p, ...) returns the smallest integer $k$ with cumulative probability at least p, so it is the discrete inverse of pbinom(). The answer 58 says that in 100 fair flips you would expect to see 58 or fewer heads at least 95 percent of the time. A common confusion is to expect the normal approximation $\mu + 1.645\sigma = 58.2$ to round to 58 exactly, which it does here but does not always.
Exercise 3.2: Build a 95 percent prediction interval for daily conversions
Task: A site averages 2000 visitors per day with a 5 percent conversion rate. The analytics lead wants a 95 percent prediction interval for the number of daily conversions. Use qbinom() at probabilities 0.025 and 0.975 to compute the lower and upper bounds. Save them as a length-2 numeric vector with names lower and upper to ex_3_2.
Expected result:
#> lower upper
#> 81 120
Difficulty: Advanced
A central 95 percent interval is bounded by the two quantiles that leave 2.5 percent of probability in each tail.
Call qbinom() twice, at p = 0.025 and p = 0.975, with size = 2000 and prob = 0.05, naming the results lower and upper.
Click to reveal solution
Explanation: This is a prediction interval, not a confidence interval. It says where future daily conversion counts should fall under the assumed model, holding prob = 0.05 fixed. A confidence interval would instead quantify uncertainty about prob itself. Reviewers sometimes flag prediction intervals that are too narrow because the analyst forgot to account for parameter uncertainty, especially when prob was estimated from a small sample.
Exercise 3.3: Compare qbinom to the empirical 0.9 quantile of rbinom
Task: Draw 50000 samples from Binomial(40, 0.25) with rbinom(), then compute the 0.9 sample quantile and compare it with qbinom(0.9, 40, 0.25). Save both numbers in a named vector with elements theoretical and empirical to ex_3_3.
Expected result:
#> theoretical empirical
#> 14 14
Difficulty: Intermediate
Compare the count predicted by theory against the same percentile read off a large simulated sample.
Generate draws with rbinom(50000, size = 40, prob = 0.25), then pair qbinom(0.9, 40, 0.25) with quantile(samp, 0.9, type = 1).
Click to reveal solution
Explanation: Empirical and theoretical quantiles should match for large n, and 50000 is well past the regime where Monte Carlo noise matters here. Passing type = 1 to quantile() is important because it uses the inverse of the empirical CDF, which matches the discrete-aware definition qbinom() uses. Other type values interpolate between adjacent order statistics and can drift off by 1 for skewed discrete distributions.
Section 4. Simulation and parameter estimation with rbinom (4 problems)
Exercise 4.1: Estimate the head probability from 10000 simulated flips
Task: Simulate 10000 Bernoulli trials of a fair coin using rbinom(n = 10000, size = 1, prob = 0.5), then compute the sample mean as your estimate of prob. Save the proportion to ex_4_1.
Expected result:
#> [1] 0.4977 # approximately 0.5; exact value depends on the seed
Difficulty: Beginner
Each simulated flip is a 0 or a 1, and averaging those zeros and ones estimates the underlying success rate.
Generate the trials with rbinom(n = 10000, size = 1, prob = 0.5) and take mean() of the result.
Click to reveal solution
Explanation: Setting size = 1 makes rbinom() generate Bernoulli outcomes (0 or 1), and the sample mean of those is the maximum likelihood estimate of prob. With 10000 draws the standard error is $\sqrt{0.5 \cdot 0.5 / 10000} = 0.005$, so values near 0.495 to 0.505 are routine. Cranking n higher tightens that band, while shrinking n widens it and makes Monte Carlo estimation noticeably noisier.
Exercise 4.2: Estimate p with standard error from a sample of 200 customers
Task: A survey draws 200 customers, of whom 56 click the promo banner. Treat this as a single Binomial(200, p) observation. Compute the point estimate $\hat p = 56/200$ and its standard error $\sqrt{\hat p (1 - \hat p) / 200}$. Save both to a named numeric vector with elements phat and se and assign it to ex_4_2.
Expected result:
#> phat se
#> 0.28000 0.03175
Difficulty: Intermediate
The point estimate is just the observed success fraction, and its uncertainty shrinks with the square root of the sample size.
Compute phat = 56 / 200 and se = sqrt(phat * (1 - phat) / 200), then combine them with c(phat = ..., se = ...).
Click to reveal solution
Explanation: $\hat p$ is the sample success proportion and is the MLE of the binomial probability parameter. Its Wald standard error uses $\hat p$ in place of the unknown true $p$, which is why the formula plugs in 0.28 rather than a hypothesized 0.5. For small samples or proportions near 0 or 1, Wald standard errors overstate precision; in those regimes prefer Wilson or Agresti-Coull intervals returned by binom.test() and prop.test().
Exercise 4.3: Compare empirical to theoretical mean and variance
Task: Draw 5000 samples from Binomial(20, 0.3) with rbinom(). Compute the sample mean and sample variance, then compute the theoretical values $np$ and $np(1-p)$. Save all four as a named numeric vector with elements emp_mean, emp_var, theo_mean, theo_var and assign to ex_4_3.
Expected result:
#> emp_mean emp_var theo_mean theo_var
#> 6.0098 4.1632 6.0000 4.2000
Difficulty: Intermediate
A simulated sample's average and spread should land near the distribution's theoretical center and dispersion.
Take mean() and var() of rbinom(5000, size = 20, prob = 0.3), and pair them with 20 0.3 and 20 0.3 * (1 - 0.3).
Click to reveal solution
Explanation: The theoretical mean of $\text{Binomial}(n, p)$ is $np$ and the variance is $np(1 - p)$, both classic results derived from summing $n$ independent Bernoulli variables. Empirical estimates from 5000 draws should be within a few percent of these targets. If you saw the empirical variance systematically above the theoretical value, that would signal overdispersion and you might switch to a Beta-binomial or quasi-binomial fit.
Exercise 4.4: Simulate free-throw streaks for an 85 percent shooter
Task: Simulate one season of 1000 free-throw attempts for a player with a true make rate of 0.85, then count the longest consecutive run of makes (a streak). Use rbinom() to generate the 1000 outcomes and rle() to find the longest run of 1s. Save the longest streak length to ex_4_4.
Expected result:
#> [1] 41 # exact streak depends on the seed; expect tens, not hundreds
Difficulty: Advanced
A streak is a stretch of consecutive identical outcomes, so you need a way to measure runs of equal values in the simulated sequence.
Generate outcomes with rbinom(1000, size = 1, prob = 0.85), pass them to rle(), then take max() of the lengths where values equal 1.
Click to reveal solution
Explanation: rle() returns run-length encoding of a vector: the lengths and the values of consecutive equal stretches, which is exactly the streak structure you want here. Filtering runs$values == 1 keeps the make-streaks and drops the miss-streaks. The expected longest run for $p = 0.85$ and $n = 1000$ grows like $\log_{1/p}(n) \approx 28$, but tail values in the 30s and 40s are routine because the maximum of geometric-like streak lengths has a heavy right tail.
Section 5. Inference with binom.test (3 problems)
Exercise 5.1: Two-sided exact test for coin fairness
Task: Suppose 40 heads were observed in 100 flips. Run an exact two-sided test of $H_0: p = 0.5$ with binom.test() and extract the p-value. Save just the p-value as a scalar to ex_5_1.
Expected result:
#> [1] 0.05688793
Difficulty: Intermediate
An exact test of a hypothesized success rate measures how surprising the observed count is, and you only need that single summary number.
Run binom.test(40, 100, p = 0.5, alternative = "two.sided") and pull out the $p.value element.
Click to reveal solution
Explanation: binom.test() returns a list (an htest object) and the p-value lives in $p.value. For two-sided tests R uses the rule of summing probabilities of all outcomes at least as unlikely as the observed one, which is exact and discrete-aware, not the lazy 2 * pbinom(...) shortcut. The 5.7 percent value is famously just over the 5 percent threshold, a good talking point about the arbitrariness of alpha cutoffs.
Exercise 5.2: A/B test: 125 clicks out of 1000 against a 10 percent baseline
Task: A product team ships a new landing page expecting at least a 10 percent click-through rate and observes 125 clicks in 1000 visits. Run a two-sided binom.test() of $H_0: p = 0.10$ and extract both the p-value and the 95 percent confidence interval. Save them as a named numeric vector with elements pvalue, ci_lo, ci_hi and assign to ex_5_2.
Expected result:
#> pvalue ci_lo ci_hi
#> 0.00977970 0.10510020 0.14710000
Difficulty: Advanced
The test yields both a measure of surprise against the null and a plausible range for the true success rate.
Run binom.test(125, 1000, p = 0.10), then read $p.value and the two entries of $conf.int into a named vector.
Click to reveal solution
Explanation: The 95 percent interval binom.test() returns is Clopper-Pearson, the exact interval inverted from the binomial CDF. It is wider than the Wald or Wilson alternatives but guarantees coverage at least 95 percent. The interval here excludes the null value 0.10, consistent with the small p-value, but you should still report the effect size (12.5 percent observed versus 10 percent baseline) since statistical and practical significance are not the same.
Exercise 5.3: Exact binom.test versus normal-approximation prop.test
Task: Run both binom.test(35, 100, p = 0.40) and prop.test(35, 100, p = 0.40, correct = FALSE) and compare their p-values to see how the normal approximation tracks the exact test. Save the two p-values to a named numeric vector with elements exact and normal and assign to ex_5_3.
Expected result:
#> exact normal
#> 0.3267612 0.3074177
Difficulty: Advanced
One test computes the answer exactly from the discrete distribution, the other leans on a normal approximation - compare their summary numbers.
Pull $p.value from both binom.test(35, 100, p = 0.40) and prop.test(35, 100, p = 0.40, correct = FALSE) into a named vector.
Click to reveal solution
Explanation: prop.test() uses a chi-squared statistic that is a normal approximation to the binomial, and correct = FALSE turns off Yates continuity correction so the comparison is apples-to-apples. The two p-values agree to about two significant figures here because $n = 100$ and the success count is comfortably away from 0 and $n$. For very small $n$ or extreme proportions the gap widens and the exact test is the safer default.
What to do next
- Review the parent lesson: Binomial and Poisson Distributions in R for the underlying theory and worked walk-throughs.
- Move on to count-data inference: Poisson Distribution Exercises in R.
- Reinforce the inference muscles with Hypothesis Testing Exercises in R.
- For more probability practice, try Normal Distribution Exercises in R.
r-statistics.co · Verifiable credential · Public URL
This document certifies mastery of
Binomial Distribution Mastery
Every certificate has a public verification URL that proves the holder passed the assessment. Anyone with the link can confirm the recipient and date.
69 learners have earned this certificate