The Birthday Problem: A Well-Known Puzzle
In this lesson, we will solve our first puzzle, a well-known problem called the birthday problem. The setup is as follows: there are n people in a room, and we want to know the probability that there is at least one common birthday among any two or more people in the room. To make this more manageable, we need to make the following assumptions.
First, we exclude leap years, meaning no one has a birthday on February 29th. Next, each birthday is equally likely to fall on any day of the year. Finally, all individuals in the room are independent of each other. For this puzzle, we will write a simulation-based solution to estimate the true theoretical probability.
Estimating the Probability of Rolling a 12 with Two Dice
Consider the following example: suppose we want to estimate the probability of rolling a 12 with two ordinary dice. We start by defining a variable called counter to keep track of the number of times that a 12 is rolled. It starts at 0, since no 12s have been rolled yet. Then, we simulate a single roll using the roll_dice function created previously.
In practice, our code will not need to print any result; we simply check whether roll is equal to 12 and if so, it will increment the counter by adding one after a 12 was rolled. The counter is now equal to 1 if the roll had not been a 12; the counter would still be 0.
To do this many times, we will use a for loop. The counter is set to 0 before the loop begins and then within the loop, we roll two dice and check the resulting value. If it is equal to 12, the counter is incremented by 1 once the loop is complete. We divide the counter by the number of iterations to obtain our estimate of the true probability.
Notice that our value is very close but not exactly equal to the correct value of 1 over 36. R has a built-in function called P_birthday that can solve the birthday problem theoretically after completing a simulated solution, we can use the P_birthday function to compare our answer and calculate the birthday problem probability over a range of sample sizes.
Using the P_birthday Function
After completing a simulated solution, we can use the P_birthday function to compare our answer and calculate the birthday problem probability over a range of sample sizes. This requires providing a sample size with a sample size of 10. The output is the probability of at least one match in a room of 10 people.
We can calculate the matched probability over a variety of room sizes using the sapply function. Here, this is shown for room sizes from 1 to 10. Notice that the last probability in the output matches the value obtained for a sample size of 10.
Displaying the Relationship Between Room Size and Match Probability
We can display the relationship between room size and match probability in a scatter plot using the plot function. We indicate the two variables we want to plot separated by a tilde with the first variable on the y-axis and the second variable on the x-axis.
Conclusion
In this lesson, we solved our first puzzle, the birthday problem. We used a simulation-based solution to estimate the true theoretical probability and compared it with the P_birthday function. By calculating the matched probability over a range of room sizes, we got a better idea of the trend. The relationship between room size and match probability can be visualized in a scatter plot, providing a clear understanding of the problem.
R Code
Here is the R code used to generate the simulation-based solution:
```r
# Initialize counter
counter <- 0
# Define roll_dice function
roll_dice <- function() {
# Roll two dice
result <- sample(1:12, size = 2)
# Check if result is equal to 12
if (result[1] + result[2] == 12) {
# Increment counter
counter <- counter + 1
}
}
# Run simulation for many iterations
set.seed(123)
for (i in 1:10000) {
roll_dice()
}
# Calculate probability of rolling a 12
probability <- counter / 10000
print(paste("Probability of rolling a 12:", probability))
```
This code initializes the counter, defines the roll_dice function, runs the simulation for many iterations, and calculates the probability of rolling a 12. The result is printed to the console.