R Tutorial - The Birthday Problem

The Birthday Problem: A Well-Known Puzzle

In this lesson, we will solve our first puzzle, a well-known problem called the birthday problem. The setup is as follows: there are n people in a room, and we want to know the probability that there is at least one common birthday among any two or more people in the room. To make this more manageable, we need to make the following assumptions.

First, we exclude leap years, meaning no one has a birthday on February 29th. Next, each birthday is equally likely to fall on any day of the year. Finally, all individuals in the room are independent of each other. For this puzzle, we will write a simulation-based solution to estimate the true theoretical probability.

Estimating the Probability of Rolling a 12 with Two Dice

Consider the following example: suppose we want to estimate the probability of rolling a 12 with two ordinary dice. We start by defining a variable called counter to keep track of the number of times that a 12 is rolled. It starts at 0, since no 12s have been rolled yet. Then, we simulate a single roll using the roll_dice function created previously.

In practice, our code will not need to print any result; we simply check whether roll is equal to 12 and if so, it will increment the counter by adding one after a 12 was rolled. The counter is now equal to 1 if the roll had not been a 12; the counter would still be 0.

To do this many times, we will use a for loop. The counter is set to 0 before the loop begins and then within the loop, we roll two dice and check the resulting value. If it is equal to 12, the counter is incremented by 1 once the loop is complete. We divide the counter by the number of iterations to obtain our estimate of the true probability.

Notice that our value is very close but not exactly equal to the correct value of 1 over 36. R has a built-in function called P_birthday that can solve the birthday problem theoretically after completing a simulated solution, we can use the P_birthday function to compare our answer and calculate the birthday problem probability over a range of sample sizes.

Using the P_birthday Function

After completing a simulated solution, we can use the P_birthday function to compare our answer and calculate the birthday problem probability over a range of sample sizes. This requires providing a sample size with a sample size of 10. The output is the probability of at least one match in a room of 10 people.

We can calculate the matched probability over a variety of room sizes using the sapply function. Here, this is shown for room sizes from 1 to 10. Notice that the last probability in the output matches the value obtained for a sample size of 10.

Displaying the Relationship Between Room Size and Match Probability

We can display the relationship between room size and match probability in a scatter plot using the plot function. We indicate the two variables we want to plot separated by a tilde with the first variable on the y-axis and the second variable on the x-axis.

Conclusion

In this lesson, we solved our first puzzle, the birthday problem. We used a simulation-based solution to estimate the true theoretical probability and compared it with the P_birthday function. By calculating the matched probability over a range of room sizes, we got a better idea of the trend. The relationship between room size and match probability can be visualized in a scatter plot, providing a clear understanding of the problem.

R Code

Here is the R code used to generate the simulation-based solution:

```r

# Initialize counter

counter <- 0

# Define roll_dice function

roll_dice <- function() {

# Roll two dice

result <- sample(1:12, size = 2)

# Check if result is equal to 12

if (result[1] + result[2] == 12) {

# Increment counter

counter <- counter + 1

}

}

# Run simulation for many iterations

set.seed(123)

for (i in 1:10000) {

roll_dice()

}

# Calculate probability of rolling a 12

probability <- counter / 10000

print(paste("Probability of rolling a 12:", probability))

```

This code initializes the counter, defines the roll_dice function, runs the simulation for many iterations, and calculates the probability of rolling a 12. The result is printed to the console.

"WEBVTTKind: captionsLanguage: enin this lesson we will solve our first puzzle a well-known problem called the birthday problem the setup is as follows there are n people in a room and we want to know the probability that there is at least one common birthday among any two or more people in the room to make this more manageable we need to make the following assumptions first we exclude leap years meaning no one has a birthday on February 29th next each birthday is equally likely to fall on any day of the year finally all individuals in the room are independent of each other for this puzzle we will write a simulation based solution to estimate the true theoretical probability to illustrate this concept consider the following example suppose we want to estimate the probability of rolling a 12 with two ordinary dice we start by defining a variable called counter to keep track of the number of times that a 12 is rolled it starts at 0 since no 12s have been rolled yet then we simulate a single roll using the roll underscore dice function created previously here we see that the roll is indeed a 12 in practice our code will not need to print any result we simply check whether roll is equal to 12 and if so it will increment the counter by adding one after a 12 was rolled the counter is now equal to 1 if the roll had not been a 12 the counter would still be 0 to do this many times we will use a for loop the counter is set to 0 before the loop begins and then within the loop we roll two dice and check the resulting value if it is equal to 12 the counter is incremented by 1 once the loop is complete we divide the counter by the number of iterations to obtain our estimate of the true probability notice that our value is very but not exactly equal to the correct value of 1 over 36 R has a built-in function called P birthday that can solve the birthday problem theoretically after completing a simulated solution we can use the P birthday function to compare our answer and calculate the birthday problem probability over a range of sample sizes to get a better idea of the trend using the p birthday function only requires providing a sample size with a sample size of 10 the output is the probability of at least one match in a room of 10 people we can calculate the matched probability over a variety of room sizes using the s apply function here this is shown for room sizes from 1 to 10 notice that the last probability in the output matches the value obtained for a sample size of 10 we can display the relationship between room size and match probability in a scatter plot using the plot function by indicating the two variables we want to plot separated by a tilde with the first variable on the y-axis and the second variable on the x-axis let's do thisin this lesson we will solve our first puzzle a well-known problem called the birthday problem the setup is as follows there are n people in a room and we want to know the probability that there is at least one common birthday among any two or more people in the room to make this more manageable we need to make the following assumptions first we exclude leap years meaning no one has a birthday on February 29th next each birthday is equally likely to fall on any day of the year finally all individuals in the room are independent of each other for this puzzle we will write a simulation based solution to estimate the true theoretical probability to illustrate this concept consider the following example suppose we want to estimate the probability of rolling a 12 with two ordinary dice we start by defining a variable called counter to keep track of the number of times that a 12 is rolled it starts at 0 since no 12s have been rolled yet then we simulate a single roll using the roll underscore dice function created previously here we see that the roll is indeed a 12 in practice our code will not need to print any result we simply check whether roll is equal to 12 and if so it will increment the counter by adding one after a 12 was rolled the counter is now equal to 1 if the roll had not been a 12 the counter would still be 0 to do this many times we will use a for loop the counter is set to 0 before the loop begins and then within the loop we roll two dice and check the resulting value if it is equal to 12 the counter is incremented by 1 once the loop is complete we divide the counter by the number of iterations to obtain our estimate of the true probability notice that our value is very but not exactly equal to the correct value of 1 over 36 R has a built-in function called P birthday that can solve the birthday problem theoretically after completing a simulated solution we can use the P birthday function to compare our answer and calculate the birthday problem probability over a range of sample sizes to get a better idea of the trend using the p birthday function only requires providing a sample size with a sample size of 10 the output is the probability of at least one match in a room of 10 people we can calculate the matched probability over a variety of room sizes using the s apply function here this is shown for room sizes from 1 to 10 notice that the last probability in the output matches the value obtained for a sample size of 10 we can display the relationship between room size and match probability in a scatter plot using the plot function by indicating the two variables we want to plot separated by a tilde with the first variable on the y-axis and the second variable on the x-axis let's do this\n"