The birthday problem as a waiting time occupancy problem

Ignoring leap year and assuming that each of the 365 days in the year is equally likely to be the birthday for a randomly chosen person, the birthday problem in probability is to find the minimum number of persons such that there is a better than 50% chance of having at least two people with the same birthday. This version of the birthday problem can be viewed as a waiting time problem, i.e. as a problem of randomly selecting people (one at a time) until there is a common birthday in the sample. This problem can also be viewed as an occupancy problem. Consider the 365 days in the year as cells and the selection of a person at random as placing a ball at random into one of the 365 cells. Then the random placement of balls (one at a time) continues until one of the cells has two balls.

This post is to explore the connection between the birthday problem as stated and the wating time occupancy problem. As the sake of completeness, we first state the two problems.

The Birthday Problem
Ignoring leap year and assuming that each of the 365 days in the year is equally likely to be the birthday for a randomly chosen person, how many people do we need to choose in order to have a better than 50% chance of having a common birthday among the selected individuals?

The Waiting Time Occupancy Problem
Suppose that balls are placed randomly (one at a time) into n cells. The random placing of balls continues until a ball is placed into a cell that already has one ball (i.e. until a “repeat” occurs). Let X_n be the number of balls required to obtain a repeat. Our goal is to discuss the probability function P(X_n=k), and the survival function P(X_n > k). Of course, we will discuss the connection with the birthday problem.

Remark
As we will see, the birthday problem can be restated as finding the median of the random variable in the waiting time occupancy problem with 365 cells. We will show that P(X_{365} \le 23)>\frac{1}{2} and P(X_{365} \le 22)<\frac{1}{2}.

A more basic version of the occupancy problem works with a fixed number of balls (say k balls) and is about randomly placing the k balls into n cells. In the following two posts in this blog, we discuss this fixed length occupancy problem and present ways to compute the probability that j cells are empty where j=0,1, \cdots, n-1.

The occupancy problem
A formula for the occupancy problem

In the waiting time occupancy problem discussed in this post, we do not fix the number of balls. Instead, we keep placing balls into cells until a certain prescribed condition takes place. So the focus is on the waiting time until the occurence of a certain event. With respect to the occupancy problem, examples of waiting times include the time until a repeat occurs (discussed here), the time until a specified cell is filled and the time until all cells are filled. Some of these waiting time problems will be discussed in subsequent posts.

Solving the Birthday Problem
In a random group of k people, the following is the probability that all k birthdays are different:

\displaystyle \begin{aligned}P_k&=\frac{365}{365} \thinspace \frac{364}{365} \thinspace \cdots \thinspace \frac{365-(k-1)}{365}\\&=\frac{364}{365} \thinspace \frac{363}{365} \thinspace \cdots \thinspace \frac{365-(k-1)}{365}\\&=\biggl(1-\frac{1}{365}\biggr) \thinspace \cdots \thinspace \biggl(1-\frac{k-1}{365}\biggr) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1) \end{aligned}

Note that for k=23, the probability of having different birthdays is p_{23}=0.492702766. Thus there is more than 50% chance of having at least a common birthday among 23 people (1-p_{23}=0.507297234). On the other hand, p_{22}=0.524304692 and 1-p_{22}<0.5. Thus among 22 people, the probability of having a common birthday is less than \frac{1}{2}.

The Discussion of the Waiting Time Occupancy Problem
In the random assignment of a series of balls (one at a time) into n cells, consider the event that the random pacement of balls terminates at the k^{th} step. This means that the first k-1 balls go into k-1 different cells and the k^{th} ball goes into one of the k-1 previously occupied cell. The probability of this event is:

\displaystyle \begin{aligned}P(X_n=k)&=\frac{n (n-1) \cdots (n-(k-2)) (k-1)}{n^k}\\&=\frac{(n-1) \cdots (n-(k-2)) (k-1)}{n^{k-1}}\\&=\biggl(1-\frac{1}{n}\biggr) \thinspace \cdots \thinspace \biggl(1-\frac{k-2}{n}\biggr) \frac{k-1}{n}\end{aligned}

Note that the number of ways of assigning k-1 balls into distinct cells among n cells is n(n-1) \cdots (n-(k-2)). Then there are k-1 ways of assigning the k^{th} ball to the one of the previously assigned cells. Thus the total number of assigning k balls in this fashion is:

\displaystyle n (n-1) \cdots (n-(k-2)) (k-1)

We now discuss P(X_n>k). This is the probability that it takes more than k balls to have two balls in the same cell. Consequently, the first k balls all go into different cells. When n=365, P(X_{365}>k) should agree with the probability stated in (1) above. Thus we have:

\displaystyle P(X_n>k)=\biggl(1-\frac{1}{n}\biggr) \ \cdots \ \biggl(1-\frac{k-1}{n}\biggr)

The above survival function can also be proven by induction. To find the median of the distribution X_n, we need to find m such that:

P(X_n \le m)=1-P(X_n>m)>\frac{1}{2} and
P(X_n \le m-1)=1-P(X_n>m-1)<\frac{1}{2}

For the birthday problem, n=365. It has been shown in the above solution of the birthday problem that the median is m=23.

Example
This is one more example to illustrate the computation. Suppose we roll a fair die repeatedly until a repeat of a previously shown face. Each roll of the die is essentially placing a ball into one of the six cells. Let X_6 be the number of rolls to obtain a repeat. The following shows the probability function P(X_6=k).

\displaystyle P(X_6=2)=\frac{1}{6}

\displaystyle P(X_6=3)=\frac{10}{6^2}

\displaystyle P(X_6=4)=\frac{60}{6^3}

\displaystyle P(X_6=5)=\frac{240}{6^4}

\displaystyle P(X_6=6)=\frac{600}{6^5}

\displaystyle P(X_6=7)=\frac{720}{6^6}

The median number of rolls to achieve a repeat is 4. Note the following probabilities:

\displaystyle P(X_6 \le 4)=1-P(X_6>4)=1-\frac{60}{6^3}=\frac{156}{6^3}=0.72

\displaystyle P(X_6 \le 3)=1-P(X_6>3)=1-\frac{20}{6^2}=\frac{16}{6^2}=0.44

The following shows that the mean number of rolls to achieve a repeat is 3.77469.

\displaystyle \begin{aligned}E(X_6)&=2 \biggl(\frac{1}{6}\biggr)+3 \biggl(\frac{10}{6^2}\biggr)+4 \biggl(\frac{60}{6^3}\biggr)+5 \biggl(\frac{240}{6^4}\biggr)+6 \biggl(\frac{600}{6^5}\biggr)+7 \biggl(\frac{720}{6^6}\biggr)\\&=\frac{176112}{6^6}\\&=3.774691358\end{aligned}

Reference

  1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s