A formula for the occupancy problem

The occupancy problem in probability deals with the random assigment of k balls into n cells. One problem of interest is on the number of cells that are occupied or the number of cells that are empty. When randomly assigning k balls into n cells, what is the probability of finding any empty cell? What is the probability of finding exactly j empty cells? In the previous post (The occupancy problem), we discussed the algorithm for solving this problem by the counting method based on the multinomial coefficients. In this post, we derive a direct formula for the probability of having j empty cells when randomly placing k balls into n cells (see Theorem 2 below). The derivation is based on a theorem about the union of events (see Theorem 1 below). For more background information on the occupancy problem, see volume I of Feller in [1].

Theorem 1 concerns with the union of n events. A provious post in this blog (The matching problem) presented a sketch of a proof of this theorem and a solution of one version of the matching problem. After proving Theorem 2, we present a direct calculation for the example in the previous post.

Suppose that we have the events E_1,E_2, \cdots, E_n. Define S_m as the sum of all probabilities of m many E_i occuring simultaneously. Thus we have:

\displaystyle S_1=\sum \limits_{i=1}^{n} P(E_i)

\displaystyle S_2=\sum \limits_{i<j} P(E_i \cap E_j)

\displaystyle S_3=\sum \limits_{i<j<w} P(E_i \cap E_j \cap E_w)

\displaystyle S_m=\sum \limits_{i(1)< \cdots < i(m)} P(E_{i(1)} \cap E_{i(2)} \cap \cdots \cap E_{i(m)})

Thoerem 1
The following is the probability that at least one of the events E_1,E_2, \cdots, E_n has occurred:

\displaystyle P(E_1 \cup E_2 \cup \cdots \cup E_n)= \sum \limits_{m=1}^{n} (-1)^{m+1} S_m

Note that when m=2 or m=3, we have the following familiar formulas:

\displaystyle P(E_1 \cup E_2)=P(E_1)+P(E_2)-P(E_1 \cap E_2)

\displaystyle \begin{aligned}P(E_1 \cup E_2 \cap E_3)&=P(E_1)+P(E_2)+P(E_3)\\&\ \ \ -P(E_1 \cap E_2)-P(E_1 \cap E_3)-P(E_2 \cap E_3)\\&\ \ \ +P(E_1 \cap E_2 \cap E_3)\end{aligned}

Theorem 2
In randomly placing k balls into n cells, let X_{k,n} be the number of empty cells. Then the probability that all cells are occupied (zero cell is empty) and the probability that exactly j cells are empty are, respectively,

\displaystyle P(X_{k,n}=0)=\sum \limits_{i=0}^{n} (-1)^{i} \binom{n}{i} \biggl(1-\frac{i}{n}\biggr)^{k}

\displaystyle P(X_{k,n}=j)=\binom{n}{j} \sum \limits_{i=0}^{n-j} (-1)^{i} \binom{n-j}{i} \biggl(1-\frac{j+i}{n}\biggr)^{k}

where j=1,2, \cdots, n-1.

Proof. Let E_i be the event that the i^{th} cell is empty, where i=1,2, \cdots, n. We proceed to derive the sums S_m as in Theorem 1.

For the event E_i to occur (the i^{th} cell being empty), the k balls in question are placed in the remaining n-1 cells. This can be done in (n-1)^k ways. Thus \displaystyle P(E_i)=\frac{(n-1)^k}{n^k}=\biggl(1-\frac{1}{n}\biggr)^k.

For the event E_i \cap E_j to occur (the i^{th} and j^{th} cells being empty), the k balls are placed in the remaining n-2 cells. This can be done in (n-2)^k ways. Thus we have \displaystyle P(E_i \cap E_j)=\frac{(n-2)^k}{n^k}=\biggl(1-\frac{2}{n}\biggr)^k.

Likewise, \displaystyle P(E_i \cap E_j \cap E_w)=\frac{(n-3)^k}{n^k}=\biggl(1-\frac{3}{n}\biggr)^k, and so on. For m \le n, we have \displaystyle S_m=\binom{n}{m} \biggl(1-\frac{m}{n}\biggr)^k.

Thus, the following is the probability that at least one cell is empty:

\displaystyle \begin{aligned}P(E_1 \cup E_2 \cup \cdots \cup E_n)&=\sum \limits_{m=1}^{n} (-1)^{m+1} S_m\\&=\sum \limits_{m=1}^{n} (-1)^{m+1} \binom{n}{m} \biggl(1-\frac{m}{n}\biggr)^k\end{aligned}

The probability of having no empty cell is:

\displaystyle \begin{aligned}P(X_{k,n}=0)&=1-\sum \limits_{m=1}^{n} (-1)^{m+1} \binom{n}{m} \biggl(1-\frac{m}{n}\biggr)^k\\&=\sum \limits_{m=0}^{n} (-1)^{m} \binom{n}{m} \biggl(1-\frac{m}{n}\biggr)^k\end{aligned}

If there are exactly j empty cells, then the k balls are placed in the remaining n-j cells and these remaining n-j cells are all occupied. Then the following is the probability of having exactly j empty cells:

\displaystyle \begin{aligned}P(X_{k,n}=j)&=\binom{n}{j} \biggl(1-\frac{j}{n}\biggr)^k \times P(X_{k,n-j}=0)\\&=\binom{n}{j} \biggl(1-\frac{j}{n}\biggr)^k \times \sum \limits_{m=0}^{n-j} (-1)^{m} \binom{n-j}{m} \biggl(1-\frac{m}{n-j}\biggr)^k\\&=\binom{n}{j} \sum \limits_{m=0}^{n-j} (-1)^m \binom{n-j}{m} \biggl(1-\frac{j+m}{n}\biggr)^k\end{aligned}

Example
In rolling a fair die 7 times, what is the probability of having exactly j faces appearing? This is the same example in the previous post (The occupancy problem). In this post we apply the formulas in Theorem 2.

Note that the rolling a die can be considered as the placing of a ball into one of the 6 cells. Thus we have k=7 balls and n=6 cells. The probability of having j cells occupied is the probability of 6-j empty cells. As in Theorem 2, \displaystyle X_{7,6} is the number of empty cells when placing 7 balls into 6 cells at random. Thus the probability of having exactly j cells occupied is \displaystyle P(X_{7,6})=6-j.

\displaystyle \begin{aligned}P(X_{7,6}=0)&=P(\text{all 6 cells are occupied})\\&=\sum \limits_{m=0}^{6} (-1)^{m} \binom{6}{m} \biggl(1-\frac{m}{6}\biggr)^7\\&=\frac{15120}{6^7}\\&=\frac{15120}{279936}\\&=0.0540123457\end{aligned}

\displaystyle \begin{aligned}P(X_{7,6}=1)&=P(\text{exactly 5 cells are occupied})\\&=\binom{6}{1} \sum \limits_{m=0}^{5} (-1)^{m} \binom{5}{m} \biggl(1-\frac{m+1}{6}\biggr)^7\\&=\frac{100800}{6^7}\\&=\frac{100800}{279936}\\&=0.3600823045\end{aligned}

\displaystyle \begin{aligned}P(X_{7,6}=2)&=P(\text{exactly 4 cells are occupied})\\&=\binom{6}{2} \sum \limits_{m=0}^{4} (-1)^{m} \binom{4}{m} \biggl(1-\frac{m+2}{6}\biggr)^7\\&=\frac{126000}{6^7}\\&=\frac{126000}{279936}\\&=0.4501028807\end{aligned}

\displaystyle \begin{aligned}P(X_{7,6}=3)&=P(\text{exactly 3 cells are occupied})\\&=\binom{6}{3} \sum \limits_{m=0}^{3} (-1)^{m} \binom{3}{m} \biggl(1-\frac{m+3}{6}\biggr)^7\\&=\frac{36120}{6^7}\\&=\frac{36120}{279936}\\&=0.1290294925\end{aligned}

\displaystyle \begin{aligned}P(X_{7,6}=4)&=P(\text{exactly 2 cells are occupied})\\&=\binom{6}{4} \sum \limits_{m=0}^{2} (-1)^{m} \binom{2}{m} \biggl(1-\frac{m+4}{6}\biggr)^7\\&=\frac{1890}{6^7}\\&=\frac{1890}{279936}\\&=0.0067515432\end{aligned}

\displaystyle \begin{aligned}P(X_{7,6}=5)&=P(\text{exactly 1 cell is occupied})\\&=\binom{6}{5} \sum \limits_{m=0}^{1} (-1)^{m} \binom{1}{m} \biggl(1-\frac{m+5}{6}\biggr)^7\\&=\frac{6}{6^7}\\&=\frac{6}{279936}\\&=0.00002143347051\end{aligned}

Reference

  1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968
Advertisements

7 thoughts on “A formula for the occupancy problem

  1. A very neat and interesting analysis. In theorem 2, did you yourself derive the formula for the probability that exactly j cells are empty, or has the proof been taken from an earlier source? Could you tell me the source, please, if there is one? I noticed similar reasoning for the matching problem.
    Thank you.
    Howard Walker

  2. Thank you, Dan, for your page!

    With k < n, there are at least n-k empty cells.

    I find that for j less than n-k the formula can be negative. For the remaining values of j, the distribution sums up to zero.

  3. Pingback: How long does it take to collect all coupons? | A Blog on Probability and Statistics

  4. Pingback: The occupancy problem | Topics in Probability

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s