A formula for the occupancy problem

The occupancy problem in probability deals with the random assigment of $k$ balls into $n$ cells. One problem of interest is on the number of cells that are occupied or the number of cells that are empty. When randomly assigning $k$ balls into $n$ cells, what is the probability of finding any empty cell? What is the probability of finding exactly $j$ empty cells? In the previous post (The occupancy problem), we discussed the algorithm for solving this problem by the counting method based on the multinomial coefficients. In this post, we derive a direct formula for the probability of having $j$ empty cells when randomly placing $k$ balls into $n$ cells (see Theorem 2 below). The derivation is based on a theorem about the union of events (see Theorem 1 below). For more background information on the occupancy problem, see volume I of Feller in [1].

Theorem 1 concerns with the union of $n$ events. A provious post in this blog (The matching problem) presented a sketch of a proof of this theorem and a solution of one version of the matching problem. After proving Theorem 2, we present a direct calculation for the example in the previous post.

Suppose that we have the events $E_1,E_2, \cdots, E_n$. Define $S_m$ as the sum of all probabilities of $m$ many $E_i$ occuring simultaneously. Thus we have:

$\displaystyle S_1=\sum \limits_{i=1}^{n} P(E_i)$

$\displaystyle S_2=\sum \limits_{i

$\displaystyle S_3=\sum \limits_{i

$\displaystyle S_m=\sum \limits_{i(1)< \cdots < i(m)} P(E_{i(1)} \cap E_{i(2)} \cap \cdots \cap E_{i(m)})$

Thoerem 1
The following is the probability that at least one of the events $E_1,E_2, \cdots, E_n$ has occurred:

$\displaystyle P(E_1 \cup E_2 \cup \cdots \cup E_n)= \sum \limits_{m=1}^{n} (-1)^{m+1} S_m$

Note that when $m=2$ or $m=3$, we have the following familiar formulas:

$\displaystyle P(E_1 \cup E_2)=P(E_1)+P(E_2)-P(E_1 \cap E_2)$

\displaystyle \begin{aligned}P(E_1 \cup E_2 \cap E_3)&=P(E_1)+P(E_2)+P(E_3)\\&\ \ \ -P(E_1 \cap E_2)-P(E_1 \cap E_3)-P(E_2 \cap E_3)\\&\ \ \ +P(E_1 \cap E_2 \cap E_3)\end{aligned}

Theorem 2
In randomly placing $k$ balls into $n$ cells, let $X_{k,n}$ be the number of empty cells. Then the probability that all cells are occupied (zero cell is empty) and the probability that exactly $j$ cells are empty are, respectively,

$\displaystyle P(X_{k,n}=0)=\sum \limits_{i=0}^{n} (-1)^{i} \binom{n}{i} \biggl(1-\frac{i}{n}\biggr)^{k}$

$\displaystyle P(X_{k,n}=j)=\binom{n}{j} \sum \limits_{i=0}^{n-j} (-1)^{i} \binom{n-j}{i} \biggl(1-\frac{j+i}{n}\biggr)^{k}$

where $j=1,2, \cdots, n-1$.

Proof. Let $E_i$ be the event that the $i^{th}$ cell is empty, where $i=1,2, \cdots, n$. We proceed to derive the sums $S_m$ as in Theorem 1.

For the event $E_i$ to occur (the $i^{th}$ cell being empty), the $k$ balls in question are placed in the remaining $n-1$ cells. This can be done in $(n-1)^k$ ways. Thus $\displaystyle P(E_i)=\frac{(n-1)^k}{n^k}=\biggl(1-\frac{1}{n}\biggr)^k$.

For the event $E_i \cap E_j$ to occur (the $i^{th}$ and $j^{th}$ cells being empty), the $k$ balls are placed in the remaining $n-2$ cells. This can be done in $(n-2)^k$ ways. Thus we have $\displaystyle P(E_i \cap E_j)=\frac{(n-2)^k}{n^k}=\biggl(1-\frac{2}{n}\biggr)^k$.

Likewise, $\displaystyle P(E_i \cap E_j \cap E_w)=\frac{(n-3)^k}{n^k}=\biggl(1-\frac{3}{n}\biggr)^k$, and so on. For $m \le n$, we have $\displaystyle S_m=\binom{n}{m} \biggl(1-\frac{m}{n}\biggr)^k$.

Thus, the following is the probability that at least one cell is empty:

\displaystyle \begin{aligned}P(E_1 \cup E_2 \cup \cdots \cup E_n)&=\sum \limits_{m=1}^{n} (-1)^{m+1} S_m\\&=\sum \limits_{m=1}^{n} (-1)^{m+1} \binom{n}{m} \biggl(1-\frac{m}{n}\biggr)^k\end{aligned}

The probability of having no empty cell is:

\displaystyle \begin{aligned}P(X_{k,n}=0)&=1-\sum \limits_{m=1}^{n} (-1)^{m+1} \binom{n}{m} \biggl(1-\frac{m}{n}\biggr)^k\\&=\sum \limits_{m=0}^{n} (-1)^{m} \binom{n}{m} \biggl(1-\frac{m}{n}\biggr)^k\end{aligned}

If there are exactly $j$ empty cells, then the $k$ balls are placed in the remaining $n-j$ cells and these remaining $n-j$ cells are all occupied. Then the following is the probability of having exactly $j$ empty cells:

\displaystyle \begin{aligned}P(X_{k,n}=j)&=\binom{n}{j} \biggl(1-\frac{j}{n}\biggr)^k \times P(X_{k,n-j}=0)\\&=\binom{n}{j} \biggl(1-\frac{j}{n}\biggr)^k \times \sum \limits_{m=0}^{n-j} (-1)^{m} \binom{n-j}{m} \biggl(1-\frac{m}{n-j}\biggr)^k\\&=\binom{n}{j} \sum \limits_{m=0}^{n-j} (-1)^m \binom{n-j}{m} \biggl(1-\frac{j+m}{n}\biggr)^k\end{aligned}

Example
In rolling a fair die $7$ times, what is the probability of having exactly $j$ faces appearing? This is the same example in the previous post (The occupancy problem). In this post we apply the formulas in Theorem 2.

Note that the rolling a die can be considered as the placing of a ball into one of the $6$ cells. Thus we have $k=7$ balls and $n=6$ cells. The probability of having $j$ cells occupied is the probability of $6-j$ empty cells. As in Theorem 2, $\displaystyle X_{7,6}$ is the number of empty cells when placing $7$ balls into $6$ cells at random. Thus the probability of having exactly $j$ cells occupied is $\displaystyle P(X_{7,6})=6-j$.

\displaystyle \begin{aligned}P(X_{7,6}=0)&=P(\text{all 6 cells are occupied})\\&=\sum \limits_{m=0}^{6} (-1)^{m} \binom{6}{m} \biggl(1-\frac{m}{6}\biggr)^7\\&=\frac{15120}{6^7}\\&=\frac{15120}{279936}\\&=0.0540123457\end{aligned}

\displaystyle \begin{aligned}P(X_{7,6}=1)&=P(\text{exactly 5 cells are occupied})\\&=\binom{6}{1} \sum \limits_{m=0}^{5} (-1)^{m} \binom{5}{m} \biggl(1-\frac{m+1}{6}\biggr)^7\\&=\frac{100800}{6^7}\\&=\frac{100800}{279936}\\&=0.3600823045\end{aligned}

\displaystyle \begin{aligned}P(X_{7,6}=2)&=P(\text{exactly 4 cells are occupied})\\&=\binom{6}{2} \sum \limits_{m=0}^{4} (-1)^{m} \binom{4}{m} \biggl(1-\frac{m+2}{6}\biggr)^7\\&=\frac{126000}{6^7}\\&=\frac{126000}{279936}\\&=0.4501028807\end{aligned}

\displaystyle \begin{aligned}P(X_{7,6}=3)&=P(\text{exactly 3 cells are occupied})\\&=\binom{6}{3} \sum \limits_{m=0}^{3} (-1)^{m} \binom{3}{m} \biggl(1-\frac{m+3}{6}\biggr)^7\\&=\frac{36120}{6^7}\\&=\frac{36120}{279936}\\&=0.1290294925\end{aligned}

\displaystyle \begin{aligned}P(X_{7,6}=4)&=P(\text{exactly 2 cells are occupied})\\&=\binom{6}{4} \sum \limits_{m=0}^{2} (-1)^{m} \binom{2}{m} \biggl(1-\frac{m+4}{6}\biggr)^7\\&=\frac{1890}{6^7}\\&=\frac{1890}{279936}\\&=0.0067515432\end{aligned}

\displaystyle \begin{aligned}P(X_{7,6}=5)&=P(\text{exactly 1 cell is occupied})\\&=\binom{6}{5} \sum \limits_{m=0}^{1} (-1)^{m} \binom{1}{m} \biggl(1-\frac{m+5}{6}\biggr)^7\\&=\frac{6}{6^7}\\&=\frac{6}{279936}\\&=0.00002143347051\end{aligned}

Reference

1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968

7 thoughts on “A formula for the occupancy problem”

1. A very neat and interesting analysis. In theorem 2, did you yourself derive the formula for the probability that exactly j cells are empty, or has the proof been taken from an earlier source? Could you tell me the source, please, if there is one? I noticed similar reasoning for the matching problem.
Thank you.
Howard Walker

• Howard, thanks for your comment. The source is Feller’s text listed under reference. See formulas 2.3 and 2.4 in page 102 of chapter 4 of Feller.

2. Thank you, Dan, for your page!

With k < n, there are at least n-k empty cells.

I find that for j less than n-k the formula can be negative. For the remaining values of j, the distribution sums up to zero.