The occupancy problem

The occupancy problem in probability theory is about the problem of randomly assigning a set of balls into a group of cells. In randomly placing k balls into n cells, what is the probability that exactly j cells are occupied where j ranges from 1 through n? What is the expected number of occupied cells (or expected number of empty cells)? We demonstrate the algorithm for answering these questions. Specifically we illustrate with the example of randomly placing k=7 balls into n=6 cells. See chapter 2 in the volume I of Feller in [1] for more information on the occupancy problem.

Our example of placing 7 balls into n=6 cells can also be presented as rolling a fair die 7 times. Each roll of the die is essentially placing a ball into one of the 6 cells. Thus the rolls of the die can be considered balls and the faces of the die can be considered cells. Each roll of the die can also be viewed as selecting one face out of the 6 faces (think of an urn with 6 balls labeled one through six). Thus, we can view the occupancy problem of randomly assigning k balls into n cells as selecting k objects with replacement out of n distinct objects.

The problem stated at the beginning can now be rephrased using a die. In rolling a die 7 times, what is the probability of j faces appearing? What is the average number of faces appearing in 7 rolls of a die?

In rolling a fair die 7 times, how likely is it that all six of the faces appear? We will see that this occurs about 5% of the time. So in rolling a die 7 times (in randomly placing 7 balls into 6 cells), it is likely that some face will appear more than once (some cells receiving more than one ball). What is the probability of having all 7 rolls showing the same face? This is extremely unlikely, about one out of 46656. So we likely get multiple faces in the 7 rolls of a die but probably not all 6 faces. We will see that on average we get about four faces.

There are n^k distinct ways of randomly assigning k balls into n cells if we consider the balls as distinct. Thus rolling a die two times generates 36 outcomes if we distinguish the first roll from the second. For example, obtaining a one on the first roll and a two on the second roll is different from getting a two on the first and a one on the second. In rolling a die 7 times, there are 6^7=279936 outcomes when the order of the rolls is important. For example, 3,5,6,4,6,5,6 is an ordering of the 7 rolls of the die indicating the face appearing on each roll (the first roll is 3, the second roll is a 5, the seventh roll is a 6 and so on). In this particular ordering, four faces appear in these 7 rolls of the die (exactly 4 cells are occupied). How many of the 6^7=279936 orderings satisfy the condition that four faces appear (four cells are occupied)? Once we have the answer to this question, the probability that exactly 4 faces appear is the number of such orderings times \displaystyle \frac{1}{6^7}.

To answer the question of how many of the 279936 ordering of the 7 rolls have exactly 4 faces, we consider the notion of occupancy number sets and occupancy numbers. For the ordering of 3,5,6,4,6,5,6, consider the 6-tuple (0,0,1,1,2,3). This indicates how many rolls result in each face (how many balls are placed in each of the 6 cells). Note that the two zeros indicate that face 1 and face 2 do not appear. Any such 6-tuple is called an occupancy number set. The numbers that form an occupancy number set are called occupancy numbers. Note that in an occupancy number set, the ordering of the cells is important. However, the order of the balls within each cell is not important.

For the general case of placing k balls into n cells, any n \text{-}tuple of nonnegative integers (k_1,k_2, \cdots, k_n) such that k_1+k_2+ \cdots +k_n=k is said to be an occupancy number set. The numbers k_1,k_2, \cdots, k_n are said to be occupancy numbers. Note that k_i is the number of balls in the i^{th} cell.

Now we would like to calculate the number of orderings of the 7 rolls of the die that are aggregated by the occupancy number set (0,0,1,1,2,3). We need to find a way to divide the 7 rolls of the die into four subgroups, one group consisting of 1 roll of three, one group consisting of 1 roll of four, one group consisting 2 rolls of five and one group consisting of 3 rolls of six. We capture the idea in the following theorem.

Theorem
Suppose we have N distinct objects. Let C_1,C_2, \cdots, C_M be nonnegative integers such that C_1+C_2+ \cdots +C_M=N. Then the following is the total number of ways of dividing the N objects into M subgroups where the first group contains C_1 elements, the second group contains C_2 elements and so on.

\displaystyle \frac{N!}{C_1! \thinspace C_2! \cdots C_M!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (0)

The above numbers in (0) are called multinomial coefficients since they are the coefficients in the following expansion:

\displaystyle (X_1+X_2+ \cdots + X_M)^{N}=\sum \limits_{C_1+ \cdots +C_M} \frac{N!}{C_1! \thinspace C_2! \cdots C_M!} \ \ X_1^{C_1} X_2^{C_2} \cdots X_M^{C_M}

This theorem shows us how to divide the 7 rolls of the die into 4 subgroups, one consisting of 1 roll of three, one consisting of 1 roll of four, one consisting 2 rolls of five and one consisting of 3 rolls of six. There are 420 possible orderings. Thus we have:

\displaystyle \frac{7!}{1! \medspace 1! \medspace 2! \medspace 3!}=420

The occupancy numbers help us sum out the order of the balls within the cells. However the order of the cells still remain in the occupancy numbers. So we need to sum out the order of the cells. Let’s look at the occupancy number set (0,0,1,1,2,3) again, which is a specific case of having two faces not appearing, one face appearing once, another face appearing once, another face appearing twice and another face appearing three times. How many of the 6^7=279936 orderings from 7 rolls of a die satisfy this condition? In other words, we wish to divide the 6 faces into 4 subgroups, one group consisting of two faces (that do not appear), one group consisting of two faces (each face appearing once), one group consisting of one face (appearing two times) and one group consisting of one face (appearing three times). This is another opportunity to apply the above theorem. There are 180 ways for this to happen.

\displaystyle \frac{6!}{2! \medspace 2! \medspace 1! \medspace 1!}=180

Thus there are 420 \times 180=75600 differents orderings of the 7 rolls of the die generated from the occupancy number set (0,0,1,1,2,3). Out of 6^7=279936 different outcomes from rolling a die 7 times, 75600 of these are from the event that two faces do not appear, two faces appearing once each, one face appearing two times and one face appearing three times.

How about the total number of outcomes for the event that exactly four faces appear (exactly two empty cells)? If we fix the first two cells empty (face 1 and face 2 do not appear), we only need to consider the following three occupancy number sets. The algorithm is to apply the above theorem to each of these occupancy number set twice, once on the 7 rolls (balls) and once on the 6 faces (cells). We then multiply the two counts and obtain the count for each occupancy number set. Then we sum the counts for the three occupancy number sets.

\displaystyle (0,0,1,1,1,4) \ \ \ (0,0,1,1,2,3) \ \ \ (0,0,1,2,2,2)

Solution
In rolling a fair die k=7 times, we let X be the number of faces that appear (the number of cells that are occupied). Then P(X=j) is the probability that exactly j faces appear in rolling a fair die 7 times. We now compute the probability P(X=j) where j=1,2,3,4,5,6. We first compute the count in each case.

Case 1. Exactly one face appears.
This is the case that all 7 rolls result in the same face (all 7 balls go into one cell). We only need to consider the occupancy number set (0,0,0,0,0,7). We apply the thoerem twice, the first time on the faces (cells) and the second time on the rolls (balls). This is indeed a rare case (6 out of 279936 possibilities).

\displaystyle \biggl(\frac{6!}{5! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{7!}\biggr)=6

Case 2. Exactly two faces appear.
This is the case that two faces appear in 7 rolls of the die (7 balls go into two cells). We need to consider the following three occupancy number sets:

(0,0,0,0,1,6) \ \ \ (0,0,0,0,2,5) \ \ \ (0,0,0,0,3,4)

We apply the thoerem twice on each of these occupancy number set, the first time on the cells and the second time on the balls.

\displaystyle \biggl(\frac{6!}{4! \thinspace 1! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{1! \thinspace 6!}\biggr)+\biggl(\frac{6!}{4! \thinspace 1! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{2! \thinspace 5!}\biggr)+\biggl(\frac{6!}{4! \thinspace 1! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{3! \thinspace 4!}\biggr)=1890

Case 3. Exactly three faces appear.
This is the case that the three faces appear in 7 rolls of the die (7 balls go into three cells). We need to consider the following four occupancy number sets:

(0,0,0,1,1,5) \ \ \ (0,0,0,1,2,4) \ \ \ (0,0,0,1,3,3) \ \ \ (0,0,0,2,2,3)

We apply the thoerem twice on each of these occupancy number set, the first time on the cells and the second time on the balls.

\displaystyle \biggl(\frac{6!}{3! \thinspace 2! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{1! \thinspace 1! \thinspace 5!}\biggr)+\biggl(\frac{6!}{3! \thinspace 1! \thinspace 1! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{1! \thinspace 2! \thinspace 4!}\biggr)
\displaystyle + \ \ \ \  \biggl(\frac{6!}{3! \thinspace 1! \thinspace 2!}\biggr) \medspace \biggl(\frac{7!}{1! \thinspace 3! \thinspace 3!}\biggr)+ \biggl(\frac{6!}{3! \thinspace 2! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{2! \thinspace 2! \thinspace 3!}\biggr)=36120

Case 4. Exactly four faces appear.
This is the case that the four faces appear in the 7 rolls of the die (7 balls go into four cells). We need to consider the following three occupancy number sets:

(0,0,1,1,1,4) \ \ \ (0,0,1,1,2,3) \ \ \ (0,0,1,2,2,2)

We apply the thoerem twice on each of these occupancy number set, the first time on the cells and the second time on the balls. Note that the middle set is used as the illustration in the above discussion.

\displaystyle \biggl(\frac{6!}{2! \thinspace 3! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{1! \thinspace 1! \thinspace 1! \thinspace 4!}\biggr)+\biggl(\frac{6!}{2! \thinspace 2! \thinspace 1! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{1! \thinspace 1! \thinspace 2! \thinspace 3!}\biggr)
\displaystyle + \ \ \ \  \biggl(\frac{6!}{2! \thinspace 1! \thinspace 3!}\biggr) \medspace \biggl(\frac{7!}{1! \thinspace 2! \thinspace 2! \thinspace 2!}\biggr)=126000

Case 5. Exactly five faces appear.
This is the case that the five faces appear in the 7 rolls of the die (7 balls go into five cells), leaving two cells empty. We need to consider the following two occupancy number sets:

(0,1,1,1,1,3) \ \ \ (0,1,1,1,2,2)

We apply the thoerem twice on each of these occupancy number set, the first time on the cells and the second time on the balls.

\displaystyle \biggl(\frac{6!}{1! \thinspace 4! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{1! \thinspace 1! \thinspace 1! \thinspace 1! \thinspace 3!}\biggr)+\biggl(\frac{6!}{1! \thinspace 3! \thinspace 2!}\biggr) \medspace \biggl(\frac{7!}{1! \thinspace 1! \thinspace 1! \thinspace 2! \thinspace 2!}\biggr)=100800

Case 6. Exactly six faces appear.
This is the case that all 6 faces appear in the 7 rolls of the die (all cells are occupied). We only need to consider the occupancy number set (1,1,1,1,1,2). We apply the thoerem twice, the first time on the cells and the second time on the balls.

\displaystyle \biggl(\frac{6!}{5! \thinspace 1!}\biggr) \medspace \biggl(\frac{7!}{1! \thinspace 1! \thinspace 1! \thinspace 1! \thinspace 1! 2!}\biggr)=15120

Note that the sum of all 6 counts is 6^7=279936. Now we have the following probabilities:

\displaystyle P(X=1)=\frac{6}{6^7}=0.00002143347

\displaystyle P(X=2)=\frac{1890}{6^7}=0.0067515

\displaystyle P(X=3)=\frac{36120}{6^7}=0.129029

\displaystyle P(X=4)=\frac{126000}{6^7}=0.450103

\displaystyle P(X=5)=\frac{100800}{6^7}=0.36008

\displaystyle P(X=6)=\frac{15120}{6^7}=0.0540

On 7 rolls of a fair die, on average we can expect about 4 faces showing up. The following shows the calculation of the expected number of occupied cells:

\displaystyle E[X]=1 \times P(X=1)+2 \times P(X=2) + 3 \times P(X=3)

\displaystyle + \ \ \ 4 \times P(X=4)+5 \times P(X=5)+6 \times P(X=6)=\frac{1210866}{6^7}=4.325501

More About Occupancy Numbers
In randomly placing k balls into n cells, what is the total number of all possible occupancy number sets? What is the total number of occupancy number sets such that all cells are occupied (assuming there are more balls than cells)? The following are the answers:

Number of occupancy number sets
\displaystyle \binom{k+n-1}{k}=\binom{k+n-1}{n-1} \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

Number of occupancy number sets with all cells occupied
\displaystyle \binom{k-1}{n-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

Consider our example of rolling a die 7 times (k=7 and n=6). Consider the occupancy number set (1,0,1,0,2,3). We can represent this occupancy number set with bars and stars as follows:

\displaystyle | \ \ * \ \ | \ \ | \ \ * \ \ | \ \ | \ \ * \ \ * \ \ | \ \ * \ \ * \ \ * \ \ |

Balls are represented by * and the space between two bars is a cell. There are n+1 bars creating n spaces representing n cells. Such a representation begins with a bar and ends with a bar. In between the two bars on the ends, there are n-1 bars and k stars. Thus the number of occupancy number sets is as indicated above.

For the case of all cells are occupied by at least one ball, first we place one ball into each cell. Then there remains k-n balls. We then place the remaining k-n balls into n cells. Using (1), we have:

\displaystyle \binom{k-n+n-1}{n-1}=\binom{k-1}{n-1}

In our example of rolling a die 7 times, there are a total of 6^7=279936 possible orderings of the 7 rolls. There are \displaystyle \binom{7+5}{5}=792 occupancy number sets, of which \displaystyle \binom{7-1}{6-1}=6 have all cells occupied.

Reference

  1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968