# Calculating order statistics using multinomial probabilities

Consider a random sample $X_1,X_2,\cdots,X_n$ drawn from a continuous distribution. Rank the sample items in increasing order, resulting in a ranked sample $Y_1 where $Y_1$ is the smallest sample item, $Y_2$ is the second smallest sample item and so on. The items in the ranked sample are called the order statistics. Recently the author of this blog was calculating a conditional probability such as $P(Y_2>4 \ | \ Y_2>3)$. One way to do this is to calculate the distribution function $P(Y_2 \le t)$. What about the probability $P(Y_5>4 \ | \ Y_2>3)$? Since this one involves two order statistics, the author of this blog initially thought that calculating $P(Y_5>4 \ | \ Y_2>3)$ would require knowing the joint probability distribution of the order statistics $Y_1,Y_2,\cdots ,Y_n$. It turns out that a joint distribution may not be needed. Instead, we can calculate a conditional probability such as $P(Y_5>4 \ | \ Y_2>3)$ using multinomial probabilities. In this post, we demonstrate how this is done using examples. Practice problems are found in here.

The calculation described here can be lengthy and tedious if the sample size is large. To make the calculation more manageable, the examples here have relatively small sample size. To keep the multinomial probabilities easier to calculate, the random samples are drawn from a uniform distribution. The calculation for larger sample sizes from other distributions is better done using a technology solution. In any case, the calculation described here is a great way to practice working with order statistics and multinomial probabilities.

________________________________________________________________________

The multinomial angle

In this post, the order statistics $Y_1 are resulted from ranking the random sample $X_1,X_2,\cdots,X_n$, which is drawn from a continuous distribution with distribution function $F(x)=P(X \le x)$. For the $j$th order statistic, the calculation often begins with its distribution function $P(Y_j \le t)$.

Here’s the thought process for calculating $P(Y_j \le t)$. In drawing the random sample $X_1,X_2,\cdots,X_n$, make a note of the items $\le t$ and the items $>t$. For the event $Y_j \le t$ to happen, there must be at least $j$ many sample items $X_i$ that are $\le t$. For the event $Y_j > t$ to happen, there can be only at most $j-1$ many sample items $X_i$ $\le t$. So to calculate $P(Y_j \le t)$, simply find out the probability of having $j$ or more sample items $\le t$. To calculate $P(Y_j > t)$, find the probability of having at most $j-1$ sample items $\le t$.

$\displaystyle P(Y_j \le t)=\sum \limits_{k=j}^n \binom{n}{k} \ \biggl[ F(t) \biggr]^k \ \biggl[1-F(x) \biggr]^{n-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

$\displaystyle P(Y_j > t)=\sum \limits_{k=0}^{j-1} \binom{n}{k} \ \biggl[ F(t) \biggr]^k \ \biggl[1-F(x) \biggr]^{n-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

Both (1) and (2) involve binomial probabilities and are discussed in this previous post. The probability of success is $F(t)=P(X \le t)$ since we are interested in how many sample items that are $\le t$. Both the calculations (1) and (2) are based on counting the number of sample items in the two intervals $\le t$ and $>t$. It turns out that when the probability that is desired involves more than one $Y_j$, we can also count the number of sample items that fall into some appropriate intervals and apply some appropriate multinomial probabilities. Let’s use an example to illustrate the idea.

Example 1
Draw a random sample $X_1,X_2,\cdots,X_{10}$ from the uniform distribution $U(0,4)$. The resulting order statistics are $Y_1. Find the following probabilities:

• $P(Y_4<2
• $P(Y_4<2

For both probabilities, the range of the distribution is broken up into 3 intervals, (0, 2), (2, 3) and (3, 4). Each sample item has probabilities $\frac{2}{4}$, $\frac{1}{4}$, $\frac{1}{4}$ of falling into these intervals, respectively. Multinomial probabilities are calculated on these 3 intervals. It is a matter of counting the numbers of sample items falling into each interval.

The first probability involves the event that there are 4 sample items in the interval (0, 2), 2 sample items in the interval (2, 3) and 4 sample items in the interval (3, 4). Thus the first probability is the following multinomial probability:

\displaystyle \begin{aligned} P(Y_4<2

For the second probability, $Y_5$ does not have to be greater than 2. Thus there could be 5 sample items less than 2. So we need to add one more case to the above probability (5 sample items to the first interval, 1 sample item to the second interval and 4 sample items to the third interval).

\displaystyle \begin{aligned} P(Y_4<2

Example 2
Draw a random sample $X_1,X_2,\cdots,X_6$ from the uniform distribution $U(0,4)$. The resulting order statistics are $Y_1. Find the probability $P(1.

In this problem the range of the distribution is broken up into 3 intervals (0, 1), (1, 3) and (3, 4). Each sample item has probabilities $\frac{1}{4}$, $\frac{2}{4}$, $\frac{1}{4}$ of falling into these intervals, respectively. Multinomial probabilities are calculated on these 3 intervals. It is a matter of counting the numbers of sample items falling into each interval. The counting is a little bit more involved here than in the previous example.

The example is to find the probability that both the second order statistic $Y_2$ and the fourth order statistic $Y_4$ fall into the interval $(1,3)$. To solve this, determine how many sample items that fall into the interval $(0,1)$, $(1,3)$ and $(3,4)$. The following points detail the counting.

• For the event $1 to happen, there can be at most 1 sample item in the interval $(0,1)$.
• For the event $Y_4<3$ to happen, there must be at least 4 sample items in the interval $(0,3)$. Thus if the interval $(0,1)$ has 1 sample item, the interval $(1,3)$ has at least 3 sample items. If the interval $(0,1)$ has no sample item, the interval $(1,3)$ has at least 4 sample items.

The following lists out all the cases that satisfy the above two bullet points. The notation $[a, b, c]$ means that $a$ sample items fall into $(0,1)$, $b$ sample items fall into the interval $(1,3)$ and $c$ sample items fall into the interval $(3,4)$. So $a+b+c=6$. Since the sample items are drawn from $U(0,4)$, the probabilities of a sample item falling into intervals $(0,1)$, $(1,3)$ and $(3,4)$ are $\frac{1}{4}$, $\frac{2}{4}$ and $\frac{1}{4}$, respectively.

[0, 4, 2]
[0, 5, 1]
[0, 6, 0]
[1, 3, 2]
[1, 4, 1]
[1, 5, 0]

\displaystyle \begin{aligned} \frac{6!}{a! \ b! \ c!} \ \biggl[\frac{1}{4} \biggr]^a \ \biggl[\frac{2}{4} \biggr]^b \ \biggl[\frac{1}{4} \biggr]^c&=\frac{6!}{0! \ 4! \ 2!} \ \biggl[\frac{1}{4} \biggr]^0 \ \biggl[\frac{2}{4} \biggr]^4 \ \biggl[\frac{1}{4} \biggr]^2=\frac{240}{4096} \\&\text{ } \\&=\frac{6!}{0! \ 5! \ 1!} \ \biggl[\frac{1}{4} \biggr]^0 \ \biggl[\frac{2}{4} \biggr]^5 \ \biggl[\frac{1}{4} \biggr]^1=\frac{192}{4096} \\&\text{ } \\&=\frac{6!}{0! \ 6! \ 0!} \ \biggl[\frac{1}{4} \biggr]^0 \ \biggl[\frac{2}{4} \biggr]^6 \ \biggl[\frac{1}{4} \biggr]^0=\frac{64}{4096} \\&\text{ } \\&=\frac{6!}{1! \ 3! \ 2!} \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{2}{4} \biggr]^3 \ \biggl[\frac{1}{4} \biggr]^2=\frac{480}{4096} \\&\text{ } \\&=\frac{6!}{1! \ 4! \ 1!} \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{2}{4} \biggr]^4 \ \biggl[\frac{1}{4} \biggr]^1=\frac{480}{4096} \\&\text{ } \\&=\frac{6!}{1! \ 5! \ 0!} \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{2}{4} \biggr]^5 \ \biggl[\frac{1}{4} \biggr]^0=\frac{192}{4096} \\&\text{ } \\&=\text{sum of probabilities }=\frac{1648}{4096}=0.4023\end{aligned}

So in randomly drawing 6 items from the uniform distribution $U(0,4)$, there is a 40% chance that the second order statistic and the fourth order statistic are between 1 and 3.

________________________________________________________________________

More examples

The method described by the above examples is this. When looking at the event described by the probability problem, the entire range of distribution is broken up into several intervals. Imagine the sample items $X_i$ are randomly being thrown into these interval (i.e. we are sampling from a uniform distribution). Then multinomial probabilities are calculated to account for all the different ways sample items can land into these intervals. The following examples further illustrate this idea.

Example 3
Draw a random sample $X_1,X_2,\cdots,X_7$ from the uniform distribution $U(0,5)$. The resulting order statistics are $Y_1. Find the following probabilities:

• $P(1
• $P(3

The range is broken up into the intervals (0, 1), (1, 3), (3, 4) and (4, 5). The sample items fall into these intervals with probabilities $\frac{1}{5}$, $\frac{2}{5}$, $\frac{1}{5}$ and $\frac{1}{5}$. The following details the counting for the event $1:

• There are no sample items in (0, 1) since $1.
• Based on $Y_1<3, there are at least one sample item and at most 3 sample items in (0, 3). Thus in the interval (1, 3), there are at least one sample item and at most 3 sample items since there are none in (0, 1).
• Based on $Y_4<4$, there are at least 4 sample items in the interval (0, 4). Thus the count in (3, 4) combines with the count in (1, 3) must be at least 4.
• The interval (4, 5) simply receives the left over sample items not in the previous intervals.

The notation $[a, b, c, d]$ lists out the counts in the 4 intervals. The following lists out all the cases described by the above 5 bullet points along with the corresponding multinomial probabilities, with two of the probabilities set up.

$\displaystyle [0, 1, 3, 3] \ \ \ \ \ \ \frac{280}{78125}=\frac{7!}{0! \ 1! \ 3! \ 3!} \ \biggl[\frac{1}{5} \biggr]^0 \ \biggl[\frac{2}{5} \biggr]^1 \ \biggl[\frac{1}{5} \biggr]^3 \ \biggl[\frac{1}{5} \biggr]^3$

$\displaystyle [0, 1, 4, 2] \ \ \ \ \ \ \frac{210}{78125}$

$\displaystyle [0, 1, 5, 1] \ \ \ \ \ \ \frac{84}{78125}$

$\displaystyle [0, 1, 6, 0] \ \ \ \ \ \ \frac{14}{78125}$

$\displaystyle [0, 2, 2, 3] \ \ \ \ \ \ \frac{840}{78125}$

$\displaystyle [0, 2, 3, 2] \ \ \ \ \ \ \frac{840}{78125}$

$\displaystyle [0, 2, 4, 1] \ \ \ \ \ \ \frac{420}{78125}$

$\displaystyle [0, 2, 5, 0] \ \ \ \ \ \ \frac{84}{78125}$

$\displaystyle [0, 3, 1, 3] \ \ \ \ \ \ \frac{1120}{78125}=\frac{7!}{0! \ 3! \ 1! \ 3!} \ \biggl[\frac{1}{5} \biggr]^0 \ \biggl[\frac{2}{5} \biggr]^3 \ \biggl[\frac{1}{5} \biggr]^1 \ \biggl[\frac{1}{5} \biggr]^3$

$\displaystyle [0, 3, 2, 2] \ \ \ \ \ \ \frac{1680}{78125}$

$\displaystyle [0, 3, 3, 1] \ \ \ \ \ \ \frac{1120}{78125}$

$\displaystyle [0, 3, 4, 0] \ \ \ \ \ \ \frac{280}{78125}$

Summing all the probabilities, $\displaystyle P(1. Out of the 78125 many different ways the 7 sample items can land into these 4 intervals, 6972 of them would satisfy the event $1.

++++++++++++++++++++++++++++++++++

We now calculate the second probability in Example 3.

$\displaystyle P(3

First calculate $P(1. The probability $P(Y_1 is the probability of having at least 1 sample item less than $t$, which is the complement of the probability of all sample items greater than $t$.

\displaystyle \begin{aligned} P(1

The event $1 can occur in 16256 ways. Out of these many ways, 6972 of these satisfy the event $1. Thus we have:

$\displaystyle P(3

Example 4
Draw a random sample $X_1,X_2,X_3,X_4,X_5$ from the uniform distribution $U(0,5)$. The resulting order statistics are $Y_1. Consider the conditional random variable $Y_4 \ | \ Y_2 >3$. For this conditional distribution, find the following:

• $P( Y_4 \le t \ | \ Y_2 >3)$
• $f_{Y_4}(t \ | \ Y_2 >3)$
• $E(Y_4 \ | \ Y_2 >3)$

where $3. Note that $f_{Y_4}(t | \ Y_2 >3)$ is the density function of $Y_4 \ | \ Y_2 >3$.

Note that

$\displaystyle P( Y_4 \le t \ | \ Y_2 >3)=\frac{P(33)}$

In finding $P(3, the range (0, 5) is broken up into 3 intervals (0, 3), (3, t) and (t, 5). The sample items fall into these intervals with probabilities $\frac{3}{5}$, $\frac{t-3}{5}$ and $\frac{5-t}{5}$.

Since $Y_2 >3$, there is at most 1 sample item in the interval (0, 3). Since $Y_4 \le t$, there are at least 4 sample items in the interval (0, t). So the count in the interval (3, t) and the count in (0, 3) should add up to 4 or more items. The following shows all the cases for the event $3 along with the corresponding multinomial probabilities.

$\displaystyle [0, 4, 1] \ \ \ \ \ \ \frac{5!}{0! \ 4! \ 1!} \ \biggl[\frac{3}{5} \biggr]^0 \ \biggl[\frac{t-3}{5} \biggr]^4 \ \biggl[\frac{5-t}{5} \biggr]^1$

$\displaystyle [0, 5, 0] \ \ \ \ \ \ \frac{5!}{0! \ 5! \ 0!} \ \biggl[\frac{3}{5} \biggr]^0 \ \biggl[\frac{t-3}{5} \biggr]^5 \ \biggl[\frac{5-t}{5} \biggr]^0$

$\displaystyle [1, 3, 1] \ \ \ \ \ \ \frac{5!}{1! \ 3! \ 1!} \ \biggl[\frac{3}{5} \biggr]^1 \ \biggl[\frac{t-3}{5} \biggr]^3 \ \biggl[\frac{5-t}{5} \biggr]^1$

$\displaystyle [1, 4, 0] \ \ \ \ \ \ \frac{5!}{1! \ 4! \ 0!} \ \biggl[\frac{3}{5} \biggr]^1 \ \biggl[\frac{t-3}{5} \biggr]^4 \ \biggl[\frac{5-t}{5} \biggr]^0$

After carrying the algebra and simplifying, we have the following:

$\displaystyle P(3

For the event $Y_2 >3$ to happen, there is at most 1 sample item less than 3. So we have:

$\displaystyle P(Y_2 >3)=\binom{5}{0} \ \biggl[\frac{3}{5} \biggr]^0 \ \biggl[\frac{2}{5} \biggr]^5 +\binom{5}{1} \ \biggl[\frac{3}{5} \biggr]^1 \ \biggl[\frac{2}{5} \biggr]^4=\frac{272}{3125}$

$\displaystyle P( Y_4 \le t \ | \ Y_2 >3)=\frac{-4t^5+25t^4+180t^3-1890t^2+5400t-5103}{272}$

Then the conditional density is obtained by differentiating $P( Y_4 \le t \ | \ Y_2 >3)$.

$\displaystyle f_{Y_4}(t \ | \ Y_2 >3)=\frac{-20t^4+100t^3+540t^2-3750t+5400}{272}$

The following gives the conditional mean $E(Y_4 \ | \ Y_2 >3)$.

\displaystyle \begin{aligned} E(Y_4 \ | \ Y_2 >3)&=\frac{1}{272} \ \int_3^5 t(-20t^4+100t^3+540t^2-3750t+5400) \ dt \\&=\frac{215}{51}=4.216 \end{aligned}

To contrast, the following gives the information on the unconditional distribution of $Y_4$.

$\displaystyle f_{Y_4}(t)=\frac{5!}{3! \ 1! \ 1!} \ \biggl[\frac{t}{5} \biggr]^3 \ \biggl[\frac{1}{5} \biggr] \ \biggl[ \frac{5-t}{5} \biggr]^1=\frac{20}{3125} \ (5t^3-t^4)$

$\displaystyle E(Y_4)=\frac{20}{3125} \ \int_0^5 t(5t^3-t^4) \ dt=\frac{10}{3}=3.33$

The unconditional mean of $Y_4$ is about 3.33. With the additional information that $Y_2 >3$, the average of $Y_4$ is now 4.2. So a higher value of $Y_2$ pulls up the mean of $Y_4$.

________________________________________________________________________

Practice problems

Practice problems to reinforce the calculation are found in the problem blog, a companion blog to this blog.

________________________________________________________________________
$\copyright \ \text{2015 by Dan Ma}$

# Splitting a Poisson Distribution

We consider a remarkable property of the Poisson distribution that has a connection to the multinomial distribution. We start with the following examples.

Example 1
Suppose that the arrivals of customers in a gift shop at an airport follow a Poisson distribution with a mean of $\alpha=5$ per 10 minutes. Furthermore, suppose that each arrival can be classified into one of three distinct types – type 1 (no purchase), type 2 (purchase under $20), and type 3 (purchase over$20). Records show that about 25% of the customers are of type 1. The percentages of type 2 and type 3 are 60% and 15%, respectively. What is the probability distribution of the number of customers per hour of each type?

Example 2
Roll a fair die $N$ times where $N$ is random and follows a Poisson distribution with parameter $\alpha$. For each $i=1,2,3,4,5,6$, let $N_i$ be the number of times the upside of the die is $i$. What is the probability distribution of each $N_i$? What is the joint distribution of $N_1,N_2,N_3,N_4,N_5,N_6$?

In Example 1, the stream of customers arrive according to a Poisson distribution. It can be shown that the stream of each type of customers also has a Poisson distribution. One way to view this example is that we can split the Poisson distribution into three Poisson distributions.

Example 2 also describes a splitting process, i.e. splitting a Poisson variable into 6 different Poisson variables. We can also view Example 2 as a multinomial distribution where the number of trials is not fixed but is random and follows a Poisson distribution. If the number of rolls of the die is fixed in Example 2 (say 10), then each $N_i$ would be a binomial distribution. Yet, with the number of trials being Poisson, each $N_i$ has a Poisson distribution with mean $\displaystyle \frac{\alpha}{6}$. In this post, we describe this Poisson splitting process in terms of a “random” multinomial distribution (the view point of Example 2).

________________________________________________________________________

Suppose we have a multinomial experiment with parameters $N$, $r$, $p_1, \cdots, p_r$, where

• $N$ is the number of multinomial trials,
• $r$ is the number of distinct possible outcomes in each trial (type 1 through type $r$),
• the $p_i$ are the probabilities of the $r$ possible outcomes in each trial.

Suppose that $N$ follows a Poisson distribution with parameter $\alpha$. For each $i=1, \cdots, r$, let $N_i$ be the number of occurrences of the $i^{th}$ type of outcomes in the $N$ trials. Then $N_1,N_2,\cdots,N_r$ are mutually independent Poisson random variables with parameters $\alpha p_1,\alpha p_2,\cdots,\alpha p_r$, respectively.

The variables $N_1,N_2,\cdots,N_r$ have a multinomial distribution and their joint probability function is:

$\displaystyle (1) \ \ \ \ P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r)=\frac{N!}{n_1! n_2! \cdots n_r!} \ p_1^{n_1} p_2^{n_2} \cdots p_r^{n_r}$

where $n_i$ are nonnegative integers such that $N=n_1+n_2+\cdots+n_r$.

Since the total number of multinomial trials $N$ is not fixed and is random, $(1)$ is not the end of the story. The probability in $(1)$ is only a conditional probability. The following is the joint probability function of $N_1,N_2,\cdots,N_r$:

$\displaystyle (2) \ \ \ \ P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r)$

\displaystyle \begin{aligned}&=P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r \lvert N=\sum \limits_{k=0}^r n_k) \\&\ \ \ \ \ \times P(N=\sum \limits_{k=0}^r n_k) \\&\text{ } \\&=\frac{(\sum \limits_{k=0}^r n_k)!}{n_1! \ n_2! \ \cdots \ n_r!} \ p_1^{n_1} \ p_2^{n_2} \ \cdots \ p_r^{n_r} \ \times \frac{e^{-\alpha} \alpha^{\sum \limits_{k=0}^r n_k}}{(\sum \limits_{k=0}^r n_k!)} \\&\text{ } \\&=\frac{e^{-\alpha p_1} \ (\alpha p_1)^{n_1}}{n_1!} \ \frac{e^{-\alpha p_2} \ (\alpha p_2)^{n_2}}{n_2!} \ \cdots \ \frac{e^{-\alpha p_r} \ (\alpha p_r)^{n_r}}{n_r!} \end{aligned}

To obtain the marginal probability function of $N_j$, $j=1,2,\cdots,r$, we sum out the other variables $N_k=n_k$ ($k \ne j$) in $(2)$ and obtain the following:

$\displaystyle (3) \ \ \ \ P(N_j=n_j)=\frac{e^{-\alpha p_j} \ (\alpha p_j)^{n_j}}{n_j!}$

Thus we can conclude that $N_j$, $j=1,2,\cdots,r$, has a Poisson distribution with parameter $\alpha p_j$. Furrthermore, the joint probability function of $N_1,N_2,\cdots,N_r$ is the product of the marginal probability functions. Thus we can conclude that $N_1,N_2,\cdots,N_r$ are mutually independent.

________________________________________________________________________
Example 1
Let $N_1,N_2,N_3$ be the number of customers per hour of type 1, type 2, and type 3, respectively. Here, we attempt to split a Poisson distribution with mean 30 per hour (based on 5 per 10 minutes). Thus $N_1,N_2,N_3$ are mutually independent Poisson variables with means $30 \times 0.25=7.5$, $30 \times 0.60=18$, $30 \times 0.15=4.5$, respectively.

Example 2
As indicated earlier, each $N_i$, $i=1,2,3,4,5,6$, has a Poisson distribution with mean $\frac{\alpha}{6}$. According to $(2)$, the joint probability function of $N_1,N_2,N_3,N_4,N_5,N_6$ is simply the product of the six marginal Poisson probability functions.

# A double application of the multinomial coefficients

Suppose there are four committees that are labeled M, I, S and P. Eleven candidates are randomly assigned to these four committees. Assume that each candidate can only be assigned to one committee. Consider the following two examples:

1. How many ways can we randomly assign 11 candidates to these 4 committees such that Committee M consists of one member, Committee I consists of four members, Committee S consists of four members and Committee P consists of two members?
2. How many ways can we randomly assign 11 candidates to these 4 committees such that one committee consists of one member, one committee consists of four members, another committee consists of four members and another committee consists of two members?

The first question is a straight application of the combinatorial technique of multinomial coefficients, namely, the number of ways to assign 11 candidates into four subgroups, one group consisting of one candidate (Committee M), one group consisting of four candidates (Committee I), one group consisting of four candidates (Committee S) and one group consisting of two candidates (Committee P). Note that the numbers in bold are used in the denominator of the multinomial coefficient below. The answer to the first question is:

$\displaystyle \frac{11!}{1! \ 4! \ 4! \ 2!}=34650$

Note that the first question is a specific example of the second. The second question is also about assigning 11 candidates into four subgroups. But in the second question, the subgroups can rotate among the four committees, requiring a double use of the multinomial coefficients, once on the 11 candidates and a second time on the four committees. In other words, the first application of the multinomial coefficients is on the 11 objects to be distributed into four subgroups and the second instance is on the grouping the four subgroups. This technique of the double applications of the multinomial coefficients is a useful one in probability and combinatorics. For example, this technique can be applied in the occupancy problem (see chapter 2 section 5 in p. 38 in [1]). Another application is for calculating the probabilities of the hands in the game of poker dice (see Example 2 below).

Example 1
This is the second question indicated above. The answer is:

$\displaystyle \frac{11!}{1! \ 4! \ 4! \ 2!} \times \frac{4!}{1! \ 2! \ 1!}=34650 \times 12=415800$

The second multinomial coefficient is 12 and is the number of ways to group 4 committees into three subgroups, one consisting of one committee (receiving one candidate), one consisting of two committees (receiving four candidates each) and one consisting of one committee (receiving two candidates). Note that the numbers in bold are used in the second multinomial coefficient above.

Example 2
Roll five fair dice. Assuming one of the dice is called die 1, another one is called die 2 and so on, the string 1,2,5,2,4 indicates the outcome that die 1 is a one, die 2 is a two and die 3 is a five and so on. There are $6^5=7776$ different outcomes. How many of these outcomes have three scores appearing in such a way that one score appears three times, and each of the other two scores appears once?

The first multinomial coefficient is from a representative outcome, for example, the string of scores, 1,1,1,2,3. We find the number of ways to assign the five scores into three subgroups, one consisting the three scores of 1, one consisting the one score of 2 and one consisting the one score of 3. Note that the numbers in bold are in the denominator of the multinomial coefficient below.

The second multinomial coefficient is from the six scores (faces) of a die. Here, we find the number of ways to assign the six faces into three groups, one consisting of three faces that do not appear, one consisting two faces (each of which appears once) and one group consisting one face that appears three times. Note that the numbers in bold are in the denominator of the multinomial coefficient below.

Multiplying these two multinomial coefficients, we obtain:

$\displaystyle \frac{5!}{3! \ 1! \ 1!} \times \frac{6!}{3! \ 2! \ 1!}=20 \times 60=1200$

Rolling five dice and obtaining a score appearing three times and two scores appearing once each is called “three of a kind” in the game of poker dice. Thus in this game, the probability of obtaining a “three of a kind” is:

$\displaystyle \frac{1200}{7776}=0.15432$

Reference

1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968

# The multinomial coefficients

Consider the following four examples of counting:

1. Consider the finite populaton $\left\{m,i,s,p\right\}$. Draw 11 letters at random from this population with replacement. The total number of 11-character letter strings that can be formed is $4^{11}$. How many of these character strings contain one m, four i’s, four s’s and two p’s?
2. What is the total number of 11-character strings that can be formed using the letters in the word Mississippi?
3. Suppose each of the questions in an 11-question multiple choice test has 4 choices (M, I, S and P). A student chooses the answers by pure guessing. How many ways can this student take the test if the random selection of the answers produces 1 M, 4 I’s, 4 S’s and 2 P’s as answers?
4. How many ways can we randomly assign 11 candidates into 4 committees such that Committee M consists of 1 member, Committee I consists of 4 members, Committee S consists of 4 members and Committee P consists of 2 members? Assume that each candidate can only be assigned to one committee.

All four examples have the same answer, namely 34,650. All these examples are about the number of ways of assigning 11 objects into four subgroups, one containing 1 object, two containing 4 objects each and one containing 2 objects, where the objects in each subgroup are considered indistinguishable. These four examples illustrate the combinatorial approach called multinomial coefficients. The multinomial coefficient (the number of ways of assigning the 11 objects in the specified manner) in these examples is:

$\displaystyle \frac{11!}{1! \ 4! \ 4! \ 2!}=34650$

In this post, I make a few observations about the combinatorics surrounding the multinomial coefficients and the multinomial theorem. The multinomial distribution is then naturally defined.

First, the multinomial coefficients. Suppose $k$ and $n$ are positive integers such that $n \le k$. Suppose we have nonnegative integers $a_1, a_2, \cdots, a_n$ such that $a_1+a_2+\cdots+a_n=k$. The following is the number of ways to partition a set of $k$ distinct objects into $n$ subgroups where the first subgroup consists of $a_1$ objects, the second subgroup consists of $a_2$ objects and so on.

$\displaystyle (0) \ \ \ \ \binom{k}{a_1, a_2, \cdots, a_n}=\frac{k!}{a_1! a_2! \cdots a_n!}$

The result $(1)$ is the result of a combinatoric observation below. The result $(2)$ is the multinomial theorem.

$\displaystyle (1) \ \ \ \ \sum_{a_1+a_2+ \cdots + a_n=k} \frac{k!}{a_1! a_2! \cdots a_n!}=n^k$
$\displaystyle (2) \ \ \ \ (x_1+x_2+ \cdots +x_n)^k=\sum_{a_1+a_2+ \cdots + a_n=k} \frac{k!}{a_1! a_2! \cdots a_n!} \ x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n}$

Discussion of Example
In drawing 11 letters at random with replacement from the set $\left\{m,i,s,p\right\}$, there are a total of $4^{11}=4194304$ many ordered samples. The following two strings are two examples:

$Mississippi \ \ \ \ \ \ \ \ \ \ \ Mississiipp$

Both of these ordered samples have one m, four i’s, four s’s and two p’s. In other words, both of these ordered samples have the same unordered sample $\left\{m,i,i,i,i,s,s,s,s,p,p\right\}$. A more compact way of notating this unordered sample is $(1,4,4,2)$, where the first coordinate is the number of m’s, the second coordinate is the number of i’s, the third coordinate is the number of s’s, and the fourth coordinate is the number of p’s. Note that the sum of the four is 11. The number of ordered samples that are tied to $(1,4,4,2)$ is the multinomial coefficient 34650 as indicated above.

Another way of denoting the unordered sample $(1,4,4,2)$ is using a string of stars and bars as follows:

$* \ | \ * \ * \ * \ * \ | \ * \ * \ * \ * \ | \ * \ *$

In the star and bar diagram, there are 11 stars (representing the 11 characters selected) and 3 bars (creating four spaces representing the letters m, i, s and p, respectively). The star and bar diagram has a combinatorial advantage. For example, how many unordered samples are there when 11 letters are selected at random with replacement from $\left\{m,i,s,p\right\}$?

In any unordered sample in our discussion, there are 3 bars and 11 stars. The stars and bars can be in any arbitary order. Thus there are $\displaystyle \binom{11+3}{11}=364$ many unordered samples. Furthermore, the equation $a_1+a_2+a_3+a_4=11$ has 364 many nonnegative integer solutions.

A related question is: how many unordered samples are there when 11 letters are selected at random with replacement from $\left\{m,i,s,p\right\}$ such that each letter is selected at least once? In the star and bar diagram, each of the four spaces has at least one star. Thus we need to arrange 7 more stars in these four spaces. Thus there are $\displaystyle \binom{7+3}{7}=120$ many ways of doing this. Furthermore, the equation $a_1+a_2+a_3+a_4=11$ has 120 many positive integer solutions.

Generalization
Suppose we sample $k$ times from the finite set $\left\{x_1,x_2,x_3, \cdots, x_n\right\}$ with replacement. There are $n^k$ many ordered samples. These ordered samples can be collapsed into

$\displaystyle (3) \ \ \ \ \binom{k+n-1}{k}=\binom{k+n-1}{n-1}$

many unordered samples (this can be seen using the star and bar diagram). Furthermore, the number of nonnegative integer solutions to the equation $a_1+a_2+ \cdots +a_n=k$ is obtained by $(3)$.

The number of ordered samples tied to the unordered sample $(a_1,a_2, \cdots, a_n)$ is:

$\displaystyle (4) \ \ \ \ \frac{k!}{a_1! a_2! \cdots a_n!} \ \ \ \ \text{(multinomial coeffcients)}$

The preceding discussion is encapsulated in equation $(1)$. The equation $(2)$ above is the statement of the multinomial theorem, which is a statement about polynomial expansion. The number of terms in the polynomial expansion is indicated by $(3)$, which is also the number of nonnegative integer solutions to the equation $a_1+a_2+ \cdots +a_n=k$. Each term $x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n}$ in the polynomial expansion can be considered as an unordered sample in the finite sampling with replacement. Thus both results of $(1)$ and $(2)$ are compact ways of describing the finite sampling with replacement from the set $\left\{x_1,x_2,x_3, \cdots, x_n\right\}$.

The number of unordered samples where each object in $\left\{x_1,x_2,x_3, \cdots, x_n\right\}$ is selected at least once is:

$\displaystyle (5) \ \ \ \ \binom{k-n+n-1}{k-n}=\binom{k-n+n-1}{n-1}=\binom{k-1}{n-1}$

As a result, $(5)$ provides the number of positive integer solutions to the equation $a_1+a_2+ \cdots +a_n=k$.

The Multinomial Distribution
Suppose that $k$ independent trials are performed where each trial has $n$ outcomes denoted by $E_1,E_2,\cdots,E_n$. Suppose that in each trial, the probability of the outcome $E_i$ is $\displaystyle p_i$. Furthermore, we have $p_1+p_2+\cdots+p_n=1$.

The result of performing the $k$ independent trials can be denoted by an ordered string such as $E_5 E_1 E_7 \cdots$, i.e. an ordered sample when sampling $k$ times from $\left\{E_1,E_2,\cdots,E_n\right\}$ with replacement. Consider the ordered samples where the outcome $E_1$ occurs $a_1$ times, the outcome $E_2$ occurs $a_2$ times and so on such that $a_1+a_2+\cdots+a_n=k$. This is a scenario that is equivalent to the four examples indicated at the beginning of the post, which is about assigning $k$ trials into $n$ subgroups, the first of which consists of $a_1$ occurrences of the outcome $E_1$, the second subgroup consists of $a_2$ occurrences of the outcome $E_2$ and so on. The multinomial coefficient $(4)$ above is the number of all such ordered samples. Thus the following is the probability that in $k$ trials, $E_1$ occurs $a_1$ times, $E_2$ occurs $a_2$ times (and so on):

$\displaystyle \frac{k!}{a_1! \ a_2! \cdots a_n!} \ p_1^{a_1} \ p_2^{a_2} \cdots p_n^{a_n}$

More formally, for each $j=1,2, \cdots, n$, let $X_j$ be the number of times the event $E_j$ occurs in the $k$ many trials. Then the joint distribution of the random variables $X_1,X_2,\cdots,X_n$ is called the multinomial distribution with parameters $k,p_1,p_2,\cdots,p_n$.

The joint probability density function (joint pdf) is given by:

$\displaystyle P(X_1=a_1,X_2=a_2, \cdots, X_n=a_n)=\frac{k!}{a_1! \ a_2! \cdots a_n!} \ p_1^{a_1} \ p_2^{a_2} \cdots p_n^{a_n}$

The multinomial distribution is so named is because of the multinomial theorem. Note that the right-hand side of the above pdf is a term in the multinomial expansion of $(p_1+p_2+\cdots+p_n)^k$. Furthermore we have:

$\displaystyle 1=(p_1+p_2+ \cdots +p_n)^k=\sum_{a_1+a_2+ \cdots + a_n=k} \frac{k!}{a_1! a_2! \cdots a_n!} \ p_1^{a_1} p_2^{a_2} \cdots p_n^{a_n}$

When there are only two categories of balls, labeled 1 (success) or 2 (failure), $X_2=k-X_1$. As a result, $X=X_1$ has a univariate distribution, which is the binomial distribution.

Reference

1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968