Calculating order statistics using multinomial probabilities

Consider a random sample X_1,X_2,\cdots,X_n drawn from a continuous distribution. Rank the sample items in increasing order, resulting in a ranked sample Y_1<Y_2<\cdots <Y_n where Y_1 is the smallest sample item, Y_2 is the second smallest sample item and so on. The items in the ranked sample are called the order statistics. Recently the author of this blog was calculating a conditional probability such as P(Y_2>4 \ | \ Y_2>3). One way to do this is to calculate the distribution function P(Y_2 \le t). What about the probability P(Y_5>4 \ | \ Y_2>3)? Since this one involves two order statistics, the author of this blog initially thought that calculating P(Y_5>4 \ | \ Y_2>3) would require knowing the joint probability distribution of the order statistics Y_1,Y_2,\cdots ,Y_n. It turns out that a joint distribution may not be needed. Instead, we can calculate a conditional probability such as P(Y_5>4 \ | \ Y_2>3) using multinomial probabilities. In this post, we demonstrate how this is done using examples. Practice problems are found in here.

The calculation described here can be lengthy and tedious if the sample size is large. To make the calculation more manageable, the examples here have relatively small sample size. To keep the multinomial probabilities easier to calculate, the random samples are drawn from a uniform distribution. The calculation for larger sample sizes from other distributions is better done using a technology solution. In any case, the calculation described here is a great way to practice working with order statistics and multinomial probabilities.

________________________________________________________________________

The multinomial angle

In this post, the order statistics Y_1<Y_2<\cdots <Y_n are resulted from ranking the random sample X_1,X_2,\cdots,X_n, which is drawn from a continuous distribution with distribution function F(x)=P(X \le x). For the jth order statistic, the calculation often begins with its distribution function P(Y_j \le t).

Here’s the thought process for calculating P(Y_j \le t). In drawing the random sample X_1,X_2,\cdots,X_n, make a note of the items \le t and the items >t. For the event Y_j \le t to happen, there must be at least j many sample items X_i that are \le t. For the event Y_j > t to happen, there can be only at most j-1 many sample items X_i \le t. So to calculate P(Y_j \le t), simply find out the probability of having j or more sample items \le t. To calculate P(Y_j > t), find the probability of having at most j-1 sample items \le t.

    \displaystyle P(Y_j \le t)=\sum \limits_{k=j}^n \binom{n}{k} \ \biggl[ F(t) \biggr]^k \ \biggl[1-F(x) \biggr]^{n-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

    \displaystyle P(Y_j > t)=\sum \limits_{k=0}^{j-1} \binom{n}{k} \ \biggl[ F(t) \biggr]^k \ \biggl[1-F(x) \biggr]^{n-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

Both (1) and (2) involve binomial probabilities and are discussed in this previous post. The probability of success is F(t)=P(X \le t) since we are interested in how many sample items that are \le t. Both the calculations (1) and (2) are based on counting the number of sample items in the two intervals \le t and >t. It turns out that when the probability that is desired involves more than one Y_j, we can also count the number of sample items that fall into some appropriate intervals and apply some appropriate multinomial probabilities. Let’s use an example to illustrate the idea.

Example 1
Draw a random sample X_1,X_2,\cdots,X_{10} from the uniform distribution U(0,4). The resulting order statistics are Y_1<Y_2< \cdots <Y_{10}. Find the following probabilities:

  • P(Y_4<2<Y_5<Y_6<3<Y_7)
  • P(Y_4<2<Y_6<3<Y_7)

For both probabilities, the range of the distribution is broken up into 3 intervals, (0, 2), (2, 3) and (3, 4). Each sample item has probabilities \frac{2}{4}, \frac{1}{4}, \frac{1}{4} of falling into these intervals, respectively. Multinomial probabilities are calculated on these 3 intervals. It is a matter of counting the numbers of sample items falling into each interval.

The first probability involves the event that there are 4 sample items in the interval (0, 2), 2 sample items in the interval (2, 3) and 4 sample items in the interval (3, 4). Thus the first probability is the following multinomial probability:

    \displaystyle \begin{aligned} P(Y_4<2<Y_5<Y_6<3<Y_7)&=\frac{10!}{4! \ 2! \ 4!} \biggl[\frac{2}{4} \biggr]^4 \ \biggl[\frac{1}{4} \biggr]^2 \ \biggl[\frac{1}{4} \biggr]^4 \\&\text{ } \\&=\frac{50400}{1048567} \\&\text{ } \\&=0.0481  \end{aligned}

For the second probability, Y_5 does not have to be greater than 2. Thus there could be 5 sample items less than 2. So we need to add one more case to the above probability (5 sample items to the first interval, 1 sample item to the second interval and 4 sample items to the third interval).

    \displaystyle \begin{aligned} P(Y_4<2<Y_6<3<Y_7)&=\frac{10!}{4! \ 2! \ 4!} \biggl[\frac{2}{4} \biggr]^4 \ \biggl[\frac{1}{4} \biggr]^2 \ \biggl[\frac{1}{4} \biggr]^4 \\& \ \ \ \ + \frac{10!}{5! \ 1! \ 4!} \biggl[\frac{2}{4} \biggr]^5 \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{1}{4} \biggr]^4 \\&\text{ } \\&=\frac{50400+40320}{1048567} \\&\text{ } \\&=\frac{90720}{1048567} \\&\text{ } \\&=0.0865  \end{aligned}

Example 2
Draw a random sample X_1,X_2,\cdots,X_6 from the uniform distribution U(0,4). The resulting order statistics are Y_1<Y_2< \cdots <Y_6. Find the probability P(1<Y_2<Y_4<3).

In this problem the range of the distribution is broken up into 3 intervals (0, 1), (1, 3) and (3, 4). Each sample item has probabilities \frac{1}{4}, \frac{2}{4}, \frac{1}{4} of falling into these intervals, respectively. Multinomial probabilities are calculated on these 3 intervals. It is a matter of counting the numbers of sample items falling into each interval. The counting is a little bit more involved here than in the previous example.

The example is to find the probability that both the second order statistic Y_2 and the fourth order statistic Y_4 fall into the interval (1,3). To solve this, determine how many sample items that fall into the interval (0,1), (1,3) and (3,4). The following points detail the counting.

  • For the event 1<Y_2 to happen, there can be at most 1 sample item in the interval (0,1).
  • For the event Y_4<3 to happen, there must be at least 4 sample items in the interval (0,3). Thus if the interval (0,1) has 1 sample item, the interval (1,3) has at least 3 sample items. If the interval (0,1) has no sample item, the interval (1,3) has at least 4 sample items.

The following lists out all the cases that satisfy the above two bullet points. The notation [a, b, c] means that a sample items fall into (0,1), b sample items fall into the interval (1,3) and c sample items fall into the interval (3,4). So a+b+c=6. Since the sample items are drawn from U(0,4), the probabilities of a sample item falling into intervals (0,1), (1,3) and (3,4) are \frac{1}{4}, \frac{2}{4} and \frac{1}{4}, respectively.

    [0, 4, 2]
    [0, 5, 1]
    [0, 6, 0]
    [1, 3, 2]
    [1, 4, 1]
    [1, 5, 0]

    \displaystyle \begin{aligned} \frac{6!}{a! \ b! \ c!} \ \biggl[\frac{1}{4} \biggr]^a \ \biggl[\frac{2}{4} \biggr]^b \ \biggl[\frac{1}{4} \biggr]^c&=\frac{6!}{0! \ 4! \ 2!} \ \biggl[\frac{1}{4} \biggr]^0 \ \biggl[\frac{2}{4} \biggr]^4 \ \biggl[\frac{1}{4} \biggr]^2=\frac{240}{4096} \\&\text{ } \\&=\frac{6!}{0! \ 5! \ 1!} \ \biggl[\frac{1}{4} \biggr]^0 \ \biggl[\frac{2}{4} \biggr]^5 \ \biggl[\frac{1}{4} \biggr]^1=\frac{192}{4096} \\&\text{ } \\&=\frac{6!}{0! \ 6! \ 0!} \ \biggl[\frac{1}{4} \biggr]^0 \ \biggl[\frac{2}{4} \biggr]^6 \ \biggl[\frac{1}{4} \biggr]^0=\frac{64}{4096} \\&\text{ } \\&=\frac{6!}{1! \ 3! \ 2!} \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{2}{4} \biggr]^3 \ \biggl[\frac{1}{4} \biggr]^2=\frac{480}{4096} \\&\text{ } \\&=\frac{6!}{1! \ 4! \ 1!} \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{2}{4} \biggr]^4 \ \biggl[\frac{1}{4} \biggr]^1=\frac{480}{4096} \\&\text{ } \\&=\frac{6!}{1! \ 5! \ 0!} \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{2}{4} \biggr]^5 \ \biggl[\frac{1}{4} \biggr]^0=\frac{192}{4096} \\&\text{ } \\&=\text{sum of probabilities }=\frac{1648}{4096}=0.4023\end{aligned}

So in randomly drawing 6 items from the uniform distribution U(0,4), there is a 40% chance that the second order statistic and the fourth order statistic are between 1 and 3.

________________________________________________________________________

More examples

The method described by the above examples is this. When looking at the event described by the probability problem, the entire range of distribution is broken up into several intervals. Imagine the sample items X_i are randomly being thrown into these interval (i.e. we are sampling from a uniform distribution). Then multinomial probabilities are calculated to account for all the different ways sample items can land into these intervals. The following examples further illustrate this idea.

Example 3
Draw a random sample X_1,X_2,\cdots,X_7 from the uniform distribution U(0,5). The resulting order statistics are Y_1<Y_2< \cdots <Y_7. Find the following probabilities:

  • P(1<Y_1<3<Y_4<4)
  • P(3<Y_4<4 \ | \ 1<Y_1<3)

The range is broken up into the intervals (0, 1), (1, 3), (3, 4) and (4, 5). The sample items fall into these intervals with probabilities \frac{1}{5}, \frac{2}{5}, \frac{1}{5} and \frac{1}{5}. The following details the counting for the event 1<Y_1<3<Y_4<4:

  • There are no sample items in (0, 1) since 1<Y_1.
  • Based on Y_1<3<Y_4, there are at least one sample item and at most 3 sample items in (0, 3). Thus in the interval (1, 3), there are at least one sample item and at most 3 sample items since there are none in (0, 1).
  • Based on Y_4<4, there are at least 4 sample items in the interval (0, 4). Thus the count in (3, 4) combines with the count in (1, 3) must be at least 4.
  • The interval (4, 5) simply receives the left over sample items not in the previous intervals.

The notation [a, b, c, d] lists out the counts in the 4 intervals. The following lists out all the cases described by the above 5 bullet points along with the corresponding multinomial probabilities, with two of the probabilities set up.

    \displaystyle [0, 1, 3, 3] \ \ \ \ \ \ \frac{280}{78125}=\frac{7!}{0! \ 1! \ 3! \ 3!} \ \biggl[\frac{1}{5} \biggr]^0 \ \biggl[\frac{2}{5} \biggr]^1 \ \biggl[\frac{1}{5} \biggr]^3 \ \biggl[\frac{1}{5} \biggr]^3

    \displaystyle [0, 1, 4, 2] \ \ \ \ \ \ \frac{210}{78125}

    \displaystyle [0, 1, 5, 1] \ \ \ \ \ \ \frac{84}{78125}

    \displaystyle [0, 1, 6, 0] \ \ \ \ \ \ \frac{14}{78125}

    \displaystyle [0, 2, 2, 3] \ \ \ \ \ \ \frac{840}{78125}

    \displaystyle [0, 2, 3, 2] \ \ \ \ \ \ \frac{840}{78125}

    \displaystyle [0, 2, 4, 1] \ \ \ \ \ \ \frac{420}{78125}

    \displaystyle [0, 2, 5, 0] \ \ \ \ \ \ \frac{84}{78125}

    \displaystyle [0, 3, 1, 3] \ \ \ \ \ \ \frac{1120}{78125}=\frac{7!}{0! \ 3! \ 1! \ 3!} \ \biggl[\frac{1}{5} \biggr]^0 \ \biggl[\frac{2}{5} \biggr]^3 \ \biggl[\frac{1}{5} \biggr]^1 \ \biggl[\frac{1}{5} \biggr]^3

    \displaystyle [0, 3, 2, 2] \ \ \ \ \ \ \frac{1680}{78125}

    \displaystyle [0, 3, 3, 1] \ \ \ \ \ \ \frac{1120}{78125}

    \displaystyle [0, 3, 4, 0] \ \ \ \ \ \ \frac{280}{78125}

Summing all the probabilities, \displaystyle P(1<Y_1<3<Y_4<4)=\frac{6972}{78125}=0.08924. Out of the 78125 many different ways the 7 sample items can land into these 4 intervals, 6972 of them would satisfy the event 1<Y_1<3<Y_4<4.

++++++++++++++++++++++++++++++++++

We now calculate the second probability in Example 3.

    \displaystyle P(3<Y_4<4 \ | \ 1<Y_1<3)=\frac{P(1<Y_1<3<Y_4<4)}{P(1<Y_1<3)}

First calculate P(1<Y_1<3)=P(Y_1<3)-P(Y_1<1). The probability P(Y_1<t) is the probability of having at least 1 sample item less than t, which is the complement of the probability of all sample items greater than t.

    \displaystyle \begin{aligned} P(1<Y_1<3)&=P(Y_1<3)-P(Y_1<1) \\&=1-\biggl( \frac{2}{5} \biggr)^7 -\biggl[1-\biggl( \frac{4}{5} \biggr)^7 \biggr] \\&=\frac{77997-61741}{78125} \\&=\frac{16256}{78125} \end{aligned}

The event 1<Y_1<3 can occur in 16256 ways. Out of these many ways, 6972 of these satisfy the event 1<Y_1<3<Y_4<4. Thus we have:

    \displaystyle P(3<Y_4<4 \ | \ 1<Y_1<3)=\frac{6972}{16256}=0.4289

Example 4
Draw a random sample X_1,X_2,X_3,X_4,X_5 from the uniform distribution U(0,5). The resulting order statistics are Y_1<Y_2<Y_3<Y_4 <Y_5. Consider the conditional random variable Y_4 \ | \ Y_2 >3. For this conditional distribution, find the following:

  • P( Y_4 \le t \ | \ Y_2 >3)
  • f_{Y_4}(t \ | \ Y_2 >3)
  • E(Y_4 \ | \ Y_2 >3)

where 3<t<5. Note that f_{Y_4}(t | \ Y_2 >3) is the density function of Y_4 \ | \ Y_2 >3.

Note that

    \displaystyle P( Y_4 \le t \ | \ Y_2 >3)=\frac{P(3<Y_2<Y_4 \le t)}{P(Y_2 >3)}

In finding P(3<Y_2<Y_4 \le t), the range (0, 5) is broken up into 3 intervals (0, 3), (3, t) and (t, 5). The sample items fall into these intervals with probabilities \frac{3}{5}, \frac{t-3}{5} and \frac{5-t}{5}.

Since Y_2 >3, there is at most 1 sample item in the interval (0, 3). Since Y_4 \le t, there are at least 4 sample items in the interval (0, t). So the count in the interval (3, t) and the count in (0, 3) should add up to 4 or more items. The following shows all the cases for the event 3<Y_2<Y_4 \le t along with the corresponding multinomial probabilities.

    \displaystyle [0, 4, 1] \ \ \ \ \ \ \frac{5!}{0! \ 4! \ 1!} \ \biggl[\frac{3}{5} \biggr]^0 \ \biggl[\frac{t-3}{5} \biggr]^4 \ \biggl[\frac{5-t}{5} \biggr]^1

    \displaystyle [0, 5, 0] \ \ \ \ \ \ \frac{5!}{0! \ 5! \ 0!} \ \biggl[\frac{3}{5} \biggr]^0 \ \biggl[\frac{t-3}{5} \biggr]^5 \ \biggl[\frac{5-t}{5} \biggr]^0

    \displaystyle [1, 3, 1] \ \ \ \ \ \ \frac{5!}{1! \ 3! \ 1!} \ \biggl[\frac{3}{5} \biggr]^1 \ \biggl[\frac{t-3}{5} \biggr]^3 \ \biggl[\frac{5-t}{5} \biggr]^1

    \displaystyle [1, 4, 0] \ \ \ \ \ \ \frac{5!}{1! \ 4! \ 0!} \ \biggl[\frac{3}{5} \biggr]^1 \ \biggl[\frac{t-3}{5} \biggr]^4 \ \biggl[\frac{5-t}{5} \biggr]^0

After carrying the algebra and simplifying, we have the following:

    \displaystyle P(3<Y_2<Y_4 \le t)=\frac{-4t^5+25t^4+180t^3-1890t^2+5400t-5103}{3125}

For the event Y_2 >3 to happen, there is at most 1 sample item less than 3. So we have:

    \displaystyle P(Y_2 >3)=\binom{5}{0} \ \biggl[\frac{3}{5} \biggr]^0 \ \biggl[\frac{2}{5} \biggr]^5 +\binom{5}{1} \ \biggl[\frac{3}{5} \biggr]^1 \ \biggl[\frac{2}{5} \biggr]^4=\frac{272}{3125}

    \displaystyle P( Y_4 \le t \ | \ Y_2 >3)=\frac{-4t^5+25t^4+180t^3-1890t^2+5400t-5103}{272}

Then the conditional density is obtained by differentiating P( Y_4 \le t \ | \ Y_2 >3).

    \displaystyle f_{Y_4}(t \ | \ Y_2 >3)=\frac{-20t^4+100t^3+540t^2-3750t+5400}{272}

The following gives the conditional mean E(Y_4 \ | \ Y_2 >3).

    \displaystyle \begin{aligned} E(Y_4 \ | \ Y_2 >3)&=\frac{1}{272} \ \int_3^5 t(-20t^4+100t^3+540t^2-3750t+5400) \ dt \\&=\frac{215}{51}=4.216 \end{aligned}

To contrast, the following gives the information on the unconditional distribution of Y_4.

    \displaystyle f_{Y_4}(t)=\frac{5!}{3! \ 1! \ 1!} \ \biggl[\frac{t}{5} \biggr]^3 \ \biggl[\frac{1}{5} \biggr] \ \biggl[ \frac{5-t}{5} \biggr]^1=\frac{20}{3125} \ (5t^3-t^4)

    \displaystyle E(Y_4)=\frac{20}{3125} \ \int_0^5 t(5t^3-t^4) \ dt=\frac{10}{3}=3.33

The unconditional mean of Y_4 is about 3.33. With the additional information that Y_2 >3, the average of Y_4 is now 4.2. So a higher value of Y_2 pulls up the mean of Y_4.

________________________________________________________________________

Practice problems

Practice problems to reinforce the calculation are found in the problem blog, a companion blog to this blog.

________________________________________________________________________
\copyright \ \text{2015 by Dan Ma}

A double application of the multinomial coefficients

Suppose there are four committees that are labeled M, I, S and P. Eleven candidates are randomly assigned to these four committees. Assume that each candidate can only be assigned to one committee. Consider the following two examples:

  1. How many ways can we randomly assign 11 candidates to these 4 committees such that Committee M consists of one member, Committee I consists of four members, Committee S consists of four members and Committee P consists of two members?
  2. How many ways can we randomly assign 11 candidates to these 4 committees such that one committee consists of one member, one committee consists of four members, another committee consists of four members and another committee consists of two members?

The first question is a straight application of the combinatorial technique of multinomial coefficients, namely, the number of ways to assign 11 candidates into four subgroups, one group consisting of one candidate (Committee M), one group consisting of four candidates (Committee I), one group consisting of four candidates (Committee S) and one group consisting of two candidates (Committee P). Note that the numbers in bold are used in the denominator of the multinomial coefficient below. The answer to the first question is:

    \displaystyle \frac{11!}{1! \ 4! \ 4! \ 2!}=34650

Note that the first question is a specific example of the second. The second question is also about assigning 11 candidates into four subgroups. But in the second question, the subgroups can rotate among the four committees, requiring a double use of the multinomial coefficients, once on the 11 candidates and a second time on the four committees. In other words, the first application of the multinomial coefficients is on the 11 objects to be distributed into four subgroups and the second instance is on the grouping the four subgroups. This technique of the double applications of the multinomial coefficients is a useful one in probability and combinatorics. For example, this technique can be applied in the occupancy problem (see chapter 2 section 5 in p. 38 in [1]). Another application is for calculating the probabilities of the hands in the game of poker dice (see Example 2 below).

Example 1
This is the second question indicated above. The answer is:

    \displaystyle \frac{11!}{1! \ 4! \ 4! \ 2!} \times \frac{4!}{1! \ 2! \ 1!}=34650 \times 12=415800

The second multinomial coefficient is 12 and is the number of ways to group 4 committees into three subgroups, one consisting of one committee (receiving one candidate), one consisting of two committees (receiving four candidates each) and one consisting of one committee (receiving two candidates). Note that the numbers in bold are used in the second multinomial coefficient above.

Example 2
Roll five fair dice. Assuming one of the dice is called die 1, another one is called die 2 and so on, the string 1,2,5,2,4 indicates the outcome that die 1 is a one, die 2 is a two and die 3 is a five and so on. There are 6^5=7776 different outcomes. How many of these outcomes have three scores appearing in such a way that one score appears three times, and each of the other two scores appears once?

The first multinomial coefficient is from a representative outcome, for example, the string of scores, 1,1,1,2,3. We find the number of ways to assign the five scores into three subgroups, one consisting the three scores of 1, one consisting the one score of 2 and one consisting the one score of 3. Note that the numbers in bold are in the denominator of the multinomial coefficient below.

The second multinomial coefficient is from the six scores (faces) of a die. Here, we find the number of ways to assign the six faces into three groups, one consisting of three faces that do not appear, one consisting two faces (each of which appears once) and one group consisting one face that appears three times. Note that the numbers in bold are in the denominator of the multinomial coefficient below.

Multiplying these two multinomial coefficients, we obtain:

    \displaystyle \frac{5!}{3! \ 1! \ 1!} \times \frac{6!}{3! \ 2! \ 1!}=20 \times 60=1200

Rolling five dice and obtaining a score appearing three times and two scores appearing once each is called “three of a kind” in the game of poker dice. Thus in this game, the probability of obtaining a “three of a kind” is:

    \displaystyle \frac{1200}{7776}=0.15432

Reference

  1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968

The multinomial coefficients

Consider the following four examples of counting:

  1. Consider the finite populaton \left\{m,i,s,p\right\}. Draw 11 letters at random from this population with replacement. The total number of 11-character letter strings that can be formed is 4^{11}. How many of these character strings contain one m, four i’s, four s’s and two p’s?
  2. What is the total number of 11-character strings that can be formed using the letters in the word Mississippi?
  3. Suppose each of the questions in an 11-question multiple choice test has 4 choices (M, I, S and P). A student chooses the answers by pure guessing. How many ways can this student take the test if the random selection of the answers produces 1 M, 4 I’s, 4 S’s and 2 P’s as answers?
  4. How many ways can we randomly assign 11 candidates into 4 committees such that Committee M consists of 1 member, Committee I consists of 4 members, Committee S consists of 4 members and Committee P consists of 2 members? Assume that each candidate can only be assigned to one committee.

All four examples have the same answer, namely 34,650. All these examples are about the number of ways of assigning 11 objects into four subgroups, one containing 1 object, two containing 4 objects each and one containing 2 objects, where the objects in each subgroup are considered indistinguishable. These four examples illustrate the combinatorial approach called multinomial coefficients. The multinomial coefficient (the number of ways of assigning the 11 objects in the specified manner) in these examples is:

    \displaystyle \frac{11!}{1! \ 4! \ 4! \ 2!}=34650

In this post, I make a few observations about the combinatorics surrounding the multinomial coefficients and the multinomial theorem. The multinomial distribution is then naturally defined.

First, the multinomial coefficients. Suppose k and n are positive integers such that n \le k. Suppose we have nonnegative integers a_1, a_2, \cdots, a_n such that a_1+a_2+\cdots+a_n=k. The following is the number of ways to partition a set of k distinct objects into n subgroups where the first subgroup consists of a_1 objects, the second subgroup consists of a_2 objects and so on.

    \displaystyle (0) \ \ \ \ \binom{k}{a_1, a_2, \cdots, a_n}=\frac{k!}{a_1! a_2! \cdots a_n!}

The result (1) is the result of a combinatoric observation below. The result (2) is the multinomial theorem.

    \displaystyle (1) \ \ \ \ \sum_{a_1+a_2+ \cdots + a_n=k} \frac{k!}{a_1! a_2! \cdots a_n!}=n^k
    \displaystyle (2) \ \ \ \ (x_1+x_2+ \cdots +x_n)^k=\sum_{a_1+a_2+ \cdots + a_n=k} \frac{k!}{a_1! a_2! \cdots a_n!} \ x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n}

Discussion of Example
In drawing 11 letters at random with replacement from the set \left\{m,i,s,p\right\}, there are a total of 4^{11}=4194304 many ordered samples. The following two strings are two examples:

    Mississippi \ \ \ \ \ \ \ \ \ \ \ Mississiipp

Both of these ordered samples have one m, four i’s, four s’s and two p’s. In other words, both of these ordered samples have the same unordered sample \left\{m,i,i,i,i,s,s,s,s,p,p\right\}. A more compact way of notating this unordered sample is (1,4,4,2), where the first coordinate is the number of m’s, the second coordinate is the number of i’s, the third coordinate is the number of s’s, and the fourth coordinate is the number of p’s. Note that the sum of the four is 11. The number of ordered samples that are tied to (1,4,4,2) is the multinomial coefficient 34650 as indicated above.

Another way of denoting the unordered sample (1,4,4,2) is using a string of stars and bars as follows:

    * \ | \ * \ * \ * \ * \ | \ * \ * \ * \ * \ | \ * \ *

In the star and bar diagram, there are 11 stars (representing the 11 characters selected) and 3 bars (creating four spaces representing the letters m, i, s and p, respectively). The star and bar diagram has a combinatorial advantage. For example, how many unordered samples are there when 11 letters are selected at random with replacement from \left\{m,i,s,p\right\}?

In any unordered sample in our discussion, there are 3 bars and 11 stars. The stars and bars can be in any arbitary order. Thus there are \displaystyle \binom{11+3}{11}=364 many unordered samples. Furthermore, the equation a_1+a_2+a_3+a_4=11 has 364 many nonnegative integer solutions.

A related question is: how many unordered samples are there when 11 letters are selected at random with replacement from \left\{m,i,s,p\right\} such that each letter is selected at least once? In the star and bar diagram, each of the four spaces has at least one star. Thus we need to arrange 7 more stars in these four spaces. Thus there are \displaystyle \binom{7+3}{7}=120 many ways of doing this. Furthermore, the equation a_1+a_2+a_3+a_4=11 has 120 many positive integer solutions.

Generalization
Suppose we sample k times from the finite set \left\{x_1,x_2,x_3, \cdots, x_n\right\} with replacement. There are n^k many ordered samples. These ordered samples can be collapsed into

    \displaystyle (3) \ \ \ \ \binom{k+n-1}{k}=\binom{k+n-1}{n-1}

many unordered samples (this can be seen using the star and bar diagram). Furthermore, the number of nonnegative integer solutions to the equation a_1+a_2+ \cdots +a_n=k is obtained by (3).

The number of ordered samples tied to the unordered sample (a_1,a_2, \cdots, a_n) is:

    \displaystyle (4) \ \ \ \ \frac{k!}{a_1! a_2! \cdots a_n!} \ \ \ \ \text{(multinomial coeffcients)}

The preceding discussion is encapsulated in equation (1). The equation (2) above is the statement of the multinomial theorem, which is a statement about polynomial expansion. The number of terms in the polynomial expansion is indicated by (3), which is also the number of nonnegative integer solutions to the equation a_1+a_2+ \cdots +a_n=k. Each term x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n} in the polynomial expansion can be considered as an unordered sample in the finite sampling with replacement. Thus both results of (1) and (2) are compact ways of describing the finite sampling with replacement from the set \left\{x_1,x_2,x_3, \cdots, x_n\right\}.

The number of unordered samples where each object in \left\{x_1,x_2,x_3, \cdots, x_n\right\} is selected at least once is:

    \displaystyle (5) \ \ \ \ \binom{k-n+n-1}{k-n}=\binom{k-n+n-1}{n-1}=\binom{k-1}{n-1}

As a result, (5) provides the number of positive integer solutions to the equation a_1+a_2+ \cdots +a_n=k.

The Multinomial Distribution
Suppose that k independent trials are performed where each trial has n outcomes denoted by E_1,E_2,\cdots,E_n. Suppose that in each trial, the probability of the outcome E_i is \displaystyle p_i. Furthermore, we have p_1+p_2+\cdots+p_n=1.

The result of performing the k independent trials can be denoted by an ordered string such as E_5 E_1 E_7 \cdots, i.e. an ordered sample when sampling k times from \left\{E_1,E_2,\cdots,E_n\right\} with replacement. Consider the ordered samples where the outcome E_1 occurs a_1 times, the outcome E_2 occurs a_2 times and so on such that a_1+a_2+\cdots+a_n=k. This is a scenario that is equivalent to the four examples indicated at the beginning of the post, which is about assigning k trials into n subgroups, the first of which consists of a_1 occurrences of the outcome E_1, the second subgroup consists of a_2 occurrences of the outcome E_2 and so on. The multinomial coefficient (4) above is the number of all such ordered samples. Thus the following is the probability that in k trials, E_1 occurs a_1 times, E_2 occurs a_2 times (and so on):

    \displaystyle \frac{k!}{a_1! \ a_2! \cdots a_n!} \ p_1^{a_1} \ p_2^{a_2} \cdots p_n^{a_n}

More formally, for each j=1,2, \cdots, n, let X_j be the number of times the event E_j occurs in the k many trials. Then the joint distribution of the random variables X_1,X_2,\cdots,X_n is called the multinomial distribution with parameters k,p_1,p_2,\cdots,p_n.

The joint probability density function (joint pdf) is given by:

    \displaystyle P(X_1=a_1,X_2=a_2, \cdots, X_n=a_n)=\frac{k!}{a_1! \ a_2! \cdots a_n!} \ p_1^{a_1} \ p_2^{a_2} \cdots p_n^{a_n}

The multinomial distribution is so named is because of the multinomial theorem. Note that the right-hand side of the above pdf is a term in the multinomial expansion of (p_1+p_2+\cdots+p_n)^k. Furthermore we have:

    \displaystyle 1=(p_1+p_2+ \cdots +p_n)^k=\sum_{a_1+a_2+ \cdots + a_n=k} \frac{k!}{a_1! a_2! \cdots a_n!} \ p_1^{a_1} p_2^{a_2} \cdots p_n^{a_n}

When there are only two categories of balls, labeled 1 (success) or 2 (failure), X_2=k-X_1. As a result, X=X_1 has a univariate distribution, which is the binomial distribution.

Reference

  1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968

The game of poker dice and the multinomial theorem

This post presents an application of the multinomial theorem and the multinomial coefficients to the game of poker dice. See the previous post The multinomial theorem for a discussion of the multinomial theorem and multinomial coefficients.

The game of poker dice is different from the standard poker game. Instead of drawing five cards from a deck of 52 cards, five fair dice are rolled. The resulting five scores from the dice form a poker dice hand. The possible hands are ranked as follows:

  • Five of a kind: one score occurs five times (e.g. 2,2,2,2,2).
  • Four of a kind: two distinct scores appear, one score occurs four times and the other score occurs one time (e.g. 2,2,2,2,5).
  • Full house: two distinct scores appear, one score occurs three times and the other score occurs two times (e.g. 3,3,3,1,1).
  • Three of a kind: three distinct scores appear, one score occurs three times, the other two scores occur one time each (e.g. 2,2,2,3,6).
  • Two pair: three distinct scores appear, two of the scores occur two times each and the other score occurs once (e.g. 4,4,1,1,5).
  • One pair: four distinct scores appear, one score occurs two times and the other three scores occur one time each (e.g. 5,5,1,2,4).
  • High card: five distinct scores occur (e.g. 2,3,5,4,1).

 
In rolling 5 dice, there are 6^5=7776 ordered outcomes. For example, assuming that the five dice are rolled one at a time, 2,3,2,6,2 indicates outcome that the first die results in a two and the second die results in a three and so on (this is a three of a kind). To find the probability of a three of a kind, we simply divide the number of ways such hands can occur by 7776. We use the multinomial coefficients to obtain the number of outcomes for each type of poker dice hands.

Rolling k dice (or rolling a die k times) can also be regarded as the occupancy problem of assigning k balls into 6 cells. As will be shown below, the problem of computing the probabilities of poker dice hands is seen through the lens of the occupancy problem of randonly placing 5 balls into 6 cells. For example, five of a kind is equivalent to all five balls being placed in different cells. Three of a kind is equivalent to three of the balls being in one cell, and the other two balls in two other different cells. For discussion on the occupancy problems in this blog, see the following posts:

In placing 5 balls into 6 cells, we use 6-tuples to indicate how many balls are in each of the 6 cells. For example, (0,3,1,0,0,1) denotes a three of a kind hand of 2,2,2,3,6 (the score of two appearing three times, the score of three appearing one time and the score of six appearing one time). Note that the 6 coordinates represent the six scores of a die (six cells) and the sum of the coordinates is 5. The 6-tuple of (3,1,1,0,0,0) is also a three of a kind hand, representing the outcome that the score of one appearing three times, the score of two appearing one time and the score of three appearing one time. We use the multinomial coefficients to determine how many of the 7776 ordered outcomes correspond to a 6-tuple such as (3,1,1,0,0,0). With respect to the occupancy problem, such 6-tuples are called occupancy number sets.

The Multinomial Theorem
For any positive integer n and any positive integer k, we have the following polynomial expansion:

    \displaystyle \biggl(x_1+x_2+ \cdots +x_n\biggr)^k=\sum \limits_{a_1+ \cdots +a_n=k} \frac{k!}{a_1! \ a_2! \cdots \ a_n!} \ \ x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n}

Remark
In addition to being a formula for polynomial expansion, there are two important interpretations of the multinomial theorem and multinomial coefficients. One is that of determining the number of ordered strings that can be formed using a set of alphabets. For example, with one m, four i's, four s's and two p's, there are \displaystyle \frac{11!}{1! \ 4! \ 4! \ 2!}=\text{34,650} possible 11-letter strings that can be formed, of which mississippi is one specific example.

Another interpretation is that of partitioning a set of distinct objects into several groupings where objects in each grouping are indistinguishable. For example, in a group of 11 candidates, how many ways can we form four committees such that the Committee 1 has only one member, Committee 2 has four members, Committee 3 has four members and Committee 4 has two members (assuming that each person can only serve in one committee)? The answer, as in above example, is \text{35,650}.

Example 1
In a random poker dice hand, what is the probability of obtaining a 4 two times, a 3 one time, a 5 one time and a 6 one time? Note that this is a specific example of the poker dice hand of one pair.

We consider the 6-tuple of (0,0,1,2,1,1). We are trying to partition 5 scores into four subgroups, one group having two identical scores of 4, one group with a score of 3, one group with a score of 5 and one group with a score of 6. Thus consider the following multinomial coefficient:

    \displaystyle \frac{5!}{1! \ 2! \ 1! \ 1!}=60

So out of \text{7,776} possible hands, 60 of them satisfiy the condition that a 4 appearing two times, a 3 appearing one time, a 5 appearing one time and a 6 appearing one time. The probability is:

    \displaystyle \frac{60}{7776}=0.0077160494

Example 2
What is the probability that one score appears two times, three other scores appear one time each in a random poker dice hand?

Here, we need to count all the possible poker dice hands of one pair. Both (0,0,1,2,1,1) and (2,1,1,1,0,0) are examples of one pair. In essense, we need to count all the occupancy number sets such that among the 6 coordinates (cells), one of the cells is a 2 and three of the cells are 1. To this end, we apply the multinomial theorem twice, one time on the five rolls of dice and one time one the 6 cells.

Consider the occupancy number set (2,1,1,1,0,0). Note that the multinomial coefficient is 60 as in Example 1 (the first application of the multinomial thoerem). Now look at the 6 coordinates of the occupancy number set (2,1,1,1,0,0). We wish to partition these 6 coordinates into three groupings, one with one 2, one with three 1's and one with two 0's. The following is the multinomial coefficient (the second application of the multinomial theorem):

    \displaystyle \frac{6!}{1! \ 3! \ 2!}=60

Thus the number of possible poker dice hands of one pair is: 60 \times 60=\text{3,600} and for a random poker dice hand, the probability that it is a one pair is:

    \displaystyle \frac{3600}{7776}=0.462962963

Remark
Example 2 provides the algorithm for computing the remaining poker dice hand probabilities. The key is to apply the multinomial coefficients twice, one time on a representative occupancy number set, the second time on the six cells (the six faces of a die in this case). Then the number of poker dice hands in question is the product of the two multinomial coefficients.

Example 3
What is the probability that a random poker dice hand is three of a kind?

Consider the occupancy number set of (3,1,1,0,0,0). The associated multinomial coefficient for the five rolls of dice is:

    \displaystyle \frac{5!}{3! \ 1! \ 1!}=20

Now partition the six cells into three groupings (one 3, two 1's, three 0's):

    \displaystyle \frac{6!}{1! \ 2! \ 3!}=60

Thus the probability that a random poker hand is three of a kind is:

    \displaystyle \frac{20 \times 60}{7776}=0.1543209877

Summary
The following are the probabilities of poker dice hands.

    \displaystyle P(\text{five of a kind})=\frac{6}{7776}

    \displaystyle P(\text{four of a kind})=\frac{150}{7776}

    \displaystyle P(\text{full house})=\frac{300}{7776}

    \displaystyle P(\text{three of a kind})=\frac{1200}{7776}

    \displaystyle P(\text{two pair})=\frac{1800}{7776}

    \displaystyle P(\text{one pair})=\frac{3600}{7776}

    \displaystyle P(\text{high card})=\frac{720}{7776}

________________________________________________________________________
\copyright  \ \text{2010 - 2015 by Dan Ma} (Revised March 28, 2015)

The multinomial theorem

The multinomial theorem is a statement about expanding a polynomial when it is raised to an arbitrary power. Rather than just stating the theorem and showing examples, we motivate the theorem by a concrete example of finite random sampling. This example demonstrates that the notion of finite sampling provides another interpretation to the multinomial thoerem. As a result, the multinomial theorem and the multinomial coefficients are useful in combinatorial techniques in finite random sampling models. The multinomial coefficients are also useful in partitioning a set of objects into several subgroups where each subgroup is made up of indistinguishable objects. See the post The game of poker dice and the multinomial theorem for an example of applications of these ideas.

________________________________________________________________________

An example

Suppose we have a finite set S with n elements where n is a positive integer. Suppose we select k elements from the population S with replacement. We consider ordered samples of size k and unordered samples of size k. To make this notion more concrete, consider the population consisting of 4 letters \lbrace{m,i,s,p}\rbrace. Suppose we sample 11 times with replacement. The following are two specific ordered samples of size 11:

    mississippi \ \ \ \ \ \ \ \ \ \ \ \ mississiipp \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

Thus ordered samples can be represented as strings of characters drawn from the population presented in the order in which the objects are drawn. In this example, there are 4^{11}=4194304 many ordered samples. Each character drawn has 4 choices and the drawing is done 11 times. In general, if the population has size n, then there are n^k many ordered samples of size k.

We now look at unordered samples obtained from sampling 11 times from the population \lbrace{m,i,s,p}\rbrace. In unordered samples, the order in which the letters are drawn is no longer important (or recorded). We only care about the number of instances each letter is selected. Thus each of the ordered samples shown above yields the same unordered sample: 1 \ m, 4 \ i's, 4 \ s's and 2 \ p's. We represent the unordered sample in two ways:

    (1,4,4,2)

    * \ | \ * \ * \ * \ * \ | \ * \ * \ * \ * \ | \ * \ *

The first representation is a 4-tuple where the coordinates are the number of m's, the number of i's, the number of s's and the number of p's in the 11 selections. Thus, the sum of the coordinates must be 11 (the total number of selections). The second representation of the unordered sample is made up of stars and bars. The three bars create 4 spaces and the stars in each space represent the number of characters drawn. We would like to point out that the “unordered” in the unordered samples refers to the fact that the order in which the objects are drawn is immaterial. However, both the 4-tuple notation and the stars and bars diagrams are ordered according to the 4 letters in the population (showing how many instances each object appears in the samples).

The star and bar diagram has a combinatorial advantage. For example, how many unordered samples are there when you draw 11 letters with replacement from the population \lbrace{m,i,s,p}\rbrace? In any unordered sample, there are 3 bars and 11 stars. The order of the stars and bars can be in any arbitrary order. Thus there are \displaystyle \binom{11+3}{11}=364 many unordered samples. In general, if the population size is n, then there are \displaystyle \binom{k+n-1}{k}=\binom{k+n-1}{n-1} many unordered samples, either represented as k-tuples or stars and bars diagrams with n-1 bars and k stars.

One more note about the number 364 calculated above. This is also the total number of non-negative integer solutions to the equation m+i+s+p=11. Thinking of an unordered sample as a 4-tuple, the sum of the 4 coordinates must be 11. This observation is important to understanding the multinomial theorem.

There is one more count associated with unordered samples. How many unordered samples are there when you draw 11 letters with replacement from the population \lbrace{m,i,s,p}\rbrace such that each letter is selected at least once? The answer is 120. Suppose that each letter is already selected once. Then we need to sample 7 more times out of these 4 letters. According to the above paragraph, the total count is \displaystyle \binom{7+3}{3}=120. To generalize, if the population size is n, there are \displaystyle \binom{k-n+n-1}{n-1}=\binom{k-1}{n-1} many unordered samples in which all objects in the population are represented in each sample.

We now tie unordered samples back to ordered samples. How many ordered samples are equivalent to the unordered sample (1,4,4,2)? Both ordered samples in (1) are equivalent to (1,4,4,2) (i.e. each letter is drawn the same number of times). In other words, how many 11-character strings can be formed using 1 \ m, 4 \ i's, 4 \ s's and 2 \ p's? The answer is:

    \displaystyle \begin{aligned}\binom{11}{1} \binom{10}{4} \binom{6}{4} \binom{2}{2}&=\displaystyle \frac{11!}{1! \ 4! \ 4! \ 2!}\\&=34650\end{aligned}

The reasoning for the above calculation is that out of 11 positions in the strings, we choose 1 position for the m, choose 4 positions for the i in the remaining 10 positions and so on.

________________________________________________________________________

Summary of the example

In sampling 11 times with replacement from the population \lbrace{m,i,s,p}\rbrace, we summarize our observations about the example.

  • The number of ordered samples is 4^{11}=4194304.
  • The number of unordered samples is \displaystyle \binom{11+3}{11}=364. Furthermore, the number of non-negative integer solutions to the equation m+i+s+p=11 is 364.
  • The number of unordered samples where each letter is selected is \displaystyle \binom{7+3}{3}=120. Furthermore, the number of positive integer solutions to the equation m+i+s+p=11 is 120.
  • For any unordered sample (a,b,c,d) where a+b+c+d=11, the total number of ordered samples equivalent to this unordered sample is \displaystyle \frac{11!}{a! \ b! \ c! \ d!}. As we shall see, these are called multinomial coefficients.

Note the interplay between the ordered samples and unordered samples. We start out with a large number of ordered samples (4^{11} many). We then collapse these ordered samples to just 364 unordered samples. We see that each unordered sample corresponds to certain number of ordered samples according to the multinomial coefficients. Thus we have the following sum where a,b,c,d are non-negative integers:

    \displaystyle \sum \limits_{a+b+c+d=11} \frac{11!}{a! \ b! \ c! \ d!}=4^{11}

________________________________________________________________________

The Multinomial Theorem

With the preceding discussion, we now state the multinomial theorem.

The Multinomial Theorem
For any positive integer n and any positive integer k, we have the following polynomial expansion:

    \displaystyle \biggl(x_1+x_2+ \cdots +x_n\biggr)^k=\sum \limits_{a_1+ \cdots +a_n=k} \frac{k!}{a_1! \ a_2! \cdots \ a_n!} \ x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n}

Remark
The same observations we make about the example apply. For example, the number of terms in the polynomial expansion is \displaystyle \binom{k+n-1}{k}, which is the number of non-negative integer solutions to the equation a_1+ \cdots +a_n=k. Each term x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n} in the polynomial expansion can be considered as an unordered sample in the finite sampling with replacement. Then the coefficient of each term (called multinomial coefficient) is the number of associated ordered samples. As a result, the multinomial coefficients sum to n^k.

We conclude with two interpretations of the multinomial coefficient.

    \displaystyle \frac{k!}{a_1! \ a_2! \cdots \ a_n!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

If we have n many distinct symbols (say x_1,x_2, \cdots, x_n) and we have a_1 many x_1, a_2 many x_2 and so on, then the multinomial coefficient in (2) is the number of k-length character strings that can be formed using the available symbols.

Another interpretation is that of partitioning a set of k objects into n subgroups where the objects in each subgroup are indistinguishable.

Both interpretations are one and the same, just looking at the same result in a different angle. For example, all three of the following yield the same answer: \text{34,650}. We have 11 letters (one m, four i's, four s's and two p's), how many character strings can be formed with these letters?

On the other hand, we have 11 identical candies randomly distributed to Marcus, Issac, Samantha and Paul. How many ordered samples will result if Marcus receives one candy, Issac receives four candies, Samantha receives four candies and Paul receives two candies? Here, we are trying to partition a set of 11 objects into four subgroups where one group has one element, two of the groups have four elements each and another group has two elements.

If we have 11 candidates for forming four distinct committees where one committee has one member, two of the committees have four members each and another one has two members. In how many ways can this be done?

Reference

  1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968

________________________________________________________________________
Revised March 28, 2015
\copyright \ \text{2010 - 2015 by Dan Ma}