The birthday problem

Consider this random experiment. You ask people (one at a time) of their birthdays (month and day only). The process continues until there is a repeat in the series of birthdays, in other words, until two people you’ve surveyed share the same birthday. How many people you have to ask before finding a repeat? What is the probability that you will have to ask k people before finding a repeat? What is the median number of people you have to ask? In this post, we discuss this random variable and how this random variable relates to the birthday problem.

In the problem at hand, we ignore leap year and assume that each of the 365 days in the year is equally likely to be the birthday for a randomly chosen person. The birthday problem is typically the following question. How many people do we need to choose in order to have a 50% or better chance of having a common birthday among the selected individuals?

The random experiment stated at the beginning can be restated as follows. Suppose that balls are randomly thrown (one at a time) into n cells (e.g. boxes). The random process continues until a ball is thrown into a cell that already has one ball (i.e. until a repeat occurs). Let X_n be the number of balls that are required to obtain a repeat. Some of the problems we discuss include the mean (the average number of balls that are to be thrown to get a repeat) and the probability function. We will also show how this random variable is linked to the birthday problem when n=365.

___________________________________________________________________________

The Birthday Problem

First, we start with the birthday problem. The key is to derive the probability that in a group of k randomly selected people, there are at least two who share the same birthday. It is easier to do the complement – the probability of having different birthdays among the group of k people. We call this probability p_k.

    \displaystyle \begin{aligned} p_k&=\frac{365}{365} \ \frac{364}{365} \ \frac{363}{365} \cdots \frac{365-(k-1)}{365} \\&=\frac{364}{365} \ \frac{363}{365} \cdots \frac{365-(k-1)}{365} \\&=\biggl[1-\frac{1}{365} \biggr] \ \biggl[1-\frac{2}{365} \biggr] \cdots \biggl[1-\frac{k-1}{365} \biggr] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1) \end{aligned}

where k=2,3,4,\cdots,365. The reasoning for the first line is that there are 365 choices to be picked in the first selection. Each subsequent random selection has to avoid the previous birthday, thus 364 choices for the second person and only 365-(k-1) choices for the kth person.

To answer the birthday problem, just plug in values of k to compute p_k and 1-p_k until reaching the smallest k such that p_k<0.5 and 1-p_k>0.5. The calculation should be done using software (Excel for example). The smallest k is 23 since

    p_{23}= 0.492702766

    1-p_{23}= 0.507297234

In a random group of 23 people, there is a less than 50% chance of having distinct birthdays, and thus a more than 50% chance of having at least one identical birthday. This may be a surprising result. Without the benefit of formula (1), some people may think that it will take a larger sample to obtain a repeat.

The benefit of (1) extends beyond the birthday problem. Let’s consider the case for n, i.e. randomly pick numbers from the set \left\{1,2,3,\cdots,n-1,n \right\} with replacement until a number is chosen twice (until a repeat occurs). Similarly, let p_{n,k} be the probability that in k draws all chosen numbers are distinct. The probability p_{n,k} is obtained by replacing 365 with n.

    \displaystyle \begin{aligned} p_{n,k}&=\frac{n}{n} \ \frac{n-1}{n} \ \frac{n-2}{n} \cdots \frac{n-(k-1)}{n} \\&=\frac{n-1}{n} \ \frac{n-2}{n} \cdots \frac{n-(k-1)}{n} \\&=\biggl[1-\frac{1}{n} \biggr] \ \biggl[1-\frac{2}{n} \biggr] \cdots \biggl[1-\frac{k-1}{n} \biggr] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2) \end{aligned}

Formula (2) will be useful in the next section.

___________________________________________________________________________

The Random Variable

We now look into the random variable discussed at the beginning, either the one for picking people at random until a repeated birthday or throwing balls into cells until one cell has two balls. To illustrate the idea, let’s look at an example.

Example 1
Roll a fair die until obtaining a repeated face value. Let X_6 be the number of rolls to obtain the repeated value. Find the probability function P[X_6=k] where k=2,3,4,5,6,7.

Note that P[X_6=k] is the probability that it will take k rolls to get a repeated die value.

    \displaystyle P(X_6=2)=\frac{6}{6} \times \frac{1}{6}=\frac{1}{6}

    \displaystyle P(X_6=3)=\frac{6}{6} \times \frac{5}{6} \times \frac{2}{6}=\frac{10}{6^2}

    \displaystyle P(X_6=4)=\frac{6}{6} \times \frac{5}{6} \times \frac{4}{6} \times \frac{3}{6}=\frac{60}{6^3}

    \displaystyle P(X_6=5)=\frac{6}{6} \times \frac{5}{6} \times \frac{4}{6} \times \frac{3}{6} \times \frac{4}{6}=\frac{240}{6^4}

    \displaystyle P(X_6=6)=\frac{6}{6} \times \frac{5}{6} \times \frac{4}{6} \times \frac{3}{6} \times \frac{2}{6} \times \frac{5}{6}=\frac{600}{6^5}

    \displaystyle P(X_6=7)=\frac{6}{6} \times \frac{5}{6} \times \frac{4}{6} \times \frac{3}{6} \times \frac{2}{6} \times \frac{1}{6} \times \frac{6}{6}=\frac{720}{6^6}

To get a repeat in 2 rolls, there are 6 choices for the first roll and the second has only one choice – the value of the first roll. To get a repeat in 3 rolls, there are 6 choices for the first roll, 5 choices for the second roll and the third roll must be out of the 2 previous two distinct values. The idea is that the first k-1 rolls are distinct and the last roll must be one of the previous values. \square

The reasoning process leads nicely to the general case. In the general case, let’s consider the occupancy interpretation. In throwing balls into n cells, let X_n be defined as above, i.e. the number of balls that are required to obtain a repeat. The following gives the probability P[X_n=k].

    \displaystyle \begin{aligned} P[X_n=k]&=\frac{n}{n} \times \frac{n-1}{n} \times \cdots \times \frac{n-(k-2)}{n} \times \frac{k-1}{n} \\&=\frac{n-1}{n} \times \cdots \times \frac{n-(k-2)}{n} \times \frac{k-1}{n} \\&=\frac{(n-1) \times (n-2) \times \cdots \times (n-(k-2)) \times (k-1)}{n^{k-1}} \\&=\biggl[1-\frac{1}{n} \biggr] \times \biggl[1-\frac{2}{n} \biggr] \times \cdots \times \biggl[1-\frac{k-2}{n}\biggr] \times \frac{k-1}{n}  \ \ \ \ \ \ \ (3) \end{aligned}

where k=2,3,\cdots,n+1.

The reasoning is similar to Example 1. To get a repeat in throwing k balls, the first k-1 balls must be go into different cells while the last ball would go into one of the k-1 occupied cells. For the first k-1 balls to go into different cells, there are n (n-1) \cdots (n-(k-2)) ways. There are k-1 cells for the last ball to land. Thus the product of these two quantities is in the numerator of (3).

Once the probability function (3) is obtained, the mean E[X_n] can be derived accordingly. For the case of n=365, E[X_{365}]=24.62 (calculated by programming the probability function in Excel). On average it will be required to sample about 25 people to obtain a repeated birthday.

Another interesting quantity is P[X_n>k]. This is the probability that it will take throwing more than k balls to get a repeat. Mathematically this can be obtained by first calculating P[X_n \le k] by summing the individual probabilities via (3). This is a workable approach using software. There is another way that is more informative. For the event X_n>k to happen, the first k throws must be in different cells (no repeat). The event X_n>k is identical to the event that there is no repeat in the first k throws of balls. This is how the random variable X_n is linked to the birthday problem since the probability P[X_n>k] should agree with the probability p_{n,k} in (2).

    \displaystyle P[X_n>k]=\biggl[1-\frac{1}{n} \biggr] \ \biggl[1-\frac{2}{n} \biggr] \cdots \biggl[1-\frac{k-1}{n} \biggr] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)

___________________________________________________________________________

Back to The Birthday Problem

Consider the case for X_{365}. What is the median of X_{365}? That would be the median number of people surveyed to obtain a pair of identical birthday. The median of X_{365} would be the least k such that P[X_{365} \le k] is at least 0.5. Note that P[X_{365}>k] is identical to p_k in (1). The above calculation shows that p_{23}=0.4927 and 1-p_{23}=0.5073. Thus the median of X_{365} is 23. Thus when performing the random sampling of surveying birthday, about half of the times you can expect to survey 23 or fewer than 23 people.

The birthday problem is equivalently about finding the median of the random variable X_{365}. A little more broadly, the birthday problem is connected to the percentiles of the variable X_{n}. In contrast, the mean of X_{365} is E[X_{365}]=24.62. The following lists several percentiles for the random variable X_{365}.

    \begin{array}{ccccccc}    k &  \text{ } & P[X_{365}>k]  & \text{ } & P[X_{365} \le k] & \text{ } & \text{Percentile} \\    \text{ } &   \text{ } & \text{ } & \text{ } & \text{ } &  \\   14 &   \text{ } & 0.77690 &  & 0.22310 &  \\   15 &   \text{ } & 0.74710 &  & 0.25290 & \text{ } & \text{25th Percentile} \\   16 &   \text{ } & 0.71640 &  & 0.28360 &  \\  \text{ } &   \text{ } & \text{ } & \text{ } & \text{ } &  \\   22 &   \text{ } & 0.52430 &  & 0.47570 &  \\   23 &   \text{ } & 0.49270 &  & 0.50730 & \text{ } & \text{50th Percentile}  \\   24 &   \text{ } & 0.46166 &  & 0.53834 &  \\  \text{ } &   \text{ } & \text{ } & \text{ } & \text{ } &  \\   31 &   \text{ } & 0.26955 &  & 0.73045 &  \\   32 &   \text{ } & 0.24665 &  & 0.75335 & \text{ } & \text{75th Percentile} \\         33 &   \text{ } & 0.22503 &  & 0.77497 &  \\  \text{ } &   \text{ } & \text{ } & \text{ } & \text{ } &  \\   40 &   \text{ } & 0.10877 &  & 0.89123 &  \\   41 &   \text{ } & 0.09685 &  & 0.90315 & \text{ } & \text{90th Percentile} \\   42 &   \text{ } & 0.08597 &  & 0.91403 &  \\  \text{ } &   \text{ } & \text{ } & \text{ } & \text{ } &  \\   46 &   \text{ } & 0.05175 &  & 0.94825 &  \\   47 &   \text{ } & 0.04523 &  & 0.95477 & \text{ } & \text{95th Percentile} \\    48 &  \text{ } & 0.03940 &  & 0.96060 &  \\  \text{ } &   \text{ } & \text{ } & \text{ } & \text{ } &  \\    56 &  \text{ } & 0.01167 &  & 0.98833 &  \\  57 &  \text{ } & 0.00988 &  & 0.99012 & \text{ } & \text{99th Percentile} \\  58 &  \text{ } & 0.00834 &  & 0.99166 &  \\      \end{array}

It is clear that in a group of 366 people, it is certain that there will be at least one repeated birthday (again ignoring leap year). This is due to the pigeon hole principle. As the percentiles in the above table shows, you do not need to survey anywhere close to 366 to get a repeat. The median is 23 as discussed. The 75th percentile of X_{365} is 32.

The preceding calculation shows that you do not need a large group to have a repeated birthday. About 50% of the times, you will survey 23 or fewer people, about 75% of the time, 32 or fewer people, About 99% of the time, you will survey 57 or fewer people, much fewer than 365 or 366. So with around 50 in a random group, there is a near certainty of finding a shared birthday. In a random group of 100 people, there should be an almost absolute certainty that there is a shared birthday.

For a further demonstration, we simulated the random variable X_{365} 10,000 times. The range of the simulated values is 2 to 78. Thus the odds for 100 people to survey is smaller than 1 in 10,000. To get a simulated value of 100, we will have to simulate more than 10,000 values of X_{365}. The median of the 10,000 simulated results is 23. The following table summarizes the results.

    \begin{array}{rrrrrr}    \text{Interval} &  \text{ } & \text{Count}  & \text{ } & \text{Proportion} & \\    \text{ } &   \text{ } & \text{ } & \text{ } & \text{ } &  \\   \text{2 to 9} &   \text{ } &953 &  & 0.09530 &  \\   \text{10 to 19} &   \text{ } & 2870 &  & 0.28700 &  \\   \text{20 to 29} &   \text{ } & 3055 &  & 0.30550 &  \\     \text{30 to 39} &   \text{ } & 1922 &  & 0.19220 &  \\   \text{40 to 49} &   \text{ } & 886 &  & 0.08860 &   \\   \text{50 to 59} &   \text{ } & 253 &  & 0.02530 &  \\     \text{60 to 69} &   \text{ } & 48 &  & 0.00480 &  \\   \text{70 to 78} &   \text{ } & 13 &  & 0.00130 &  \\              \end{array}

Not shown in the table is that 33 of the simulated results are the value of 2. Thus it is possible to ask two people and they both have the same birthday. But the odds of that happening is 33 out of 10,000 according to this particular set of simulations (probability 0.0033). The theoretical probability of 2 is 1/365 = 0.002739726. There are 2 instances of 78 in the 10,000 simulated values. Thus the odds are 2 out of 10,000 with probability 0.0002. The theoretical probability is 0.000037 using (3).

___________________________________________________________________________
\copyright \ 2017 \text{ by Dan Ma}

Every character is known but password is hard to crack

In this post, we discusses an example in which you are given a password (every character of it) and yet it is still very hard (or even impossible) to crack. Anyone who understands this example has a solid understanding of the binomial distribution. Here’s the example:

    Your friend John tells you that the password to his online bank account has 26 characters. The first character is the first letter in the English alphabet, the second character is the second letter in the English alphabet, the third character is the third letter in the English alphabet and so on.

    Now that your friend John has given you the key to his account, does that mean you can log onto his account to find out how much money he has, or to make financial transaction on his behalf or to enrich yourself?

    If this example sounds too good to be true, what is the catch?

Even though every character in the 26-character password is known, it is indeed a very strong password. How could this be? You may want to stop here and think about.

Indeed, if every character in John’s password is lower case or if every character is upper case, then his bank account is toast. But John’s password can be made very strong and very hard to crack if the password is case sensitive. The password given by John is not just one password, but is a large collection of passwords. In fact, there are over 67 millions possible passwords (67,108,864 to be exact). The following are two of the most obvious ones.

    a b c d e f g h i j k l m n o p q r s t u v w x y z (all lower case)

    A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (all upper case)

The following is another possible password. If this is the one John uses, it will be difficult to crack.

    a b C d e f G h i j k l M N o p Q R s T U v w X Y z (10 upper case letters)

Here’s another possibility.

    A B c D E f G H I J k l M N o P q R S t u v w X y z (14 upper case letters)

Each character in the password has two possibilities – lower case or upper case. Across all 26 characters, there are 2^{26} possibilities. This number is 67,108,864. So 2 raised to 26 is a little over 67 millions. So the password given by John is not just one password, but is a generic one with over 67 million possibilities. There is a one in 67 million chance in correctly guessing the correct password if John chooses the upper case letters randomly. This is much better odds than winning the Powerball lottery, one in 292,201,338, which one in 292 million. But it is still an undeniably strong password.

So John tells you the password, but has in fact not given up much secret. This is the case if he makes the case sensitivity truly random. Of course, once he sets his password, unless he has a great memory, he will need to write down the positions that are upper case. Otherwise, he will not be able to log back on. But that is a totally separate story.

___________________________________________________________________________

The binomial angle

A good way to represent the passwords is to view each one as a 26-character string of U and L (U stands for upper case and L stands for lower case). Then the above two passwords (the one with 10 upper case letters and the one with 14 upper case letters) are represented as follows:

    L L U L L L U L L L L L U U L L U U L U U L L U U L (10 U’s)

    U U L U U L U U U U L L U U L U L U U L L L L U L L (14 U’s)

As discussed, there are 67,108,864 many such strings. We can also think of each such string as the record of a random experiment consisting of tossing a coin 26 times. We record a U when we get a head and record an L when the coin toss results in a tail. In fact, this is the kind of records that John would need to keep when he sets the password. Such a string would tell him which characters in the password are in upper case and which characters are in lower case. On the other hand, hacking the password would be essentially trying to pinpoint one such string out of 67,108,864 many possibilities.

We know 67,108,864 is a large number. Let’s further demonstrate the challenge. In each string, the number of U’s can range from 0 to 26. How many of the strings have 0 U’s? Precisely one (all letters are L). This would be what most people would try first (all the letters are lower case). How many of the strings have exactly 1 U? Thinking about it carefully, we should arrive at the conclusion that there are 26 possibilities (the single U can be in any of the 26 positions). So if the would be hacker knows there there is only one upper case letter, then it would be easy to break the password. How many of the strings have exactly 2 U’s? If you try to write out all the possible cases, it may take some effort. There are 325 possibilities. So just having two upper case letters in the password seems to make it something that is approaching a strong password. But the problem is that the two U’s may be guessed easily. John may put the upper case letters on his initials (if his name is John Smith, he may make J and S upper case), or in other obvious letters.

How many of the strings have exactly 3 U’s? This will be really hard to write out by hand. There are 2,600 many possibilities. Why stop at having just 3 upper case letters? It is clear that the more upper case letters used in the password, the stronger it is and the harder it is to crack.

How do we know the precise number of possibilities for a given k, the number of U’s? The idea is that of choosing k number of letters out of 26 letters.

The number of ways of choosing k objects from a total of n objects is denoted by the notation \binom{n}{k}. Sometimes the notations C(n,k), _nC_k and C_{n,k} are used. Regardless of the notation, the calculation is

    \displaystyle \binom{n}{k}=\frac{n!}{k! (n-k)!}

The notation n! is the product of all the positive integers up to and including n (this is called n factorial). Thus 1!=1, 2!=2, 3!=6, 4!=24. To make the formula work correctly, we make 0!=1.

The following gives the first several calculations in the 26-character password example.

    \displaystyle \binom{26}{2}=\frac{26!}{2! \ 24!}=\frac{26 \cdot 25}{2}=325

    \displaystyle \binom{26}{3}=\frac{26!}{3! \ 23!}=\frac{26 \cdot 25 \cdot 24}{6}=2600

    \displaystyle \binom{26}{4}=\frac{26!}{4! \ 22!}=\frac{26 \cdot 25 \cdot 24 \cdot 23}{24}=14950

If the desire is to see the patterns, the remaining calculations can be done by using software (or at least a hand held calculator). The following table shows the results.

    \displaystyle \begin{array}{rrr} k &\text{ } & \displaystyle \binom{26}{k}  \\  \text{ } & \text{ } & \text{ }  \\  0 &\text{ } & 1  \\     1 &\text{ } & 26   \\  2 &\text{ } & 325   \\  3 &\text{ } & 2,600   \\  4 &\text{ } & 14,950   \\  5 &\text{ } & 65,780   \\  6 &\text{ } & 230,230   \\  7 &\text{ } & 657,800   \\  8 &\text{ } & 1,562,275   \\  9 &\text{ } & 3,124,550   \\  10 &\text{ } & 5,311,735   \\  11 &\text{ } & 7,726,160   \\  12 &\text{ } & 9,657,700   \\  13 &\text{ } & 10,400,600   \\  14 &\text{ } & 9,657,700   \\  15 &\text{ } & 7,726,160   \\  16 &\text{ } & 5,311,735   \\  17 &\text{ } & 3,124,550   \\  18 &\text{ } & 1,562,275   \\  19 &\text{ } & 657,800   \\  20 &\text{ } & 230,230   \\  21 &\text{ } & 65,780   \\  22 &\text{ } & 14,950   \\  23 &\text{ } & 2,600   \\  24 &\text{ } & 325   \\  25 &\text{ } & 26   \\  26 &\text{ } & 1   \\  \end{array}

The pattern is symmetrical. Having too few U’s or too many U’s produces weak passwords that may be easy to guess. Having 6 or 7 U’s seems to give strong passwords. Having half of the letters upper case (13 U’s) is the optimal, with the most possibilities (over 10 millions). Even if you are given partial information such as “half of the letters are in upper case”, you are still left with over 10 million possibilities to work with!

Dividing each of the above counts by 67,108,864 gives the relative weight (probability) of each case of having exactly k U’s.

    \displaystyle \begin{array}{rrr} k &\text{ } & P[X=k]  \\  \text{ } & \text{ } & \text{ }  \\  0 &\text{ } & 0.00000001  \\     1 &\text{ } & 0.00000039   \\  2 &\text{ } & 0.00000484   \\  3 &\text{ } & 0.00003874   \\  4 &\text{ } & 0.00022277   \\  5 &\text{ } & 0.00098020   \\  6 &\text{ } & 0.00343069   \\  7 &\text{ } & 0.00980198   \\  8 &\text{ } & 0.02327971   \\  9 &\text{ } & 0.04655942   \\  10 &\text{ } & 0.07915102   \\  11 &\text{ } & 0.11512876   \\  12 &\text{ } & 0.14391094   \\  13 &\text{ } & 0.15498102   \\  14 &\text{ } & 0.14391094   \\  15 &\text{ } & 0.11512876   \\  16 &\text{ } & 0.07915102   \\  17 &\text{ } & 0.04655942   \\  18 &\text{ } & 0.02327971   \\  19 &\text{ } & 0.00980198   \\  20 &\text{ } & 0.00343069   \\  21 &\text{ } & 0.00098020   \\  22 &\text{ } & 0.00022277   \\  23 &\text{ } & 0.00003874   \\  24 &\text{ } & 0.00000484   \\  25 &\text{ } & 0.00000039   \\  26 &\text{ } & 0.00000001   \\  \end{array}

The cases of X=k where k=11,12,13,14,15 add up to 67.3% of the 67,108,864 possibilities.

___________________________________________________________________________

The binomial distribution

Another way to look at it is that in setting the password, John is performing a sequence of 26 independent Bernoulli trials. Here, each trial has two outcomes, a or A, b or B, c or C and so on. For example, the lower case or upper case can be determined by a coin toss. Let X be the number of upper case letters in the 26-character password. Then the random variable X has a binomial distribution with n=26 (26 Bernoulli trials) and the probability of success p=0.5 in each trial, which is the probability that a character is upper case, assuming that he determines the upper/lower case by a coin toss. The following is the probability function:

    \displaystyle P(X=x)=\binom{26}{x} \biggl[\frac{1}{2}\biggr]^x \biggl[\frac{1}{2}\biggr]^{26-x}=\binom{26}{x} \biggl[\frac{1}{2}\biggr]^{26}

where x=0,1,2,\cdots,25,26. The quantity \displaystyle P(X=x) is the probability that the number of upper case letters is x. Here, \binom{26}{x} is the number of ways to choose x letters out of 26 letters and is computed by the formula indicated earlier.

Since the upper/lower case is determined randomly, another way to state the probability function of the random variable X is:

    \displaystyle P(X=x)=\displaystyle \frac{\binom{26}{x}}{2^{26}}=\frac{\binom{26}{x}}{67108864} \ \ \ \ \ \ \ \ \ x=0,1,2,3,\cdots,24,25,26

The expected value of this random variable X is 13. This is the average number of upper case letters if the case is determined randomly. This obviously produces the most optimally strong password. If John determines the case not at random, the security may not be as strong or the would be hacker may be able to guess.

Stepping away from the 26-character password example, here’s the probability function of a binomial distribution in general.

    \displaystyle P(X=x)=\binom{n}{x} \ p^x \ (1-p)^{n-x} \ \ \ \ \ \ \ \ x=0,1,2,3,\cdots,n-1,n

This model describes the random experiment of running n independent trials, where each trial has two outcomes (the technical term is Bernoulli trial). In each trial, the probability of one outcome (called success) is p and the probability of the other outcome (called failure) is 1-p. The random variable X counts the number of successes whenever such an experiment is performed. The probability P(X=x) gives the likelihood of achieving x successes.

As an example, if John has a bias toward lower case letter, then the probability of success (upper case) may be p=0.4 (assuming that the random variable X still counts the number of upper case letters). Then the average number of upper case letters in a randomly determined password is 26 x 0.4 = 10.4.

___________________________________________________________________________

Interesting binomial distribution problems

The problem of points and the dice problem are two famous probability problems that are in the history book as a result of a gambler seeking help from Pascal and Fermat.

___________________________________________________________________________
\copyright \ 2016 \text{ by Dan Ma}

Calculating order statistics using multinomial probabilities

Consider a random sample X_1,X_2,\cdots,X_n drawn from a continuous distribution. Rank the sample items in increasing order, resulting in a ranked sample Y_1<Y_2<\cdots <Y_n where Y_1 is the smallest sample item, Y_2 is the second smallest sample item and so on. The items in the ranked sample are called the order statistics. Recently the author of this blog was calculating a conditional probability such as P(Y_2>4 \ | \ Y_2>3). One way to do this is to calculate the distribution function P(Y_2 \le t). What about the probability P(Y_5>4 \ | \ Y_2>3)? Since this one involves two order statistics, the author of this blog initially thought that calculating P(Y_5>4 \ | \ Y_2>3) would require knowing the joint probability distribution of the order statistics Y_1,Y_2,\cdots ,Y_n. It turns out that a joint distribution may not be needed. Instead, we can calculate a conditional probability such as P(Y_5>4 \ | \ Y_2>3) using multinomial probabilities. In this post, we demonstrate how this is done using examples. Practice problems are found in here.

The calculation described here can be lengthy and tedious if the sample size is large. To make the calculation more manageable, the examples here have relatively small sample size. To keep the multinomial probabilities easier to calculate, the random samples are drawn from a uniform distribution. The calculation for larger sample sizes from other distributions is better done using a technology solution. In any case, the calculation described here is a great way to practice working with order statistics and multinomial probabilities.

________________________________________________________________________

The multinomial angle

In this post, the order statistics Y_1<Y_2<\cdots <Y_n are resulted from ranking the random sample X_1,X_2,\cdots,X_n, which is drawn from a continuous distribution with distribution function F(x)=P(X \le x). For the jth order statistic, the calculation often begins with its distribution function P(Y_j \le t).

Here’s the thought process for calculating P(Y_j \le t). In drawing the random sample X_1,X_2,\cdots,X_n, make a note of the items \le t and the items >t. For the event Y_j \le t to happen, there must be at least j many sample items X_i that are \le t. For the event Y_j > t to happen, there can be only at most j-1 many sample items X_i \le t. So to calculate P(Y_j \le t), simply find out the probability of having j or more sample items \le t. To calculate P(Y_j > t), find the probability of having at most j-1 sample items \le t.

    \displaystyle P(Y_j \le t)=\sum \limits_{k=j}^n \binom{n}{k} \ \biggl[ F(t) \biggr]^k \ \biggl[1-F(x) \biggr]^{n-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

    \displaystyle P(Y_j > t)=\sum \limits_{k=0}^{j-1} \binom{n}{k} \ \biggl[ F(t) \biggr]^k \ \biggl[1-F(x) \biggr]^{n-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

Both (1) and (2) involve binomial probabilities and are discussed in this previous post. The probability of success is F(t)=P(X \le t) since we are interested in how many sample items that are \le t. Both the calculations (1) and (2) are based on counting the number of sample items in the two intervals \le t and >t. It turns out that when the probability that is desired involves more than one Y_j, we can also count the number of sample items that fall into some appropriate intervals and apply some appropriate multinomial probabilities. Let’s use an example to illustrate the idea.

Example 1
Draw a random sample X_1,X_2,\cdots,X_{10} from the uniform distribution U(0,4). The resulting order statistics are Y_1<Y_2< \cdots <Y_{10}. Find the following probabilities:

  • P(Y_4<2<Y_5<Y_6<3<Y_7)
  • P(Y_4<2<Y_6<3<Y_7)

For both probabilities, the range of the distribution is broken up into 3 intervals, (0, 2), (2, 3) and (3, 4). Each sample item has probabilities \frac{2}{4}, \frac{1}{4}, \frac{1}{4} of falling into these intervals, respectively. Multinomial probabilities are calculated on these 3 intervals. It is a matter of counting the numbers of sample items falling into each interval.

The first probability involves the event that there are 4 sample items in the interval (0, 2), 2 sample items in the interval (2, 3) and 4 sample items in the interval (3, 4). Thus the first probability is the following multinomial probability:

    \displaystyle \begin{aligned} P(Y_4<2<Y_5<Y_6<3<Y_7)&=\frac{10!}{4! \ 2! \ 4!} \biggl[\frac{2}{4} \biggr]^4 \ \biggl[\frac{1}{4} \biggr]^2 \ \biggl[\frac{1}{4} \biggr]^4 \\&\text{ } \\&=\frac{50400}{1048567} \\&\text{ } \\&=0.0481  \end{aligned}

For the second probability, Y_5 does not have to be greater than 2. Thus there could be 5 sample items less than 2. So we need to add one more case to the above probability (5 sample items to the first interval, 1 sample item to the second interval and 4 sample items to the third interval).

    \displaystyle \begin{aligned} P(Y_4<2<Y_6<3<Y_7)&=\frac{10!}{4! \ 2! \ 4!} \biggl[\frac{2}{4} \biggr]^4 \ \biggl[\frac{1}{4} \biggr]^2 \ \biggl[\frac{1}{4} \biggr]^4 \\& \ \ \ \ + \frac{10!}{5! \ 1! \ 4!} \biggl[\frac{2}{4} \biggr]^5 \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{1}{4} \biggr]^4 \\&\text{ } \\&=\frac{50400+40320}{1048567} \\&\text{ } \\&=\frac{90720}{1048567} \\&\text{ } \\&=0.0865  \end{aligned}

Example 2
Draw a random sample X_1,X_2,\cdots,X_6 from the uniform distribution U(0,4). The resulting order statistics are Y_1<Y_2< \cdots <Y_6. Find the probability P(1<Y_2<Y_4<3).

In this problem the range of the distribution is broken up into 3 intervals (0, 1), (1, 3) and (3, 4). Each sample item has probabilities \frac{1}{4}, \frac{2}{4}, \frac{1}{4} of falling into these intervals, respectively. Multinomial probabilities are calculated on these 3 intervals. It is a matter of counting the numbers of sample items falling into each interval. The counting is a little bit more involved here than in the previous example.

The example is to find the probability that both the second order statistic Y_2 and the fourth order statistic Y_4 fall into the interval (1,3). To solve this, determine how many sample items that fall into the interval (0,1), (1,3) and (3,4). The following points detail the counting.

  • For the event 1<Y_2 to happen, there can be at most 1 sample item in the interval (0,1).
  • For the event Y_4<3 to happen, there must be at least 4 sample items in the interval (0,3). Thus if the interval (0,1) has 1 sample item, the interval (1,3) has at least 3 sample items. If the interval (0,1) has no sample item, the interval (1,3) has at least 4 sample items.

The following lists out all the cases that satisfy the above two bullet points. The notation [a, b, c] means that a sample items fall into (0,1), b sample items fall into the interval (1,3) and c sample items fall into the interval (3,4). So a+b+c=6. Since the sample items are drawn from U(0,4), the probabilities of a sample item falling into intervals (0,1), (1,3) and (3,4) are \frac{1}{4}, \frac{2}{4} and \frac{1}{4}, respectively.

    [0, 4, 2]
    [0, 5, 1]
    [0, 6, 0]
    [1, 3, 2]
    [1, 4, 1]
    [1, 5, 0]

    \displaystyle \begin{aligned} \frac{6!}{a! \ b! \ c!} \ \biggl[\frac{1}{4} \biggr]^a \ \biggl[\frac{2}{4} \biggr]^b \ \biggl[\frac{1}{4} \biggr]^c&=\frac{6!}{0! \ 4! \ 2!} \ \biggl[\frac{1}{4} \biggr]^0 \ \biggl[\frac{2}{4} \biggr]^4 \ \biggl[\frac{1}{4} \biggr]^2=\frac{240}{4096} \\&\text{ } \\&=\frac{6!}{0! \ 5! \ 1!} \ \biggl[\frac{1}{4} \biggr]^0 \ \biggl[\frac{2}{4} \biggr]^5 \ \biggl[\frac{1}{4} \biggr]^1=\frac{192}{4096} \\&\text{ } \\&=\frac{6!}{0! \ 6! \ 0!} \ \biggl[\frac{1}{4} \biggr]^0 \ \biggl[\frac{2}{4} \biggr]^6 \ \biggl[\frac{1}{4} \biggr]^0=\frac{64}{4096} \\&\text{ } \\&=\frac{6!}{1! \ 3! \ 2!} \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{2}{4} \biggr]^3 \ \biggl[\frac{1}{4} \biggr]^2=\frac{480}{4096} \\&\text{ } \\&=\frac{6!}{1! \ 4! \ 1!} \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{2}{4} \biggr]^4 \ \biggl[\frac{1}{4} \biggr]^1=\frac{480}{4096} \\&\text{ } \\&=\frac{6!}{1! \ 5! \ 0!} \ \biggl[\frac{1}{4} \biggr]^1 \ \biggl[\frac{2}{4} \biggr]^5 \ \biggl[\frac{1}{4} \biggr]^0=\frac{192}{4096} \\&\text{ } \\&=\text{sum of probabilities }=\frac{1648}{4096}=0.4023\end{aligned}

So in randomly drawing 6 items from the uniform distribution U(0,4), there is a 40% chance that the second order statistic and the fourth order statistic are between 1 and 3.

________________________________________________________________________

More examples

The method described by the above examples is this. When looking at the event described by the probability problem, the entire range of distribution is broken up into several intervals. Imagine the sample items X_i are randomly being thrown into these interval (i.e. we are sampling from a uniform distribution). Then multinomial probabilities are calculated to account for all the different ways sample items can land into these intervals. The following examples further illustrate this idea.

Example 3
Draw a random sample X_1,X_2,\cdots,X_7 from the uniform distribution U(0,5). The resulting order statistics are Y_1<Y_2< \cdots <Y_7. Find the following probabilities:

  • P(1<Y_1<3<Y_4<4)
  • P(3<Y_4<4 \ | \ 1<Y_1<3)

The range is broken up into the intervals (0, 1), (1, 3), (3, 4) and (4, 5). The sample items fall into these intervals with probabilities \frac{1}{5}, \frac{2}{5}, \frac{1}{5} and \frac{1}{5}. The following details the counting for the event 1<Y_1<3<Y_4<4:

  • There are no sample items in (0, 1) since 1<Y_1.
  • Based on Y_1<3<Y_4, there are at least one sample item and at most 3 sample items in (0, 3). Thus in the interval (1, 3), there are at least one sample item and at most 3 sample items since there are none in (0, 1).
  • Based on Y_4<4, there are at least 4 sample items in the interval (0, 4). Thus the count in (3, 4) combines with the count in (1, 3) must be at least 4.
  • The interval (4, 5) simply receives the left over sample items not in the previous intervals.

The notation [a, b, c, d] lists out the counts in the 4 intervals. The following lists out all the cases described by the above 5 bullet points along with the corresponding multinomial probabilities, with two of the probabilities set up.

    \displaystyle [0, 1, 3, 3] \ \ \ \ \ \ \frac{280}{78125}=\frac{7!}{0! \ 1! \ 3! \ 3!} \ \biggl[\frac{1}{5} \biggr]^0 \ \biggl[\frac{2}{5} \biggr]^1 \ \biggl[\frac{1}{5} \biggr]^3 \ \biggl[\frac{1}{5} \biggr]^3

    \displaystyle [0, 1, 4, 2] \ \ \ \ \ \ \frac{210}{78125}

    \displaystyle [0, 1, 5, 1] \ \ \ \ \ \ \frac{84}{78125}

    \displaystyle [0, 1, 6, 0] \ \ \ \ \ \ \frac{14}{78125}

    \displaystyle [0, 2, 2, 3] \ \ \ \ \ \ \frac{840}{78125}

    \displaystyle [0, 2, 3, 2] \ \ \ \ \ \ \frac{840}{78125}

    \displaystyle [0, 2, 4, 1] \ \ \ \ \ \ \frac{420}{78125}

    \displaystyle [0, 2, 5, 0] \ \ \ \ \ \ \frac{84}{78125}

    \displaystyle [0, 3, 1, 3] \ \ \ \ \ \ \frac{1120}{78125}=\frac{7!}{0! \ 3! \ 1! \ 3!} \ \biggl[\frac{1}{5} \biggr]^0 \ \biggl[\frac{2}{5} \biggr]^3 \ \biggl[\frac{1}{5} \biggr]^1 \ \biggl[\frac{1}{5} \biggr]^3

    \displaystyle [0, 3, 2, 2] \ \ \ \ \ \ \frac{1680}{78125}

    \displaystyle [0, 3, 3, 1] \ \ \ \ \ \ \frac{1120}{78125}

    \displaystyle [0, 3, 4, 0] \ \ \ \ \ \ \frac{280}{78125}

Summing all the probabilities, \displaystyle P(1<Y_1<3<Y_4<4)=\frac{6972}{78125}=0.08924. Out of the 78125 many different ways the 7 sample items can land into these 4 intervals, 6972 of them would satisfy the event 1<Y_1<3<Y_4<4.

++++++++++++++++++++++++++++++++++

We now calculate the second probability in Example 3.

    \displaystyle P(3<Y_4<4 \ | \ 1<Y_1<3)=\frac{P(1<Y_1<3<Y_4<4)}{P(1<Y_1<3)}

First calculate P(1<Y_1<3)=P(Y_1<3)-P(Y_1<1). The probability P(Y_1<t) is the probability of having at least 1 sample item less than t, which is the complement of the probability of all sample items greater than t.

    \displaystyle \begin{aligned} P(1<Y_1<3)&=P(Y_1<3)-P(Y_1<1) \\&=1-\biggl( \frac{2}{5} \biggr)^7 -\biggl[1-\biggl( \frac{4}{5} \biggr)^7 \biggr] \\&=\frac{77997-61741}{78125} \\&=\frac{16256}{78125} \end{aligned}

The event 1<Y_1<3 can occur in 16256 ways. Out of these many ways, 6972 of these satisfy the event 1<Y_1<3<Y_4<4. Thus we have:

    \displaystyle P(3<Y_4<4 \ | \ 1<Y_1<3)=\frac{6972}{16256}=0.4289

Example 4
Draw a random sample X_1,X_2,X_3,X_4,X_5 from the uniform distribution U(0,5). The resulting order statistics are Y_1<Y_2<Y_3<Y_4 <Y_5. Consider the conditional random variable Y_4 \ | \ Y_2 >3. For this conditional distribution, find the following:

  • P( Y_4 \le t \ | \ Y_2 >3)
  • f_{Y_4}(t \ | \ Y_2 >3)
  • E(Y_4 \ | \ Y_2 >3)

where 3<t<5. Note that f_{Y_4}(t | \ Y_2 >3) is the density function of Y_4 \ | \ Y_2 >3.

Note that

    \displaystyle P( Y_4 \le t \ | \ Y_2 >3)=\frac{P(3<Y_2<Y_4 \le t)}{P(Y_2 >3)}

In finding P(3<Y_2<Y_4 \le t), the range (0, 5) is broken up into 3 intervals (0, 3), (3, t) and (t, 5). The sample items fall into these intervals with probabilities \frac{3}{5}, \frac{t-3}{5} and \frac{5-t}{5}.

Since Y_2 >3, there is at most 1 sample item in the interval (0, 3). Since Y_4 \le t, there are at least 4 sample items in the interval (0, t). So the count in the interval (3, t) and the count in (0, 3) should add up to 4 or more items. The following shows all the cases for the event 3<Y_2<Y_4 \le t along with the corresponding multinomial probabilities.

    \displaystyle [0, 4, 1] \ \ \ \ \ \ \frac{5!}{0! \ 4! \ 1!} \ \biggl[\frac{3}{5} \biggr]^0 \ \biggl[\frac{t-3}{5} \biggr]^4 \ \biggl[\frac{5-t}{5} \biggr]^1

    \displaystyle [0, 5, 0] \ \ \ \ \ \ \frac{5!}{0! \ 5! \ 0!} \ \biggl[\frac{3}{5} \biggr]^0 \ \biggl[\frac{t-3}{5} \biggr]^5 \ \biggl[\frac{5-t}{5} \biggr]^0

    \displaystyle [1, 3, 1] \ \ \ \ \ \ \frac{5!}{1! \ 3! \ 1!} \ \biggl[\frac{3}{5} \biggr]^1 \ \biggl[\frac{t-3}{5} \biggr]^3 \ \biggl[\frac{5-t}{5} \biggr]^1

    \displaystyle [1, 4, 0] \ \ \ \ \ \ \frac{5!}{1! \ 4! \ 0!} \ \biggl[\frac{3}{5} \biggr]^1 \ \biggl[\frac{t-3}{5} \biggr]^4 \ \biggl[\frac{5-t}{5} \biggr]^0

After carrying the algebra and simplifying, we have the following:

    \displaystyle P(3<Y_2<Y_4 \le t)=\frac{-4t^5+25t^4+180t^3-1890t^2+5400t-5103}{3125}

For the event Y_2 >3 to happen, there is at most 1 sample item less than 3. So we have:

    \displaystyle P(Y_2 >3)=\binom{5}{0} \ \biggl[\frac{3}{5} \biggr]^0 \ \biggl[\frac{2}{5} \biggr]^5 +\binom{5}{1} \ \biggl[\frac{3}{5} \biggr]^1 \ \biggl[\frac{2}{5} \biggr]^4=\frac{272}{3125}

    \displaystyle P( Y_4 \le t \ | \ Y_2 >3)=\frac{-4t^5+25t^4+180t^3-1890t^2+5400t-5103}{272}

Then the conditional density is obtained by differentiating P( Y_4 \le t \ | \ Y_2 >3).

    \displaystyle f_{Y_4}(t \ | \ Y_2 >3)=\frac{-20t^4+100t^3+540t^2-3750t+5400}{272}

The following gives the conditional mean E(Y_4 \ | \ Y_2 >3).

    \displaystyle \begin{aligned} E(Y_4 \ | \ Y_2 >3)&=\frac{1}{272} \ \int_3^5 t(-20t^4+100t^3+540t^2-3750t+5400) \ dt \\&=\frac{215}{51}=4.216 \end{aligned}

To contrast, the following gives the information on the unconditional distribution of Y_4.

    \displaystyle f_{Y_4}(t)=\frac{5!}{3! \ 1! \ 1!} \ \biggl[\frac{t}{5} \biggr]^3 \ \biggl[\frac{1}{5} \biggr] \ \biggl[ \frac{5-t}{5} \biggr]^1=\frac{20}{3125} \ (5t^3-t^4)

    \displaystyle E(Y_4)=\frac{20}{3125} \ \int_0^5 t(5t^3-t^4) \ dt=\frac{10}{3}=3.33

The unconditional mean of Y_4 is about 3.33. With the additional information that Y_2 >3, the average of Y_4 is now 4.2. So a higher value of Y_2 pulls up the mean of Y_4.

________________________________________________________________________

Practice problems

Practice problems to reinforce the calculation are found in the problem blog, a companion blog to this blog.

________________________________________________________________________
\copyright \ \text{2015 by Dan Ma}

Tis the Season for Gift Exchange

Suppose that there are 10 people in a holiday party with each person bearing a gift. The gifts are put in a pile and each person randomly selects one gift from the pile. In order not to bias the random selection of gifts, suppose that the partygoers select gifts by picking slips of papers with numbers identifying the gifts. What is the probability that there is at least one partygoer who ends up selecting his or her own gift? In this example, selecting one’s gift is called a match. What is the probability that there is at least one match if there are more people in the party, say 50 or 100 people? What is the behavior of this probability if the size of the party increases without bound?

The above example is a description of a classic problem in probability called the matching problem. There are many colorful descriptions of the problem. One such description is that there are n married couples in a ballroom dancing class. The instructor pairs the ladies with the gentlemen at random. A match occurs if a married couple is paired as dancing partners.

Whatever the description, the matching problem involves two lists of items that are paired according to a particular order (the original order). When the items in the first list are paired with the items in the second list according to another random ordering, a match occurs if an item in the first list and an item in the second list are paired both in the original order and in the new order. The matching problem discussed here is: what is the probability that there is at least one match? Seen in this light, both examples described above are equivalent mathematically.

We now continue with the discussion of the random gift selection example. Suppose that there are n people in the party. Let E_i be the event that the i^{th} person selects his or her own gift. The event E=E_1 \cup E_2 \cup \cdots \cup E_n is the event that at least one person selects his or her own gift (i.e. there is at least one match). The probability P(E)=P(E_1 \cup E_2 \cup \cdots \cup E_n) is the solution of the matching problem as described in the beginning. The following is the probability P(E)=P(E_1 \cup E_2 \cup \cdots \cup E_n).

\displaystyle (1) \ \ \ \ \ \ P[E_1 \cup E_2 \cup \cdots \cup E_n]=1-\frac{1}{2!}+\frac{1}{3!}-\cdots+(-1)^{n+1} \frac{1}{n!}

The answer in (1) is obtained by using an idea called the inclusion-exclusion principle. We will get to that in just a minute. First let’s look at the results of (1) for a few iterations.

\displaystyle (2) \ \ \ \ \ \  \begin{bmatrix} \text{n}&\text{ }& P[E_1 \cup E_2 \cup \cdots \cup E_n]  \\\text{ }&\text{ }&\text{ } \\ 3&\text{ }&0.666667  \\ 4&\text{ }&0.625000  \\ 5&\text{ }&0.633333  \\ 6&\text{ }&0.631944  \\ 7&\text{ }&0.632143  \\ 8&\text{ }&0.632118 \\ 9&\text{ }&0.632121 \\ 10&\text{ }&0.632121 \\ 11&\text{ }&0.632121 \\ 12&\text{ }&0.632121 \end{bmatrix}

In a party with random gift exchange, it appears that regardless of the size of the party, there is a very good chance that someone will end up picking his or her own gift! A match will happen about 63% of the time.

It turns out that the answers in (1) are related to the Taylor’s series expansion of e^{-1}. We show that the probability in (1) will always converge to 1-e^{-1}=0.632121. Note that the Taylor’s series expansion of e^{-1} is:

\displaystyle (3) \ \ \ \ \ \ e^{-1}=\frac{1}{0!}-\frac{1}{1!}+\frac{1}{2!}-\frac{1}{3!}+\cdots+(-1)^n \frac{1}{n!}+\cdots

Consequently, the Taylor’s series expansion of 1-e^{-1} is:

\displaystyle \begin{aligned} (4) \ \ \ \ \ \   1-e^{-1}&=1-\biggl(1-1+\frac{1}{2!}-\frac{1}{3!}+\cdots+(-1)^n \frac{1}{n!}+\cdots \biggr) \\&=1-\frac{1}{2!}+\frac{1}{3!}-\cdots+(-1)^{n+1} \frac{1}{n!}+\cdots  \end{aligned}

Note that the sum of the first n terms of the series in (4) is the probability in (1). As the number of partygoers n increases, the probability P[E_1 \cup E_2 \cup \cdots \cup E_n] will be closer and closer to 1-e^{-1}=0.6321. We have:

\displaystyle (5) \ \ \ \ \ \ \lim \limits_{n \rightarrow \infty} P[E_1 \cup E_2 \cup \cdots \cup E_n]=1 - e^{-1}

\displaystyle (6) \ \ \ \ \ \ \lim \limits_{n \rightarrow \infty} P[(E_1 \cup E_2 \cup \cdots \cup E_n)^c]=e^{-1}

The equation (5) says that it does not matter how many people are in the random gift exchange, the answer to the matching problem is always 1-e^{-1} in the limit. The equation (6) says that the probability of having no matches approaches e^{-1}. There is a better than even chance (0.63 to be more precise) that there is at least one match. So in a random gift exchange as described at the beginning, it is much easier to see a match than not to see one.

The Inclusion-Exclusion Principle

We now want to give some indication why (1) provides the answers. The inclusion-exclusion principle is a formula for finding the probability of the union of events. For the union of two events and the union of three events, we have:

\displaystyle (7) \ \ \ \ \ \ P[E_1 \cup E_2]=P[E_1]+P[E_1]-P[E_1 \cap E_2]

\displaystyle \begin{aligned} (8) \ \ \ \ \ \   P[E_1 \cup E_2 \cup E_3]&=P[E_1]+P[E_1]+P[E_3] \\& \ \ \ \ -P[E_1 \cap E_2]-P[E_1 \cap E_3]-P[E_2 \cap E_3] \\& \ \ \ \ +P[E_1 \cap E_2 \cap E_3]  \end{aligned}

To find the probability of the union of n events, we first add up the probabilities of the individual events E_i. Because the resulting sum overshoots, we need to subtract the probabilities of the intersections E_i \cap E_j. Because the subtractions remove too much, we need to add back the probabilities of the intersections of three individual events E_i \cap E_j \cap E_k. The process of inclusion and exclusion continues until we reach the step of adding/removing of the intersection E_1 \cap \cdots \cap E_n. The following is the statement of the inclusion-exclusion principle.

\displaystyle (9) \ \ \ \ \ \ P[E_1 \cup E_2 \cdots \cup E_n]=S_1-S_2+S_3- \ \ \cdots \ \ +(-1)^{n+1}S_n

In (9), S_1 is the sum of all probabilities P[E_i], S_2 is the sum of all possible probabilities of the intersection of two events E_i and E_j and S_3 is the sum of all possible probabilities of the intersection of three events and so on.

We now apply the inclusion-exclusion principle to derive equation (1). The event E_i is the event that the i^{th} person gets his or her own gift while the other n-1 people are free to select gifts. The probability of this event is \displaystyle P[E_i]=\frac{(n-1)!}{n!}. There are \displaystyle \binom{n}{1}=\frac{n!}{1! (n-1)!} many ways of fixing 1 gift. So \displaystyle S_1=\binom{n}{1} \times \frac{(n-1)!}{n!}=1.

Now consider the intersection of two events. The event E_i \cap E_j is the event that the i^{th} person and the j^{th} person get their own gifts while the other (n-2) people are free to select gifts. The probability of this event is \displaystyle P[E_i \cap E_j]=\frac{(n-2)!}{n!}. There are \displaystyle \binom{n}{2}=\frac{n!}{2! (n-2)!} many ways of fixing 2 gifts. So \displaystyle S_2=\binom{n}{2} \times \frac{(n-2)!}{n!}=\frac{1}{2!}.

By the same reasoning, we derive that \displaystyle S_3=\frac{1}{3!} and \displaystyle S_4=\frac{1}{4!} and so on. Then plugging \displaystyle S_i=\frac{1}{i!} into (9), we obtain the answers in (1).

The matching problem had been discussed previously in the following posts.

The matching problem

Mote about the matching problem

More about the matching problem

We started the discussion on the matching problem in a previous post (The matching problem). We would like to add a little bit more information. The matching problem has many colorful descriptions. Here are some examples:

  • In a dance class with n married couples, suppose that the dance instructor assigns partners randomly. We have a match if a married couple is paired as dancing partners.
  • There are n letters with n matching envelops. Suppose that the letters are randomly stuffed into the envelops. We have a match if a letter is placed into the correct envelop.

In this post we use the generic example of a deck of shuffled cards. Suppose a deck of n cards is numbered 1 through n. After the deck is thoroughly shuffled, if the card numbered i is the i^{th} card in the shuffled deck, we say that it is a match. It is clear that the above two examples and the shuffled deck example are equivalent mathematically.

How many matches will there be in a shuffled deck? What is the probability that there will be at least one match? What is the probability that there will be exactly k matches where k=0,1, \cdots, n? On average, how many matches will there be? The number of matches is a random variable. We comment on this random variable resulting from the matching problem. We also derive the probability density function and its moments. We also comment the relation this random variable with the Poisson distribution.

Consider the set S=\lbrace{1,2, \cdots, n}\rbrace. The matching problem corresponds to the random selection of all the points in S without replacement. The random selection of points of the entire set S results in ordered samples (Y_1,Y_2, \cdots, Y_n) where each Y_i \in S and Y_i \neq Y_j for i \neq j. These ordered samples are precisely the permutations of the set S and there are n! many such permutations.

A match occurs when some element i \in S is selected in the i^{th} pick. In the card example, the i^{th} card is a match when the card that is numbered i is in the i^{th} position in the shuffled deck (Y_i=i). The matching problem is a counting problem. We demonstrate how to count the n! permutations that result in no matches, at least one match and exactly k matches using the inclusion-exclusion principle.

For each i \in S, we define the following indicator variable I_i:

\displaystyle I_i=\left\{\begin{matrix}1&\thinspace Y_i=i \ \ \ \text{(} i^{th} \text{card is a match)}\\{0}&\thinspace Y_i \neq i \ \ \ \text{(} i^{th} \text{card is not a match)}\end{matrix}\right.

Furthermore, let X_n=I_1+I_2+ \cdots +I_n. The random variable X_n is the number of matches when a deck of n cards (numbered 1 through n) are shuffled. We derive the probability function of X_n as well as its first and second moments.

The Inclusion-Exclusion Principle
For any events A_1,A_2, \cdots, A_n, the following is the cardinality of the event A_1 \cup A_2 \cup \cdots \cup A_n:

\displaystyle \lvert A_1 \cup A_2 \cup \cdots \cup A_n \lvert=\sum \limits_{m=0}^n (-1)^{m+1} S_m where the counts S_m are defined by the following:

\displaystyle S_1=\sum \limits_{i=1}^n \lvert A_i \lvert

\displaystyle S_2=\sum \limits_{i<j} \lvert A_i \cap A_j\lvert

\displaystyle S_m=\sum \limits_{i(1)< \cdots <i(m)} \lvert A_{i(1)} \cap \cdots \cap A_{i(m)}\lvert

Remark
In the inclusion-exclusion formula, we sum the cardinalities of the sets, subtract the cardinalities of intersections of pairs, add the cardinalities of the intersection of triples and so on. When n=2, we have the familiar formula:

\displaystyle \lvert A_1 \cup A_2 \lvert=\lvert A_1 \lvert +\lvert A_2 \lvert-\lvert A_1 \cap A_2\lvert

The Probability Density Function
For each i in S=\lbrace{1,2, \cdots, n}\rbrace, let A_i be the event that the i^{th} card is a match. The event A_1 \cup \cdots \cup A_n is the event that there is at least one match in the deck of n shuffled cards. We use the inclusion-exclusion principle to derive the count for this event. The union and its complement “no matches in the cards” will become the basis for deriving the density function of X_n.

For the event A_i, the i^{th} card is fixed and the other n-1 cards are free to permute. Thus \vert A_i \vert=(n-1)!. As a result, \displaystyle S_1=\binom{n}{1} (n-1)!=n!.

For the event A_i \cap A_j, the i^{th} card and the j^{th} card are fixed and the other n-2 cards are free to permute. Thus we have \vert A_i \cap A_j \vert=(n-2)! and \displaystyle S_2=\binom{n}{2} (n-2)!=\frac{n!}{2!}

In the general case, we have \displaystyle S_i=\binom{n}{i} (n-i)!=\frac{n!}{i!}

Applying the inclusion-exclusion principle, we have:

\displaystyle \lvert A_1 \cup A_2 \cup \cdots \cup A_n \lvert=\sum \limits_{i=1}^n (-1)^{i+1} S_i

\displaystyle \begin{aligned}\lvert (A_1 \cup A_2 \cup \cdots \cup A_n)^c \lvert&=n!-\sum \limits_{i=1}^n (-1)^{i+1} S_i\\&=n!-\sum \limits_{i=1}^n (-1)^{i+1} \frac{n!}{i!}\\&=n!+\sum \limits_{i=1}^n (-1)^{i} \frac{n!}{i!}\\&=\sum \limits_{i=0}^n (-1)^{i} \frac{n!}{i!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)\end{aligned}

Out of the n! permutations, \displaystyle \lvert (A_1 \cup A_2 \cup \cdots \cup A_n)^c \lvert of them have no matches. Thus the probability of having no matches in a shuffled deck of n cards is:

\displaystyle \begin{aligned}P(\text{no matches in n cards})&=\displaystyle \biggl(\sum \limits_{i=0}^n (-1)^{i} \frac{n!}{i!}\biggr) \frac{1}{n!}\\&=\sum \limits_{i=0}^n \frac{(-1)^{i}}{i!}\\&=P(X_n=0)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)\end{aligned}

To derive P(X_n=1), we consider the event that 1 of the cards is a match and the other n-1 cards are not matches. Using (1) to obtain the count for the event that n-1 cards are not matches, we have:

\displaystyle \begin{aligned}P(X_n=1)&=\displaystyle \binom{n}{1}\displaystyle \biggl(\sum \limits_{i=0}^{n-1} (-1)^{i} \frac{(n-1)!}{i!}\biggr) \frac{1}{n!}\\&=\sum \limits_{i=0}^{n-1} \frac{(-1)^{i}}{i!}\end{aligned}

For the general case, we have:

\displaystyle \begin{aligned}P(X_n=k)&=\displaystyle \binom{n}{k}\displaystyle \biggl(\sum \limits_{i=0}^{n-k} (-1)^{i} \frac{(n-k)!}{i!}\biggr) \frac{1}{n!}\\&=\frac{1}{k!}\sum \limits_{i=0}^{n-k} \frac{(-1)^{i}}{i!}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)\end{aligned}

We point out that P(X_n=n-1)=0. It is impossible to have exactly n-1 matches while the remaining card is not a match. This can also be verified by looking at the probability density function. On the other hand, \displaystyle P(X_n=n)=\frac{1}{n!}. There is only one permutation out of n! many where all the cards are matches.

Poisson Distribution
We compare the probability density functions P(X_{10}=k), P(X_{20}=k) P(X_{150}=k) and the Poisson desnity function with parameter \lambda=1. The following matrix shows the results (rounded to eight decimal places).

\displaystyle \begin{pmatrix} \text{k}&P(X_{10}=k)&P(X_{20}=k)&P(X_{150}=k)&\displaystyle \frac{e^{-1}}{k!} \\{\text{ }}&\text{ }&\text{ } \\{0}&0.36787946&0.36787944&0.36787944&0.36787944 \\{1}&0.36787919&0.36787944&0.36787944&0.36787944 \\{2}&0.18394097&0.18393972&0.18393972&0.18393972 \\{3}&0.06130952&0.06131324&0.06131324&0.06131324 \\{4}&0.01533565&0.01532831&0.01532831&0.01532831 \\{5}&0.00305556&0.00306566&0.00306566&0.00306566 \\{6}&0.00052083&0.00051094&0.00051094&0.00051094 \\{7}&0.00006614&0.00007299&0.00007299&0.00007299 \\{8}&0.00001240&0.00000912&0.00000912&0.00000912 \\{9}&0&0.00000101&0.00000101&0.00000101 \\{10}&0.00000028&0.00000010&0.00000010&0.00000010 \\{\text{ }}&\text{ }&\text{ } \\{\text{Total}}&1.000000&0.99999997&0.99999997&0.99999997\end{pmatrix}

There is remarkable agreement among the density functions arising from the match problems with the Poisson distribution. The density function P(X_n=k) converges to the Poisson distribution with parameter \lambda=1 and the convergence occurs rapidly.

The probability of having exactly k matches in (3) above converges to \displaystyle \frac{e^{-1}}{k!}, which is the Poisson distribution with parameter \lambda=1.

The probability of no matches among the n cards in (2) above is the Taylor expansion of e^{-1}. Thus the probability of no matches in a shuffled deck of n cards when n is large is about 0.3678794412 and the probability of at least one match is approximately 0.6321205588. As the above example shows, the convergence occurs fairly rapidly.

The Moments
Since P(X_n=k) converges to the Poisson distribution with parameter \lambda=1, we might guess that the mean and variance of X_n approach \lambda=1. We now show that E(X_n)=1 and Var(X_n)=1 regardless of n.

Recall that X_n=I_1+I_2+ \cdots +I_n where for each j=1,2, \cdots, n, I_j is an indicator variable:

\displaystyle I_j=\left\{\begin{matrix}1&\thinspace Y_j=j \ \ \ \text{(} j^{th} \text{card is a match)}\\{0}&\thinspace Y_j \neq j \ \ \ \text{(} j^{th} \text{card is not a match)}\end{matrix}\right.

We claim that for all j=1,2, \cdots, n, \displaystyle P(I_j=1)=\frac{1}{n} and for all i<j, \displaystyle P(I_i=1,I_j=1)=\frac{1}{n(n-1)}. To see this, the event (I_j=1) is the event A_j defined in the derivation of the probability density function of X_n. We have \lvert A_j \vert=(n-1)!. Thus the probability that the j^{th} card is a match is:

\displaystyle P(I_j=1)=\frac{(n-1)!}{n!}=\frac{1}{n}

The event (I_i=1, I_j=1) is the event A_i \cap A_j defined in the derivation of the probability density function of X_n. We have \lvert A_i \cap A_j \vert=(n-2)!. Thus the probability that both the i^{th} card and the j^{th} card are matches is:

\displaystyle P(I_i=1,I_j=1)=\frac{(n-2)!}{n!}=\frac{1}{n(n-1)}

It follows that for all j=1,2, \cdots, n, I_j has the Bernoulli distribution with probability of success \displaystyle P(I_j=1)=\frac{1}{n}.

Knowing that I_j is Bernoulli, we know its mean and variance. For j=1,2, \cdots, n, we have:

\displaystyle E(I_j)=\frac{1}{n}

\displaystyle Var(I_j)=\frac{1}{n} \biggl(1-\frac{1}{n}\biggr)=\frac{n-1}{n^2}

It is now easy to see that E(X_n)=1. To find Var(X_n), we need to find the covariance Cov(I_i,I_j). Note that the indicator variables I_j are not independent. In evaluating Cov(I_i,I_j), we need to consider four possibilities: (I_i=0,I_j=0), (I_i=0,I_j=1), (I_i=1,I_j=0), (I_i=1,I_j=1). The only relevant case is the last one. Thus we have:

\displaystyle \begin{aligned}Cov(I_i,I_j)&=E(I_i I_j)-E(I_i) E(I_j)\\&=\biggl(\sum \limits_{x,y=0,1} x \ y P(I_i=x,I_j=y)\biggr)-E(I_i) E(I_j)\\&=P(I_i=1,I_j=1)-E(I_i) E(I_j)\\&=\frac{1}{n(n-1)}-\frac{1}{n^2}\\&=\frac{1}{n^2 (n-1)} \end{aligned}

The following derives the variance:

\displaystyle \begin{aligned}Var(X_n)&=\sum \limits_{i=1}^n Var(I_i)+2 \sum \limits_{i \ne j} Cov(I_i,I_j)\\&=n \frac{n-1}{n^2}+2 \binom{n}{2} \frac{1}{n^2 (n-1)}\\&=1 \end{aligned}

Reference

  1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968

The game of poker dice and the multinomial theorem

This post presents an application of the multinomial theorem and the multinomial coefficients to the game of poker dice. See the previous post The multinomial theorem for a discussion of the multinomial theorem and multinomial coefficients.

The game of poker dice is different from the standard poker game. Instead of drawing five cards from a deck of 52 cards, five fair dice are rolled. The resulting five scores from the dice form a poker dice hand. The possible hands are ranked as follows:

  • Five of a kind: one score occurs five times (e.g. 2,2,2,2,2).
  • Four of a kind: two distinct scores appear, one score occurs four times and the other score occurs one time (e.g. 2,2,2,2,5).
  • Full house: two distinct scores appear, one score occurs three times and the other score occurs two times (e.g. 3,3,3,1,1).
  • Three of a kind: three distinct scores appear, one score occurs three times, the other two scores occur one time each (e.g. 2,2,2,3,6).
  • Two pair: three distinct scores appear, two of the scores occur two times each and the other score occurs once (e.g. 4,4,1,1,5).
  • One pair: four distinct scores appear, one score occurs two times and the other three scores occur one time each (e.g. 5,5,1,2,4).
  • High card: five distinct scores occur (e.g. 2,3,5,4,1).

 
In rolling 5 dice, there are 6^5=7776 ordered outcomes. For example, assuming that the five dice are rolled one at a time, 2,3,2,6,2 indicates outcome that the first die results in a two and the second die results in a three and so on (this is a three of a kind). To find the probability of a three of a kind, we simply divide the number of ways such hands can occur by 7776. We use the multinomial coefficients to obtain the number of outcomes for each type of poker dice hands.

Rolling k dice (or rolling a die k times) can also be regarded as the occupancy problem of assigning k balls into 6 cells. As will be shown below, the problem of computing the probabilities of poker dice hands is seen through the lens of the occupancy problem of randonly placing 5 balls into 6 cells. For example, five of a kind is equivalent to all five balls being placed in different cells. Three of a kind is equivalent to three of the balls being in one cell, and the other two balls in two other different cells. For discussion on the occupancy problems in this blog, see the following posts:

In placing 5 balls into 6 cells, we use 6-tuples to indicate how many balls are in each of the 6 cells. For example, (0,3,1,0,0,1) denotes a three of a kind hand of 2,2,2,3,6 (the score of two appearing three times, the score of three appearing one time and the score of six appearing one time). Note that the 6 coordinates represent the six scores of a die (six cells) and the sum of the coordinates is 5. The 6-tuple of (3,1,1,0,0,0) is also a three of a kind hand, representing the outcome that the score of one appearing three times, the score of two appearing one time and the score of three appearing one time. We use the multinomial coefficients to determine how many of the 7776 ordered outcomes correspond to a 6-tuple such as (3,1,1,0,0,0). With respect to the occupancy problem, such 6-tuples are called occupancy number sets.

The Multinomial Theorem
For any positive integer n and any positive integer k, we have the following polynomial expansion:

    \displaystyle \biggl(x_1+x_2+ \cdots +x_n\biggr)^k=\sum \limits_{a_1+ \cdots +a_n=k} \frac{k!}{a_1! \ a_2! \cdots \ a_n!} \ \ x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n}

Remark
In addition to being a formula for polynomial expansion, there are two important interpretations of the multinomial theorem and multinomial coefficients. One is that of determining the number of ordered strings that can be formed using a set of alphabets. For example, with one m, four i's, four s's and two p's, there are \displaystyle \frac{11!}{1! \ 4! \ 4! \ 2!}=\text{34,650} possible 11-letter strings that can be formed, of which mississippi is one specific example.

Another interpretation is that of partitioning a set of distinct objects into several groupings where objects in each grouping are indistinguishable. For example, in a group of 11 candidates, how many ways can we form four committees such that the Committee 1 has only one member, Committee 2 has four members, Committee 3 has four members and Committee 4 has two members (assuming that each person can only serve in one committee)? The answer, as in above example, is \text{35,650}.

Example 1
In a random poker dice hand, what is the probability of obtaining a 4 two times, a 3 one time, a 5 one time and a 6 one time? Note that this is a specific example of the poker dice hand of one pair.

We consider the 6-tuple of (0,0,1,2,1,1). We are trying to partition 5 scores into four subgroups, one group having two identical scores of 4, one group with a score of 3, one group with a score of 5 and one group with a score of 6. Thus consider the following multinomial coefficient:

    \displaystyle \frac{5!}{1! \ 2! \ 1! \ 1!}=60

So out of \text{7,776} possible hands, 60 of them satisfiy the condition that a 4 appearing two times, a 3 appearing one time, a 5 appearing one time and a 6 appearing one time. The probability is:

    \displaystyle \frac{60}{7776}=0.0077160494

Example 2
What is the probability that one score appears two times, three other scores appear one time each in a random poker dice hand?

Here, we need to count all the possible poker dice hands of one pair. Both (0,0,1,2,1,1) and (2,1,1,1,0,0) are examples of one pair. In essense, we need to count all the occupancy number sets such that among the 6 coordinates (cells), one of the cells is a 2 and three of the cells are 1. To this end, we apply the multinomial theorem twice, one time on the five rolls of dice and one time one the 6 cells.

Consider the occupancy number set (2,1,1,1,0,0). Note that the multinomial coefficient is 60 as in Example 1 (the first application of the multinomial thoerem). Now look at the 6 coordinates of the occupancy number set (2,1,1,1,0,0). We wish to partition these 6 coordinates into three groupings, one with one 2, one with three 1's and one with two 0's. The following is the multinomial coefficient (the second application of the multinomial theorem):

    \displaystyle \frac{6!}{1! \ 3! \ 2!}=60

Thus the number of possible poker dice hands of one pair is: 60 \times 60=\text{3,600} and for a random poker dice hand, the probability that it is a one pair is:

    \displaystyle \frac{3600}{7776}=0.462962963

Remark
Example 2 provides the algorithm for computing the remaining poker dice hand probabilities. The key is to apply the multinomial coefficients twice, one time on a representative occupancy number set, the second time on the six cells (the six faces of a die in this case). Then the number of poker dice hands in question is the product of the two multinomial coefficients.

Example 3
What is the probability that a random poker dice hand is three of a kind?

Consider the occupancy number set of (3,1,1,0,0,0). The associated multinomial coefficient for the five rolls of dice is:

    \displaystyle \frac{5!}{3! \ 1! \ 1!}=20

Now partition the six cells into three groupings (one 3, two 1's, three 0's):

    \displaystyle \frac{6!}{1! \ 2! \ 3!}=60

Thus the probability that a random poker hand is three of a kind is:

    \displaystyle \frac{20 \times 60}{7776}=0.1543209877

Summary
The following are the probabilities of poker dice hands.

    \displaystyle P(\text{five of a kind})=\frac{6}{7776}

    \displaystyle P(\text{four of a kind})=\frac{150}{7776}

    \displaystyle P(\text{full house})=\frac{300}{7776}

    \displaystyle P(\text{three of a kind})=\frac{1200}{7776}

    \displaystyle P(\text{two pair})=\frac{1800}{7776}

    \displaystyle P(\text{one pair})=\frac{3600}{7776}

    \displaystyle P(\text{high card})=\frac{720}{7776}

________________________________________________________________________
\copyright  \ \text{2010 - 2015 by Dan Ma} (Revised March 28, 2015)

The multinomial theorem

The multinomial theorem is a statement about expanding a polynomial when it is raised to an arbitrary power. Rather than just stating the theorem and showing examples, we motivate the theorem by a concrete example of finite random sampling. This example demonstrates that the notion of finite sampling provides another interpretation to the multinomial thoerem. As a result, the multinomial theorem and the multinomial coefficients are useful in combinatorial techniques in finite random sampling models. The multinomial coefficients are also useful in partitioning a set of objects into several subgroups where each subgroup is made up of indistinguishable objects. See the post The game of poker dice and the multinomial theorem for an example of applications of these ideas.

________________________________________________________________________

An example

Suppose we have a finite set S with n elements where n is a positive integer. Suppose we select k elements from the population S with replacement. We consider ordered samples of size k and unordered samples of size k. To make this notion more concrete, consider the population consisting of 4 letters \lbrace{m,i,s,p}\rbrace. Suppose we sample 11 times with replacement. The following are two specific ordered samples of size 11:

    mississippi \ \ \ \ \ \ \ \ \ \ \ \ mississiipp \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

Thus ordered samples can be represented as strings of characters drawn from the population presented in the order in which the objects are drawn. In this example, there are 4^{11}=4194304 many ordered samples. Each character drawn has 4 choices and the drawing is done 11 times. In general, if the population has size n, then there are n^k many ordered samples of size k.

We now look at unordered samples obtained from sampling 11 times from the population \lbrace{m,i,s,p}\rbrace. In unordered samples, the order in which the letters are drawn is no longer important (or recorded). We only care about the number of instances each letter is selected. Thus each of the ordered samples shown above yields the same unordered sample: 1 \ m, 4 \ i's, 4 \ s's and 2 \ p's. We represent the unordered sample in two ways:

    (1,4,4,2)

    * \ | \ * \ * \ * \ * \ | \ * \ * \ * \ * \ | \ * \ *

The first representation is a 4-tuple where the coordinates are the number of m's, the number of i's, the number of s's and the number of p's in the 11 selections. Thus, the sum of the coordinates must be 11 (the total number of selections). The second representation of the unordered sample is made up of stars and bars. The three bars create 4 spaces and the stars in each space represent the number of characters drawn. We would like to point out that the “unordered” in the unordered samples refers to the fact that the order in which the objects are drawn is immaterial. However, both the 4-tuple notation and the stars and bars diagrams are ordered according to the 4 letters in the population (showing how many instances each object appears in the samples).

The star and bar diagram has a combinatorial advantage. For example, how many unordered samples are there when you draw 11 letters with replacement from the population \lbrace{m,i,s,p}\rbrace? In any unordered sample, there are 3 bars and 11 stars. The order of the stars and bars can be in any arbitrary order. Thus there are \displaystyle \binom{11+3}{11}=364 many unordered samples. In general, if the population size is n, then there are \displaystyle \binom{k+n-1}{k}=\binom{k+n-1}{n-1} many unordered samples, either represented as k-tuples or stars and bars diagrams with n-1 bars and k stars.

One more note about the number 364 calculated above. This is also the total number of non-negative integer solutions to the equation m+i+s+p=11. Thinking of an unordered sample as a 4-tuple, the sum of the 4 coordinates must be 11. This observation is important to understanding the multinomial theorem.

There is one more count associated with unordered samples. How many unordered samples are there when you draw 11 letters with replacement from the population \lbrace{m,i,s,p}\rbrace such that each letter is selected at least once? The answer is 120. Suppose that each letter is already selected once. Then we need to sample 7 more times out of these 4 letters. According to the above paragraph, the total count is \displaystyle \binom{7+3}{3}=120. To generalize, if the population size is n, there are \displaystyle \binom{k-n+n-1}{n-1}=\binom{k-1}{n-1} many unordered samples in which all objects in the population are represented in each sample.

We now tie unordered samples back to ordered samples. How many ordered samples are equivalent to the unordered sample (1,4,4,2)? Both ordered samples in (1) are equivalent to (1,4,4,2) (i.e. each letter is drawn the same number of times). In other words, how many 11-character strings can be formed using 1 \ m, 4 \ i's, 4 \ s's and 2 \ p's? The answer is:

    \displaystyle \begin{aligned}\binom{11}{1} \binom{10}{4} \binom{6}{4} \binom{2}{2}&=\displaystyle \frac{11!}{1! \ 4! \ 4! \ 2!}\\&=34650\end{aligned}

The reasoning for the above calculation is that out of 11 positions in the strings, we choose 1 position for the m, choose 4 positions for the i in the remaining 10 positions and so on.

________________________________________________________________________

Summary of the example

In sampling 11 times with replacement from the population \lbrace{m,i,s,p}\rbrace, we summarize our observations about the example.

  • The number of ordered samples is 4^{11}=4194304.
  • The number of unordered samples is \displaystyle \binom{11+3}{11}=364. Furthermore, the number of non-negative integer solutions to the equation m+i+s+p=11 is 364.
  • The number of unordered samples where each letter is selected is \displaystyle \binom{7+3}{3}=120. Furthermore, the number of positive integer solutions to the equation m+i+s+p=11 is 120.
  • For any unordered sample (a,b,c,d) where a+b+c+d=11, the total number of ordered samples equivalent to this unordered sample is \displaystyle \frac{11!}{a! \ b! \ c! \ d!}. As we shall see, these are called multinomial coefficients.

Note the interplay between the ordered samples and unordered samples. We start out with a large number of ordered samples (4^{11} many). We then collapse these ordered samples to just 364 unordered samples. We see that each unordered sample corresponds to certain number of ordered samples according to the multinomial coefficients. Thus we have the following sum where a,b,c,d are non-negative integers:

    \displaystyle \sum \limits_{a+b+c+d=11} \frac{11!}{a! \ b! \ c! \ d!}=4^{11}

________________________________________________________________________

The Multinomial Theorem

With the preceding discussion, we now state the multinomial theorem.

The Multinomial Theorem
For any positive integer n and any positive integer k, we have the following polynomial expansion:

    \displaystyle \biggl(x_1+x_2+ \cdots +x_n\biggr)^k=\sum \limits_{a_1+ \cdots +a_n=k} \frac{k!}{a_1! \ a_2! \cdots \ a_n!} \ x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n}

Remark
The same observations we make about the example apply. For example, the number of terms in the polynomial expansion is \displaystyle \binom{k+n-1}{k}, which is the number of non-negative integer solutions to the equation a_1+ \cdots +a_n=k. Each term x_1^{a_1} x_2^{a_2} \cdots x_n^{a_n} in the polynomial expansion can be considered as an unordered sample in the finite sampling with replacement. Then the coefficient of each term (called multinomial coefficient) is the number of associated ordered samples. As a result, the multinomial coefficients sum to n^k.

We conclude with two interpretations of the multinomial coefficient.

    \displaystyle \frac{k!}{a_1! \ a_2! \cdots \ a_n!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

If we have n many distinct symbols (say x_1,x_2, \cdots, x_n) and we have a_1 many x_1, a_2 many x_2 and so on, then the multinomial coefficient in (2) is the number of k-length character strings that can be formed using the available symbols.

Another interpretation is that of partitioning a set of k objects into n subgroups where the objects in each subgroup are indistinguishable.

Both interpretations are one and the same, just looking at the same result in a different angle. For example, all three of the following yield the same answer: \text{34,650}. We have 11 letters (one m, four i's, four s's and two p's), how many character strings can be formed with these letters?

On the other hand, we have 11 identical candies randomly distributed to Marcus, Issac, Samantha and Paul. How many ordered samples will result if Marcus receives one candy, Issac receives four candies, Samantha receives four candies and Paul receives two candies? Here, we are trying to partition a set of 11 objects into four subgroups where one group has one element, two of the groups have four elements each and another group has two elements.

If we have 11 candidates for forming four distinct committees where one committee has one member, two of the committees have four members each and another one has two members. In how many ways can this be done?

Reference

  1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968

________________________________________________________________________
Revised March 28, 2015
\copyright \ \text{2010 - 2015 by Dan Ma}