We motivate the discussion with the following example. The notation denotes the statement that has a binomial distribution with parameters and . In other words, is the number of successes in a sequence of independent Bernoulli trials where is the probability of success in each trial.
Suppose that a student took two multiple choice quizzes in a course for probability and statistics. Each quiz has 5 questions. Each question has 4 choices and only one of the choices is correct. Suppose that the student answered all the questions by pure guessing. Furthermore, the two quizzes are independent (i.e. results of one quiz will not affect the results of the other quiz). Let be the number of correct answers in the first quiz and be the number of correct answers in the second quiz. Suppose the student was told by the instructor that she had a total of 4 correct answers in these two quizzes. What is the probability that she had 3 correct answers in the first quiz?
On the face of it, the example is all about binomial distribution. Both and are binomial distributions (both ). The sum is also a binomial distribution (). The question that is being asked is a conditional probability, i.e., . Surprisingly, this conditional probability can be computed using the hypergeometric distribution. One can always work this problem from first principle using binomial distributions. As discussed below, for a problem such as Example 1, it is always possible to replace the binomial distributions using a thought process involving the hypergeometric distribution.
Here’s how to think about the problem. This student took the two quizzes and was given the news by the instructor that she had 4 correct answers in total. She now wonders what the probability of having 3 correct answers in the first quiz is. The thought process is this. She is to pick 4 questions from 10 questions (5 of them are from Quiz 1 and 5 of them are from Quiz 2). So she is picking 4 objects from a group of two distinct types of objects. This is akin to reaching into a jar that has 5 red balls and 5 blue balls and pick 4 balls without replacement. What is the probability of picking 3 red balls and 1 blue ball? The probability just described is from a hypergeometric distribution. The following shows the calculation.
We will show below why this works. Before we do that, let’s describe the above thought process. Whenever you have two independent binomial distributions and with the same probability of success (the number of trials does not have to be the same), the conditional distribution is a hypergeometric distribution. Interestingly, the probability of success has no bearing on this observation. For Example 1, we have the following calculation.
Interestingly, the conditional mean , while the unconditional mean . The fact that the conditional mean is higher is not surprising. The student was lucky enough to have obtained 4 correct answers by guessing. Given this, she had a greater chance of doing better on the first quiz.
Why This Works
Suppose and and they are independent. The joint distribution of and has 36 points in the sample space. See the following diagram.
The probability attached to each point is
where and .
The conditional probability involves 5 points as indicated in the following diagram.
The conditional probability is simply the probability of one of the 5 sample points as a fraction of the sum total of the 5 sample points encircled in the above diagram. The following is the sum total of the probabilities of the 5 points indicated in Figure 2.
We can plug into and work out the calculation. But is actually equivalent to the following because .
As stated earlier, the conditional probability is simply the probability of one of the 5 sample points as a fraction of the sum total of the 5 sample points encircled in Figure 2. Thus we have:
With the terms involving and cancel out, we have:
Suppose and and they are independent. Then is also a binomial distribution, i.e., . Suppose that both binomial experiments and have been performed and it is known that there are successes in total. Then has a hypergeometric distribution.
As discussed earlier, think of the trials in as red balls and think of the trials in as blue balls in a jar. Think of the successes as the number of balls you are about to draw from the jar. So you reach into the jar and select balls without replacement. The calculation in gives the probability that you select red balls and blue balls.
The probability of success in the two binomial distributions have no bearing on the result since it gets canceled out in the derivation. One can always work a problem like Example 1 using first principle. Once the thought process using hypergeometric distribution is understood, it is a great way to solve this problem, that is, you can by pass the binomial distributions and go straight to the hypergeometric distribution.
Practice problems are found in the following blog post.