We motivate the discussion with the following example. The notation denotes the statement that
has a binomial distribution with parameters
and
. In other words,
is the number of successes in a sequence of
independent Bernoulli trials where
is the probability of success in each trial.
Example 1
Suppose that a student took two multiple choice quizzes in a course for probability and statistics. Each quiz has 5 questions. Each question has 4 choices and only one of the choices is correct. Suppose that the student answered all the questions by pure guessing. Furthermore, the two quizzes are independent (i.e. results of one quiz will not affect the results of the other quiz). Let be the number of correct answers in the first quiz and
be the number of correct answers in the second quiz. Suppose the student was told by the instructor that she had a total of 4 correct answers in these two quizzes. What is the probability that she had 3 correct answers in the first quiz?
On the face of it, the example is all about binomial distribution. Both and
are binomial distributions (both
). The sum
is also a binomial distribution (
). The question that is being asked is a conditional probability, i.e.,
. Surprisingly, this conditional probability can be computed using the hypergeometric distribution. One can always work this problem from first principle using binomial distributions. As discussed below, for a problem such as Example 1, it is always possible to replace the binomial distributions using a thought process involving the hypergeometric distribution.
Here’s how to think about the problem. This student took the two quizzes and was given the news by the instructor that she had 4 correct answers in total. She now wonders what the probability of having 3 correct answers in the first quiz is. The thought process is this. She is to pick 4 questions from 10 questions (5 of them are from Quiz 1 and 5 of them are from Quiz 2). So she is picking 4 objects from a group of two distinct types of objects. This is akin to reaching into a jar that has 5 red balls and 5 blue balls and pick 4 balls without replacement. What is the probability of picking 3 red balls and 1 blue ball? The probability just described is from a hypergeometric distribution. The following shows the calculation.
We will show below why this works. Before we do that, let’s describe the above thought process. Whenever you have two independent binomial distributions and
with the same probability of success
(the number of trials does not have to be the same), the conditional distribution
is a hypergeometric distribution. Interestingly, the probability of success
has no bearing on this observation. For Example 1, we have the following calculation.
Interestingly, the conditional mean , while the unconditional mean
. The fact that the conditional mean is higher is not surprising. The student was lucky enough to have obtained 4 correct answers by guessing. Given this, she had a greater chance of doing better on the first quiz.
__________________________________________________
Why This Works
Suppose and
and they are independent. The joint distribution of
and
has 36 points in the sample space. See the following diagram.
Figure 1

The probability attached to each point is
where and
.
The conditional probability involves 5 points as indicated in the following diagram.
Figure 2

The conditional probability is simply the probability of one of the 5 sample points as a fraction of the sum total of the 5 sample points encircled in the above diagram. The following is the sum total of the probabilities of the 5 points indicated in Figure 2.
We can plug into
and work out the calculation. But
is actually equivalent to the following because
.
As stated earlier, the conditional probability is simply the probability of one of the 5 sample points as a fraction of the sum total of the 5 sample points encircled in Figure 2. Thus we have:
With the terms involving and
cancel out, we have:
__________________________________________________
Summary
Suppose and
and they are independent. Then
is also a binomial distribution, i.e.,
. Suppose that both binomial experiments
and
have been performed and it is known that there are
successes in total. Then
has a hypergeometric distribution.
where .
As discussed earlier, think of the trials in
as red balls and think of the
trials in
as blue balls in a jar. Think of the
successes as the number of balls you are about to draw from the jar. So you reach into the jar and select
balls without replacement. The calculation in
gives the probability that you select
red balls and
blue balls.
The probability of success in the two binomial distributions have no bearing on the result since it gets canceled out in the derivation. One can always work a problem like Example 1 using first principle. Once the thought process using hypergeometric distribution is understood, it is a great way to solve this problem, that is, you can by pass the binomial distributions and go straight to the hypergeometric distribution.
__________________________________________________
Additional Practice
Practice problems are found in the following blog post.