We present an example of the hypergeometric distribution seen through an independent sum of two binomial distributions. Suppose a student takes two independent multiple choice quizzes (i.e. performance on one quiz has no bearing on the other quiz). Quiz 1 has 5 problems where each of the problem has 4 choices. Quiz 2 has 5 problems with 4 choices for each problem. Suppose a student answers each question in each of the two quizzes by pure guessing. If the students has a total of four correct answers for the two quizzes combined, what is the probablity that he passes quiz 1 (60% correct)?
Suppose that is the number of correct answers in quiz 1 and is the number of correct answers in quiz 2. Then both and have binomial distribution with and . Then is the total number of correct answers and has a binomial distribution with and . The problem we need to solve is .
We propose that the conditional distribution of is a hypergeometric distribution. To see this intuitively, there are five green balls (a correct answer in quiz 1) and five yellow balls (a correct answer in quiz 2) in a bowl. Taking these two quizzes and getting a total of four correct answers would be like drawing 4 balls out of this bowl without replacement. Then what is the probability that three of the four balls are green? This is a probability obtained by the hypergeometric distribution (drawing 4 balls out of the bowl and resulting in 3 green balls and 1 yellow ball). Though not a proof, this is good intuitive description of the approach we can take. We first do the calculation and present the proof at the end.
We now evaluate the probability function . For example, to find is the probability of drawing 4 balls out of the bowl and resulting in 1 green ball and 3 yellow balls.
Thus, . Note that the unconditional probability using the binomial distribution with and . It is not surprising that the conditional probability is much greater. The conditional probability is greater because the student is lucky enough to have four correct guesses.
We now discuss the general fact. Suppose and . With an independent sum, we show that has a hypergeometric distribution.
After canceling out the terms for and , the following is the probability function for the hypergeometric distribution:
The above probability distribution describes the situation where there are similar objects, with objects belong to one class (say green balls) and objects belong to another class (say yellow balls). We choose balls out of balls without replacement. The above probability is the probability of having a result of green balls and yellow balls. There are many ways of choosing green balls out of green balls. Likewise, there are many ways of choosing yellow balls out of yellow balls. The total number ways the joint operation can take place is . Of course, we assume that each of the ways of selecting balls out of balls is equally likely.
In the conditional probability in our example, the probability of success (0.25) in the individual Bernoulli trials that make up the two binomial distributions is not used. This is because the terms for and are canceled out. If each multiple choice quiz has a different probability of success, then the resulting conditional distribution is no longer hypergeometric. In that case, the conditional probability must be obtained by first principle.