We started the discussion on the matching problem in a previous post (The matching problem). We would like to add a little bit more information. The matching problem has many colorful descriptions. Here are some examples:
- In a dance class with married couples, suppose that the dance instructor assigns partners randomly. We have a match if a married couple is paired as dancing partners.
- There are letters with matching envelops. Suppose that the letters are randomly stuffed into the envelops. We have a match if a letter is placed into the correct envelop.
In this post we use the generic example of a deck of shuffled cards. Suppose a deck of cards is numbered through . After the deck is thoroughly shuffled, if the card numbered is the card in the shuffled deck, we say that it is a match. It is clear that the above two examples and the shuffled deck example are equivalent mathematically.
How many matches will there be in a shuffled deck? What is the probability that there will be at least one match? What is the probability that there will be exactly matches where ? On average, how many matches will there be? The number of matches is a random variable. We comment on this random variable resulting from the matching problem. We also derive the probability density function and its moments. We also comment the relation this random variable with the Poisson distribution.
Consider the set . The matching problem corresponds to the random selection of all the points in without replacement. The random selection of points of the entire set results in ordered samples where each and for . These ordered samples are precisely the permutations of the set and there are many such permutations.
A match occurs when some element is selected in the pick. In the card example, the card is a match when the card that is numbered is in the position in the shuffled deck (). The matching problem is a counting problem. We demonstrate how to count the permutations that result in no matches, at least one match and exactly matches using the inclusion-exclusion principle.
For each , we define the following indicator variable :
Furthermore, let . The random variable is the number of matches when a deck of cards (numbered through ) are shuffled. We derive the probability function of as well as its first and second moments.
The Inclusion-Exclusion Principle
For any events , the following is the cardinality of the event :
where the counts are defined by the following:
In the inclusion-exclusion formula, we sum the cardinalities of the sets, subtract the cardinalities of intersections of pairs, add the cardinalities of the intersection of triples and so on. When , we have the familiar formula:
The Probability Density Function
For each in , let be the event that the card is a match. The event is the event that there is at least one match in the deck of shuffled cards. We use the inclusion-exclusion principle to derive the count for this event. The union and its complement “no matches in the cards” will become the basis for deriving the density function of .
For the event , the card is fixed and the other cards are free to permute. Thus . As a result, .
For the event , the card and the card are fixed and the other cards are free to permute. Thus we have and
In the general case, we have
Applying the inclusion-exclusion principle, we have:
Out of the permutations, of them have no matches. Thus the probability of having no matches in a shuffled deck of cards is:
To derive , we consider the event that of the cards is a match and the other cards are not matches. Using to obtain the count for the event that cards are not matches, we have:
For the general case, we have:
We point out that . It is impossible to have exactly matches while the remaining card is not a match. This can also be verified by looking at the probability density function. On the other hand, . There is only one permutation out of many where all the cards are matches.
We compare the probability density functions , and the Poisson desnity function with parameter . The following matrix shows the results (rounded to eight decimal places).
There is remarkable agreement among the density functions arising from the match problems with the Poisson distribution. The density function converges to the Poisson distribution with parameter and the convergence occurs rapidly.
The probability of having exactly matches in above converges to , which is the Poisson distribution with parameter .
The probability of no matches among the cards in above is the Taylor expansion of . Thus the probability of no matches in a shuffled deck of cards when is large is about and the probability of at least one match is approximately . As the above example shows, the convergence occurs fairly rapidly.
Since converges to the Poisson distribution with parameter , we might guess that the mean and variance of approach . We now show that and regardless of .
Recall that where for each , is an indicator variable:
We claim that for all , and for all , . To see this, the event is the event defined in the derivation of the probability density function of . We have . Thus the probability that the card is a match is:
The event is the event defined in the derivation of the probability density function of . We have . Thus the probability that both the card and the card are matches is:
It follows that for all , has the Bernoulli distribution with probability of success .
Knowing that is Bernoulli, we know its mean and variance. For , we have:
It is now easy to see that . To find , we need to find the covariance . Note that the indicator variables are not independent. In evaluating , we need to consider four possibilities: , , , . The only relevant case is the last one. Thus we have:
The following derives the variance:
- Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968