More about the matching problem

We started the discussion on the matching problem in a previous post (The matching problem). We would like to add a little bit more information. The matching problem has many colorful descriptions. Here are some examples:

  • In a dance class with n married couples,¬†suppose that¬†the dance instructor assigns partners randomly. We have a match if a married couple is paired as dancing partners.
  • There are n letters with n matching envelops. Suppose that the letters are randomly stuffed into the envelops. We have a match if a letter is placed into the correct envelop.

In this post we use the generic example of a deck of shuffled cards. Suppose a deck of n cards is numbered 1 through n. After the deck is thoroughly shuffled, if the card numbered i is the i^{th} card in the shuffled deck, we say that it is a match. It is clear that the above two examples and the shuffled deck example are equivalent mathematically.

How many matches will there be in a shuffled deck? What is the probability that there will be at least one match? What is the probability that there will be exactly k matches where k=0,1, \cdots, n? On average, how many matches will there be? The number of matches is a random variable. We comment on this random variable resulting from the matching problem. We also derive the probability density function and its moments. We also comment the relation this random variable with the Poisson distribution.

Consider the set S=\lbrace{1,2, \cdots, n}\rbrace. The matching problem corresponds to the random selection of all the points in S without replacement. The random selection of points of the entire set S results in ordered samples (Y_1,Y_2, \cdots, Y_n) where each Y_i \in S and Y_i \neq Y_j for i \neq j. These ordered samples are precisely the permutations of the set S and there are n! many such permutations.

A match occurs when some element i \in S is selected in the i^{th} pick. In the card example, the i^{th} card is a match when the card that is numbered i is in the i^{th} position in the shuffled deck (Y_i=i). The matching problem is a counting problem. We demonstrate how to count the n! permutations that result in no matches, at least one match and exactly k matches using the inclusion-exclusion principle.

For each i \in S, we define the following indicator variable I_i:

\displaystyle I_i=\left\{\begin{matrix}1&\thinspace Y_i=i \ \ \ \text{(} i^{th} \text{card is a match)}\\{0}&\thinspace Y_i \neq i \ \ \ \text{(} i^{th} \text{card is not a match)}\end{matrix}\right.

Furthermore, let X_n=I_1+I_2+ \cdots +I_n. The random variable X_n is the number of matches when a deck of n cards (numbered 1 through n) are shuffled. We derive the probability function of X_n as well as its first and second moments.

The Inclusion-Exclusion Principle
For any events A_1,A_2, \cdots, A_n, the following is the cardinality of the event A_1 \cup A_2 \cup \cdots \cup A_n:

\displaystyle \lvert A_1 \cup A_2 \cup \cdots \cup A_n \lvert=\sum \limits_{m=0}^n (-1)^{m+1} S_m where the counts S_m are defined by the following:

\displaystyle S_1=\sum \limits_{i=1}^n \lvert A_i \lvert

\displaystyle S_2=\sum \limits_{i<j} \lvert A_i \cap A_j\lvert

\displaystyle S_m=\sum \limits_{i(1)< \cdots <i(m)} \lvert A_{i(1)} \cap \cdots \cap A_{i(m)}\lvert

Remark
In the inclusion-exclusion formula, we sum the cardinalities of the sets, subtract the cardinalities of intersections of pairs, add the cardinalities of the intersection of triples and so on. When n=2, we have the familiar formula:

\displaystyle \lvert A_1 \cup A_2 \lvert=\lvert A_1 \lvert +\lvert A_2 \lvert-\lvert A_1 \cap A_2\lvert

The Probability Density Function
For each i in S=\lbrace{1,2, \cdots, n}\rbrace, let A_i be the event that the i^{th} card is a match. The event A_1 \cup \cdots \cup A_n is the event that there is at least one match in the deck of n shuffled cards. We use the inclusion-exclusion principle to derive the count for this event. The union and its complement “no matches in the cards” will become the basis for deriving the density function of X_n.

For the event A_i, the i^{th} card is fixed and the other n-1 cards are free to permute. Thus \vert A_i \vert=(n-1)!. As a result, \displaystyle S_1=\binom{n}{1} (n-1)!=n!.

For the event A_i \cap A_j, the i^{th} card and the j^{th} card are fixed and the other n-2 cards are free to permute. Thus we have \vert A_i \cap A_j \vert=(n-2)! and \displaystyle S_2=\binom{n}{2} (n-2)!=\frac{n!}{2!}

In the general case, we have \displaystyle S_i=\binom{n}{i} (n-i)!=\frac{n!}{i!}

Applying the inclusion-exclusion principle, we have:

\displaystyle \lvert A_1 \cup A_2 \cup \cdots \cup A_n \lvert=\sum \limits_{i=1}^n (-1)^{i+1} S_i

\displaystyle \begin{aligned}\lvert (A_1 \cup A_2 \cup \cdots \cup A_n)^c \lvert&=n!-\sum \limits_{i=1}^n (-1)^{i+1} S_i\\&=n!-\sum \limits_{i=1}^n (-1)^{i+1} \frac{n!}{i!}\\&=n!+\sum \limits_{i=1}^n (-1)^{i} \frac{n!}{i!}\\&=\sum \limits_{i=0}^n (-1)^{i} \frac{n!}{i!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)\end{aligned}

Out of the n! permutations, \displaystyle \lvert (A_1 \cup A_2 \cup \cdots \cup A_n)^c \lvert of them have no matches. Thus the probability of having no matches in a shuffled deck of n cards is:

\displaystyle \begin{aligned}P(\text{no matches in n cards})&=\displaystyle \biggl(\sum \limits_{i=0}^n (-1)^{i} \frac{n!}{i!}\biggr) \frac{1}{n!}\\&=\sum \limits_{i=0}^n \frac{(-1)^{i}}{i!}\\&=P(X_n=0)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)\end{aligned}

To derive P(X_n=1), we consider the event that 1 of the cards is a match and the other n-1 cards are not matches. Using (1) to obtain the count for the event that n-1 cards are not matches, we have:

\displaystyle \begin{aligned}P(X_n=1)&=\displaystyle \binom{n}{1}\displaystyle \biggl(\sum \limits_{i=0}^{n-1} (-1)^{i} \frac{(n-1)!}{i!}\biggr) \frac{1}{n!}\\&=\sum \limits_{i=0}^{n-1} \frac{(-1)^{i}}{i!}\end{aligned}

For the general case, we have:

\displaystyle \begin{aligned}P(X_n=k)&=\displaystyle \binom{n}{k}\displaystyle \biggl(\sum \limits_{i=0}^{n-k} (-1)^{i} \frac{(n-k)!}{i!}\biggr) \frac{1}{n!}\\&=\frac{1}{k!}\sum \limits_{i=0}^{n-k} \frac{(-1)^{i}}{i!}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)\end{aligned}

We point out that P(X_n=n-1)=0. It is impossible to have exactly n-1 matches while the remaining card is not a match. This can also be verified by looking at the probability density function. On the other hand, \displaystyle P(X_n=n)=\frac{1}{n!}. There is only one permutation out of n! many where all the cards are matches.

Poisson Distribution
We compare the probability density functions P(X_{10}=k), P(X_{20}=k) P(X_{150}=k) and the Poisson desnity function with parameter \lambda=1. The following matrix shows the results (rounded to eight decimal places).

\displaystyle \begin{pmatrix} \text{k}&P(X_{10}=k)&P(X_{20}=k)&P(X_{150}=k)&\displaystyle \frac{e^{-1}}{k!} \\{\text{ }}&\text{ }&\text{ } \\{0}&0.36787946&0.36787944&0.36787944&0.36787944 \\{1}&0.36787919&0.36787944&0.36787944&0.36787944 \\{2}&0.18394097&0.18393972&0.18393972&0.18393972 \\{3}&0.06130952&0.06131324&0.06131324&0.06131324 \\{4}&0.01533565&0.01532831&0.01532831&0.01532831 \\{5}&0.00305556&0.00306566&0.00306566&0.00306566 \\{6}&0.00052083&0.00051094&0.00051094&0.00051094 \\{7}&0.00006614&0.00007299&0.00007299&0.00007299 \\{8}&0.00001240&0.00000912&0.00000912&0.00000912 \\{9}&0&0.00000101&0.00000101&0.00000101 \\{10}&0.00000028&0.00000010&0.00000010&0.00000010 \\{\text{ }}&\text{ }&\text{ } \\{\text{Total}}&1.000000&0.99999997&0.99999997&0.99999997\end{pmatrix}

There is remarkable agreement among the density functions arising from the match problems with the Poisson distribution. The density function P(X_n=k) converges to the Poisson distribution with parameter \lambda=1 and the convergence occurs rapidly.

The probability of having exactly k matches in (3) above converges to \displaystyle \frac{e^{-1}}{k!}, which is the Poisson distribution with parameter \lambda=1.

The probability of no matches among the n cards in (2) above is the Taylor expansion of e^{-1}. Thus the probability of no matches in a shuffled deck of n cards when n is large is about 0.3678794412 and the probability of at least one match is approximately 0.6321205588. As the above example shows, the convergence occurs fairly rapidly.

The Moments
Since P(X_n=k) converges to the Poisson distribution with parameter \lambda=1, we might guess that the mean and variance of X_n approach \lambda=1. We now show that E(X_n)=1 and Var(X_n)=1 regardless of n.

Recall that X_n=I_1+I_2+ \cdots +I_n where for each j=1,2, \cdots, n, I_j is an indicator variable:

\displaystyle I_j=\left\{\begin{matrix}1&\thinspace Y_j=j \ \ \ \text{(} j^{th} \text{card is a match)}\\{0}&\thinspace Y_j \neq j \ \ \ \text{(} j^{th} \text{card is not a match)}\end{matrix}\right.

We claim that for all j=1,2, \cdots, n, \displaystyle P(I_j=1)=\frac{1}{n} and for all i<j, \displaystyle P(I_i=1,I_j=1)=\frac{1}{n(n-1)}. To see this, the event (I_j=1) is the event A_j defined in the derivation of the probability density function of X_n. We have \lvert A_j \vert=(n-1)!. Thus the probability that the j^{th} card is a match is:

\displaystyle P(I_j=1)=\frac{(n-1)!}{n!}=\frac{1}{n}

The event (I_i=1, I_j=1) is the event A_i \cap A_j defined in the derivation of the probability density function of X_n. We have \lvert A_i \cap A_j \vert=(n-2)!. Thus the probability that both the i^{th} card and the j^{th} card are matches is:

\displaystyle P(I_i=1,I_j=1)=\frac{(n-2)!}{n!}=\frac{1}{n(n-1)}

It follows that for all j=1,2, \cdots, n, I_j has the Bernoulli distribution with probability of success \displaystyle P(I_j=1)=\frac{1}{n}.

Knowing that I_j is Bernoulli, we know its mean and variance. For j=1,2, \cdots, n, we have:

\displaystyle E(I_j)=\frac{1}{n}

\displaystyle Var(I_j)=\frac{1}{n} \biggl(1-\frac{1}{n}\biggr)=\frac{n-1}{n^2}

It is now easy to see that E(X_n)=1. To find Var(X_n), we need to find the covariance Cov(I_i,I_j). Note that the indicator variables I_j are not independent. In evaluating Cov(I_i,I_j), we need to consider four possibilities: (I_i=0,I_j=0), (I_i=0,I_j=1), (I_i=1,I_j=0), (I_i=1,I_j=1). The only relevant case is the last one. Thus we have:

\displaystyle \begin{aligned}Cov(I_i,I_j)&=E(I_i I_j)-E(I_i) E(I_j)\\&=\biggl(\sum \limits_{x,y=0,1} x \ y P(I_i=x,I_j=y)\biggr)-E(I_i) E(I_j)\\&=P(I_i=1,I_j=1)-E(I_i) E(I_j)\\&=\frac{1}{n(n-1)}-\frac{1}{n^2}\\&=\frac{1}{n^2 (n-1)} \end{aligned}

The following derives the variance:

\displaystyle \begin{aligned}Var(X_n)&=\sum \limits_{i=1}^n Var(I_i)+2 \sum \limits_{i \ne j} Cov(I_i,I_j)\\&=n \frac{n-1}{n^2}+2 \binom{n}{2} \frac{1}{n^2 (n-1)}\\&=1 \end{aligned}

Reference

  1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968
Advertisements