# More about the matching problem

We started the discussion on the matching problem in a previous post (The matching problem). We would like to add a little bit more information. The matching problem has many colorful descriptions. Here are some examples:

• In a dance class with $n$ married couples, suppose that the dance instructor assigns partners randomly. We have a match if a married couple is paired as dancing partners.
• There are $n$ letters with $n$ matching envelops. Suppose that the letters are randomly stuffed into the envelops. We have a match if a letter is placed into the correct envelop.

In this post we use the generic example of a deck of shuffled cards. Suppose a deck of $n$ cards is numbered $1$ through $n$. After the deck is thoroughly shuffled, if the card numbered $i$ is the $i^{th}$ card in the shuffled deck, we say that it is a match. It is clear that the above two examples and the shuffled deck example are equivalent mathematically.

How many matches will there be in a shuffled deck? What is the probability that there will be at least one match? What is the probability that there will be exactly $k$ matches where $k=0,1, \cdots, n$? On average, how many matches will there be? The number of matches is a random variable. We comment on this random variable resulting from the matching problem. We also derive the probability density function and its moments. We also comment the relation this random variable with the Poisson distribution.

Consider the set $S=\lbrace{1,2, \cdots, n}\rbrace$. The matching problem corresponds to the random selection of all the points in $S$ without replacement. The random selection of points of the entire set $S$ results in ordered samples $(Y_1,Y_2, \cdots, Y_n)$ where each $Y_i \in S$ and $Y_i \neq Y_j$ for $i \neq j$. These ordered samples are precisely the permutations of the set $S$ and there are $n!$ many such permutations.

A match occurs when some element $i \in S$ is selected in the $i^{th}$ pick. In the card example, the $i^{th}$ card is a match when the card that is numbered $i$ is in the $i^{th}$ position in the shuffled deck ($Y_i=i$). The matching problem is a counting problem. We demonstrate how to count the $n!$ permutations that result in no matches, at least one match and exactly $k$ matches using the inclusion-exclusion principle.

For each $i \in S$, we define the following indicator variable $I_i$:

$\displaystyle I_i=\left\{\begin{matrix}1&\thinspace Y_i=i \ \ \ \text{(} i^{th} \text{card is a match)}\\{0}&\thinspace Y_i \neq i \ \ \ \text{(} i^{th} \text{card is not a match)}\end{matrix}\right.$

Furthermore, let $X_n=I_1+I_2+ \cdots +I_n$. The random variable $X_n$ is the number of matches when a deck of $n$ cards (numbered $1$ through $n$) are shuffled. We derive the probability function of $X_n$ as well as its first and second moments.

The Inclusion-Exclusion Principle
For any events $A_1,A_2, \cdots, A_n$, the following is the cardinality of the event $A_1 \cup A_2 \cup \cdots \cup A_n$:

$\displaystyle \lvert A_1 \cup A_2 \cup \cdots \cup A_n \lvert=\sum \limits_{m=0}^n (-1)^{m+1} S_m$ where the counts $S_m$ are defined by the following:

$\displaystyle S_1=\sum \limits_{i=1}^n \lvert A_i \lvert$

$\displaystyle S_2=\sum \limits_{i

$\displaystyle S_m=\sum \limits_{i(1)< \cdots

Remark
In the inclusion-exclusion formula, we sum the cardinalities of the sets, subtract the cardinalities of intersections of pairs, add the cardinalities of the intersection of triples and so on. When $n=2$, we have the familiar formula:

$\displaystyle \lvert A_1 \cup A_2 \lvert=\lvert A_1 \lvert +\lvert A_2 \lvert-\lvert A_1 \cap A_2\lvert$

The Probability Density Function
For each $i$ in $S=\lbrace{1,2, \cdots, n}\rbrace$, let $A_i$ be the event that the $i^{th}$ card is a match. The event $A_1 \cup \cdots \cup A_n$ is the event that there is at least one match in the deck of $n$ shuffled cards. We use the inclusion-exclusion principle to derive the count for this event. The union and its complement “no matches in the cards” will become the basis for deriving the density function of $X_n$.

For the event $A_i$, the $i^{th}$ card is fixed and the other $n-1$ cards are free to permute. Thus $\vert A_i \vert=(n-1)!$. As a result, $\displaystyle S_1=\binom{n}{1} (n-1)!=n!$.

For the event $A_i \cap A_j$, the $i^{th}$ card and the $j^{th}$ card are fixed and the other $n-2$ cards are free to permute. Thus we have $\vert A_i \cap A_j \vert=(n-2)!$ and $\displaystyle S_2=\binom{n}{2} (n-2)!=\frac{n!}{2!}$

In the general case, we have $\displaystyle S_i=\binom{n}{i} (n-i)!=\frac{n!}{i!}$

Applying the inclusion-exclusion principle, we have:

$\displaystyle \lvert A_1 \cup A_2 \cup \cdots \cup A_n \lvert=\sum \limits_{i=1}^n (-1)^{i+1} S_i$

\displaystyle \begin{aligned}\lvert (A_1 \cup A_2 \cup \cdots \cup A_n)^c \lvert&=n!-\sum \limits_{i=1}^n (-1)^{i+1} S_i\\&=n!-\sum \limits_{i=1}^n (-1)^{i+1} \frac{n!}{i!}\\&=n!+\sum \limits_{i=1}^n (-1)^{i} \frac{n!}{i!}\\&=\sum \limits_{i=0}^n (-1)^{i} \frac{n!}{i!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)\end{aligned}

Out of the $n!$ permutations, $\displaystyle \lvert (A_1 \cup A_2 \cup \cdots \cup A_n)^c \lvert$ of them have no matches. Thus the probability of having no matches in a shuffled deck of $n$ cards is:

\displaystyle \begin{aligned}P(\text{no matches in n cards})&=\displaystyle \biggl(\sum \limits_{i=0}^n (-1)^{i} \frac{n!}{i!}\biggr) \frac{1}{n!}\\&=\sum \limits_{i=0}^n \frac{(-1)^{i}}{i!}\\&=P(X_n=0)\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)\end{aligned}

To derive $P(X_n=1)$, we consider the event that $1$ of the cards is a match and the other $n-1$ cards are not matches. Using $(1)$ to obtain the count for the event that $n-1$ cards are not matches, we have:

\displaystyle \begin{aligned}P(X_n=1)&=\displaystyle \binom{n}{1}\displaystyle \biggl(\sum \limits_{i=0}^{n-1} (-1)^{i} \frac{(n-1)!}{i!}\biggr) \frac{1}{n!}\\&=\sum \limits_{i=0}^{n-1} \frac{(-1)^{i}}{i!}\end{aligned}

For the general case, we have:

\displaystyle \begin{aligned}P(X_n=k)&=\displaystyle \binom{n}{k}\displaystyle \biggl(\sum \limits_{i=0}^{n-k} (-1)^{i} \frac{(n-k)!}{i!}\biggr) \frac{1}{n!}\\&=\frac{1}{k!}\sum \limits_{i=0}^{n-k} \frac{(-1)^{i}}{i!}\ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)\end{aligned}

We point out that $P(X_n=n-1)=0$. It is impossible to have exactly $n-1$ matches while the remaining card is not a match. This can also be verified by looking at the probability density function. On the other hand, $\displaystyle P(X_n=n)=\frac{1}{n!}$. There is only one permutation out of $n!$ many where all the cards are matches.

Poisson Distribution
We compare the probability density functions $P(X_{10}=k)$, $P(X_{20}=k)$ $P(X_{150}=k)$ and the Poisson desnity function with parameter $\lambda=1$. The following matrix shows the results (rounded to eight decimal places).

$\displaystyle \begin{pmatrix} \text{k}&P(X_{10}=k)&P(X_{20}=k)&P(X_{150}=k)&\displaystyle \frac{e^{-1}}{k!} \\{\text{ }}&\text{ }&\text{ } \\{0}&0.36787946&0.36787944&0.36787944&0.36787944 \\{1}&0.36787919&0.36787944&0.36787944&0.36787944 \\{2}&0.18394097&0.18393972&0.18393972&0.18393972 \\{3}&0.06130952&0.06131324&0.06131324&0.06131324 \\{4}&0.01533565&0.01532831&0.01532831&0.01532831 \\{5}&0.00305556&0.00306566&0.00306566&0.00306566 \\{6}&0.00052083&0.00051094&0.00051094&0.00051094 \\{7}&0.00006614&0.00007299&0.00007299&0.00007299 \\{8}&0.00001240&0.00000912&0.00000912&0.00000912 \\{9}&0&0.00000101&0.00000101&0.00000101 \\{10}&0.00000028&0.00000010&0.00000010&0.00000010 \\{\text{ }}&\text{ }&\text{ } \\{\text{Total}}&1.000000&0.99999997&0.99999997&0.99999997\end{pmatrix}$

There is remarkable agreement among the density functions arising from the match problems with the Poisson distribution. The density function $P(X_n=k)$ converges to the Poisson distribution with parameter $\lambda=1$ and the convergence occurs rapidly.

The probability of having exactly $k$ matches in $(3)$ above converges to $\displaystyle \frac{e^{-1}}{k!}$, which is the Poisson distribution with parameter $\lambda=1$.

The probability of no matches among the $n$ cards in $(2)$ above is the Taylor expansion of $e^{-1}$. Thus the probability of no matches in a shuffled deck of $n$ cards when $n$ is large is about $0.3678794412$ and the probability of at least one match is approximately $0.6321205588$. As the above example shows, the convergence occurs fairly rapidly.

The Moments
Since $P(X_n=k)$ converges to the Poisson distribution with parameter $\lambda=1$, we might guess that the mean and variance of $X_n$ approach $\lambda=1$. We now show that $E(X_n)=1$ and $Var(X_n)=1$ regardless of $n$.

Recall that $X_n=I_1+I_2+ \cdots +I_n$ where for each $j=1,2, \cdots, n$, $I_j$ is an indicator variable:

$\displaystyle I_j=\left\{\begin{matrix}1&\thinspace Y_j=j \ \ \ \text{(} j^{th} \text{card is a match)}\\{0}&\thinspace Y_j \neq j \ \ \ \text{(} j^{th} \text{card is not a match)}\end{matrix}\right.$

We claim that for all $j=1,2, \cdots, n$, $\displaystyle P(I_j=1)=\frac{1}{n}$ and for all $i, $\displaystyle P(I_i=1,I_j=1)=\frac{1}{n(n-1)}$. To see this, the event $(I_j=1)$ is the event $A_j$ defined in the derivation of the probability density function of $X_n$. We have $\lvert A_j \vert=(n-1)!$. Thus the probability that the $j^{th}$ card is a match is:

$\displaystyle P(I_j=1)=\frac{(n-1)!}{n!}=\frac{1}{n}$

The event $(I_i=1, I_j=1)$ is the event $A_i \cap A_j$ defined in the derivation of the probability density function of $X_n$. We have $\lvert A_i \cap A_j \vert=(n-2)!$. Thus the probability that both the $i^{th}$ card and the $j^{th}$ card are matches is:

$\displaystyle P(I_i=1,I_j=1)=\frac{(n-2)!}{n!}=\frac{1}{n(n-1)}$

It follows that for all $j=1,2, \cdots, n$, $I_j$ has the Bernoulli distribution with probability of success $\displaystyle P(I_j=1)=\frac{1}{n}$.

Knowing that $I_j$ is Bernoulli, we know its mean and variance. For $j=1,2, \cdots, n$, we have:

$\displaystyle E(I_j)=\frac{1}{n}$

$\displaystyle Var(I_j)=\frac{1}{n} \biggl(1-\frac{1}{n}\biggr)=\frac{n-1}{n^2}$

It is now easy to see that $E(X_n)=1$. To find $Var(X_n)$, we need to find the covariance $Cov(I_i,I_j)$. Note that the indicator variables $I_j$ are not independent. In evaluating $Cov(I_i,I_j)$, we need to consider four possibilities: $(I_i=0,I_j=0)$, $(I_i=0,I_j=1)$, $(I_i=1,I_j=0)$, $(I_i=1,I_j=1)$. The only relevant case is the last one. Thus we have:

\displaystyle \begin{aligned}Cov(I_i,I_j)&=E(I_i I_j)-E(I_i) E(I_j)\\&=\biggl(\sum \limits_{x,y=0,1} x \ y P(I_i=x,I_j=y)\biggr)-E(I_i) E(I_j)\\&=P(I_i=1,I_j=1)-E(I_i) E(I_j)\\&=\frac{1}{n(n-1)}-\frac{1}{n^2}\\&=\frac{1}{n^2 (n-1)} \end{aligned}

The following derives the variance:

\displaystyle \begin{aligned}Var(X_n)&=\sum \limits_{i=1}^n Var(I_i)+2 \sum \limits_{i \ne j} Cov(I_i,I_j)\\&=n \frac{n-1}{n^2}+2 \binom{n}{2} \frac{1}{n^2 (n-1)}\\&=1 \end{aligned}

Reference

1. Feller, W., An Introduction to Probability Theory and its Applications, Vol. I, 3rd ed., John Wiley & Sons, Inc., New York, 1968