The matching problem

One of the classic problems in probability theory is the “matching problem.” This problem has many variations and dated back to the early 18th century. There are many ways to describe the problem. One such description is the example of matching letters with envelops. Suppose there are n letters with n matching envelops (assume that each letter has only one matching envelop). Further suppose that the secretary stuffs the letters randomly into envelops. What is the probability that every letter is matched correctly, or that no letter is matched correctly or that exactly k letters are stuffed into the correct envelops? In this post we solve a simpler problem. What is the probability that at least one letter is stuffed into a correct envelop? Though the answer is in terms of the total number of letters n, it turns out that the answer is practically independent of n and is roughly \frac{2}{3}.

We first states the formula for the union of n events. After we apply this formula to obtain the answer to the matching problem, we give a sketch of a proof to the formula.

The union of n events
For any n events E_1,E_2, \cdots,E_n that are defined on the same sample space, we have the following formula:

P[E_1 \cup E_2 \cup \cdots \cup E_n]=\sum \limits_{m=1}^{n} (-1)^{m+1} \thinspace S_m where

S_1=\sum \limits_{r=1}^{n}P[E_r],

S_2=\sum \limits_{j<k}P[E_j \cap E_k],

S_m=\sum P[E_{i(1)} \cap E_{i(2)} \cap \cdots \cap E_{i(m)}]

Note that in the general term S_m, the sum is taken over all increasing sequence i(\cdot), i.e. 1 \le i(1) < i(2) < \cdots < i(m) \le n. For n=2,3, we have the following familiar formulas:

\displaystyle P[E_1 \cup E_2]=P[E_1]+P[E_2]-P[E_1 \cap E_2]

\displaystyle P[E_1 \cup E_2 \cup E_3]=
\displaystyle P[E_1]+P[E_2]+P[E_3]-P[E_1 \cap E_2]-P[E_1 \cap E_3]-P[E_2 \cap E_3]+P[E_1 \cap E_2 \cap E_3]

The matching problem
Suppose that the n letters are numbered 1,2, \cdots, n. Let E_i be the event that the i^{th} letter is stuffed into the correct envelop. Then P[E_1 \cup E_2 \cup \cdots \cup E_n] is the probability that at least one letter is matched with the correct envelop.

The probability of the intersection of m events is:

\displaystyle P[E_{i(1)} \cap E_{i(2)} \cap \cdots \cap E_{i(m)}]=\frac{(n-m)!}{n!}.

For m=1, we have P[E_i]=\frac{(n-1)!}{n!}=\frac{1}{n}. Note that the i^{th} position is fixed and we permute the other n-1 positions. For m=2, we have P[E_i \cap E_j]=\frac{(n-2)!}{n!}. Here, the i^{th} and j^{th} positions are fixed and we permute the other n-2 positions.

We now show that S_m=\frac{1}{m!}. There are \binom{n}{m} number of ways to have m matches out of n letters. Thus we have:

\displaystyle S_m=\binom{n}{m} \frac{(n-m)!}{n!}=\frac{1}{m!}

Using the formula for the union of n events, we have:

\displaystyle P[\bigcup_{m=1}^{n} E_m]=1-\frac{1}{2!}+\frac{1}{3!}- \cdots + (-1)^{n+1}\frac{1}{n!}

\displaystyle 1-P[\bigcup_{m=1}^{n} E_m]=1-1+\frac{1}{2!}-\frac{1}{3!}+ \cdots + (-1)^{n}\frac{1}{n!}

Note that the left hand side of the above is the first n+1 terms of the expansion of e^{-1}. Thus we have:

\displaystyle \lim_{n \rightarrow \infty}P[\bigcup_{m=1}^{n} E_m]=\lim_{n \rightarrow \infty}\biggl(1-\frac{1}{2!}+\frac{1}{3!}- \cdots + (-1)^{n+1}\frac{1}{n!}\biggr)

=1-e^{-1}=0.632121

The above limit converges quite rapidly. Let P_n=P[\bigcup_{m=1}^{n} E_m]. The following matrix shows the first several terms of this limit.

\displaystyle \begin{pmatrix} n&P_n \\{2}&0.50000 \\{3}&0.66667 \\{4}&0.62500 \\{5}&0.63333 \\{6}&0.63194 \\{7}&0.63214 \\{8}&0.63212\end{pmatrix}

Sketch of proof
We now discuss the union formula:

P[E_1 \cup E_2 \cup \cdots \cup E_n]=\sum \limits_{m=1}^{n} (-1)^{m+1} \thinspace S_m

The key idea is that any sample point in the union E_1 \cup \cdots \cup E_n is counted exactly one time in the right hand side of the formula. Suppose that a sample point is in exactly t events E_{i(1)}, \cdots E_{i(t)}. Then the following shows the number of times the sample point is counted in each expression:

\displaystyle \sum \limits_{m=1}^{n}P[E_m] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ t times

\displaystyle \sum \limits_{a<b}P[E_a \cap E_b] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \binom{t}{2} times

\displaystyle \sum \limits_{a<b<c}P[E_a \cap E_b \cap E_c] \ \ \ \ \ \ \ \ \ \binom{t}{3} times and so on

Thus the sample point in question will be counted exactly H times in the right hand side of the formula.

\displaystyle H=\binom{t}{1}-\binom{t}{2}+\binom{t}{3}- \cdots + (-1)^{t+1}\binom{t}{t}

The following is the derivation that H=1.

\displaystyle 0=(1-1)^t=\sum \limits_{a=0}^{t} \binom{t}{a}(-1)^{a}(1)^{t-a}=\binom{t}{0}-\binom{t}{1}+\binom{t}{2}+ \cdots +(-1)^t \binom{t}{t}

\displaystyle 1=\binom{k}{0}=\binom{t}{1}-\binom{t}{2}- \cdots +(-1)^{t+1} \binom{t}{t}=H

5 thoughts on “The matching problem

  1. Excellent site
    Can you solve this problem for me
    I have 4 (say x) identical chocolates (A,A,A, A) and 3 (say y) additional chocolates (B,C,D) which are completely distinct (as identified above) and I wish to distribute it among 3 (say z) children (P,Q.R,) such that each child P,Q.R, gets atleast one chocolate. How many ways can I distribute – I am looking at a simple formulae (in x,y,z)
    [ It is easy if the condition given in last line of each person getting one chocolate is not stated in which case 4 identical gets distributed in
    (4 + 5 -1) C ( 5 -1) ways and 5^3 ways for non identical chocolates and we multiply the two results ]
    I can get the answer using different cases but becomes long
    next part of the question is if 2 more children are added (making it > 4) how does this formula change

  2. Hi Dan,

    Can you consider a write up for the second part:
    What is the probability that exactly k of the letters match their envelope?

  3. What’s Going down i am new to this, I stumbled upon this I’ve
    found It positively useful and it has aided me
    out loads. I am hoping to contribute & help other customers
    like its helped me. Good job.

Leave a comment