# The capture-recapture method

The capture-recapture method is one of the methods for estimating the size of wildlife populations and is based on the hypergeometric distribution. Recall that the hypergeometric distribution is a three-parameter family of discrete distributions and one of the parameters, denoted by $N$ in this post, is the size of the population. We show that the estimate for the parameter $N$ that is obtained from the capture-recapture method is the value of the parameter $N$ that makes the observed data “more likely” than any other possible values of $N$. Thus, the capture-recapture method produces the maximum likelihood estimate of the population size parameter $N$ of the hypergeometric distribution.

Let’s start with an example. In order to estimate the size of the population of bluegills (a species of fresh water fish) in a small lake in Missouri, a total of $w=250$ bluegills are captured and tagged and then released. After allowing sufficient time for the tagged fish to disperse, a sample of $n=150$ bluegills were caught. It was found that $y=16$ bluegills in the sample were tagged. Estimate the size of the bluegill population in this lake.

Let $N$ be the size of the bluegill population in this lake. The population proportion of the tagged bluegills is $\frac{w}{N}$. The sample proportion of the tagged bluegills is $\frac{y}{n}$. In the capture-recapture method, the population proportion and the sample proportion are set equaled. Then we solve for $N$.

$\displaystyle \frac{w}{N}=\frac{y}{n} \Rightarrow N=\frac{w n}{y}=\frac{250(150)}{16}=2343.75=2343$

Now, the connection to the hypergeometric distribution. After $w=250$ bluegills were captured, tagged and released, the population is separated into two distinct classes, tagged and non-tagged. When a sample of $n=150$ bluegills were selected without replacement, we let $Y$ be the number of bluegills in the sample that were tagged. The distribution of $Y$ is the hypergeometric distribution. The following is the probability function of $Y$.

$\displaystyle P[Y=y]=\frac{\binom{w}{y} \thinspace \binom{N-w}{n-y}}{\binom{N}{n}}$

In the hypergeometric distribution described here, the parameters $w$ and $n$ are known ($w=250$ and $n=150$). We now show that the estimate of $N=2343$ is the estimate that makes the observed value of $y=16$ “most likely” (i.e. the estimate of $N=2343$ is a maximum likelihood estimate of $N$). To show this, we consider the ratio of the hypergeometric probabilities for two successive values of $N$.

$\displaystyle \frac{P(N)}{P(N-1)}=\frac{(N-w)(N-n)}{N(N-w-n+y)}$

where $\displaystyle P(N)=\frac{\binom{w}{y} \thinspace \binom{N-w}{n-y}}{\binom{N}{n}}$ and $\displaystyle P(N-1)=\frac{\binom{w}{y} \thinspace \binom{N-1-w}{n-y}}{\binom{N-1}{n}}$

Note that $1<\frac{P(N)}{P(N-1)}$ or $P(N-1) if and only if the following holds:

$\displaystyle N(N-w-n+y)<(N-w)(N-n)$

$\displaystyle N<\frac{w n}{y}$

Note that $\frac{w n}{y}$ is the estimate from the capture-recapture method. It is also an upper bound for the population size $N$ such that the probability $P(N)$ is greater than $P(N-1)$. This implies that the maximum likelihood estimate of $N$ is achieved when the estimate is $\hat{N}=\frac{w n}{y}$.

As an illustration, we compute the probabilities $\displaystyle P(N)=\frac{\binom{250}{16} \thinspace \binom{N-250}{150-16}}{\binom{N}{150}}$ for several values of $N$ above and below $N=2343$. The following matrix illustrates that the maximum likelihood is achieved at $N=2343$.

$\displaystyle \begin{pmatrix} N&P(N) \\{2340}&0.1084918 \\{2341}&0.1084929 \\{2342}&0.1084935 \\{2343}&0.1084938 \\{2344}&0.1084937 \\{2345}&0.1084933 \\{2346}&0.1084924\end{pmatrix}$