The Negative Binomial Distribution

A counting distribution is a discrete distribution with probabilities only on the nonnegative integers. Such distributions are important in insurance applications since they can be used to model the number of events such as losses to the insured or claims to the insurer. Though playing a prominent role in statistical theory, the Poisson distribution is not appropriate in all situations, since it requires that the mean and the variance are equaled. Thus the negative binomial distribution is an excellent alternative to the Poisson distribution, especially in the cases where the observed variance is greater than the observed mean.

The negative binomial distribution arises naturally from a probability experiment of performing a series of independent Bernoulli trials until the occurrence of the rth success where r is a positive integer. From this starting point, we discuss three ways to define the distribution. We then discuss several basic properties of the negative binomial distribution. Emphasis is placed on the close connection between the Poisson distribution and the negative binomial distribution.

________________________________________________________________________

Definitions
We define three versions of the negative binomial distribution. The first two versions arise from the view point of performing a series of independent Bernoulli trials until the rth success where r is a positive integer. A Bernoulli trial is a probability experiment whose outcome is random such that there are two possible outcomes (success or failure).

Let X_1 be the number of Bernoulli trials required for the rth success to occur where r is a positive integer. Let p is the probability of success in each trial. The following is the probability function of X_1:

\displaystyle (1) \ \ \ \ \ P(X_1=x)= \binom{x-1}{r-1} p^r (1-p)^{x-r} \ \ \ \ \ \ \ x=r,r+1,r+2,\cdots

The idea for (1) is that for X_1=x to happen, there must be r-1 successes in the first x-1 trials and one additional success occurring in the last trial (the xth trial).

A more common version of the negative binomial distribution is the number of Bernoulli trials in excess of r in order to produce the rth success. In other words, we consider the number of failures before the occurrence of the rth success. Let X_2 be this random variable. The following is the probability function of X_2:

\displaystyle (2) \ \ \ \ \ P(X_2=x)=\binom{x+r-1}{x} p^r (1-p)^x \ \ \ \ \ \ \ x=0,1,2,\cdots

The idea for (2) is that there are x+r trials and in the first x+r-1 trials, there are x failures (or equivalently r-1 successes).

In both (1) and (2), the binomial coefficient is defined by

\displaystyle (3) \ \ \ \ \ \binom{y}{k}=\frac{y!}{k! \ (y-k)!}=\frac{y(y-1) \cdots (y-(k-1))}{k!}

where y is a positive integer and k is a nonnegative integer. However, the right-hand-side of (3) can be calculated even if y is not a positive integer. Thus the binomial coefficient \displaystyle \binom{y}{k} can be expanded to work for all real number y. However k must still be nonnegative integer.

\displaystyle (4) \ \ \ \ \ \binom{y}{k}=\frac{y(y-1) \cdots (y-(k-1))}{k!}

For convenience, we let \displaystyle \binom{y}{0}=1. When the real number y>k-1, the binomial coefficient in (4) can be expressed as:

\displaystyle (5) \ \ \ \ \ \binom{y}{k}=\frac{\Gamma(y+1)}{\Gamma(k+1) \Gamma(y-k+1)}

where \Gamma(\cdot) is the gamma function.

With the more relaxed notion of binomial coefficient, the probability function in (2) above can be defined for all real number r. Thus the general version of the negative binomial distribution has two parameters r and p, both real numbers, such that 0<p<1. The following is its probability function.

\displaystyle (6) \ \ \ \ \ P(X=x)=\binom{x+r-1}{x} p^r (1-p)^x \ \ \ \ \ \ \ x=0,1,2,\cdots

Whenever r in (6) is a real number that is not a positive integer, the interpretation of counting the number of failures until the occurrence of the rth success is no longer important. Instead we can think of it simply as a count distribution.

The following alternative parametrization of the negative binomial distribution is also useful.

\displaystyle (6a) \ \ \ \ \ P(X=x)=\binom{x+r-1}{x} \biggl(\frac{\alpha}{\alpha+1}\biggr)^r \biggl(\frac{1}{\alpha+1}\biggr)^x \ \ \ \ \ \ \ x=0,1,2,\cdots

The parameters in this alternative parametrization are r and \alpha>0. Clearly, the ratio \frac{\alpha}{\alpha+1} takes the place of p in (6). Unless stated otherwise, we use the parametrization of (6).
________________________________________________________________________

What is negative about the negative binomial distribution?
What is negative about this distribution? What is binomial about this distribution? The name is suggested by the fact that the binomial coefficient in (6) can be rearranged as follows:

\displaystyle \begin{aligned}(7) \ \ \ \ \ \binom{x+r-1}{x}&=\frac{(x+r-1)(x+r-2) \cdots r}{x!} \\&=(-1)^x \frac{(-r-(x-1))(-r-(x-2)) \cdots (-r)}{x!} \\&=(-1)^x \frac{(-r)(-r-1) \cdots (-r-(x-1))}{x!} \\&=(-1)^x \binom{-r}{x} \end{aligned}

The calculation in (7) can be used to verify that (6) is indeed a probability function, that is, all the probabilities sum to 1.

\displaystyle \begin{aligned}(8) \ \ \ \ \ 1&=p^r p^{-r}\\&=p^r (1-q)^{-r} \\&=p^r \sum \limits_{x=0}^\infty \binom{-r}{x} (-q)^x \ \ \ \ \ \ \ \ (8.1) \\&=p^r \sum \limits_{x=0}^\infty (-1)^x \binom{-r}{x} q^x \\&=\sum \limits_{x=0}^\infty \binom{x+r-1}{x} p^r q^x \end{aligned}

In (8), we take q=1-p. The step (8.1) above uses the following formula known as the Newton’s binomial formula.

\displaystyle (9) \ \ \ \ \ (1+t)^w=\sum \limits_{k=0}^\infty \binom{w}{k} t^k

For a detailed discussion of (8) with all the details worked out, see the post called Deriving some facts of the negative binomial distribution.

________________________________________________________________________

The Generating Function
By definition, the following is the generating function of the negative binomial distribution, using :

\displaystyle (10) \ \ \ \ \ g(z)=\sum \limits_{x=0}^\infty \binom{r+x-1}{x} p^r q^x z^x

where q=1-p. Using a similar calculation as in (8), the generating function can be simplified as:

\displaystyle (11) \ \ \ \ \ g(z)=p^r (1-q z)^{-r}=\frac{p^r}{(1-q z)^r}=\frac{p^r}{(1-(1-p) z)^r}; \ \ \ \ \ z<\frac{1}{1-p}

As a result, the moment generating function of the negative binomial distribution is:

\displaystyle (12) \ \ \ \ \ M(t)=\frac{p^r}{(1-(1-p) e^t)^r}; \ \ \ \ \ \ \ t<-ln(1-p)

For a detailed discussion of (12) with all the details worked out, see the post called Deriving some facts of the negative binomial distribution.

________________________________________________________________________

Independent Sum

One useful property of the negative binomial distribution is that the independent sum of negative binomial random variables, all with the same parameter p, also has a negative binomial distribution. Let Y=Y_1+Y_2+\cdots+Y_n be an independent sum such that each X_i has a negative binomial distribution with parameters r_i and p. Then the sum Y=Y_1+Y_2+\cdots+Y_n has a negative binomial distribution with parameters r=r_1+\cdots+r_n and p.

Note that the generating function of an independent sum is the product of the individual generating functions. The following shows that the product of the individual generating functions is of the same form as (11), thus proving the above assertion.

\displaystyle (13) \ \ \ \ \ h(z)=\frac{p^{\sum \limits_{i=1}^n r_i}}{(1-(1-p) z)^{\sum \limits_{i=1}^n r_i}}
________________________________________________________________________

Mean and Variance
The mean and variance can be obtained from the generating function. From E(X)=g'(1) and E(X^2)=g'(1)+g^{(2)}(1), we have:

\displaystyle (14) \ \ \ \ \ E(X)=\frac{r(1-p)}{p} \ \ \ \ \ \ \ \ \ \ \ \ \ Var(X)=\frac{r(1-p)}{p^2}

Note that Var(X)=\frac{1}{p} E(X)>E(X). Thus when the sample data suggest that the variance is greater than the mean, the negative binomial distribution is an excellent alternative to the Poisson distribution. For example, suppose that the sample mean and the sample variance are 3.6 and 7.1. In exploring the possibility of fitting the data using the negative binomial distribution, we would be interested in the negative binomial distribution with this mean and variance. Then plugging these into (14) produces the negative binomial distribution with r=3.7 and p=0.507.
________________________________________________________________________

The Poisson-Gamma Mixture
One important application of the negative binomial distribution is that it is a mixture of a family of Poisson distributions with Gamma mixing weights. Thus the negative binomial distribution can be viewed as a generalization of the Poisson distribution. The negative binomial distribution can be viewed as a Poisson distribution where the Poisson parameter is itself a random variable, distributed according to a Gamma distribution. Thus the negative binomial distribution is known as a Poisson-Gamma mixture.

In an insurance application, the negative binomial distribution can be used as a model for claim frequency when the risks are not homogeneous. Let N has a Poisson distribution with parameter \theta, which can be interpreted as the number of claims in a fixed period of time from an insured in a large pool of insureds. There is uncertainty in the parameter \theta, reflecting the risk characteristic of the insured. Some insureds are poor risks (with large \theta) and some are good risks (with small \theta). Thus the parameter \theta should be regarded as a random variable \Theta. The following is the conditional distribution of the random variable N (conditional on \Theta=\theta):

\displaystyle (15) \ \ \ \ \ P(N=n \lvert \Theta=\theta)=\frac{e^{-\theta} \ \theta^n}{n!} \ \ \ \ \ \ \ \ \ \ n=0,1,2,\cdots

Suppose that \Theta has a Gamma distribution with scale parameter \alpha and shape parameter \beta. The following is the probability density function of \Theta.

\displaystyle (16) \ \ \ \ \ g(\theta)=\frac{\alpha^\beta}{\Gamma(\beta)} \theta^{\beta-1} e^{-\alpha \theta} \ \ \ \ \ \ \ \ \ \ \theta>0

Then the joint density of N and \Theta is:

\displaystyle (17) \ \ \ \ \ P(N=n \lvert \Theta=\theta) \ g(\theta)=\frac{e^{-\theta} \ \theta^n}{n!} \ \frac{\alpha^\beta}{\Gamma(\beta)} \theta^{\beta-1} e^{-\alpha \theta}

The unconditional distribution of N is obtained by summing out \theta in (17).

\displaystyle \begin{aligned}(18) \ \ \ \ \ P(N=n)&=\int_0^\infty P(N=n \lvert \Theta=\theta) \ g(\theta) \ d \theta \\&=\int_0^\infty \frac{e^{-\theta} \ \theta^n}{n!} \ \frac{\alpha^\beta}{\Gamma(\beta)} \ \theta^{\beta-1} \ e^{-\alpha \theta} \ d \theta \\&=\int_0^\infty \frac{\alpha^\beta}{n! \ \Gamma(\beta)} \ \theta^{n+\beta-1} \ e^{-(\alpha+1) \theta} d \theta \\&=\frac{\alpha^\beta}{n! \ \Gamma(\beta)} \ \frac{\Gamma(n+\beta)}{(\alpha+1)^{n+\beta}} \int_0^\infty \frac{(\alpha+1)^{n+\beta}}{\Gamma(n+\beta)} \theta^{n+\beta-1} \ e^{-(\alpha+1) \theta} d \theta \\&=\frac{\alpha^\beta}{n! \ \Gamma(\beta)} \ \frac{\Gamma(n+\beta)}{(\alpha+1)^{n+\beta}} \\&=\frac{\Gamma(n+\beta)}{\Gamma(n+1) \ \Gamma(\beta)} \ \biggl( \frac{\alpha}{\alpha+1}\biggr)^\beta \ \biggl(\frac{1}{\alpha+1}\biggr)^n \\&=\binom{n+\beta-1}{n} \ \biggl( \frac{\alpha}{\alpha+1}\biggr)^\beta \ \biggl(\frac{1}{\alpha+1}\biggr)^n \ \ \ \ \ \ \ \ \ n=0,1,2,\cdots \end{aligned}

Note that the integral in the fourth step in (18) is 1.0 since the integrand is the pdf of a Gamma distribution. The above probability function is that of a negative binomial distribution. It is of the same form as (6a). Equivalently, it is also of the form (6) with parameter r=\beta and p=\frac{\alpha}{\alpha+1}.

The variance of the negative binomial distribution is greater than the mean. In a Poisson distribution, the mean equals the variance. Thus the unconditional distribution of N is more dispersed than its conditional distributions. This is a characteristic of mixture distributions. The uncertainty in the parameter variable \Theta has the effect of increasing the unconditional variance of the mixture distribution of N. The variance of a mixture distribution has two components, the weighted average of the conditional variances and the variance of the conditional means. The second component represents the additional variance introduced by the uncertainty in the parameter \Theta (see The variance of a mixture).

________________________________________________________________________

The Poisson Distribution as Limit of Negative Binomial
There is another connection to the Poisson distribution, that is, the Poisson distribution is a limiting case of the negative binomial distribution. We show that the generating function of the Poisson distribution can be obtained by taking the limit of the negative binomial generating function as r \rightarrow \infty. Interestingly, the Poisson distribution is also the limit of the binomial distribution.

In this section, we use the negative binomial parametrization of (6a). By replacing \frac{\alpha}{\alpha+1} for p, the following are the mean, variance, and the generating function for the probability function in (6a):

\displaystyle \begin{aligned}(19) \ \ \ \ \ \ &E(X)=\frac{r}{\alpha} \\&\text{ }\\&Var(X)=\frac{\alpha+1}{\alpha} \ \frac{r}{\alpha}=\frac{r(\alpha+1)}{\alpha^2} \\&\text{ } \\&g(z)=\frac{1}{[1-\frac{1}{\alpha}(z-1)]^r} \ \ \ \ \ \ \ z<\alpha+1 \end{aligned}

Let r goes to infinity and \displaystyle \frac{1}{\alpha} goes to zero and at the same time keeping their product constant. Thus \displaystyle \mu=\frac{r}{\alpha} is constant (this is the mean of the negative binomial distribution). We show the following:

\displaystyle (20) \ \ \ \ \ \lim \limits_{r \rightarrow \infty} [1-\frac{\mu}{r}(z-1)]^{-r}=e^{\mu (z-1)}

The right-hand side of (20) is the generating function of the Poisson distribution with mean \mu. The generating function in the left-hand side is that of a negative binomial distribution with mean \displaystyle \mu=\frac{r}{\alpha}. The following is the derivation of (20).

\displaystyle \begin{aligned}(21) \ \ \ \ \ \lim \limits_{r \rightarrow \infty} [1-\frac{\mu}{r}(z-1)]^{-r}&=\lim \limits_{r \rightarrow \infty} e^{\displaystyle \biggl(ln[1-\frac{\mu}{r}(z-1)]^{-r}\biggr)} \\&=\lim \limits_{r \rightarrow \infty} e^{\displaystyle \biggl(-r \ ln[1-\frac{\mu}{r}(z-1)]\biggr)} \\&=e^{\displaystyle \biggl(\lim \limits_{r \rightarrow \infty} -r \ ln[1-\frac{\mu}{r}(z-1)]\biggr)} \end{aligned}

We now focus on the limit in the exponent.

\displaystyle \begin{aligned}(22) \ \ \ \ \ \lim \limits_{r \rightarrow \infty} -r \ ln[1-\frac{\mu}{r}(z-1)]&=\lim \limits_{r \rightarrow \infty} \frac{ln(1-\frac{\mu}{r} (z-1))^{-1}}{r^{-1}} \\&=\lim \limits_{r \rightarrow \infty} \frac{(1-\frac{\mu}{r} (z-1)) \ \mu (z-1) r^{-2}}{r^{-2}} \\&=\mu (z-1) \end{aligned}

The middle step in (22) uses the L’Hopital’s Rule. The result in (20) is obtained by combining (21) and (22).

________________________________________________________________________

Reference

  1. Klugman S.A., Panjer H. H., Wilmot G. E. Loss Models, From Data to Decisions, Second Edition., Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2004

________________________________________________________________________
\copyright \ \text{2011-2015 by Dan Ma}

Advertisements

A note about the Student’s t distribution

The Stuent’s t distribution is an important distribution in statistics. It is the basis for the Student’s t statistic and arises in the problem of etimating the mean of a normal population. The Student’s t distribution (t distribution for short) is usually defined as the following ratio

\displaystyle T=\frac{Z}{\sqrt{\frac{U}{n}}}=\frac{Z \sqrt{n}}{\sqrt{U}}

where Z \sim N(0,1) and U has a chi-square distribution with n degrees of freedom. For this derivation, see [1].

In this post we discuss another way of deriving the t distribution. The alternative view is through the notion of mixture (compounding in some texts). Suppose that X has a normal distribution with mean 0 and variance \Theta^{-1}. That is, X \sim N(0,\Theta^{-1}). There is uncertainty in the variance \Theta^{-1}. Further suppose that \Theta follows a gamma distribution with parameters \alpha and \beta where \alpha=\beta and \alpha is a positive integer. Then the unconditional distribution of X has a Student’s t distribution with n=2 \alpha degrees of freedom. In the language of mixture in probability, we say that the Student’s t distribution is a mixture of normal distributions with gamma mixing weights.

The conditional distribution of X \lvert \Theta is f_{X \lvert \Theta}(x \lvert \theta) \thinspace h_{\Theta}(\theta) where

\displaystyle f_{X \lvert \Theta}(x \lvert \theta)=\frac{\sqrt{\theta}}{\sqrt{2 \pi}} \thinspace e^{-\frac{\Theta x^2}{2}} and

\displaystyle h_{\Theta}(\theta)=\frac{\alpha^{\alpha}}{\Gamma(\alpha)} \thinspace \theta^{\alpha-1} \thinspace e^{-\alpha \theta} \thinspace d \theta

The marginal (the unconditional) density function of X is obtained by integrating out the parameter \theta. The resulted density is that of the Student’s t distribution. The following is the derivation.

\displaystyle f_{X}(x)=\int_{0}^{\infty} f_{X \lvert \Theta}(x \lvert \theta) \thinspace h_{\Theta}(\theta) \thinspace d \theta

\displaystyle =\int_{0}^{\infty} \frac{\sqrt{\theta}}{\sqrt{2 \pi}} \thinspace e^{-\frac{\Theta x^2}{2}} \thinspace \frac{\alpha^{\alpha}}{\Gamma(\alpha)} \thinspace \theta^{\alpha-1} \thinspace e^{-\alpha \theta} \thinspace d \theta

\displaystyle =\frac{\alpha^{\alpha}}{\sqrt{2 \pi} \Gamma(\alpha)} \int_{0}^{\infty} \theta^{\alpha+\frac{1}{2}-1} \thinspace e^{-(\alpha+\frac{x^2}{2}) \theta} \thinspace d \theta

\displaystyle =\frac{\alpha^{\alpha}}{\sqrt{2 \pi} \Gamma(\alpha)} \frac{\Gamma(\alpha+\frac{1}{2})}{(\alpha+\frac{x^2}{2})^{\alpha+\frac{1}{2}}}\int_{0}^{\infty} \frac{(\alpha+\frac{x^2}{2})^{\alpha+\frac{1}{2}}}{\Gamma(\alpha+\frac{1}{2})} \theta^{\alpha+\frac{1}{2}-1} \thinspace e^{-(\alpha+\frac{x^2}{2}) \theta} \thinspace d \theta

\displaystyle =\frac{\alpha^{\alpha}}{\sqrt{2 \pi} \Gamma(\alpha)} \frac{\Gamma(\alpha+\frac{1}{2})}{(\alpha+\frac{x^2}{2})^{\alpha+\frac{1}{2}}}

Now, let n=2 \alpha. Then the above density function becomes:

\displaystyle f_X(x)=\frac{\Gamma(\frac{n+1}{2})}{\sqrt{\pi n} \thinspace \Gamma(\frac{n}{2})} \biggl(\frac{n}{n+x^2}\biggr)^{\frac{n+1}{2}}

The above density function is that of a Student’s t distribution with n degrees of freedom. It is interesting to note that because of the uncertainty in the parameter \Theta, the Student’s t distribution has a longer tail than the conditional normal distribution used in the beginning of the derivation.

Reference

  1. Feller W., An Introduction to Probability Theory and Its Applications, Vol II, Second Edition, John Wiley & Sons (1971)