The previous post called The Negative Binomial Distribution gives a fairly comprehensive discussion of the negative binomial distribution. In this post, we fill in some of the details that are glossed over in that previous post. We derive the following points:
 Discuss the several versions of the negative binomial distribution.
 The negative binomial probabilities sum to one, i.e., the negative binomial probability function is a valid one.
 Derive the moment generating function of the negative binomial distribution.
 Derive the first and second moments and the variance of the negative binomial distribution.
 An observation about independent sum of negative binomial distributions.
________________________________________________________________________
Three versions
The negative binomial distribution has two parameters and , where is a positive real number and . The first two versions arise from the case that is a positive integer, which can be interpreted as the random experiment of a sequence of independent Bernoulli trials until the th success (the trials have the same probability of success ). In this interpretation, there are two ways of recording the random experiment:

the number of Bernoulli trials required to get the th success.
the number of Bernoulli trials that end in failure before getting the th success.
The other parameter is the probability of success in each Bernoulli trial. The notation is the binomial coefficient where and are nonnegative integers and is defined as:
With this in mind, the following are the probability functions of the random variables and .
The thought process for (1) is that for the event to happen, there can only be successes in the first trials and one additional success occurring in the last trial (the th trial). The thought process for (2) is that for the event to happen, there are trials ( failures and successes). In the first trials, there can be only failures (or equivalently successes). Note that . Thus knowing the mean of will derive the mean of , a fact we will use below.
Instead of memorizing the probability functions (1) and (2), it is better to understand and remember the thought processes involved. Because of the natural interpretation of performing Bernoulli trials until the th success, it is a good idea to introduce the negative binomial distribution via the distributions described by (1) and (2), i.e., the case where the parameter is a positive integer. When , the random experiment is a sequence of independent Bernoulli trials until the first success (this is called the geometric distribution).
Of course, (1) and (2) can also simply be used as counting distributions without any connection with a series of Bernoulli trials (e.g. used in an insurance context as the number of losses or claims arising from a group of insurance policies).
The binomial coefficient in (0) is defined when both numbers are nonnegative integers and that the top one is greater than or equal to the bottom one. However, the rightmost term in (0) can be calculated even when the top number is not a nonnegative integer. Thus when is any real number, the rightmost term (0) can be calculated provided that the bottom number is a positive integer. For convenience we define . With this in mind, the binomial coefficient is defined for any real number and any nonnegative integer .
The third version of the negative binomial distribution arises from the relaxation of the binomial coefficient just discussed. With this in mind, the probability function in (2) can be defined for any positive real number :
where .
Of course when is a positive integer, versions (2) and (3) are identical. When is a positive real number but is not an integer, the distribution cannot be interpreted as the number of failures until the occurrence of th success. Instead, it is used as a counting distribution.
________________________________________________________________________
The probabilities sum to one
Do the probabilities in (1), (2) or (3) sum to one? For the interpretations of (1) and (2), is it possible to repeatedly perform Bernoulli trials and never get the th success? For , is it possible to never even get a success? In tossing a fair coin repeatedly, soon enough you will get a head and even if is a large number, you will eventually get number of heads. Here we wish to prove this fact mathematically.
To show that (1), (2) and (3) are indeed probability functions, we use a fact concerning Maclaurin’s series expansion of the function , a fact that is covered in a calculus course. In the following two results, is a fixed positive real number and is any nonnegative integer:
The result (4) is to rearrange the binomial coefficient in probability function (3) to another binomial coefficient with a negative number. This is why there is the word “negative” in negative binomial distribution. The result (5) is the Maclaurin’s series expansion for the function . We first derive these two facts and then use them to show that the negative binomial probabilities in (3) sum to one. The following derives (4).
To derive (5), let . Based on a theorem that can be found in most calculus text, the function has the following Maclaurin’s series expansion (Maclaurin’s series is simply Taylor’s series with center = 0).
where . Now, filling in the derivatives , we have the following derivation.
We can now show that the negative binomial probabilities in (3) sum to one. Let .
________________________________________________________________________
The moment generating function
We now derive the moment generating function of the negative binomial distribution according to (3). The moment generation function is over all real numbers for which is defined. The following derivation does the job.
The above moment generating function works for the negative binomial distribution with respect to (3) and thus to (2). For the distribution in (1), note that . Thus . The moment generating function of (1) is simply the above moment generating function multiplied by the factor . To summarize, the moment generating functions for the three versions are:
The domain of the moment generating function is the set of all that for which or is defined and is positive. Based on the form that it takes, we focus on making sure that . This leads to the domain .
________________________________________________________________________
The mean and the variance
With the moment generating function derived in the above section, we can now focus on finding the moments of the negative binomial distribution. To find the moments, simply take the derivatives of the moment generating function and evaluate at . For the distribution represented by the probability function in (3), we calculate the following:
After taking the first and second derivatives and evaluate at , the first and the second moments are:
The following derives the variance.
The above formula is the variance for the three versions (1), (2) and (3). Note that . In contrast, the variance of the Poisson distribution is identical to its mean. Thus in the situation where the variance of observed data is greater than the sample mean, the negative binomial distribution should be a better fit than the Poisson distribution.
________________________________________________________________________
The independent sum
There is an easy consequence that follows from the moment generating function derived above. The sum of several independent negative binomial distributions is also a negative binomial distribution. For example, suppose are independent negative binomial random variables (version (3)). Suppose each has parameters and (the second parameter is identical). The moment generating function of the independent sum is the product of the individual moment generating functions. Thus the following is the moment generating function of .
where . The moment generating function uniquely identifies the distribution. The above is that of a negative binomial distribution with parameters and according to (3).
A special case is that the sum of independent geometric distributions is a negative binomial distribution with the parameter being . The following is the moment generating function of the sum of independent geometric distributions.
________________________________________________________________________