A natural look at the negative binomial survival function

The negative binomial distribution is a discrete distribution with two parameters r and p where r>0 and 0<p<1. It has positive probabilities at the non-negative integers 0,1,2,\cdots. So it can potentially be used as a model for the random count of a phenomenon of interest. In some cases, the negative binomial distribution has a natural interpretation. In fact, the natural interpretation should be how the negative binomial distribution is introduced. With the parameter r being a positive integer, the negative binomial distribution can be naturally interpreted as the discrete waiting time until the rth success. But this natural interpretation does not apply to the general case of r being only a positive real number but not necessarily an integer. However, once the natural case that r being a positive integer is understood, it is easy to make a leap to the general case that r is any positive real number. In this post, we focus on the “natural” version of the negative binomial distribution, where r is a positive integer. In this case, there is a natural way to look at the cumulative distribution function (cdf) and the survival function. The discussion in this post will complement the following posts on the negative binomial distribution.

________________________________________________________________________

The probability functions

A Bernoulli trial is a random experiment in which there are two distinct outcomes. For convenience, they are called success (S) and failure (F). Consider performing a sequence of independent Bernoulli trials such that each trial has the same probability of success p. Fix a positive integer r. In some cases, we would like to count the number of trials that produce the rth success. Let’s call this count X_r. In other cases, we may want instead to count the number of failures before getting the rth success. Let’s call this count Y_r. According to the discussion in the two posts listed above, the probability functions of X_r and Y_r are:

    \displaystyle P(X_r=x)= \binom{x-1}{r-1} p^r (1-p)^{x-r} \ \ \ \ \ \ \ x=r,r+1,r+2,\cdots \ \ \ \ \ \ \ (1)

    \displaystyle P(Y_r=y)=\binom{y+r-1}{y} p^r (1-p)^y \ \ \ \ \ \ \ y=0,1,2,\cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

When r=1, the resulting distribution is called the geometric distribution.

    \displaystyle P(X_1=x)= p (1-p)^{x-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x=1,2,3,\cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1a)

    \displaystyle P(Y_1=y)= p (1-p)^y \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y=0,1,2,\cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2a)

Because the parameter r is a positive integer, all the above probability functions have an combinatorial explanation. For the probability function in (1), there are x many trials. The last Bernoulli trial is the rth success. Then all the preceding x-1 trails have at most r-1 successes. Hence we have the binomial coefficient \binom{x-1}{r-1} in (1). For the probability function in (2), there are y+r trials (r successes and y failures). Again, the last Bernoulli trial is the rth success. So in the preceding y+r-1 trials, there are exactly r-1 successes and exactly y failures. Hence we have the binomial coefficient \binom{y+r-1}{y} in (2). Let’s look at some examples.

Example 1
Four different prizes are randomly put into boxes of cereal. One of the prizes is a free ticket to the local zoo. Suppose that a family of four (called Family A) decides to buy this cereal until obtaining four free tickets to the zoo. What is the probability that the family will have to buy 10 boxes of cereal to obtain the four free tickets to the zoo? What is the probability that the family will have to buy 16 boxes of cereal to obtain the four free tickets to the zoo?

In this example, the success is a box of cereal with a free ticket to the zoo. So getting a ticket to the zoo is considered a success. Any one of the other three prizes is considered undesirable or a failure. Any one of the four prizes is equally likely. The negative binomial distribution in this example has r=4 and p=0.25. The count of the boxes of cereal to be purchased is the random variable X_4 as described in (1) above. The following gives the answers.

    \displaystyle \begin{aligned} P(X_4=10)&=\binom{10-1}{3} \ (0.25)^4 \ (0.75)^{10-4} \\&=\binom{9}{3} \ (0.25)^4 \ (0.75)^{6} \\&=84 \ (0.25)^4 \ (0.75)^{6} \\&=0.0583992  \end{aligned}

    \displaystyle \begin{aligned} P(X_4=16)&=\binom{16-1}{3} \ (0.25)^4 \ (0.75)^{16-4} \\&=\binom{15}{3} \ (0.25)^4 \ (0.75)^{12} \\&=455 \ (0.25)^4 \ (0.75)^{12} \\&=0.056299766  \end{aligned}

Example 2 (Example 1 continued)
Suppose Family A agrees to give any one of the undesirable prizes away to another family (called Family B). What is the probability that Family A will give 5 undesirable prizes to Family B before obtaining the four desirable tickets to the zoo? What is the probability that Family A will give 12 undesirable prizes to Family B before obtaining the four tickets to the zoo?

The negative binomial distribution in this example has r=4 and p=0.25. The interest here is to count the number failures (undesirable prizes) before getting 4 successes. Thus the random variable of interest is Y_4 as described in (2) above. The following gives the answers.

    \displaystyle \begin{aligned} P(Y_4=5)&=\binom{5+4-1}{5} \ (0.25)^4 \ (0.75)^{5} \\&=\binom{8}{5} \ (0.25)^4 \ (0.75)^{5} \\&=56 \ (0.25)^4 \ (0.75)^{5} \\&=0.0519104  \end{aligned}

    \displaystyle \begin{aligned} P(Y_4=12)&=\binom{12+4-1}{10} \ (0.25)^4 \ (0.75)^{12} \\&=\binom{15}{12} \ (0.25)^4 \ (0.75)^{12} \\&=455 \ (0.25)^4 \ (0.75)^{12} \\&=0.056299766  \end{aligned}

Here’s the mean and variance for both examples.

    \displaystyle E(X_4)=4 \ \frac{1}{0.25}=16

    \displaystyle Var(X_4)=Var(Y_4)=4 \ \frac{0.75}{0.25^2}=48

    \displaystyle E(Y_4)=4 \ \frac{0.75}{0.25}=12

Thus Family A is expected to buy 16 boxes of cereal to get the 4 tickets to the zoo and is expected to give 12 prizes to the other family. However, the variance is fairly large. As a result, the actual number of boxes purchased can vary from the mean by a large amount.

_______________________________________________________________________

The survival functions and the cumulative distribution functions

For any random variable T, the cumulative distribution function is P(T \le t) where T can range over all real numbers t or a relevant set of real numbers t. The survival function is P(T > t). The term survival comes from the interpretation that P(T > t) is the probability of a life surviving beyond time t if T is meant to model the lifetime of a biological life or some system. Even when T is not a lifetime random variable, we still use the term survival function for P(T > t).

Example 1 asks the probability of a certain number of trials in order to get rth success and the probability of a certain number of failures before getting the rth success. Sometimes it is more informative to know how many trials that are required to be performed in order to achieve one’s goal. For example, it may be useful to know the mean number of trials or the probability of achieving the goal in x trials or less. In some cases, it may take time and resource to perform the random Bernoulli trials. It will be helpful to know ahead of time the likelihood of achieving one’s goal given the resources that are available. In the above examples, it will be helpful for Family A to have a better and clearer sense of how many boxes of cereal are to be purchased. Therefore, it is worthwhile to look at the cdf and the survival function of the negative binomial distribution.

In terms of basic principle, here’s the survival functions for the distribution described in (1) and (2).

    \displaystyle \begin{aligned} P(X_r>x)&=\sum \limits_{k=x+1}^\infty P(X_r=k) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x=r,r+1,r+2,\cdots \\&=\sum \limits_{k=x+1}^\infty \binom{k-1}{r-1} p^r (1-p)^{k-r} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3) \end{aligned}

    \displaystyle \begin{aligned} P(Y_r>y)&=\sum \limits_{j=y+1}^\infty P(Y_r=j) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y= 0,1,2,3,\cdots \\&=\sum \limits_{j=y+1}^\infty \binom{j+r-1}{j} p^r (1-p)^{j} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4) \end{aligned}

The cumulative distribution functions can be defined by basic principle or by taking the complement as follows:

    \displaystyle \begin{aligned} P(X_r \le x)&=\sum \limits_{k=r}^x P(X_r=k) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x=r,r+1,r+2,\cdots \\&=1-P(X_r>x)  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (5)\end{aligned}

    \displaystyle \begin{aligned} P(Y_r \le y)&=\sum \limits_{j=0}^y P(Y_r=j) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y=0,1,2,3,\cdots \\&=1-P(Y_r>y)  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (6)\end{aligned}

Example 3 (the above examples continued)
What is the probability that Family A will buy more than 16 boxes in order to obtain the 4 free tickets to the zoo? What is the probability that Family A will give away at most 10 undesirable prizes before obtaining the 4 free tickets to the zoo?

Working from the definition, here’s the answers:

    \displaystyle P(X_4>16)=1-P(X_4 \le 16)=0.40498711

    \displaystyle P(Y_4 \le 10)=0.47866004

The mean number of boxes of cereal purchased is 16 as indicated earlier. Since the variance is large, there is still a significance chance (about 40%) that Family A will have to buy more than 16 boxes of cereal before achieving their goal. On the other hand, there is good chance (about 48%) that Family will give away at most 10 undesirable prizes.

Note that the calculation for P(X_4>16) is based on P(X_4 \le 16), which requires the calculation of 17 probabilities. The calculation for P(Y_4 \le 10), which requires 11 probabilities. Such calculation can be done by software of course. There is a natural way of looking at the calculation for (3), (4), (5) and (6). This alternative approach will give much better insight on the negative binomial distribution.

_______________________________________________________________________

A more natural way of interpreting the survival function

We now discuss a better way to look at the survival functions in (3) and (4). Consider the event X_r>x and the event Y_r>y. We will see that the negative binomial survival function can be related to the cdf of a binomial distribution.

For the event X_r>x to occur, the rth success occurs after performing x trials. So it will take x+1 trials or more to get the rth success. This means that in the first x trials, there are at most r-1 successes. The following highlights the equivalent statements.

    \displaystyle \begin{aligned} X_r>x&\equiv \text{the } r \text{th success occurs after performing } x \text{ trials} \\&\equiv \text{it takes at least } x+1 \text{ trials to get the } r \text{th success}  \\&\equiv \text{in the first } x \text{ trials, there are at most } r-1 \text{ successes}  \end{aligned}

The last statement is a binomial distribution. Specifically, it is the binomial distribution with x trials and the probability of success p. Let’s denote the count of successes of this binomial distribution by B_{x,p}. Thus we can relate the survival function in (3) with the cdf of B_{x,p} as follows:

    \displaystyle \begin{aligned} P(X_r>x)&=P(B_{x,p} \le r-1) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x=r,r+1,r+2,\cdots \\&=\sum \limits_{k=0}^{r-1} \ P(B_{x,p}=k)  \\&=\sum \limits_{k=0}^{r-1} \ \binom{x}{k} \ p^k (1-p)^{x-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (7)   \end{aligned}

The advantage of (7) is that it gives us the insight of relating the negative binomial distribution with the binomial distribution. In terms of computation, (7) only requires r many binomial probabilities. Thus Example 3 only requires the computation of 4 probabilities (versus 17 previously).

Note that X_r=Y_r+r. Thus the event Y_r>y is the same as the event X_r>y+r. So we can just piggy back on the work done in (7). For the sake of the more clarity, here’s a translation for the event Y_r>y.

    \displaystyle \begin{aligned} Y_r>y&\equiv X_r>y+r \\&\equiv \text{the } r \text{th success occurs after performing } y+r \text{ trials} \\&\equiv \text{it takes at least } y+r+1 \text{ trials to get the } r \text{th success}  \\&\equiv \text{in the first } y+r \text{ trials, there are at most } r-1 \text{ successes}  \end{aligned}

As before, let B_{n,p} denote the number of successes in performing n Bernoulli trials with p as the probability of success. Based on the above translation, the following gives the survival function for the negative binomial random variable Y_r.

    \displaystyle \begin{aligned} P(Y_r>y)&=P(X_r>y+r)   \\&=P(B_{y+r,p} \le r-1) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y=0,1,2,\cdots \\&=\sum \limits_{j=0}^{r-1} \ P(B_{y+r,p}=j)  \\&=\sum \limits_{j=0}^{r-1} \ \binom{y+r}{j} \ p^j (1-p)^{y+r-j} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8)   \end{aligned}

For both negative binomial random variables X_r and Y_r, the survival functions can be computed using (7) and (8), respectively. Rote memorization of the formulas (7) and (8) is not recommended. Instead, focus on the thought process that translate the events X_r>x and Y_r>y into a binomial distribution.

Example 4 (the above examples continued)
We now rework Example 3 using the ideas presented in (7) and (8).

    \displaystyle \begin{aligned} P(X_4>16)&=\sum \limits_{k=0}^{3} \ \binom{16}{k} \ 0.25^k \ 0.75^{16-k}=0.40498711    \end{aligned}

    \displaystyle \begin{aligned} P(Y_4 \le 10)&=1-P(Y_4>10)   \\&=1-\sum \limits_{j=0}^{3} \ \binom{14}{j} \ 0.25^j \ 0.75^{14-j} \\&=1-0.52133996  \\&=0.47866004   \end{aligned}

Example 5 (the above examples continued)
What is the median number of boxes of cereal purchased by Family A in order to obtain 4 boxes with the prize of free ticket to the zoo? What is the median number of boxes of cereal with undesirable prizes that are purchased by Family A?

We have the following probabilities.

    \displaystyle P(X_4>14)=\sum \limits_{k=0}^{3} \ \binom{14}{k} \ 0.25^k \ 0.75^{16-k}=0.52133996

    \displaystyle P(X_4>15)=\sum \limits_{k=0}^{3} \ \binom{15}{k} \ 0.25^k \ 0.75^{16-k}=0.461286876

    \displaystyle P(X_4 \le 14)=1-0.52133996=0.47866004

    \displaystyle P(X_4 \le 15)=1-0.461286876=0.538713124

This means that the median number of boxes to be purchased is 15. One way to look at it is that x= 15 is the first number such that P(X_4 \le x) is greater than 0.5. Then the median number of boxes with undesirable prizes is 11 (15 less 4).

_______________________________________________________________________

Comparing with the Gamma distribution

The thought process discussed in (7) and (8) certainly gives a more efficient way to calculate the cumulative distribution function and the survival function of the negative binomial distribution. Even though the negative binomial cdf can be calculated easily by software, the ideas in (7) and (8) provides a formulation that gives more insight on the negative binomial distribution.

The though process in (7) and (8) is analogous to the relationship between the Gamma distribution and the Poisson distribution. Consider the Gamma distribution where the shape parameter n is a positive integer and the rate parameter \beta can be any positive real number. Then the following is the density function of the Gamma distribution under consideration:

    \displaystyle f(x)=\frac{\beta^n}{(n-1)!} \ x^{n-1} \ e^{-\beta \ x} \ \ \ \ \ x>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (9)

The Gamma distribution described by the density function in (9) can be interpreted as a waiting time – the waiting time until the nth change in a Poisson process. Thus, if the nth change takes place after time t, there can be at most n-1 arrivals in the time interval [0,t]. Thus the survival function of this Gamma distribution is the same as the cdf of a Poisson distribution. The survival function in (7) is analogous to the following relation.

    \displaystyle P(X>t)=\int_t^\infty \frac{\beta^n}{(n-1)!} \ x^{n-1} \ e^{-\beta x} \ dx=\sum \limits_{j=0}^{n-1} \frac{e^{-\beta t} \ (\beta t)^j}{j!} \ \ \ \ \ \ \ \ \ \ \ (10)

The idea for (7) and (8) is the waiting time until the rth success where each success or failure is based on a Bernoulli process. The resulting probability distribution is a discrete waiting time process. The idea for (10) is the waiting time until the nth change where each change is based on a Poisson counting process. The resulting probability distribution is a continuous waiting time process. It is not necessary to memorize these formulas. It is easy to reproduce (7), (8) and (10) from the underlying thought processes.

________________________________________________________________________
\copyright \ \text{2015 by Dan Ma}

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s