# The Negative Binomial Distribution

A counting distribution is a discrete distribution with probabilities only on the nonnegative integers. Such distributions are important in insurance applications since they can be used to model the number of events such as losses to the insured or claims to the insurer. Though playing a prominent role in statistical theory, the Poisson distribution is not appropriate in all situations, since it requires that the mean and the variance are equaled. Thus the negative binomial distribution is an excellent alternative to the Poisson distribution, especially in the cases where the observed variance is greater than the observed mean.

The negative binomial distribution arises naturally from a probability experiment of performing a series of independent Bernoulli trials until the occurrence of the rth success where r is a positive integer. From this starting point, we discuss three ways to define the distribution. We then discuss several basic properties of the negative binomial distribution. Emphasis is placed on the close connection between the Poisson distribution and the negative binomial distribution.

________________________________________________________________________

Definitions
We define three versions of the negative binomial distribution. The first two versions arise from the view point of performing a series of independent Bernoulli trials until the rth success where r is a positive integer. A Bernoulli trial is a probability experiment whose outcome is random such that there are two possible outcomes (success or failure).

Let $X_1$ be the number of Bernoulli trials required for the rth success to occur where r is a positive integer. Let $p$ is the probability of success in each trial. The following is the probability function of $X_1$:

$\displaystyle (1) \ \ \ \ \ P(X_1=x)= \binom{x-1}{r-1} p^r (1-p)^{x-r} \ \ \ \ \ \ \ x=r,r+1,r+2,\cdots$

The idea for $(1)$ is that for $X_1=x$ to happen, there must be $r-1$ successes in the first $x-1$ trials and one additional success occurring in the last trial (the $x$th trial).

A more common version of the negative binomial distribution is the number of Bernoulli trials in excess of r in order to produce the rth success. In other words, we consider the number of failures before the occurrence of the rth success. Let $X_2$ be this random variable. The following is the probability function of $X_2$:

$\displaystyle (2) \ \ \ \ \ P(X_2=x)=\binom{x+r-1}{x} p^r (1-p)^x \ \ \ \ \ \ \ x=0,1,2,\cdots$

The idea for $(2)$ is that there are $x+r$ trials and in the first $x+r-1$ trials, there are $x$ failures (or equivalently $r-1$ successes).

In both $(1)$ and $(2)$, the binomial coefficient is defined by

$\displaystyle (3) \ \ \ \ \ \binom{y}{k}=\frac{y!}{k! \ (y-k)!}=\frac{y(y-1) \cdots (y-(k-1))}{k!}$

where $y$ is a positive integer and $k$ is a nonnegative integer. However, the right-hand-side of $(3)$ can be calculated even if $y$ is not a positive integer. Thus the binomial coefficient $\displaystyle \binom{y}{k}$ can be expanded to work for all real number $y$. However $k$ must still be nonnegative integer.

$\displaystyle (4) \ \ \ \ \ \binom{y}{k}=\frac{y(y-1) \cdots (y-(k-1))}{k!}$

For convenience, we let $\displaystyle \binom{y}{0}=1$. When the real number $y>k-1$, the binomial coefficient in $(4)$ can be expressed as:

$\displaystyle (5) \ \ \ \ \ \binom{y}{k}=\frac{\Gamma(y+1)}{\Gamma(k+1) \Gamma(y-k+1)}$

where $\Gamma(\cdot)$ is the gamma function.

With the more relaxed notion of binomial coefficient, the probability function in $(2)$ above can be defined for all real number r. Thus the general version of the negative binomial distribution has two parameters r and $p$, both real numbers, such that $0. The following is its probability function.

$\displaystyle (6) \ \ \ \ \ P(X=x)=\binom{x+r-1}{x} p^r (1-p)^x \ \ \ \ \ \ \ x=0,1,2,\cdots$

Whenever r in $(6)$ is a real number that is not a positive integer, the interpretation of counting the number of failures until the occurrence of the rth success is no longer important. Instead we can think of it simply as a count distribution.

The following alternative parametrization of the negative binomial distribution is also useful.

$\displaystyle (6a) \ \ \ \ \ P(X=x)=\binom{x+r-1}{x} \biggl(\frac{\alpha}{\alpha+1}\biggr)^r \biggl(\frac{1}{\alpha+1}\biggr)^x \ \ \ \ \ \ \ x=0,1,2,\cdots$

The parameters in this alternative parametrization are r and $\alpha>0$. Clearly, the ratio $\frac{\alpha}{\alpha+1}$ takes the place of $p$ in $(6)$. Unless stated otherwise, we use the parametrization of $(6)$.
________________________________________________________________________

What is negative about the negative binomial distribution?
What is negative about this distribution? What is binomial about this distribution? The name is suggested by the fact that the binomial coefficient in $(6)$ can be rearranged as follows:

\displaystyle \begin{aligned}(7) \ \ \ \ \ \binom{x+r-1}{x}&=\frac{(x+r-1)(x+r-2) \cdots r}{x!} \\&=(-1)^x \frac{(-r-(x-1))(-r-(x-2)) \cdots (-r)}{x!} \\&=(-1)^x \frac{(-r)(-r-1) \cdots (-r-(x-1))}{x!} \\&=(-1)^x \binom{-r}{x} \end{aligned}

The calculation in $(7)$ can be used to verify that $(6)$ is indeed a probability function, that is, all the probabilities sum to 1.

\displaystyle \begin{aligned}(8) \ \ \ \ \ 1&=p^r p^{-r}\\&=p^r (1-q)^{-r} \\&=p^r \sum \limits_{x=0}^\infty \binom{-r}{x} (-q)^x \ \ \ \ \ \ \ \ (8.1) \\&=p^r \sum \limits_{x=0}^\infty (-1)^x \binom{-r}{x} q^x \\&=\sum \limits_{x=0}^\infty \binom{x+r-1}{x} p^r q^x \end{aligned}

In $(8)$, we take $q=1-p$. The step $(8.1)$ above uses the following formula known as the Newton’s binomial formula.

$\displaystyle (9) \ \ \ \ \ (1+t)^w=\sum \limits_{k=0}^\infty \binom{w}{k} t^k$

For a detailed discussion of (8) with all the details worked out, see the post called Deriving some facts of the negative binomial distribution.

________________________________________________________________________

The Generating Function
By definition, the following is the generating function of the negative binomial distribution, using :

$\displaystyle (10) \ \ \ \ \ g(z)=\sum \limits_{x=0}^\infty \binom{r+x-1}{x} p^r q^x z^x$

where $q=1-p$. Using a similar calculation as in $(8)$, the generating function can be simplified as:

$\displaystyle (11) \ \ \ \ \ g(z)=p^r (1-q z)^{-r}=\frac{p^r}{(1-q z)^r}=\frac{p^r}{(1-(1-p) z)^r}; \ \ \ \ \ z<\frac{1}{1-p}$

As a result, the moment generating function of the negative binomial distribution is:

$\displaystyle (12) \ \ \ \ \ M(t)=\frac{p^r}{(1-(1-p) e^t)^r}; \ \ \ \ \ \ \ t<-ln(1-p)$

For a detailed discussion of (12) with all the details worked out, see the post called Deriving some facts of the negative binomial distribution.

________________________________________________________________________

Independent Sum

One useful property of the negative binomial distribution is that the independent sum of negative binomial random variables, all with the same parameter $p$, also has a negative binomial distribution. Let $Y=Y_1+Y_2+\cdots+Y_n$ be an independent sum such that each $X_i$ has a negative binomial distribution with parameters $r_i$ and $p$. Then the sum $Y=Y_1+Y_2+\cdots+Y_n$ has a negative binomial distribution with parameters $r=r_1+\cdots+r_n$ and $p$.

Note that the generating function of an independent sum is the product of the individual generating functions. The following shows that the product of the individual generating functions is of the same form as $(11)$, thus proving the above assertion.

$\displaystyle (13) \ \ \ \ \ h(z)=\frac{p^{\sum \limits_{i=1}^n r_i}}{(1-(1-p) z)^{\sum \limits_{i=1}^n r_i}}$
________________________________________________________________________

Mean and Variance
The mean and variance can be obtained from the generating function. From $E(X)=g'(1)$ and $E(X^2)=g'(1)+g^{(2)}(1)$, we have:

$\displaystyle (14) \ \ \ \ \ E(X)=\frac{r(1-p)}{p} \ \ \ \ \ \ \ \ \ \ \ \ \ Var(X)=\frac{r(1-p)}{p^2}$

Note that $Var(X)=\frac{1}{p} E(X)>E(X)$. Thus when the sample data suggest that the variance is greater than the mean, the negative binomial distribution is an excellent alternative to the Poisson distribution. For example, suppose that the sample mean and the sample variance are 3.6 and 7.1. In exploring the possibility of fitting the data using the negative binomial distribution, we would be interested in the negative binomial distribution with this mean and variance. Then plugging these into $(14)$ produces the negative binomial distribution with $r=3.7$ and $p=0.507$.
________________________________________________________________________

The Poisson-Gamma Mixture
One important application of the negative binomial distribution is that it is a mixture of a family of Poisson distributions with Gamma mixing weights. Thus the negative binomial distribution can be viewed as a generalization of the Poisson distribution. The negative binomial distribution can be viewed as a Poisson distribution where the Poisson parameter is itself a random variable, distributed according to a Gamma distribution. Thus the negative binomial distribution is known as a Poisson-Gamma mixture.

In an insurance application, the negative binomial distribution can be used as a model for claim frequency when the risks are not homogeneous. Let $N$ has a Poisson distribution with parameter $\theta$, which can be interpreted as the number of claims in a fixed period of time from an insured in a large pool of insureds. There is uncertainty in the parameter $\theta$, reflecting the risk characteristic of the insured. Some insureds are poor risks (with large $\theta$) and some are good risks (with small $\theta$). Thus the parameter $\theta$ should be regarded as a random variable $\Theta$. The following is the conditional distribution of the random variable $N$ (conditional on $\Theta=\theta$):

$\displaystyle (15) \ \ \ \ \ P(N=n \lvert \Theta=\theta)=\frac{e^{-\theta} \ \theta^n}{n!} \ \ \ \ \ \ \ \ \ \ n=0,1,2,\cdots$

Suppose that $\Theta$ has a Gamma distribution with scale parameter $\alpha$ and shape parameter $\beta$. The following is the probability density function of $\Theta$.

$\displaystyle (16) \ \ \ \ \ g(\theta)=\frac{\alpha^\beta}{\Gamma(\beta)} \theta^{\beta-1} e^{-\alpha \theta} \ \ \ \ \ \ \ \ \ \ \theta>0$

Then the joint density of $N$ and $\Theta$ is:

$\displaystyle (17) \ \ \ \ \ P(N=n \lvert \Theta=\theta) \ g(\theta)=\frac{e^{-\theta} \ \theta^n}{n!} \ \frac{\alpha^\beta}{\Gamma(\beta)} \theta^{\beta-1} e^{-\alpha \theta}$

The unconditional distribution of $N$ is obtained by summing out $\theta$ in $(17)$.

\displaystyle \begin{aligned}(18) \ \ \ \ \ P(N=n)&=\int_0^\infty P(N=n \lvert \Theta=\theta) \ g(\theta) \ d \theta \\&=\int_0^\infty \frac{e^{-\theta} \ \theta^n}{n!} \ \frac{\alpha^\beta}{\Gamma(\beta)} \ \theta^{\beta-1} \ e^{-\alpha \theta} \ d \theta \\&=\int_0^\infty \frac{\alpha^\beta}{n! \ \Gamma(\beta)} \ \theta^{n+\beta-1} \ e^{-(\alpha+1) \theta} d \theta \\&=\frac{\alpha^\beta}{n! \ \Gamma(\beta)} \ \frac{\Gamma(n+\beta)}{(\alpha+1)^{n+\beta}} \int_0^\infty \frac{(\alpha+1)^{n+\beta}}{\Gamma(n+\beta)} \theta^{n+\beta-1} \ e^{-(\alpha+1) \theta} d \theta \\&=\frac{\alpha^\beta}{n! \ \Gamma(\beta)} \ \frac{\Gamma(n+\beta)}{(\alpha+1)^{n+\beta}} \\&=\frac{\Gamma(n+\beta)}{\Gamma(n+1) \ \Gamma(\beta)} \ \biggl( \frac{\alpha}{\alpha+1}\biggr)^\beta \ \biggl(\frac{1}{\alpha+1}\biggr)^n \\&=\binom{n+\beta-1}{n} \ \biggl( \frac{\alpha}{\alpha+1}\biggr)^\beta \ \biggl(\frac{1}{\alpha+1}\biggr)^n \ \ \ \ \ \ \ \ \ n=0,1,2,\cdots \end{aligned}

Note that the integral in the fourth step in $(18)$ is 1.0 since the integrand is the pdf of a Gamma distribution. The above probability function is that of a negative binomial distribution. It is of the same form as $(6a)$. Equivalently, it is also of the form $(6)$ with parameter $r=\beta$ and $p=\frac{\alpha}{\alpha+1}$.

The variance of the negative binomial distribution is greater than the mean. In a Poisson distribution, the mean equals the variance. Thus the unconditional distribution of $N$ is more dispersed than its conditional distributions. This is a characteristic of mixture distributions. The uncertainty in the parameter variable $\Theta$ has the effect of increasing the unconditional variance of the mixture distribution of $N$. The variance of a mixture distribution has two components, the weighted average of the conditional variances and the variance of the conditional means. The second component represents the additional variance introduced by the uncertainty in the parameter $\Theta$ (see The variance of a mixture).

________________________________________________________________________

The Poisson Distribution as Limit of Negative Binomial
There is another connection to the Poisson distribution, that is, the Poisson distribution is a limiting case of the negative binomial distribution. We show that the generating function of the Poisson distribution can be obtained by taking the limit of the negative binomial generating function as $r \rightarrow \infty$. Interestingly, the Poisson distribution is also the limit of the binomial distribution.

In this section, we use the negative binomial parametrization of $(6a)$. By replacing $\frac{\alpha}{\alpha+1}$ for $p$, the following are the mean, variance, and the generating function for the probability function in $(6a)$:

\displaystyle \begin{aligned}(19) \ \ \ \ \ \ &E(X)=\frac{r}{\alpha} \\&\text{ }\\&Var(X)=\frac{\alpha+1}{\alpha} \ \frac{r}{\alpha}=\frac{r(\alpha+1)}{\alpha^2} \\&\text{ } \\&g(z)=\frac{1}{[1-\frac{1}{\alpha}(z-1)]^r} \ \ \ \ \ \ \ z<\alpha+1 \end{aligned}

Let r goes to infinity and $\displaystyle \frac{1}{\alpha}$ goes to zero and at the same time keeping their product constant. Thus $\displaystyle \mu=\frac{r}{\alpha}$ is constant (this is the mean of the negative binomial distribution). We show the following:

$\displaystyle (20) \ \ \ \ \ \lim \limits_{r \rightarrow \infty} [1-\frac{\mu}{r}(z-1)]^{-r}=e^{\mu (z-1)}$

The right-hand side of $(20)$ is the generating function of the Poisson distribution with mean $\mu$. The generating function in the left-hand side is that of a negative binomial distribution with mean $\displaystyle \mu=\frac{r}{\alpha}$. The following is the derivation of $(20)$.

\displaystyle \begin{aligned}(21) \ \ \ \ \ \lim \limits_{r \rightarrow \infty} [1-\frac{\mu}{r}(z-1)]^{-r}&=\lim \limits_{r \rightarrow \infty} e^{\displaystyle \biggl(ln[1-\frac{\mu}{r}(z-1)]^{-r}\biggr)} \\&=\lim \limits_{r \rightarrow \infty} e^{\displaystyle \biggl(-r \ ln[1-\frac{\mu}{r}(z-1)]\biggr)} \\&=e^{\displaystyle \biggl(\lim \limits_{r \rightarrow \infty} -r \ ln[1-\frac{\mu}{r}(z-1)]\biggr)} \end{aligned}

We now focus on the limit in the exponent.

\displaystyle \begin{aligned}(22) \ \ \ \ \ \lim \limits_{r \rightarrow \infty} -r \ ln[1-\frac{\mu}{r}(z-1)]&=\lim \limits_{r \rightarrow \infty} \frac{ln(1-\frac{\mu}{r} (z-1))^{-1}}{r^{-1}} \\&=\lim \limits_{r \rightarrow \infty} \frac{(1-\frac{\mu}{r} (z-1)) \ \mu (z-1) r^{-2}}{r^{-2}} \\&=\mu (z-1) \end{aligned}

The middle step in $(22)$ uses the L’Hopital’s Rule. The result in $(20)$ is obtained by combining $(21)$ and $(22)$.

________________________________________________________________________

Reference

1. Klugman S.A., Panjer H. H., Wilmot G. E. Loss Models, From Data to Decisions, Second Edition., Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2004

________________________________________________________________________
$\copyright \ \text{2011-2015 by Dan Ma}$

# Splitting a Poisson Distribution

We consider a remarkable property of the Poisson distribution that has a connection to the multinomial distribution. We start with the following examples.

Example 1
Suppose that the arrivals of customers in a gift shop at an airport follow a Poisson distribution with a mean of $\alpha=5$ per 10 minutes. Furthermore, suppose that each arrival can be classified into one of three distinct types – type 1 (no purchase), type 2 (purchase under $20), and type 3 (purchase over$20). Records show that about 25% of the customers are of type 1. The percentages of type 2 and type 3 are 60% and 15%, respectively. What is the probability distribution of the number of customers per hour of each type?

Example 2
Roll a fair die $N$ times where $N$ is random and follows a Poisson distribution with parameter $\alpha$. For each $i=1,2,3,4,5,6$, let $N_i$ be the number of times the upside of the die is $i$. What is the probability distribution of each $N_i$? What is the joint distribution of $N_1,N_2,N_3,N_4,N_5,N_6$?

In Example 1, the stream of customers arrive according to a Poisson distribution. It can be shown that the stream of each type of customers also has a Poisson distribution. One way to view this example is that we can split the Poisson distribution into three Poisson distributions.

Example 2 also describes a splitting process, i.e. splitting a Poisson variable into 6 different Poisson variables. We can also view Example 2 as a multinomial distribution where the number of trials is not fixed but is random and follows a Poisson distribution. If the number of rolls of the die is fixed in Example 2 (say 10), then each $N_i$ would be a binomial distribution. Yet, with the number of trials being Poisson, each $N_i$ has a Poisson distribution with mean $\displaystyle \frac{\alpha}{6}$. In this post, we describe this Poisson splitting process in terms of a “random” multinomial distribution (the view point of Example 2).

________________________________________________________________________

Suppose we have a multinomial experiment with parameters $N$, $r$, $p_1, \cdots, p_r$, where

• $N$ is the number of multinomial trials,
• $r$ is the number of distinct possible outcomes in each trial (type 1 through type $r$),
• the $p_i$ are the probabilities of the $r$ possible outcomes in each trial.

Suppose that $N$ follows a Poisson distribution with parameter $\alpha$. For each $i=1, \cdots, r$, let $N_i$ be the number of occurrences of the $i^{th}$ type of outcomes in the $N$ trials. Then $N_1,N_2,\cdots,N_r$ are mutually independent Poisson random variables with parameters $\alpha p_1,\alpha p_2,\cdots,\alpha p_r$, respectively.

The variables $N_1,N_2,\cdots,N_r$ have a multinomial distribution and their joint probability function is:

$\displaystyle (1) \ \ \ \ P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r)=\frac{N!}{n_1! n_2! \cdots n_r!} \ p_1^{n_1} p_2^{n_2} \cdots p_r^{n_r}$

where $n_i$ are nonnegative integers such that $N=n_1+n_2+\cdots+n_r$.

Since the total number of multinomial trials $N$ is not fixed and is random, $(1)$ is not the end of the story. The probability in $(1)$ is only a conditional probability. The following is the joint probability function of $N_1,N_2,\cdots,N_r$:

$\displaystyle (2) \ \ \ \ P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r)$

\displaystyle \begin{aligned}&=P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r \lvert N=\sum \limits_{k=0}^r n_k) \\&\ \ \ \ \ \times P(N=\sum \limits_{k=0}^r n_k) \\&\text{ } \\&=\frac{(\sum \limits_{k=0}^r n_k)!}{n_1! \ n_2! \ \cdots \ n_r!} \ p_1^{n_1} \ p_2^{n_2} \ \cdots \ p_r^{n_r} \ \times \frac{e^{-\alpha} \alpha^{\sum \limits_{k=0}^r n_k}}{(\sum \limits_{k=0}^r n_k!)} \\&\text{ } \\&=\frac{e^{-\alpha p_1} \ (\alpha p_1)^{n_1}}{n_1!} \ \frac{e^{-\alpha p_2} \ (\alpha p_2)^{n_2}}{n_2!} \ \cdots \ \frac{e^{-\alpha p_r} \ (\alpha p_r)^{n_r}}{n_r!} \end{aligned}

To obtain the marginal probability function of $N_j$, $j=1,2,\cdots,r$, we sum out the other variables $N_k=n_k$ ($k \ne j$) in $(2)$ and obtain the following:

$\displaystyle (3) \ \ \ \ P(N_j=n_j)=\frac{e^{-\alpha p_j} \ (\alpha p_j)^{n_j}}{n_j!}$

Thus we can conclude that $N_j$, $j=1,2,\cdots,r$, has a Poisson distribution with parameter $\alpha p_j$. Furrthermore, the joint probability function of $N_1,N_2,\cdots,N_r$ is the product of the marginal probability functions. Thus we can conclude that $N_1,N_2,\cdots,N_r$ are mutually independent.

________________________________________________________________________
Example 1
Let $N_1,N_2,N_3$ be the number of customers per hour of type 1, type 2, and type 3, respectively. Here, we attempt to split a Poisson distribution with mean 30 per hour (based on 5 per 10 minutes). Thus $N_1,N_2,N_3$ are mutually independent Poisson variables with means $30 \times 0.25=7.5$, $30 \times 0.60=18$, $30 \times 0.15=4.5$, respectively.

Example 2
As indicated earlier, each $N_i$, $i=1,2,3,4,5,6$, has a Poisson distribution with mean $\frac{\alpha}{6}$. According to $(2)$, the joint probability function of $N_1,N_2,N_3,N_4,N_5,N_6$ is simply the product of the six marginal Poisson probability functions.

# The Poisson Distribution

Let $\alpha$ be a positive constant. Consider the following probability distribution:

$\displaystyle (1) \ \ \ \ \ P(X=j)=\frac{e^{-\alpha} \alpha^j}{j!} \ \ \ \ \ j=0,1,2,\cdots$

The above distribution is said to be a Poisson distribution with parameter $\alpha$. The Poisson distribution is usually used to model the random number of events occurring in a fixed time interval. As will be shown below, $E(X)=\alpha$. Thus the parameter $\alpha$ is the rate of occurrence of the random events; it indicates on average how many events occur per unit of time. Examples of random events that may be modeled by the Poisson distribution include the number of alpha particles emitted by a radioactive substance counted in a prescribed area during a fixed period of time, the number of auto accidents in a fixed period of time or the number of losses arising from a group of insureds during a policy period.

Each of the above examples can be thought of as a process that generates a number of arrivals or changes in a fixed period of time. If such a counting process leads to a Poisson distribution, then the process is said to be a Poisson process.

We now discuss some basic properties of the Poisson distribution. Using the Taylor series expansion of $e^{\alpha}$, the following shows that $(1)$ is indeed a probability distribution.

$\displaystyle . \ \ \ \ \ \ \ \sum \limits_{j=0}^\infty \frac{e^{-\alpha} \alpha^j}{j!}=e^{-\alpha} \sum \limits_{j=0}^\infty \frac{\alpha^j}{j!}=e^{-\alpha} e^{\alpha}=1$

The generating function of the Poisson distribution is $g(z)=e^{\alpha (z-1)}$ (see The generating function). The mean and variance can be calculated using the generating function.

\displaystyle \begin{aligned}(2) \ \ \ \ \ &E(X)=g'(1)=\alpha \\&\text{ } \\&E[X(X-1)]=g^{(2)}(1)=\alpha^2 \\&\text{ } \\&Var(X)=E[X(X-1)]+E(X)-E(X)^2=\alpha^2+\alpha-\alpha^2=\alpha \end{aligned}

The Poisson distribution can also be interpreted as an approximation to the binomial distribution. It is well known that the Poisson distribution is the limiting case of binomial distributions (see [1] or this post).

$\displaystyle (3) \ \ \ \ \ \lim \limits_{n \rightarrow \infty} \binom{n}{j} \biggl(\frac{\alpha}{n}\biggr)^j \biggl(1-\frac{\alpha}{n}\biggr)^{n-j}=\frac{e^{-\alpha} \alpha^j}{j!}$

One application of $(3)$ is that we can use Poisson probabilities to approximate Binomial probabilities. The approximation is reasonably good when the number of trials $n$ in a binomial distribution is large and the probability of success $p$ is small. The binomial mean is $n p$ and the variance is $n p (1-p)$. When $p$ is small, $1-p$ is close to 1 and the binomial variance is approximately $np \approx n p (1-p)$. Whenever the mean of a discrete distribution is approximately equaled to the mean, the Poisson approximation is quite good. As a rule of thumb, we can use Poisson to approximate binomial if $n \le 100$ and $p \le 0.01$.

As an example, we use the Poisson distribution to estimate the probability that at most 1 person out of 1000 will have a birthday on the New Year Day. Let $n=1000$ and $p=365^{-1}$. So we use the Poisson distribution with $\alpha=1000 365^{-1}$. The following is an estimate using the Poisson distribution.

$\displaystyle . \ \ \ \ \ \ \ P(X \le 1)=e^{-\alpha}+\alpha e^{-\alpha}=(1+\alpha) e^{-\alpha}=0.2415$

Another useful property is that the independent sum of Poisson distributions also has a Poisson distribution. Specifically, if each $X_i$ has a Poisson distribution with parameter $\alpha_i$, then the independent sum $X=X_1+\cdots+X_n$ has a Poisson distribution with parameter $\alpha=\alpha_1+\cdots+\alpha_n$. One way to see this is that the product of Poisson generating functions has the same general form as $g(z)=e^{\alpha (z-1)}$ (see The generating function). One interpretation of this property is that when merging several arrival processes, each of which follow a Poisson distribution, the result is still a Poisson distribution.

For example, suppose that in an airline ticket counter, the arrival of first class customers follows a Poisson process with a mean arrival rate of 8 per 15 minutes and the arrival of customers flying coach follows a Poisson distribution with a mean rate of 12 per 15 minutes. Then the arrival of customers of either types has a Poisson distribution with a mean rate of 20 per 15 minutes or 80 per hour.

A Poisson distribution with a large mean can be thought of as an independent sum of Poisson distributions. For example, a Poisson distribution with a mean of 50 is the independent sum of 50 Poisson distributions each with mean 1. Because of the central limit theorem, when the mean is large, we can approximate the Poisson using the normal distribution.

In addition to merging several Poisson distributions into one combined Poisson distribution, we can also split a Poisson into several Poisson distributions. For example, suppose that a stream of customers arrives according to a Poisson distribution with parameter $\alpha$ and each customer can be classified into one of two types (e.g. no purchase vs. purchase) with probabilities $p_1$ and $p_2$, respectively. Then the number of “no purchase” customers and the number of “purchase” customers are independent Poisson random variables with parameters $\alpha p_1$ and $\alpha p_2$, respectively. For more details on the splitting of Poisson, see Splitting a Poisson Distribution.

Reference

1. Feller W. An Introduction to Probability Theory and Its Applications, Third Edition, John Wiley & Sons, New York, 1968

# The generating function

Consider the function $g(z)=\displaystyle e^{\alpha (z-1)}$ where $\alpha$ is a positive constant. The following shows the derivatives of this function.

\displaystyle \begin{aligned}. \ \ \ \ \ \ &g(z)=e^{\alpha (z-1)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ g(0)=e^{-\alpha} \\&\text{ } \\&g'(z)=e^{\alpha (z-1)} \ \alpha \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ g'(0)=e^{-\alpha} \ \alpha \\&\text{ } \\&g^{(2)}(z)=e^{\alpha (z-1)} \ \alpha^2 \ \ \ \ \ \ \ \ \ \ \ \ \ \ g^{(2)}(0)=2! \ \frac{e^{-\alpha} \ \alpha^2}{2!} \\&\text{ } \\&g^{(3)}(z)=e^{\alpha (z-1)} \ \alpha^3 \ \ \ \ \ \ \ \ \ \ \ \ \ \ g^{(3)}(0)=3! \ \frac{e^{-\alpha} \ \alpha^3}{3!} \\&\text{ } \\&\ \ \ \ \ \ \ \ \cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \cdots \\&\text{ } \\&g^{(n)}(z)=e^{\alpha (z-1)} \ \alpha^n \ \ \ \ \ \ \ \ \ \ \ \ \ \ g^{(n)}(0)=n! \ \frac{e^{-\alpha} \ \alpha^n}{n!} \end{aligned}

Note that the derivative of $g(z)$ at each order is a multiple of a Poisson probability. Thus the Poisson distribution is coded by the function $g(z)=\displaystyle e^{\alpha (z-1)}$. Because of this reason, such a function is called a generating function (or probability generating function). This post discusses some basic facts about the generating function (gf) and its cousin, the moment generating function (mgf). One important characteristic is that these functions generate probabilities and moments. Another important characteristic is that there is a one-to-one correspondence between a probability distribution and its generating function and moment generating function, i.e. two random variables with different cumulative distribution functions cannot have the same gf or mgf. In some situations, this fact is useful in working with independent sum of random variables.

________________________________________________
The Generating Function
Suppose that $X$ is a random variable that takes only nonegative integer values with the probability function given by

$(1) \ \ \ \ \ \ P(X=j)=a_j, \ \ \ \ j=0,1,2,\cdots$

The idea of the generating function is that we use a power series to capture the entire probability distribution. The following defines the generating function that is associated with the above sequence $a_j$, .

$(2) \ \ \ \ \ \ g(z)=a_0+a_1 \ z+a_2 \ z^2+ \cdots=\sum \limits_{j=0}^\infty a_j \ z^j$

Since the elements of the sequence $a_j$ are probabilities, we can also call $g(z)$ the generating function of the probability distribution defined by the sequence in $(1)$. The generating function $g(z)$ is defined wherever the power series converges. It is clear that at the minimum, the power series in $(2)$ converges for $\lvert z \lvert \le 1$.

We discuss the following three properties of generating functions:

1. The generating function completely determines the distribution.
2. The moments of the distribution can be derived from the derivatives of the generating function.
3. The generating function of a sum of independent random variables is the product of the individual generating functions.

The Poisson generating function at the beginning of the post is an example demonstrating property 1 (see Example 0 below for the derivation of the generating function). In some cases, the probability distribution of an independent sum can be deduced from the product of the individual generating functions. Some examples are given below.

________________________________________________
Generating Probabilities
We now discuss the property 1 indicated above. To see that $g(z)$ generates the probabilities, let’s look at the derivatives of $g(z)$:

\displaystyle \begin{aligned}(3) \ \ \ \ \ \ &g'(z)=a_1+2 a_2 \ z+3 a_3 \ z^2+\cdots=\sum \limits_{j=1}^\infty j a_j \ z^{j-1} \\&\text{ } \\&g^{(2)}(z)=2 a_2+6 a_3 \ z+ 12 a_4 \ z^2=\sum \limits_{j=2}^\infty j (j-1) a_j \ z^{j-2} \\&\text{ } \\&g^{(3)}(z)=6 a_3+ 24 a_4 \ z+60 a_5 \ z^2=\sum \limits_{j=3}^\infty j (j-1)(j-2) a_j \ z^{j-3} \\&\text{ } \\&\ \ \ \ \ \ \ \ \cdots \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \cdots \\&\text{ } \\&g^{(n)}(z)=\sum \limits_{j=n}^\infty j(j-1) \cdots (j-n+1) a_j \ z^{j-n}=\sum \limits_{j=n}^\infty \binom{j}{n} n! \ a_j \ z^{j-n} \end{aligned}

By letting $z=0$ above, all the terms vanishes except for the constant term. We have:

$(4) \ \ \ \ \ \ g^{(n)}(0)=n! \ a_n=n! \ P(X=n)$

Thus the generating function is a compact way of encoding the probability distribution. The probability distribution determines the generating function as seen in $(2)$. On the other hand, $(3)$ and $(4)$ demonstrate that the generating function also determines the probability distribution.

________________________________________________
Generating Moments
The generating function also determines the moments (property 2 indicated above). For example, we have:

\displaystyle \begin{aligned}(5) \ \ \ \ \ \ &g'(1)=0 \ a_0+a_1+2 a_2+3 a_3+\cdots=\sum \limits_{j=0}^\infty j a_j=E(X) \\&\text{ } \\&g^{(2)}(1)=0 a_0 + 0 a_1+2 a_2+6 a_3+ 12 a_4+\cdots=\sum \limits_{j=0}^\infty j (j-1) a_j=E[X(X-1)] \\&\text{ } \\&E(X)=g'(1) \\&\text{ } \\&E(X^2)=g'(1)+g^{(2)}(1) \end{aligned}

Note that $g^{(n)}(1)=E[X(X-1) \cdots (X-(n-1))]$. Thus the higher moment $E(X^n)$ can be expressed in terms of $g^{(n)}(1)$ and $g^{(k)}(1)$ where $k.
________________________________________________
More General Definitions
Note that the definition in $(2)$ can also be interpreted as the mathematical expectation of $z^X$, i.e., $g(z)=E(z^X)$. This provides a way to define the generating function for random variables that may take on values outside of the nonnegative integers. The following is a more general definition of the generating function of the random variable $X$, which is defined for all $z$ where the expectation exists.

$(6) \ \ \ \ \ \ g(z)=E(z^X)$

________________________________________________
The Generating Function of Independent Sum
Let $X_1,X_2,\cdots,X_n$ be independent random variables with generating functions $g_1,g_2,\cdots,g_n$, respectively. Then the generating function of $X_1+X_2+\cdots+X_n$ is given by the product $g_1 \cdot g_2 \cdots g_n$.

Let $g(z)$ be the generating function of the independent sum $X_1+X_2+\cdots+X_n$. The following derives $g(z)$. Note that the general form of generating function $(6)$ is used.

\displaystyle \begin{aligned}(7) \ \ \ \ \ \ g(z)&=E(z^{X_1+\cdots+X_n}) \\&\text{ } \\&=E(z^{X_1} \cdots z^{X_n}) \\&\text{ } \\&=E(z^{X_1}) \cdots E(z^{X_n}) \\&\text{ } \\&=g_1(z) \cdots g_n(z) \end{aligned}

The probability distribution of a random variable is uniquely determined by its generating function. In particular, the generating function $g(z)$ of the independent sum $X_1+X_2+\cdots+X_n$ that is derived in $(7)$ is unique. So if the generating function is of a particular distribution, we can deduce that the distribution of the sum must be of the same distribution. See the examples below.

________________________________________________
Example 0
In this example, we derive the generating function of the Poisson distribution. Based on the definition, we have:

\displaystyle \begin{aligned}. \ \ \ \ \ \ g(z)&=\sum \limits_{j=0}^\infty \frac{e^{-\alpha} \alpha^j}{j!} \ z^j \\&\text{ } \\&=\sum \limits_{j=0}^\infty \frac{e^{-\alpha} (\alpha z)^j}{j!} \\&\text{ } \\&=\frac{e^{-\alpha}}{e^{- \alpha z}} \sum \limits_{j=0}^\infty \frac{e^{-\alpha z} (\alpha z)^j}{j!} \\&\text{ } \\&=e^{\alpha (z-1)} \end{aligned}

Example 1
Suppose that $X_1,X_2,\cdots,X_n$ are independent random variables where each $X_i$ has a Bernoulli distribution with probability of success $p$. Let $q=1-p$. The following is the generating function for each $X_i$.

$. \ \ \ \ \ \ g(z)=q+p z$

Then the generating function of the sum $X=X_1+\cdots+X_n$ is $g(z)^n=(q+p z)^n$. The following is the binomial expansion:

\displaystyle \begin{aligned}(8) \ \ \ \ \ \ g(z)^n&=(q+p z)^n \\&\text{ } \\&=\sum \limits_{j=0}^n \binom{n}{j} q^{n-j} \ p^j \ z^j \end{aligned}

By definition $(2)$, the generating function of $X=X_1+\cdots+X_n$ is:

$(9) \ \ \ \ \ \ g(z)^n=\sum \limits_{j=0}^\infty P(X=j) \ z^j$

Comparing $(8)$ and $(9)$, we have

$\displaystyle (10) \ \ \ \ \ \ P(X=j)=\left\{\begin{matrix}\displaystyle \binom{n}{j} p^j \ q^{n-j}&\ 0 \le j \le n\\{0}&\ j>n \end{matrix}\right.$

The probability distribution indicated by $(8)$ and $(10)$ is that of a binomial distribution. Since the probability distribution of a random variable is uniquely determined by its generating function, the independent sum of Bernoulli distributions must ave a Binomial distribution.

Example 2
Suppose that $X_1,X_2,\cdots,X_n$ are independent and have Poisson distributions with parameters $\alpha_1,\alpha_2,\cdots,\alpha_n$, respectively. Then the independent sum $X=X_1+\cdots+X_n$ has a Poisson distribution with parameter $\alpha=\alpha_1+\cdots+\alpha_n$.

Let $g(z)$ be the generating function of $X=X_1+\cdots+X_n$. For each $i$, the generating function of $X_i$ is $g_i(z)=e^{\alpha_i (z-1)}$. The key to the proof is that the product of the $g_i$ has the same general form as the individual $g_i$.

\displaystyle \begin{aligned}(11) \ \ \ \ \ \ g(z)&=g_1(z) \cdots g_n(z) \\&\text{ } \\&=e^{\alpha_1 (z-1)} \cdots e^{\alpha_n (z-1)} \\&\text{ } \\&=e^{(\alpha_1+\cdots+\alpha_n)(z-1)} \end{aligned}

The generating function in $(11)$ is that of a Poisson distribution with mean $\alpha=\alpha_1+\cdots+\alpha_n$. Since the generating function uniquely determines the distribution, we can deduce that the sum $X=X_1+\cdots+X_n$ has a Poisson distribution with parameter $\alpha=\alpha_1+\cdots+\alpha_n$.

Example 3
In rolling a fair die, let $X$ be the number shown on the up face. The associated generating function is:

$\displaystyle. \ \ \ \ \ \ g(z)=\frac{1}{6}(z+z^2+z^3+z^4+z^5+z^6)=\frac{z(1-z^6)}{6(1-z)}$

The generating function can be further reduced as:

\displaystyle \begin{aligned}. \ \ \ \ \ \ g(z)&=\frac{z(1-z^6)}{6(1-z)} \\&\text{ } \\&=\frac{z(1-z^3)(1+z^3)}{6(1-z)} \\&\text{ } \\&=\frac{z(1-z)(1+z+z^2)(1+z^3)}{6(1-z)} \\&\text{ } \\&=\frac{z(1+z+z^2)(1+z^3)}{6} \end{aligned}

Suppose that we roll the fair dice 4 times. Let $W$ be the sum of the 4 rolls. Then the generating function of $Z$ is

$\displaystyle. \ \ \ \ \ \ g(z)^4=\frac{z^4 (1+z^3)^4 (1+z+z^2)^4}{6^4}$

The random variable $W$ ranges from 4 to 24. Thus the probability function ranges from $P(W=4)$ to $P(W=24)$. To find these probabilities, we simply need to decode the generating function $g(z)^4$. For example, to find $P(W=12)$, we need to find the coefficient of the term $z^{12}$ in the polynomial $g(z)^4$. To help this decoding, we can expand two of the polynomials in $g(z)^4$.

\displaystyle \begin{aligned}. \ \ \ \ \ \ g(z)^4&=\frac{z^4 (1+z^3)^4 (1+z+z^2)^4}{6^4} \\&\text{ } \\&=\frac{z^4 \times A \times B}{6^4} \\&\text{ } \\&A=(1+z^3)^4=1+4z^3+6z^6+4z^9+z^{12} \\&\text{ } \\&B=(1+z+z^2)^4=1+4z+10z^2+16z^3+19z^4+16z^5+10z^6+4z^7+z^8 \end{aligned}

Based on the above polynomials, there are three ways of forming $z^{12}$. They are: $(z^4 \times 1 \times z^8)$, $(z^4 \times 4z^3 \times 16z^5)$, $(z^4 \times 6z^6 \times 10z^2)$. Thus we have:

$\displaystyle. \ \ \ \ \ \ P(W=12)=\frac{1}{6^4}(1+4 \times 16+6 \times 10)=\frac{125}{6^4}$

To find the other probabilities, we can follow the same decoding process.

________________________________________________
Remark
The probability distribution of a random variable is uniquely determined by its generating function. This fundamental property is useful in determining the distribution of an independent sum. The generating function of the independent sum is simply the product of the individual generating functions. If the product is of a certain distributional form (as in Example 1 and Example 2), then we can deduce that the sum must be of the same distribution.

We can also decode the product of generating functions to obtain the probability function of the independent sum (as in Example 3). The method in Example 3 is quite tedious. But one advantage is that it is a “machine process”, a pretty fool proof process that can be performed mechanically.

The machine process is this: Code the individual probability distribution in a generating function $g(z)$. Then raise it to $n$. After performing some manipulation to $g(z)^n$, decode the probabilities from $g(z)^n$.

As long as we can perform the algebraic manipulation carefully and correctly, this process will be sure to provide the probability distribution of an independent sum.

________________________________________________
The Moment Generating Function
The moment generating function of a random variable $X$ is $M_X(t)=E(e^{tX})$ on all real numbers $t$ for which the expected value exists. The moments can be computed more directly using an mgf. From the theory of mathematical analysis, it can be shown that if $M_X(t)$ exists on some interval $-a, then the derivatives of $M_X(t)$ of all orders exist at $t=0$. Furthermore, it can be show that $E(X^n)=M_X^{(n)}(0)$.

Suppose that $g(z)$ is the generating function of a random variable. The following relates the generating function and the moment generating function.

\displaystyle \begin{aligned}. \ \ \ \ \ \ &M_X(t)=g(e^t) \\&\text{ } \\&g(z)=M_X(ln z) \end{aligned}

________________________________________________

Reference

1. Feller W. An Introduction to Probability Theory and Its Applications, Third Edition, John Wiley & Sons, New York, 1968

# The hazard rate function

In this post, we introduce the hazard rate function using the notions of non-homogeneous Poisson process.

In a Poisson process, changes occur at a constant rate $\lambda$ per unit time. Suppose that we interpret the changes in a Poisson process from a mortality point of view, i.e. a change in the Poisson process mean a termination of a system, be it biological or manufactured, and this Poisson process counts the number of terminations as they occur. Then the rate of change $\lambda$ is interpreted as a hazard rate (or failure rate or force of mortality). With a constant force of mortality, the time until the next change is exponentially distributed. In this post, we discuss the hazard rate function in a more general setting. The process that counts of the number of terminations will no longer have a constant hazard rate, and instead will have a hazard rate function $\lambda(t)$, a function of time $t$. Such a counting process is called a non-homogeneous Poisson process. We discuss the survival probability models (the time to the first termination) associated with a non-homogeneous Poisson process. We then discuss several important examples of survival probability models, including the Weibull distribution, the Gompertz distribution and the model based on the Makeham’s law. See [1] for more information about the hazard rate function.

$\text{ }$

The Poisson Process
We start with the three postulates of a Poisson process. Consider an experiment in which the occurrences of a certain type of events are counted during a given time interval. We call the occurrence of the type of events in question a change. We assume the following three conditions:

1. The numbers of changes occurring in nonoverlapping intervals are independent.
2. The probability of two or more changes taking place in a sufficiently small interval is essentially zero.
3. The probability of exactly one change in the short interval $(t,t+\delta)$ is approximately $\lambda \delta$ where $\delta$ is sufficiently small and $\lambda$ is a positive constant.

$\text{ }$

When we interpret the Poisson process in a mortality point of view, the constant $\lambda$ is a hazard rate (or force of mortality), which can be interpreted as the rate of failure at the next instant given that the life has survived to time $t$. With a constant force of mortality, the survival model (the time until the next termination) has an exponential distribution with mean $\frac{1}{\lambda}$. We wish to relax the constant force of mortality assumption by making $\lambda$ a function of $t$ instead. The remainder of this post is based on the non-homogeneous Poisson process defined below.

$\text{ }$

The Non-Homogeneous Poisson Process
We modifiy condition 3 above by making $\lambda(t)$ a function of $t$. We have the following modified counting process.

1. The numbers of changes occurring in nonoverlapping intervals are independent.
2. The probability of two or more changes taking place in a sufficiently small interval is essentially zero.
3. The probability of exactly one change in the short interval $(t,t+\delta)$ is approximately $\lambda(t) \delta$ where $\delta$ is sufficiently small and $\lambda(t)$ is a nonnegative function of $t$.

$\text{ }$

We focus on the survival model aspect of such counting processes. Such process can be interpreted as models for the number of changes occurred in a time interval where a change means “termination” or ‘failure” of a system under consideration. The rate of change function $\lambda(t)$ indicated in condition 3 is called the hazard rate function. It is also called the failure rate function in reliability engineering and the force of mortality in life contingency theory.

Based on condition 3 in the non-homogeneous Poisson process, the hazard rate function $\lambda(t)$ can be interpreted as the rate of failure at the next instant given that the life has survived to time $t$.

Two random variables naturally arise from a non-homogeneous Poisson process are described here. One is the discrete variable $N_t$, defined as the number of changes in the time interval $(0,t)$. The other is the continuous random variable $T$, defined as the time until the occurrence of the first change. The probability distribution of $T$ is called a survival model. The following is the link between $N_t$ and $T$.

$\text{ }$

\displaystyle \begin{aligned}(1) \ \ \ \ \ \ \ \ \ &P[T > t]=P[N_t=0] \end{aligned}

$\text{ }$

Note that $P[T > t]$ is the probability that the next change occurs after time $t$. This means that there is no change within the interval $(0,t)$. We have the following theorems.

$\text{ }$

Theorem 1.
Let $\displaystyle \Lambda(t)=\int_{0}^{t} \lambda(y) dy$. Then $e^{-\Lambda(t)}$ is the probability that there is no change in the interval $(0,t)$. That is, $\displaystyle P[N_t=0]=e^{-\Lambda(t)}$.

Proof. We are interested in finding the probability of zero changes in the interval $(0,y+\delta)$. By condition 1, the numbers of changes in the nonoverlapping intervals $(0,y)$ and $(y,y+\delta)$ are independent. Thus we have:

$\text{ }$

$\displaystyle (2) \ \ \ \ \ \ \ \ P[N_{y+\delta}=0] \approx P[N_y=0] \times [1-\lambda(y) \delta]$

$\text{ }$

Note that by condition 3, the probability of exactly one change in the small interval $(y,y+\delta)$ is $\lambda(y) \delta$. Thus $[1-\lambda(y) \delta]$ is the probability of no change in the interval $(y,y+\delta)$. Continuing with equation $(2)$, we have the following derivation:

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\frac{P[N_{y+\delta}=0] - P[N_y=0]}{\delta} \approx -\lambda(y) P[N_y=0] \\&\text{ } \\&\frac{d}{dy} P[N_y=0]=-\lambda(y) P[N_y=0] \\&\text{ } \\&\frac{\frac{d}{dy} P[N_y=0]}{P[N_y=0]}=-\lambda(y) \\&\text{ } \\&\int_0^{t} \frac{\frac{d}{dy} P[N_y=0]}{P[N_y=0]} dy=-\int_0^{t} \lambda(y)dy \end{aligned}

$\text{ }$

Evaluating the integral on the left hand side with the boundary condition of $P[N_0=0]=1$ produces the following results:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &ln P[N_t=0]=-\int_0^{t} \lambda(y)dy \\&\text{ } \\&P[N_t=0]=e^{\displaystyle -\int_0^{t} \lambda(y)dy} \end{aligned}

$\text{ }$

Theorem 2
As discussed above, let $T$ be the length of the interval that is required to observe the first change. Then the following are the distribution function, survival function and pdf of $T$:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &F_T(t)=\displaystyle 1-e^{\displaystyle -\int_0^t \lambda(y) dy} \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\int_0^t \lambda(y) dy} \\&\text{ } \\&f_T(t)=\displaystyle \lambda(t) \ e^{\displaystyle -\int_0^t \lambda(y) dy} \end{aligned}

Proof. In Theorem 1, we derive the probability $P[N_y=0]$ for the discrete variable $N_y$ derived from the non-homogeneous Poisson process. We now consider the continuous random variable $T$, the time until the first change, which is related to $N_t$ by $(1)$. Thus $S_T(t)=P[T > t]=P[N_t=0]=e^{-\int_0^t \lambda(y) dy}$. The distribution function and density function can be derived accordingly.

$\text{ }$

Theorem 3
The hazard rate function $\lambda(t)$ is equivalent to each of the following:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\frac{f_T(t)}{1-F_T(t)} \\&\text{ } \\&\lambda(t)=\frac{-S_T^{'}(t)}{S_T(t)} \end{aligned}

$\text{ }$

Remark
Theorem 1 and Theorem 2 show that in a non-homogeneous Poisson process as described above, the hazard rate function $\lambda(t)$ completely specifies the probability distribution of the survival model $T$ (the time until the first change) . Once the rate of change function $\lambda(t)$ is known in the non-homogeneous Poisson process, we can use it to generate the survival function $S_T(t)$. All of the examples of survival models given below are derived by assuming the functional form of the hazard rate function. The result in Theorem 2 holds even outside the context of a non-homogeneous Poisson process, that is, given the hazard rate function $\lambda(t)$, we can derive the three distributional items $S_T(t)$, $F_T(t)$, $f_T(t)$.

The ratio in Theorem 3 indicates that the probability distribution determines the hazard rate function. In fact, the ratio in Theorem 3 is the usual definition of the hazard rate function. That is, the hazard rate function can be defined as the ratio of the density and the survival function (one minus the cdf). With this definition, we can also recover the survival function. Whenever $\displaystyle \lambda(x)=\frac{f_X(x)}{1-F_X(x)}$, we can derive:

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &S_X(x)=\displaystyle e^{-\int_0^t \lambda(y) dy} \end{aligned}

$\text{ }$

As indicated above, the hazard rate function can be interpreted as the failure rate at time $t$ given that the life in question has survived to time $t$. It is the rate of failure at the next instant given that the life or system being studied has survived up to time $t$.

It is interesting to note that the function $\Lambda(t)=\int_0^t \lambda(y) dy$ defined in Theorem 1 is called the cumulative hazard rate function. Thus the cumulative hazard rate function is an alternative way of representing the hazard rate function (see the discussion on Weibull distribution below).

——————————————————————————————————————
Examples of Survival Models

–Exponential Distribution–
In many applications, especially those for biological organisms and mechanical systems that wear out over time, the hazard rate $\lambda(t)$ is an increasing function of $t$. In other words, the older the life in question (the larger the $t$), the higher chance of failure at the next instant. For humans, the probability of a 85 years old dying in the next year is clearly higher than for a 20 years old. In a Poisson process, the rate of change $\lambda(t)=\lambda$ indicated in condition 3 is a constant. As a result, the time $T$ until the first change derived in Theorem 2 has an exponential distribution with parameter $\lambda$. In terms of mortality study or reliability study of machines that wear out over time, this is not a realistic model. However, if the mortality or failure is caused by random external events, this could be an appropriate model.

–Weibull Distribution–
This distribution is an excellent model choice for describing the life of manufactured objects. It is defined by the following cumulative hazard rate function:

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\Lambda(t)=\biggl(\frac{t}{\beta}\biggr)^{\alpha} \end{aligned} where $\alpha > 0$ and $\beta>0$

$\text{ }$

As a result, the hazard rate function, the density function and the survival function for the lifetime distribution are:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\frac{\alpha}{\beta} \biggl(\frac{t}{\beta}\biggr)^{\alpha-1} \\&\text{ } \\&f_T(t)=\frac{\alpha}{\beta} \biggl(\frac{t}{\beta}\biggr)^{\alpha-1} \displaystyle e^{\displaystyle -\biggl[\frac{t}{\beta}\biggr]^{\alpha}} \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\biggl[\frac{t}{\beta}\biggr]^{\alpha}} \end{aligned}

$\text{ }$

The parameter $\alpha$ is the shape parameter and $\beta$ is the scale parameter. When $\alpha=1$, the hazard rate becomes a constant and the Weibull distribution becomes an exponential distribution.

When the parameter $\alpha<1$, the failure rate decreases over time. One interpretation is that most of the defective items fail early on in the life cycle. Once they they are removed from the population, failure rate decreases over time.

When the parameter $1<\alpha$, the failure rate increases with time. This is a good candidate for a model to describe the lifetime of machines or systems that wear out over time.

–The Gompertz Distribution–
The Gompertz law states that the force of mortality or failure rate increases exponentially over time. It describe human mortality quite accurately. The following is the hazard rate function:

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\alpha e^{\beta t} \end{aligned} where $\alpha>0$ and $\beta>0$.

$\text{ }$

The following are the cumulative hazard rate function as well as the survival function, distribution function and the pdf of the lifetime distribution $T$.

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\Lambda(t)=\int_0^t \alpha e^{\beta y} dy=\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta} \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}\biggr)} \\&\text{ } \\&F_T(t)=\displaystyle 1-e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}\biggr)} \\&\text{ } \\&f_T(t)=\displaystyle \alpha \ e^{\beta t} \ e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}\biggr)} \end{aligned}

$\text{ }$

–Makeham’s Law–
The Makeham’s Law states that the force of mortality is the Gompertz failure rate plus an age-indpendent component that accounts for external causes of mortality. The following is the hazard rate function:

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\alpha e^{\beta t}+\mu \end{aligned} where $\alpha>0$, $\beta>0$ and $\mu>0$.

$\text{ }$

The following are the cumulative hazard rate function as well as the survival function, distribution function and the pdf of the lifetime distribution $T$.

$\text{ }$

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\Lambda(t)=\int_0^t (\alpha e^{\beta y}+\mu) dy=\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t\biggr)} \\&\text{ } \\&F_T(t)=\displaystyle 1-e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t\biggr)} \\&\text{ } \\&f_T(t)=\biggl( \alpha e^{\beta t}+\mu t \biggr) \ e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t\biggr)} \end{aligned}

$\text{ }$

Reference

1. Klugman S.A., Panjer H. H., Wilmot G. E. Loss Models, From Data to Decisions, Second Edition., Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2004