Defining the Poisson distribution

The Poisson distribution is a family of discrete distributions with positive probabilities on the non-negative numbers 0,1,2,\cdots. Each distribution in this family is indexed by a positive number \lambda>0. One way to define this distribution is to give its probability function given the parameter \lambda and then derive various distributional quantities such as mean and variance. Along with other mathematical facts, it can be shown that both the mean and the variance are \lambda. In this post, we take a different tack. We look at two view points that give rise to the Poisson distribution. Taking this approach will make it easier to appreciate some of the possible applications of the Poisson distribution. The first view point is that the Poisson distribution is the limiting case of the binomial distribution. The second view point is through the Poisson process, a stochastic process that, under some conditions, counts the number of events and the time points at which these events occur in a given time (or physical) interval.

________________________________________________________________________

Poisson as a limiting case of binomial

A binomial distribution where the number of trials n is large and the probability of success p is small such that np is moderate in size can be approximated using the Poisson distribution with mean \lambda=np. This fact follows from Theorem 1, which indicates that the Poisson distribution is the limiting case of the binomial distribution.

Theorem 1
Let \lambda be a fixed positive constant. Then for each integer x=0,1,2,\cdots, the following is true:

    \displaystyle \lim_{n \rightarrow \infty} \binom{n}{x} \ p^x \  (1-p)^{n-x}=\lim_{n \rightarrow \infty} \frac{n!}{x! \ (n-x)!} \ p^x \  (1-p)^{n-x}=\frac{e^{-\lambda} \ \lambda^x}{x!}

where \displaystyle p=\frac{\lambda}{n}.

Proof of Theorem 1
We start with a binomial distribution with n trials and with \displaystyle p=\frac{\lambda}{n} being the probability of success, where n>\lambda. Let X_n be the count of the number of successes in these n Bernoulli trials. The following is the probability that X_n=k.

    \displaystyle \begin{aligned} P(X_n=k)&=\binom{n}{k} \biggl(\frac{\lambda}{n}\biggr)^k \biggr(1-\frac{\lambda}{n}\biggr)^{n-k} \\&=\frac{n!}{k! (n-k)!} \biggl(\frac{\lambda}{n}\biggr)^k \biggr(1-\frac{\lambda}{n}\biggr)^{n-k} \\&=\frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \biggl(\frac{\lambda^k}{k!}\biggr) \biggr(1-\frac{\lambda}{n}\biggr)^{n} \biggr(1-\frac{\lambda}{n}\biggr)^{-k} \\&=\biggl(\frac{\lambda^k}{k!}\biggr) \ \biggl[ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \ \biggr(1-\frac{\lambda}{n}\biggr)^{n} \ \biggr(1-\frac{\lambda}{n}\biggr)^{-k} \biggr] \end{aligned}

In the last step, the terms that contain n are inside the square brackets. Let’s see what they are when n approaches infinity.

    \displaystyle \lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k}=1

    \displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\lambda}{n}\biggr)^{n}=e^{-\lambda}

    \displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\lambda}{n}\biggr)^{-k}=1

The reason that the first result is true is that the numerator is a polynomial where the leading term is n^k. Upon dividing by n^k and taking the limit, we get 1. The second result is true since the following limit is one of the definitions of the exponential function e^x.

    \displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1+\frac{x}{n}\biggr)^{n}=e^{x}

The third result is true since the exponent -k is a constant. Thus the following is the limit of the probability P(X_n=k) as n \rightarrow \infty.

    \displaystyle \begin{aligned} \lim \limits_{n \rightarrow \infty} P(X_n=k)&= \biggl(\frac{\lambda^k}{k!}\biggr) \ \lim \limits_{n \rightarrow \infty} \biggl[ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \ \biggr(1-\frac{\lambda}{n}\biggr)^{n} \ \biggr(1-\frac{\lambda}{n}\biggr)^{-k} \biggr] \\&=\biggl(\frac{\lambda^k}{k!}\biggr) \cdot 1 \cdot e^{-\lambda} \cdot 1 \\&=\frac{e^{-\lambda} \lambda^k}{k!} \end{aligned}

This above derivation completes the proof. \blacksquare

In a given binomial distribution, whenever the number of trials n is large and the probability p of success in each trial is small (i.e. each of the Bernoulli trial rarely results in a success), Theorem 1 tells us that we can use the Poisson distribution with parameter \lambda=np to estimate the binomial distribution.

Example 1
The probability of being dealt a full house in a hand of poker is approximately 0.001441. Out of 5000 hands of poker that are dealt at a certain casino, what is the probability that there will be at most 4 full houses?

Let X be the number of full houses in these 5000 poker hands. The exact distribution for X is the binomial distribution with n= 5000 and p= 0.001441. Thus example deals with a large number of trials where each trial is a rare event. So the Poisson estimation is applicable. Let \lambda= 5000(0.001441) = 7.205. Then P(X \le 4) can be approximated by the Poisson random variable Y with parameter \lambda. The following is the probability function of Y:

    \displaystyle P(Y=y)=e^{-7.205} \ \frac{7.205^y}{y!}

The following is the approximation of P(X \le 4):

    \displaystyle \begin{aligned} P(X \le 4)&\approx P(Y \le 4) \\&=P(Y=0)+P(Y=1)+P(Y=2)+P(Y=3)+P(Y=4) \\&= e^{-7.205} \biggl[ 1+7.205+\frac{7.205^2}{2!}+\frac{7.205^3}{3!}+\frac{7.205^4}{4!}\biggr] \\&=0.155098087  \end{aligned}

The following is a side by side comparison between the binomial distribution and its Poisson approximation. For all practical purposes, they are indistingusihable from one another.

    \displaystyle \begin{bmatrix} \text{Count of}&\text{ }&\text{ }&\text{Binomial } &\text{ }&\text{ }&\text{Poisson } \\\text{Full Houses}&\text{ }&\text{ }&P(X \le x) &\text{ }&\text{ }&P(Y \le x) \\\text{ }&\text{ }&\text{ }&n=5000 &\text{ }&\text{ }&\lambda=7.205 \\\text{ }&\text{ }&\text{ }&p=0.001441 &\text{ }&\text{ }&\text{ } \\\text{ }&\text{ }&\text{ } &\text{ }&\text{ } \\ 0&\text{ }&\text{ }&0.000739012&\text{ }&\text{ }&0.000742862 \\ 1&\text{ }&\text{ }&0.006071278&\text{ }&\text{ }&0.006095184 \\ 2&\text{ }&\text{ }&0.025304641&\text{ }&\text{ }&0.025376925 \\ 3&\text{ }&\text{ }&0.071544923&\text{ }&\text{ }&0.071685238 \\ 4&\text{ }&\text{ }&0.154905379&\text{ }&\text{ }&0.155098087  \\ 5&\text{ }&\text{ }&0.275104906&\text{ }&\text{ }&0.275296003 \\ 6&\text{ }&\text{ }&0.419508250&\text{ }&\text{ }&0.419633667 \\ 7&\text{ }&\text{ }&0.568176421 &\text{ }&\text{ }&0.568198363   \\ 8&\text{ }&\text{ }&0.702076190 &\text{ }&\text{ }&0.701999442 \\ 9&\text{ }&\text{ }&0.809253326&\text{ }&\text{ }&0.809114639 \\ 10&\text{ }&\text{ }&0.886446690&\text{ }&\text{ }&0.886291139 \\ 11&\text{ }&\text{ }&0.936980038&\text{ }&\text{ }&0.936841746 \\ 12&\text{ }&\text{ }&0.967298041&\text{ }&\text{ }&0.967193173 \\ 13&\text{ }&\text{ }&0.984085073&\text{ }&\text{ }&0.984014868 \\ 14&\text{ }&\text{ }&0.992714372&\text{ }&\text{ }&0.992672033 \\ 15&\text{ }&\text{ }&0.996853671&\text{ }&\text{ }&0.996830358 \end{bmatrix}

The above table is calculated using the functions BINOM.DIST and POISSON.DIST in Excel. The following shows how it is done. The parameter TRUE indicates that the result is a cumulative distribution. When it is set to FALSE, the formula gives the probability function.

    P(X \le x)=\text{BINOM.DIST(x, 5000, 0.001441, TRUE)}

    P(Y \le x)=\text{POISSON.DIST(x, 7.205, TRUE)}

________________________________________________________________________

The Poisson distribution

The limit in Theorem 1 is a probability function and the resulting distribution is called the Poisson distribution. We now gives the formal definition. A random variable X that takes on one of the numbers 0,1,2,\cdots is said to be a Poisson random variable with parameter \lambda>0 if

    \displaystyle P(X=x)=\frac{e^{-\lambda} \ \lambda^x}{x!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x=0,1,2,\cdots

It can be shown that the above function is indeed a probability function, i.e., the probabilities sum to 1. Any random variable that has a probability function of the above form is said to follow (or to have) a Poisson distribution. Furthermore, it can be shown that E(X)=var(X)=\lambda, i.e., the Poisson parameter is both the mean and variance. Thus the Poisson distribution may be a good fit if the observed data indicate that the sample mean and the sample variance are nearly identical.

The following is the moment generating function of the Poisson distribution with parameter \lambda.

    \displaystyle M(t)=E(e^{tX})=e^{\lambda \ (e^t-1)}

One consequence of the Poisson moment generating function is that any independent sum of Poisson distributions is again a Poisson distribution.

________________________________________________________________________

The Poisson process

Another way, the more important way, to look at the Poisson distribution is the view point of the Poisson process. Consider an experiment in which events that are of interest occur at random in a time interval. The goal here is to record the time of the occurrence of each random event and for the purpose at hand, count the number of random events occurring in a fixed time interval. Starting at time 0, note the time of the occurrence of the first event. Then the time at which the second random event occurs and so on. Out of these measurements, we can derive the length of time between the occurrences of any two consecutive random events. Such measurements belong to a continuous random variable. In this post, we focus on the discrete random variable of the count of the random events in a fixed time interval.

A good example of a Poisson process is the well known experiment in radioactivity conducted by Rutherford and Geiger in 1910. In this experiment, \alpha-particles were emitted from a polonium source and the number of \alpha-particles were counted during an interval of 7.5 seconds (2608 many such time intervals were observed). A Poisson process is a random process in which several criteria are satisfied. We will show that in a Poisson process, the number of these random occurrences in the fixed time interval will follow a Poisson distribution. First, we discuss the criteria to which a Poisson process must conform.

One of the criteria is that in a very short time interval, the chance of having more than one random event is essentially zero. So either one random event will occur or none will occur in a very short time interval. Considering the occurrence of a random event as a success, there is either a success or a failure in a very short time interval. So a very short time interval in a Poisson process can be regarded as a Bernoulli trial.

The second criterion is that the experiment remains constant over time. Specifically this means that the probability of a random event occurring in a given subinterval is proportional to the length of that subinterval and not on where the subinterval is in the original interval. For example, in the 1910 radioactivity study, \alpha-particles were emitted at the rate of \lambda= 3.87 per 7.5 seconds. So the probability of one \alpha-particle emitted from the radioactive source in a one-second interval is 3.87/7.5 = 0.516. Then the probability of observing one \alpha-particle in a half-second interval is 0.516/2 = 0.258. For a quarter-second interval, the probability is 0.258/2 = 0.129. So if we observe half as long, it will be half as likely to observe the occurrence of a random event. On the other hand, it does not matter when the quarter-second subinterval is, whether at the beginning or toward the end of the original interval of 7.5 seconds.

The third criterion is that non-overlapping subintervals are mutually independent in the sense that what happens in one subinterval (i.e. the occurrence or non-occurrence of a random event) will have no influence on the occurrence of a random event in another subinterval. To summarize, the following are the three criteria of a Poisson process:

    Suppose that on average \lambda random events occur in a time interval of length 1.

    1. The probability of having more than one random event occurring in a very short time interval is essentially zero.
    2. For a very short subinterval of length \frac{1}{n} where n is a sufficiently large integer, the probability of a random event occurring in this subinterval is \frac{\lambda}{n}.
    3. The numbers of random events occurring in non-overlapping time intervals are independent.

Consider a Poisson process in which the average rate is \lambda random events per unit time interval. Let Y be the number of random events occurring in the unit time interval. In the 1910 radioactivity study, the unit time interval is 7.5 seconds and Y is the count of the number of \alpha-particles emitted in 7.5 seconds. It follows that Y has a Poisson distribution with parameter \lambda. To see this, subdivide the unit interval into n non-overlapping subintervals of equal length where n is a sufficiently large integer. Let X_{n,j} be the number of random events in the the jth subinterval (1 \le j \le n). Based on the three assumptions, X_{n,1},X_{n,2},\cdots,X_{n,n} are independent Bernoulli random variables, where the probability of success for each X_{n,j} is \frac{\lambda}{n}. Then X_n=X_{n,1}+X_{n,2}+\cdots+X_{n,n} has a binomial distribution with parameters n and p=\frac{\lambda}{n}. Theorem 1 tells us that the limiting case of the binomial distributions for X_n is the Poisson distribution with parameter \lambda. This Poisson distribution should agree with the distribution for Y. The Poisson is also discussed in quite a lot of details in the previous post called Poisson as a Limiting Case of Binomial Distribution.

We now examine the 1910 radioactivity study a little more closely.

Example 2
The basic idea of the 1910 radioactivity study conducted by Rutherford and Geiger is that a polonium source was placed a short distance from an observation point. The number of \alpha-particles emitted from the source were counted in 7.5-second intervals for 2608 times. The following is the tabulated results.

    \displaystyle \begin{bmatrix}   \text{Number of alpha particles}&\text{ }&\text{Observed}   \\ \text{recorded per 7.5 seconds }&\text{ }&\text{counts}    \\ \text{ }&\text{ }&\text{ }   \\ 0&\text{ }&57    \\ 1&\text{ }&203    \\ 2&\text{ }&383    \\ 3&\text{ }&525    \\ 4&\text{ }&532    \\ 5&\text{ }&408   \\ 6&\text{ }&273   \\ 7&\text{ }&139   \\ 8&\text{ }&45   \\ 9&\text{ }&27   \\ 10&\text{ }&10  \\ 11+&\text{ }&6  \\ \text{ }&\text{ }&\text{ }  \\ \text{Total }&\text{ }&2608    \end{bmatrix}

What is the average number of particles observed per 7.5 seconds? The total number of \alpha-particles in these 2608 periods is

    0 \times 57+1 \times 203+2 \times 383+ 3 \times 525 + \cdots=10097.

The mean count per period is \lambda=\frac{10097}{2608}=3.87. Consider the Poisson distribution with parameter 3.87. The following is its probability function.

    \displaystyle P(X=x)=\frac{e^{-3.87} \ 3.87^x}{x!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x=0,1,2,\cdots

Out of 2608 periods, the expected number of periods with x particles in emission is 2608P(X=x). The following is a side by side comparison in the observed counts and the expected counts.

    \displaystyle \begin{bmatrix}   \text{Number of alpha particles}&\text{ }&\text{Observed}&\text{ }&\text{Expected}    \\ \text{recorded per 7.5 seconds }&\text{ }&\text{counts}&\text{ }&\text{counts}    \\ \text{ }&\text{ }&\text{ }&\text{ }&2608 \times P(X=x)  \\ \text{ }&\text{ }&\text{ }&\text{ }&\text{ }   \\ 0&\text{ }&57&\text{ }&54.40    \\ 1&\text{ }&203&\text{ }&210.52  \\ 2&\text{ }&383&\text{ }&407.36    \\ 3&\text{ }&525&\text{ }&525.50    \\ 4&\text{ }&532&\text{ }&508.42    \\ 5&\text{ }&408&\text{ }&393.52   \\ 6&\text{ }&273&\text{ }&253.82   \\ 7&\text{ }&139&\text{ }&140.32   \\ 8&\text{ }&45&\text{ }&67.88   \\ 9&\text{ }&27&\text{ }&29.19   \\ 10&\text{ }&10&\text{ }&11.30  \\ 11+&\text{ }&6&\text{ }&5.78  \\ \text{ }&\text{ }&\text{ }&\text{ }&\text{ }  \\ \text{Total }&\text{ }&2608&\text{ }&2608    \end{bmatrix}

The expected counts are quite close to the observed counts, showing that the Poisson distribution is a very good fit to the observed data from the 1910 study.

________________________________________________________________________

More comments about the Poisson process

We have described the Poisson process as the distribution of random events in a time interval. The same idea can be used to describe random events occurring along a spatial interval, i.e. intervals in terms of distance or volume or other spatial measurements (see Examples 5 and 6 below).

Another point to make is that sometimes it may be necessary to consider an interval other than the unit length. Instead of counting the random events occurring in an interval of length 1, we may want to count the random events in an interval of length t. As before, let \lambda be the rate of occurrences in a unit interval. Then the rate of occurrences of the random events is over the interval of length t is \lambda t. The same idea will derive that fact that the number of occurrences of the random events of interest in the interval of length t is a Poisson distribution with parameter \lambda t. The following is its probability function.

    \displaystyle P(X_t=x)=\frac{e^{-\lambda t} \ (\lambda t)^x}{x!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x=0,1,2,\cdots

where X_t is the count of the random events in an interval of length t.

For example, in the 1910 radioactive study, the unit length is 7.5 seconds. The number of \alpha-particles observed in a half unit interval (3.75 seconds) will follow a Poisson distribution with parameter 0.5 \lambda= 0.5(3.87) = 1.935 with the following probability function:

    \displaystyle P(X_{0.5}=x)=\frac{e^{-1.935} \ (1.935)^x}{x!}  \ \ \ \ \ \ \ \ \ \ \ \ \ x=0,1,2,\cdots

________________________________________________________________________

More examples

Example 3
A radioactive source is metered for 5 hours. During this period, 9638 \alpha-particles are counted. What is the probability that during the next minute, between 30 and 34 particles (both inclusive ) will be counted?

The average number of \alpha-particles counted per minute is \lambda=\frac{9638}{300}=32.12. Let X be the number of \alpha-particles counted per minute. Then X has a Poisson distribution with parameter \lambda=32.12. The following calculates P(30 \le X \le 34).

    \displaystyle \begin{aligned} P(30 \le X \le 34)&=e^{-32.12} \biggl[ \frac{32.12^{30}}{30!}+\frac{32.12^{31}}{31!}+\frac{32.12^{32}}{32!}+\frac{32.12^{33}}{33!}+\frac{32.12^{34}}{34!}   \biggr] \\&=0.341118569  \end{aligned}

Alternatively, the POISSON.DIST function in Excel can be used as follows:

    \displaystyle \begin{aligned} P(30 \le X \le 34)&=P(X \le 34)-P(X \le 29) \\&=\text{POISSON.DIST(34,32.12,TRUE)} \\& \ \ \ \ \ \ -\text{POISSON.DIST(29,32.12,TRUE)} \\&=0.671501917-0.330383348 \\&=0.341118569  \end{aligned}

Example 4
The side effect of dry mouth is known to be experienced, on the average, by 5 out of 10,000 individuals taking a certain medication. About 20,000 patients are expected to take this medication next year. What is the probability that between 12 and 16 (both inclusive) patients will experience the side effect of dry mouth? What is the exact probability model that can also be used to work this problem?

The exact model is a binomial distribution. The number of trials n= 20000 and the probability of success in each trial is p= 0.0005 (experiencing the side effect). Here, we use Poisson to estimate the binomial. The average number of patients experiencing side effect is \lambda=20000(0.0005)=10. Let X be the number of patients experiencing the side effect. The following calculates the Poisson probability for P(12 \le X \le 16) in two different ways.

    \displaystyle \begin{aligned} P(12 \le X \le 16)&=e^{-10} \biggl[ \frac{10^{12}}{12!}+\frac{10^{13}}{13!}+\frac{10^{14}}{14!}+\frac{10^{15}}{15!}+\frac{10^{16}}{16!}   \biggr] \\&=0.276182244  \end{aligned}
    \displaystyle \begin{aligned} P(12 \le X \le 16)&=P(X \le 11)-P(X \le 16) \\&=\text{POISSON.DIST(16,10,TRUE)} \\& \ \ \ \ \ \ -\text{POISSON.DIST(11,10,TRUE)} \\&=0.97295839-0.696776146 \\&=0.276182244  \end{aligned}

Example 5
In a 10-mile stretch of a highway, car troubles (e.g. tire punctures, dead batteries, and mechanical breakdown) occur at a rate of 1.5 per hour. A tow truck driver can respond to such car troubles and offer roadside assistance, which can include towing and minor repair. Assume that the number of such incidences per hour follows a Poisson distribution. At the beginning of the hour, three tow trucks (and their drivers) are available to respond to any car troubles in this stretch of highway. What is the probability that in the next hour all three tow trick drivers will be busy helping motorists with car troubles in this stretch of highway?

Let X be the number of car troubles that occur in this 10-mile stretch of highway in the one-hour period in question. If in this one hour there are 3 or more car troubles (X \ge 3), then all three tow truck drivers will be busy.

    \displaystyle \begin{aligned} P(X \ge 3)&=1-P(X \le 2) \\&=1-e^{-1.5} \biggl[ 1+1.5+\frac{1.5^{2}}{2!}   \biggr] \\&=1-0.808846831\\&=0.191153169  \end{aligned}

Example 6
Continuing Example 5. Considering that there is only 19% chance that all 3 tow truck drivers will be busy, there is a good chance that the resources are under utilized. What if one of the drivers is assigned to another stretch of highway?

With only two tow trucks available for this 10-mile stretch of highway, the following is the probability that all two tow truck drivers will be busy:

    \displaystyle \begin{aligned} P(X \ge 2)&=1-P(X \le 1) \\&=1-e^{-1.5} \biggl[ 1+1.5   \biggr] \\&=1-0.5578254\\&=0.4421746  \end{aligned}

Assigning one driver to another area seems to be a better way to make good use of the available resources. With only two tow truck drivers available, there is much reduced chance (56%) that one of the drivers will be idle, and there is a much increased chance (44%) that all available drivers will be busy.

________________________________________________________________________

Remarks

The Poisson distribution is one of the most important of all probability models and has shown to be an excellent model for a wide array of phenomena such as

  • the number of \alpha-particles emitted from radioactive source in a given amount of time,
  • the number of vehicles passing a particular location on a busy highway,
  • the number of traffic accidents in a stretch of highway in a given period of time,
  • the number of phone calls arriving at a particular point in a telephone network in a fixed time period,
  • the number of insurance losses/claims in a given period of time,
  • the number of customers arriving at a ticket window,
  • the number of earthquakes occurring in a fixed period of time,
  • the number of mutations on a strand of DNA.
  • the number of hurricanes in a year that originate in the Atlantic ocean.

What is the Poisson distribution so widely applicable in these and many other seemingly different and diverse phenomena? What is the commonality that ties all these different and diverse phenomena? The commonality is that all these phenomena are basically a series of independent Bernoulli trials. If a phenomenon is a Binomial model where n is large and p is small, then it has a strong connection to Poisson model mathematically through Theorem 1 above (i.e. it has a Poisson approximation). On the other hand, if the random phenomenon follows the criteria in a Poisson process, then the phenomenon is also approximately a Binomial model, which means that in the limiting case it is Poisson.

In both view points discussed in this post, the Poisson distribution can be regarded as a binomial distribution taken at a very granular level. This connection with the binomial distribution points to a vast arrays of problems that can be solved using the Poisson distribution.

________________________________________________________________________

Exercises

Practice problems for the Poisson concepts discussed above can be found in the companion blog (go there via the following link). Working on these exercises is strongly encouraged (you don’t know it until you can do it).

________________________________________________________________________
\copyright \ \text{2015 by Dan Ma}

Poisson as a Limiting Case of Binomial Distribution

In many binomial problems, the number of Bernoulli trials n is large, relatively speaking, and the probability of success p is small such that n p is of moderate magnitude. For example, consider problems that deal with rare events where the probability of occurrence is small (as a concrete example, counting the number of people with July 1 as birthday out of a random sample of 1000 people). It is often convenient to approximate such binomial problems using the Poisson distribution. The justification for using the Poisson approximation is that the Poisson distribution is a limiting case of the binomial distribution. Now that cheap computing power is widely available, it is quite easy to use computer or other computing devices to obtain exact binomial probabiities for experiments up to 1000 trials or more. Though the Poisson approximation may no longer be necessary for such problems, knowing how to get from binomial to Poisson is important for understanding the Poisson distribution itself.

Consider a counting process that describes the occurrences of a certain type of events of interest in a unit time interval subject to three simplifying assumptions (discussed below). We are interested in counting the number of occurrences of the event of interest in a unit time interval. As a concrete example, consider the number of cars arriving at an observation point in a certain highway in a period of time, say one hour. We wish to model the probability distribution of how many cars that will arrive at the observation point in this particular highway in one hour. Let X be the random variable described by this probability distribution. We wish to konw the probability that there are k cars arriving in one hour. We start with using a binomial distribution as an approximation to the probability P(X=k). We will see that upon letting n \rightarrow \infty, the P(X=k) is a Poisson probability.

Suppose that we know E(X)=\alpha, perhaps an average obtained after observing cars at the observation points for many hours. The simplifying assumptions alluded to earlier are the following:

  1. The numbers of cars arriving in nonoverlapping time intervals are independent.
  2. The probability of one car arriving in a very short time interval of length h is \alpha h.
  3. The probability of having more than one car arriving in a very short time interval is esstentially zero.

Assumption 1 means that a large number of cars arriving in one period does not imply fewer cars will arrival in the next period and vice versa. In other words, the number of cars that arrive in any one given moment does not affect the number of cars that will arrive subsequently. Knowing how many cars arriving in one minute will not help predict the number of cars arriving at the next minute.

Assumption 2 means that the rate of cars arriving is dependent only on the length of the time interval and not on when the time interval occurs (e.g. not on whether it is at the beginning of the hour or toward the end of the hour).

Assumptions 3 allows us to think of a very short period of time as a Bernoulli trial. Thinking of the arrival of a car as a success, each short time interval will result in only one success or one failure.

Assumption 2 tells us that non-overlapping short time intervals of the same length have identical probability of success. Overall, all three assumptions imply that non-overlapping short intervals of the same length are inpdendent Bernoulli trials with identical probability of success, giving us the basis for applying binomial distribution.

To start, we can break up the hour into 60 minutes (into 60 Bernoulli trials). We then consider the binomial distribution with n=60 and p=\frac{\alpha}{60}. So the following is an approximation to our desired probability distribution.

\displaystyle (1) \ \ \ \ \ P(X=k) \approx \binom{60}{k} \biggl(\frac{\alpha}{60}\biggr)^k \biggr(1-\frac{\alpha}{60}\biggr)^{60-k} \ \ \ \ \ k=0,1,2,\cdots, 60

Conceivably, there can be more than 1 car arriving in a minute and observing cars in a one-minute interval may not be a Bernoulli trial. For a one-minute interval to qualify as a Bernoulli trial, there is either no car arriving or 1 car arriving in that one minute. So we can break up an hour into 3,600 seconds (into 3,600 Bernoulli trials). We now consider the binomial distribution with n=3600 and p=\frac{\alpha}{3600}.

\displaystyle (2) \ \ \ \ \ P(X=k) \approx \binom{3600}{k} \biggl(\frac{\alpha}{3600}\biggr)^k \biggr(1-\frac{\alpha}{3600}\biggr)^{3600-k} \ \ \ \ \ k=0,1,2,\cdots, 3600

It is also conceivable that more than 1 car can arrive in one second and observing cars in one-second interval may still not qualify as a Bernoulli trial. So we need to get more granular. We can divide up the hour into n equal subintervals, each of length \frac{1}{n}. Assumption 3 ensures that each subinterval is a Bernoulli trial (either it is a success or a failure; one car arriving or no car arriving). Assumption 2 ensures that the non-overlapping subintervals all have the same probability of success. Assumption 1 tells us that all the n subintervals are independent. So breaking up the hour into n moments and counting the number of moments that are successes will result in a binomial distribution with parameters n and p=\frac{\alpha}{n}. So we are ready to proceed with the following approximation to our probability distribution P(X=k).

\displaystyle (3) \ \ \ \ \ P(X=k) \approx \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \ \ \ \ \ k=0,1,2,\cdots, n

As we get more granular, n \rightarrow \infty. We show that the limit of the binomial probability in (3) is the Poisson distribution with parameter \alpha. We show the following.

\displaystyle (4) \ \ \ \ \ P(X=k) = \lim \limits_{n \rightarrow \infty} \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k}=\frac{e^{-\alpha} \alpha^k}{k!} \ \ \ \ \ \ k=0,1,2,\cdots

In the derivation of (4), we need the following two mathematical tools. The statement (5) is one of the definitions of the mathematical constant e. In the statement (6), the integer n in the numerator is greater than the integer k in the denominator. It says that whenever we work with such a ratio of two factorials, the result is the product of n with the smaller integers down to (n-(k-1)). There are exactly k terms in the product.

\displaystyle (5) \ \ \ \ \ \lim \limits_{n \rightarrow \infty} \biggl(1+\frac{x}{n}\biggr)^n=e^x

\displaystyle (6) \ \ \ \ \ \frac{n!}{(n-k)!}=n(n-1)(n-2) \cdots (n-k+1) \ \ \ \ \ \ \ \  k<n

The following is the derivation of (4).

\displaystyle \begin{aligned}(7) \ \ \ \ \  P(X=k)&=\lim \limits_{n \rightarrow \infty} \binom{n}{k} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \\&=\lim \limits_{n \rightarrow \infty} \ \frac{n!}{k! (n-k)!} \biggl(\frac{\alpha}{n}\biggr)^k \biggr(1-\frac{\alpha}{n}\biggr)^{n-k} \\&=\lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \biggl(\frac{\alpha^k}{k!}\biggr) \biggr(1-\frac{\alpha}{n}\biggr)^{n} \biggr(1-\frac{\alpha}{n}\biggr)^{-k} \\&=\biggl(\frac{\alpha^k}{k!}\biggr) \lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \\&\times \ \ \ \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{n} \ \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{-k} \\&=\frac{e^{-\alpha} \alpha^k}{k!} \end{aligned}

In (7), we have \displaystyle \lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k}=1. The reason being that the numerator is a polynomial where the leading term is n^k. Upon dividing by n^k and taking the limit, we get 1. Based on (5), we have \displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{n}=e^{-\alpha}. For the last limit in the derivation we have \displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\alpha}{n}\biggr)^{-k}=1.

We conclude with some comments. As the above derivation shows, the Poisson distribution is at heart a binomial distribution. When we divide the unit time interval into more and more subintervals (as the subintervals get more and more granular), the resulting binomial distribution behaves more and more like the Poisson distribution.

The three assumtions used in the derivation are called the Poisson postulates, which are the underlying assumptions that govern a Poisson process. Such a random process describes the occurrences of some type of events that are of interest (e.g. the arrivals of cars in our example) in a fixed period of time. The positive constant \alpha indicated in Assumption 2 is the parameter of the Poisson process, which can be interpreted as the rate of occurrences of the event of interest (or rate of changes, or rate of arrivals) in a unit time interval, meaning that the positive constant \alpha is the mean number of occurrences in the unit time interval. The derivation in (7) shows that whenever a certain type of events occurs according to a Poisson process with parameter \alpha, the counting variable of the number of occurrences in the unit time interval is distributed according to the Poisson distribution as indicated in (4).

If we observe the occurrences of events over intervals of length other than unit length, say, in an interval of length t, the counting process is governed by the same three postulates, with the modification to Assumption 2 that the rate of changes of the process is now \alpha t. The mean number of occurrences in the time interval of length t is now \alpha t. The Assumption 2 now states that for any very short time interval of length h (and that is also a subinterval of the interval of length t under observation), the probability of having one occurrence of event in this short interval is (\alpha t)h. Applyng the same derivation, it can be shown that the number of occurrences (X_t) in a time interval of length t has the Poisson distribution with the following probability mass function.

\displaystyle (8) \ \ \ \ \ P(X_t=k)=\frac{e^{-\alpha t} \ (\alpha t)^k}{k!} \ \ \ \ \ \ \ \  k=0,1,2,\cdots

Relating Binomial and Negative Binomial

The negative binomial distribution has a natural intepretation as a waiting time until the arrival of the rth success (when the parameter r is a positive integer). The waiting time refers to the number of independent Bernoulli trials needed to reach the rth success. This interpretation of the negative binomial distribution gives us a good way of relating it to the binomial distribution. For example, if the rth success takes place after k failed Bernoulli trials (for a total of k+r trials), then there can be at most r-1 successes in the first k+r trials. This tells us that the survival function of the negative binomial distribution is the cumulative distribution function (cdf) of a binomial distribution. In this post, we gives the details behind this observation. A previous post on the negative binomial distribution is found here.

A random experiment resulting in two distinct outcomes (success or failure) is called a Bernoulli trial (e.g. head or tail in a coin toss, whether or not the birthday of a customer is the first of January, whether an insurance claim is above or below a given threshold etc). Suppose a series of independent Bernoulli trials are performed until reaching the rth success where the probability of success in each trial is p. Let X_r be the number of failures before the occurrence of the rth success. The following is the probablity mass function of X_r.

\displaystyle (1) \ \ \ \ P(X_r=k)=\binom{k+r-1}{k} p^r (1-p)^k \ \ \ \ \ \ k=0,1,2,3,\cdots

Be definition, the survival function and cdf of X_r are:

\displaystyle (2) \ \ \ \ P(X_r > k)=\sum \limits_{j=k+1}^\infty \binom{j+r-1}{j} p^r (1-p)^j \ \ \ \ \ \ k=0,1,2,3,\cdots

\displaystyle (3) \ \ \ \ P(X_r \le k)=\sum \limits_{j=0}^k \binom{j+r-1}{j} p^r (1-p)^j \ \ \ \ \ \ k=0,1,2,3,\cdots

For each positive integer k, let Y_{r+k} be the number of successes in performing a sequence of r+k independent Bernoulli trials where p is the probability of success. In other words, Y_{r+k} has a binomial distribution with parameters r+k and p.

If the random experiment requires more than k failures to reach the rth success, there are at most r-1 successes in the first k+r trails. Thus the survival function of X_r is the same as the cdf of a binomial distribution. Equivalently, the cdf of X_r is the same as the survival function of a binomial distribution. We have the following:

\displaystyle \begin{aligned}(4) \ \ \ \ P(X_r > k)&=P(Y_{k+r} \le r-1) \\&=\sum \limits_{j=0}^{r-1} \binom{k+r}{j} p^j (1-p)^{k+r-j} \ \ \ \ \ \ k=0,1,2,3,\cdots \end{aligned}

\displaystyle \begin{aligned}(5) \ \ \ \ P(X_r \le k)&=P(Y_{k+r} > r-1) \ \ \ \ \ \ k=0,1,2,3,\cdots \end{aligned}

Remark
The relation (4) is analogous to the relationship between the Gamma distribution and the Poisson distribution. Recall that a Gamma distribution with shape parameter \alpha and scale parameter n, where n is a positive integer, can be interpreted as the waiting time until the nth change in a Poisson process. Thus, if the nth change takes place after time t, there can be at most n-1 arrivals in the time interval [0,t]. Thus the survival function of this Gamma distribution is the same as the cdf of a Poisson distribution. The relation (4) is analogous to the following relation.

\displaystyle (5) \ \ \ \ \int_t^\infty \frac{\alpha^n}{(n-1)!} \ x^{n-1} \ e^{-\alpha x} \ dx=\sum \limits_{j=0}^{n-1} \frac{e^{-\alpha t} \ (\alpha t)^j}{j!}

A previous post on the negative binomial distribution is found here.

Splitting a Poisson Distribution

We consider a remarkable property of the Poisson distribution that has a connection to the multinomial distribution. We start with the following examples.

Example 1
Suppose that the arrivals of customers in a gift shop at an airport follow a Poisson distribution with a mean of \alpha=5 per 10 minutes. Furthermore, suppose that each arrival can be classified into one of three distinct types – type 1 (no purchase), type 2 (purchase under $20), and type 3 (purchase over $20). Records show that about 25% of the customers are of type 1. The percentages of type 2 and type 3 are 60% and 15%, respectively. What is the probability distribution of the number of customers per hour of each type?

Example 2
Roll a fair die N times where N is random and follows a Poisson distribution with parameter \alpha. For each i=1,2,3,4,5,6, let N_i be the number of times the upside of the die is i. What is the probability distribution of each N_i? What is the joint distribution of N_1,N_2,N_3,N_4,N_5,N_6?

In Example 1, the stream of customers arrive according to a Poisson distribution. It can be shown that the stream of each type of customers also has a Poisson distribution. One way to view this example is that we can split the Poisson distribution into three Poisson distributions.

Example 2 also describes a splitting process, i.e. splitting a Poisson variable into 6 different Poisson variables. We can also view Example 2 as a multinomial distribution where the number of trials is not fixed but is random and follows a Poisson distribution. If the number of rolls of the die is fixed in Example 2 (say 10), then each N_i would be a binomial distribution. Yet, with the number of trials being Poisson, each N_i has a Poisson distribution with mean \displaystyle \frac{\alpha}{6}. In this post, we describe this Poisson splitting process in terms of a “random” multinomial distribution (the view point of Example 2).

________________________________________________________________________

Suppose we have a multinomial experiment with parameters N, r, p_1, \cdots, p_r, where

  • N is the number of multinomial trials,
  • r is the number of distinct possible outcomes in each trial (type 1 through type r),
  • the p_i are the probabilities of the r possible outcomes in each trial.

Suppose that N follows a Poisson distribution with parameter \alpha. For each i=1, \cdots, r, let N_i be the number of occurrences of the i^{th} type of outcomes in the N trials. Then N_1,N_2,\cdots,N_r are mutually independent Poisson random variables with parameters \alpha p_1,\alpha p_2,\cdots,\alpha p_r, respectively.

The variables N_1,N_2,\cdots,N_r have a multinomial distribution and their joint probability function is:

\displaystyle (1) \ \ \ \ P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r)=\frac{N!}{n_1! n_2! \cdots n_r!} \ p_1^{n_1} p_2^{n_2} \cdots p_r^{n_r}

where n_i are nonnegative integers such that N=n_1+n_2+\cdots+n_r.

Since the total number of multinomial trials N is not fixed and is random, (1) is not the end of the story. The probability in (1) is only a conditional probability. The following is the joint probability function of N_1,N_2,\cdots,N_r:

\displaystyle (2) \ \ \ \ P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r)

          \displaystyle \begin{aligned}&=P(N_1=n_1,N_2=n_2,\cdots,N_r=n_r \lvert N=\sum \limits_{k=0}^r n_k) \\&\ \ \ \ \ \times P(N=\sum \limits_{k=0}^r n_k) \\&\text{ } \\&=\frac{(\sum \limits_{k=0}^r n_k)!}{n_1! \ n_2! \ \cdots \ n_r!} \ p_1^{n_1} \ p_2^{n_2} \ \cdots \ p_r^{n_r} \ \times \frac{e^{-\alpha} \alpha^{\sum \limits_{k=0}^r n_k}}{(\sum \limits_{k=0}^r n_k!)} \\&\text{ } \\&=\frac{e^{-\alpha p_1} \ (\alpha p_1)^{n_1}}{n_1!} \ \frac{e^{-\alpha p_2} \ (\alpha p_2)^{n_2}}{n_2!} \ \cdots \ \frac{e^{-\alpha p_r} \ (\alpha p_r)^{n_r}}{n_r!} \end{aligned}

To obtain the marginal probability function of N_j, j=1,2,\cdots,r, we sum out the other variables N_k=n_k (k \ne j) in (2) and obtain the following:

\displaystyle (3) \ \ \ \ P(N_j=n_j)=\frac{e^{-\alpha p_j} \ (\alpha p_j)^{n_j}}{n_j!}

Thus we can conclude that N_j, j=1,2,\cdots,r, has a Poisson distribution with parameter \alpha p_j. Furrthermore, the joint probability function of N_1,N_2,\cdots,N_r is the product of the marginal probability functions. Thus we can conclude that N_1,N_2,\cdots,N_r are mutually independent.

________________________________________________________________________
Example 1
Let N_1,N_2,N_3 be the number of customers per hour of type 1, type 2, and type 3, respectively. Here, we attempt to split a Poisson distribution with mean 30 per hour (based on 5 per 10 minutes). Thus N_1,N_2,N_3 are mutually independent Poisson variables with means 30 \times 0.25=7.5, 30 \times 0.60=18, 30 \times 0.15=4.5, respectively.

Example 2
As indicated earlier, each N_i, i=1,2,3,4,5,6, has a Poisson distribution with mean \frac{\alpha}{6}. According to (2), the joint probability function of N_1,N_2,N_3,N_4,N_5,N_6 is simply the product of the six marginal Poisson probability functions.

The Poisson Distribution

Let \alpha be a positive constant. Consider the following probability distribution:

\displaystyle (1) \ \ \ \ \ P(X=j)=\frac{e^{-\alpha} \alpha^j}{j!} \ \ \ \ \ j=0,1,2,\cdots

The above distribution is said to be a Poisson distribution with parameter \alpha. The Poisson distribution is usually used to model the random number of events occurring in a fixed time interval. As will be shown below, E(X)=\alpha. Thus the parameter \alpha is the rate of occurrence of the random events; it indicates on average how many events occur per unit of time. Examples of random events that may be modeled by the Poisson distribution include the number of alpha particles emitted by a radioactive substance counted in a prescribed area during a fixed period of time, the number of auto accidents in a fixed period of time or the number of losses arising from a group of insureds during a policy period.

Each of the above examples can be thought of as a process that generates a number of arrivals or changes in a fixed period of time. If such a counting process leads to a Poisson distribution, then the process is said to be a Poisson process.

We now discuss some basic properties of the Poisson distribution. Using the Taylor series expansion of e^{\alpha}, the following shows that (1) is indeed a probability distribution.

\displaystyle . \ \ \ \ \ \ \ \sum \limits_{j=0}^\infty \frac{e^{-\alpha} \alpha^j}{j!}=e^{-\alpha} \sum \limits_{j=0}^\infty \frac{\alpha^j}{j!}=e^{-\alpha} e^{\alpha}=1

The generating function of the Poisson distribution is g(z)=e^{\alpha (z-1)} (see The generating function). The mean and variance can be calculated using the generating function.

\displaystyle \begin{aligned}(2) \ \ \ \ \ &E(X)=g'(1)=\alpha \\&\text{ } \\&E[X(X-1)]=g^{(2)}(1)=\alpha^2 \\&\text{ } \\&Var(X)=E[X(X-1)]+E(X)-E(X)^2=\alpha^2+\alpha-\alpha^2=\alpha \end{aligned}

The Poisson distribution can also be interpreted as an approximation to the binomial distribution. It is well known that the Poisson distribution is the limiting case of binomial distributions (see [1] or this post).

\displaystyle (3) \ \ \ \ \ \lim \limits_{n \rightarrow \infty} \binom{n}{j} \biggl(\frac{\alpha}{n}\biggr)^j \biggl(1-\frac{\alpha}{n}\biggr)^{n-j}=\frac{e^{-\alpha} \alpha^j}{j!}

One application of (3) is that we can use Poisson probabilities to approximate Binomial probabilities. The approximation is reasonably good when the number of trials n in a binomial distribution is large and the probability of success p is small. The binomial mean is n p and the variance is n p (1-p). When p is small, 1-p is close to 1 and the binomial variance is approximately np \approx n p (1-p). Whenever the mean of a discrete distribution is approximately equaled to the mean, the Poisson approximation is quite good. As a rule of thumb, we can use Poisson to approximate binomial if n \le 100 and p \le 0.01.

As an example, we use the Poisson distribution to estimate the probability that at most 1 person out of 1000 will have a birthday on the New Year Day. Let n=1000 and p=365^{-1}. So we use the Poisson distribution with \alpha=1000 365^{-1}. The following is an estimate using the Poisson distribution.

\displaystyle . \ \ \ \ \ \ \ P(X \le 1)=e^{-\alpha}+\alpha e^{-\alpha}=(1+\alpha) e^{-\alpha}=0.2415

Another useful property is that the independent sum of Poisson distributions also has a Poisson distribution. Specifically, if each X_i has a Poisson distribution with parameter \alpha_i, then the independent sum X=X_1+\cdots+X_n has a Poisson distribution with parameter \alpha=\alpha_1+\cdots+\alpha_n. One way to see this is that the product of Poisson generating functions has the same general form as g(z)=e^{\alpha (z-1)} (see The generating function). One interpretation of this property is that when merging several arrival processes, each of which follow a Poisson distribution, the result is still a Poisson distribution.

For example, suppose that in an airline ticket counter, the arrival of first class customers follows a Poisson process with a mean arrival rate of 8 per 15 minutes and the arrival of customers flying coach follows a Poisson distribution with a mean rate of 12 per 15 minutes. Then the arrival of customers of either types has a Poisson distribution with a mean rate of 20 per 15 minutes or 80 per hour.

A Poisson distribution with a large mean can be thought of as an independent sum of Poisson distributions. For example, a Poisson distribution with a mean of 50 is the independent sum of 50 Poisson distributions each with mean 1. Because of the central limit theorem, when the mean is large, we can approximate the Poisson using the normal distribution.

In addition to merging several Poisson distributions into one combined Poisson distribution, we can also split a Poisson into several Poisson distributions. For example, suppose that a stream of customers arrives according to a Poisson distribution with parameter \alpha and each customer can be classified into one of two types (e.g. no purchase vs. purchase) with probabilities p_1 and p_2, respectively. Then the number of “no purchase” customers and the number of “purchase” customers are independent Poisson random variables with parameters \alpha p_1 and \alpha p_2, respectively. For more details on the splitting of Poisson, see Splitting a Poisson Distribution.

Reference

  1. Feller W. An Introduction to Probability Theory and Its Applications, Third Edition, John Wiley & Sons, New York, 1968

The hazard rate function

In this post, we introduce the hazard rate function using the notions of non-homogeneous Poisson process.

In a Poisson process, changes occur at a constant rate \lambda per unit time. Suppose that we interpret the changes in a Poisson process from a mortality point of view, i.e. a change in the Poisson process mean a termination of a system, be it biological or manufactured, and this Poisson process counts the number of terminations as they occur. Then the rate of change \lambda is interpreted as a hazard rate (or failure rate or force of mortality). With a constant force of mortality, the time until the next change is exponentially distributed. In this post, we discuss the hazard rate function in a more general setting. The process that counts of the number of terminations will no longer have a constant hazard rate, and instead will have a hazard rate function \lambda(t), a function of time t. Such a counting process is called a non-homogeneous Poisson process. We discuss the survival probability models (the time to the first termination) associated with a non-homogeneous Poisson process. We then discuss several important examples of survival probability models, including the Weibull distribution, the Gompertz distribution and the model based on the Makeham’s law. See [1] for more information about the hazard rate function.

\text{ }

The Poisson Process
We start with the three postulates of a Poisson process. Consider an experiment in which the occurrences of a certain type of events are counted during a given time interval. We call the occurrence of the type of events in question a change. We assume the following three conditions:

  1. The numbers of changes occurring in nonoverlapping intervals are independent.
  2. The probability of two or more changes taking place in a sufficiently small interval is essentially zero.
  3. The probability of exactly one change in the short interval (t,t+\delta) is approximately \lambda \delta where \delta is sufficiently small and \lambda is a positive constant.

\text{ }

When we interpret the Poisson process in a mortality point of view, the constant \lambda is a hazard rate (or force of mortality), which can be interpreted as the rate of failure at the next instant given that the life has survived to time t. With a constant force of mortality, the survival model (the time until the next termination) has an exponential distribution with mean \frac{1}{\lambda}. We wish to relax the constant force of mortality assumption by making \lambda a function of t instead. The remainder of this post is based on the non-homogeneous Poisson process defined below.

\text{ }

The Non-Homogeneous Poisson Process
We modifiy condition 3 above by making \lambda(t) a function of t. We have the following modified counting process.

  1. The numbers of changes occurring in nonoverlapping intervals are independent.
  2. The probability of two or more changes taking place in a sufficiently small interval is essentially zero.
  3. The probability of exactly one change in the short interval (t,t+\delta) is approximately \lambda(t) \delta where \delta is sufficiently small and \lambda(t) is a nonnegative function of t.

\text{ }

We focus on the survival model aspect of such counting processes. Such process can be interpreted as models for the number of changes occurred in a time interval where a change means “termination” or ‘failure” of a system under consideration. The rate of change function \lambda(t) indicated in condition 3 is called the hazard rate function. It is also called the failure rate function in reliability engineering and the force of mortality in life contingency theory.

Based on condition 3 in the non-homogeneous Poisson process, the hazard rate function \lambda(t) can be interpreted as the rate of failure at the next instant given that the life has survived to time t.

Two random variables naturally arise from a non-homogeneous Poisson process are described here. One is the discrete variable N_t, defined as the number of changes in the time interval (0,t). The other is the continuous random variable T, defined as the time until the occurrence of the first change. The probability distribution of T is called a survival model. The following is the link between N_t and T.

\text{ }

\displaystyle \begin{aligned}(1) \ \ \ \ \ \ \ \ \ &P[T > t]=P[N_t=0] \end{aligned}

\text{ }

Note that P[T > t] is the probability that the next change occurs after time t. This means that there is no change within the interval (0,t). We have the following theorems.

\text{ }

Theorem 1.
Let \displaystyle \Lambda(t)=\int_{0}^{t} \lambda(y) dy. Then e^{-\Lambda(t)} is the probability that there is no change in the interval (0,t). That is, \displaystyle P[N_t=0]=e^{-\Lambda(t)}.

Proof. We are interested in finding the probability of zero changes in the interval (0,y+\delta). By condition 1, the numbers of changes in the nonoverlapping intervals (0,y) and (y,y+\delta) are independent. Thus we have:

\text{ }

\displaystyle (2) \ \ \ \ \ \ \ \ P[N_{y+\delta}=0] \approx P[N_y=0] \times [1-\lambda(y) \delta]

\text{ }

Note that by condition 3, the probability of exactly one change in the small interval (y,y+\delta) is \lambda(y) \delta. Thus [1-\lambda(y) \delta] is the probability of no change in the interval (y,y+\delta). Continuing with equation (2), we have the following derivation:

\text{ }

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\frac{P[N_{y+\delta}=0] - P[N_y=0]}{\delta} \approx -\lambda(y) P[N_y=0] \\&\text{ } \\&\frac{d}{dy} P[N_y=0]=-\lambda(y) P[N_y=0] \\&\text{ } \\&\frac{\frac{d}{dy} P[N_y=0]}{P[N_y=0]}=-\lambda(y) \\&\text{ } \\&\int_0^{t} \frac{\frac{d}{dy} P[N_y=0]}{P[N_y=0]} dy=-\int_0^{t} \lambda(y)dy \end{aligned}

\text{ }

Evaluating the integral on the left hand side with the boundary condition of P[N_0=0]=1 produces the following results:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &ln P[N_t=0]=-\int_0^{t} \lambda(y)dy \\&\text{ } \\&P[N_t=0]=e^{\displaystyle -\int_0^{t} \lambda(y)dy} \end{aligned}

\text{ }

Theorem 2
As discussed above, let T be the length of the interval that is required to observe the first change. Then the following are the distribution function, survival function and pdf of T:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &F_T(t)=\displaystyle 1-e^{\displaystyle -\int_0^t \lambda(y) dy} \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\int_0^t \lambda(y) dy} \\&\text{ } \\&f_T(t)=\displaystyle \lambda(t) \ e^{\displaystyle -\int_0^t \lambda(y) dy} \end{aligned}

Proof. In Theorem 1, we derive the probability P[N_y=0] for the discrete variable N_y derived from the non-homogeneous Poisson process. We now consider the continuous random variable T, the time until the first change, which is related to N_t by (1). Thus S_T(t)=P[T > t]=P[N_t=0]=e^{-\int_0^t \lambda(y) dy}. The distribution function and density function can be derived accordingly.

\text{ }

Theorem 3
The hazard rate function \lambda(t) is equivalent to each of the following:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\frac{f_T(t)}{1-F_T(t)} \\&\text{ } \\&\lambda(t)=\frac{-S_T^{'}(t)}{S_T(t)} \end{aligned}

\text{ }

Remark
Theorem 1 and Theorem 2 show that in a non-homogeneous Poisson process as described above, the hazard rate function \lambda(t) completely specifies the probability distribution of the survival model T (the time until the first change) . Once the rate of change function \lambda(t) is known in the non-homogeneous Poisson process, we can use it to generate the survival function S_T(t). All of the examples of survival models given below are derived by assuming the functional form of the hazard rate function. The result in Theorem 2 holds even outside the context of a non-homogeneous Poisson process, that is, given the hazard rate function \lambda(t), we can derive the three distributional items S_T(t), F_T(t), f_T(t).

The ratio in Theorem 3 indicates that the probability distribution determines the hazard rate function. In fact, the ratio in Theorem 3 is the usual definition of the hazard rate function. That is, the hazard rate function can be defined as the ratio of the density and the survival function (one minus the cdf). With this definition, we can also recover the survival function. Whenever \displaystyle \lambda(x)=\frac{f_X(x)}{1-F_X(x)}, we can derive:

\text{ }

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &S_X(x)=\displaystyle e^{-\int_0^t \lambda(y) dy} \end{aligned}

\text{ }

As indicated above, the hazard rate function can be interpreted as the failure rate at time t given that the life in question has survived to time t. It is the rate of failure at the next instant given that the life or system being studied has survived up to time t.

It is interesting to note that the function \Lambda(t)=\int_0^t \lambda(y) dy defined in Theorem 1 is called the cumulative hazard rate function. Thus the cumulative hazard rate function is an alternative way of representing the hazard rate function (see the discussion on Weibull distribution below).

——————————————————————————————————————
Examples of Survival Models

–Exponential Distribution–
In many applications, especially those for biological organisms and mechanical systems that wear out over time, the hazard rate \lambda(t) is an increasing function of t. In other words, the older the life in question (the larger the t), the higher chance of failure at the next instant. For humans, the probability of a 85 years old dying in the next year is clearly higher than for a 20 years old. In a Poisson process, the rate of change \lambda(t)=\lambda indicated in condition 3 is a constant. As a result, the time T until the first change derived in Theorem 2 has an exponential distribution with parameter \lambda. In terms of mortality study or reliability study of machines that wear out over time, this is not a realistic model. However, if the mortality or failure is caused by random external events, this could be an appropriate model.

–Weibull Distribution–
This distribution is an excellent model choice for describing the life of manufactured objects. It is defined by the following cumulative hazard rate function:

\text{ }

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\Lambda(t)=\biggl(\frac{t}{\beta}\biggr)^{\alpha} \end{aligned} where \alpha > 0 and \beta>0

\text{ }

As a result, the hazard rate function, the density function and the survival function for the lifetime distribution are:

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\frac{\alpha}{\beta} \biggl(\frac{t}{\beta}\biggr)^{\alpha-1} \\&\text{ } \\&f_T(t)=\frac{\alpha}{\beta} \biggl(\frac{t}{\beta}\biggr)^{\alpha-1} \displaystyle e^{\displaystyle -\biggl[\frac{t}{\beta}\biggr]^{\alpha}} \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\biggl[\frac{t}{\beta}\biggr]^{\alpha}} \end{aligned}

\text{ }

The parameter \alpha is the shape parameter and \beta is the scale parameter. When \alpha=1, the hazard rate becomes a constant and the Weibull distribution becomes an exponential distribution.

When the parameter \alpha<1, the failure rate decreases over time. One interpretation is that most of the defective items fail early on in the life cycle. Once they they are removed from the population, failure rate decreases over time.

When the parameter 1<\alpha, the failure rate increases with time. This is a good candidate for a model to describe the lifetime of machines or systems that wear out over time.

–The Gompertz Distribution–
The Gompertz law states that the force of mortality or failure rate increases exponentially over time. It describe human mortality quite accurately. The following is the hazard rate function:

\text{ }

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\alpha e^{\beta t} \end{aligned} where \alpha>0 and \beta>0.

\text{ }

The following are the cumulative hazard rate function as well as the survival function, distribution function and the pdf of the lifetime distribution T.

\text{ }

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\Lambda(t)=\int_0^t \alpha e^{\beta y} dy=\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta} \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}\biggr)} \\&\text{ } \\&F_T(t)=\displaystyle 1-e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}\biggr)} \\&\text{ } \\&f_T(t)=\displaystyle \alpha \ e^{\beta t} \ e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}\biggr)} \end{aligned}

\text{ }

–Makeham’s Law–
The Makeham’s Law states that the force of mortality is the Gompertz failure rate plus an age-indpendent component that accounts for external causes of mortality. The following is the hazard rate function:

\text{ }

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\lambda(t)=\alpha e^{\beta t}+\mu \end{aligned} where \alpha>0, \beta>0 and \mu>0.

\text{ }

The following are the cumulative hazard rate function as well as the survival function, distribution function and the pdf of the lifetime distribution T.

\text{ }

\displaystyle \begin{aligned}. \ \ \ \ \ \ \ \ \ &\Lambda(t)=\int_0^t (\alpha e^{\beta y}+\mu) dy=\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t \\&\text{ } \\&S_T(t)=\displaystyle e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t\biggr)} \\&\text{ } \\&F_T(t)=\displaystyle 1-e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t\biggr)} \\&\text{ } \\&f_T(t)=\biggl( \alpha e^{\beta t}+\mu t \biggr) \ e^{\displaystyle -\biggl(\frac{\alpha}{\beta} e^{\beta t}-\frac{\alpha}{\beta}+\mu t\biggr)} \end{aligned}

\text{ }

Reference

  1. Klugman S.A., Panjer H. H., Wilmot G. E. Loss Models, From Data to Decisions, Second Edition., Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2004