Defining the Poisson distribution

The Poisson distribution is a family of discrete distributions with positive probabilities on the non-negative numbers 0,1,2,\cdots. Each distribution in this family is indexed by a positive number \lambda>0. One way to define this distribution is to give its probability function given the parameter \lambda and then derive various distributional quantities such as mean and variance. Along with other mathematical facts, it can be shown that both the mean and the variance are \lambda. In this post, we take a different tack. We look at two view points that give rise to the Poisson distribution. Taking this approach will make it easier to appreciate some of the possible applications of the Poisson distribution. The first view point is that the Poisson distribution is the limiting case of the binomial distribution. The second view point is through the Poisson process, a stochastic process that, under some conditions, counts the number of events and the time points at which these events occur in a given time (or physical) interval.

________________________________________________________________________

Poisson as a limiting case of binomial

A binomial distribution where the number of trials n is large and the probability of success p is small such that np is moderate in size can be approximated using the Poisson distribution with mean \lambda=np. This fact follows from Theorem 1, which indicates that the Poisson distribution is the limiting case of the binomial distribution.

Theorem 1
Let \lambda be a fixed positive constant. Then for each integer x=0,1,2,\cdots, the following is true:

    \displaystyle \lim_{n \rightarrow \infty} \binom{n}{x} \ p^x \  (1-p)^{n-x}=\lim_{n \rightarrow \infty} \frac{n!}{x! \ (n-x)!} \ p^x \  (1-p)^{n-x}=\frac{e^{-\lambda} \ \lambda^x}{x!}

where \displaystyle p=\frac{\lambda}{n}.

Proof of Theorem 1
We start with a binomial distribution with n trials and with \displaystyle p=\frac{\lambda}{n} being the probability of success, where n>\lambda. Let X_n be the count of the number of successes in these n Bernoulli trials. The following is the probability that X_n=k.

    \displaystyle \begin{aligned} P(X_n=k)&=\binom{n}{k} \biggl(\frac{\lambda}{n}\biggr)^k \biggr(1-\frac{\lambda}{n}\biggr)^{n-k} \\&=\frac{n!}{k! (n-k)!} \biggl(\frac{\lambda}{n}\biggr)^k \biggr(1-\frac{\lambda}{n}\biggr)^{n-k} \\&=\frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \biggl(\frac{\lambda^k}{k!}\biggr) \biggr(1-\frac{\lambda}{n}\biggr)^{n} \biggr(1-\frac{\lambda}{n}\biggr)^{-k} \\&=\biggl(\frac{\lambda^k}{k!}\biggr) \ \biggl[ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \ \biggr(1-\frac{\lambda}{n}\biggr)^{n} \ \biggr(1-\frac{\lambda}{n}\biggr)^{-k} \biggr] \end{aligned}

In the last step, the terms that contain n are inside the square brackets. Let’s see what they are when n approaches infinity.

    \displaystyle \lim \limits_{n \rightarrow \infty} \ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k}=1

    \displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\lambda}{n}\biggr)^{n}=e^{-\lambda}

    \displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1-\frac{\lambda}{n}\biggr)^{-k}=1

The reason that the first result is true is that the numerator is a polynomial where the leading term is n^k. Upon dividing by n^k and taking the limit, we get 1. The second result is true since the following limit is one of the definitions of the exponential function e^x.

    \displaystyle \lim \limits_{n \rightarrow \infty} \biggr(1+\frac{x}{n}\biggr)^{n}=e^{x}

The third result is true since the exponent -k is a constant. Thus the following is the limit of the probability P(X_n=k) as n \rightarrow \infty.

    \displaystyle \begin{aligned} \lim \limits_{n \rightarrow \infty} P(X_n=k)&= \biggl(\frac{\lambda^k}{k!}\biggr) \ \lim \limits_{n \rightarrow \infty} \biggl[ \frac{n(n-1)(n-2) \cdots (n-k+1)}{n^k} \ \biggr(1-\frac{\lambda}{n}\biggr)^{n} \ \biggr(1-\frac{\lambda}{n}\biggr)^{-k} \biggr] \\&=\biggl(\frac{\lambda^k}{k!}\biggr) \cdot 1 \cdot e^{-\lambda} \cdot 1 \\&=\frac{e^{-\lambda} \lambda^k}{k!} \end{aligned}

This above derivation completes the proof. \blacksquare

In a given binomial distribution, whenever the number of trials n is large and the probability p of success in each trial is small (i.e. each of the Bernoulli trial rarely results in a success), Theorem 1 tells us that we can use the Poisson distribution with parameter \lambda=np to estimate the binomial distribution.

Example 1
The probability of being dealt a full house in a hand of poker is approximately 0.001441. Out of 5000 hands of poker that are dealt at a certain casino, what is the probability that there will be at most 4 full houses?

Let X be the number of full houses in these 5000 poker hands. The exact distribution for X is the binomial distribution with n= 5000 and p= 0.001441. Thus example deals with a large number of trials where each trial is a rare event. So the Poisson estimation is applicable. Let \lambda= 5000(0.001441) = 7.205. Then P(X \le 4) can be approximated by the Poisson random variable Y with parameter \lambda. The following is the probability function of Y:

    \displaystyle P(Y=y)=e^{-7.205} \ \frac{7.205^y}{y!}

The following is the approximation of P(X \le 4):

    \displaystyle \begin{aligned} P(X \le 4)&\approx P(Y \le 4) \\&=P(Y=0)+P(Y=1)+P(Y=2)+P(Y=3)+P(Y=4) \\&= e^{-7.205} \biggl[ 1+7.205+\frac{7.205^2}{2!}+\frac{7.205^3}{3!}+\frac{7.205^4}{4!}\biggr] \\&=0.155098087  \end{aligned}

The following is a side by side comparison between the binomial distribution and its Poisson approximation. For all practical purposes, they are indistingusihable from one another.

    \displaystyle \begin{bmatrix} \text{Count of}&\text{ }&\text{ }&\text{Binomial } &\text{ }&\text{ }&\text{Poisson } \\\text{Full Houses}&\text{ }&\text{ }&P(X \le x) &\text{ }&\text{ }&P(Y \le x) \\\text{ }&\text{ }&\text{ }&n=5000 &\text{ }&\text{ }&\lambda=7.205 \\\text{ }&\text{ }&\text{ }&p=0.001441 &\text{ }&\text{ }&\text{ } \\\text{ }&\text{ }&\text{ } &\text{ }&\text{ } \\ 0&\text{ }&\text{ }&0.000739012&\text{ }&\text{ }&0.000742862 \\ 1&\text{ }&\text{ }&0.006071278&\text{ }&\text{ }&0.006095184 \\ 2&\text{ }&\text{ }&0.025304641&\text{ }&\text{ }&0.025376925 \\ 3&\text{ }&\text{ }&0.071544923&\text{ }&\text{ }&0.071685238 \\ 4&\text{ }&\text{ }&0.154905379&\text{ }&\text{ }&0.155098087  \\ 5&\text{ }&\text{ }&0.275104906&\text{ }&\text{ }&0.275296003 \\ 6&\text{ }&\text{ }&0.419508250&\text{ }&\text{ }&0.419633667 \\ 7&\text{ }&\text{ }&0.568176421 &\text{ }&\text{ }&0.568198363   \\ 8&\text{ }&\text{ }&0.702076190 &\text{ }&\text{ }&0.701999442 \\ 9&\text{ }&\text{ }&0.809253326&\text{ }&\text{ }&0.809114639 \\ 10&\text{ }&\text{ }&0.886446690&\text{ }&\text{ }&0.886291139 \\ 11&\text{ }&\text{ }&0.936980038&\text{ }&\text{ }&0.936841746 \\ 12&\text{ }&\text{ }&0.967298041&\text{ }&\text{ }&0.967193173 \\ 13&\text{ }&\text{ }&0.984085073&\text{ }&\text{ }&0.984014868 \\ 14&\text{ }&\text{ }&0.992714372&\text{ }&\text{ }&0.992672033 \\ 15&\text{ }&\text{ }&0.996853671&\text{ }&\text{ }&0.996830358 \end{bmatrix}

The above table is calculated using the functions BINOM.DIST and POISSON.DIST in Excel. The following shows how it is done. The parameter TRUE indicates that the result is a cumulative distribution. When it is set to FALSE, the formula gives the probability function.

    P(X \le x)=\text{BINOM.DIST(x, 5000, 0.001441, TRUE)}

    P(Y \le x)=\text{POISSON.DIST(x, 7.205, TRUE)}

________________________________________________________________________

The Poisson distribution

The limit in Theorem 1 is a probability function and the resulting distribution is called the Poisson distribution. We now gives the formal definition. A random variable X that takes on one of the numbers 0,1,2,\cdots is said to be a Poisson random variable with parameter \lambda>0 if

    \displaystyle P(X=x)=\frac{e^{-\lambda} \ \lambda^x}{x!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x=0,1,2,\cdots

It can be shown that the above function is indeed a probability function, i.e., the probabilities sum to 1. Any random variable that has a probability function of the above form is said to follow (or to have) a Poisson distribution. Furthermore, it can be shown that E(X)=var(X)=\lambda, i.e., the Poisson parameter is both the mean and variance. Thus the Poisson distribution may be a good fit if the observed data indicate that the sample mean and the sample variance are nearly identical.

The following is the moment generating function of the Poisson distribution with parameter \lambda.

    \displaystyle M(t)=E(e^{tX})=e^{\lambda \ (e^t-1)}

One consequence of the Poisson moment generating function is that any independent sum of Poisson distributions is again a Poisson distribution.

________________________________________________________________________

The Poisson process

Another way, the more important way, to look at the Poisson distribution is the view point of the Poisson process. Consider an experiment in which events that are of interest occur at random in a time interval. The goal here is to record the time of the occurrence of each random event and for the purpose at hand, count the number of random events occurring in a fixed time interval. Starting at time 0, note the time of the occurrence of the first event. Then the time at which the second random event occurs and so on. Out of these measurements, we can derive the length of time between the occurrences of any two consecutive random events. Such measurements belong to a continuous random variable. In this post, we focus on the discrete random variable of the count of the random events in a fixed time interval.

A good example of a Poisson process is the well known experiment in radioactivity conducted by Rutherford and Geiger in 1910. In this experiment, \alpha-particles were emitted from a polonium source and the number of \alpha-particles were counted during an interval of 7.5 seconds (2608 many such time intervals were observed). A Poisson process is a random process in which several criteria are satisfied. We will show that in a Poisson process, the number of these random occurrences in the fixed time interval will follow a Poisson distribution. First, we discuss the criteria to which a Poisson process must conform.

One of the criteria is that in a very short time interval, the chance of having more than one random event is essentially zero. So either one random event will occur or none will occur in a very short time interval. Considering the occurrence of a random event as a success, there is either a success or a failure in a very short time interval. So a very short time interval in a Poisson process can be regarded as a Bernoulli trial.

The second criterion is that the experiment remains constant over time. Specifically this means that the probability of a random event occurring in a given subinterval is proportional to the length of that subinterval and not on where the subinterval is in the original interval. For example, in the 1910 radioactivity study, \alpha-particles were emitted at the rate of \lambda= 3.87 per 7.5 seconds. So the probability of one \alpha-particle emitted from the radioactive source in a one-second interval is 3.87/7.5 = 0.516. Then the probability of observing one \alpha-particle in a half-second interval is 0.516/2 = 0.258. For a quarter-second interval, the probability is 0.258/2 = 0.129. So if we observe half as long, it will be half as likely to observe the occurrence of a random event. On the other hand, it does not matter when the quarter-second subinterval is, whether at the beginning or toward the end of the original interval of 7.5 seconds.

The third criterion is that non-overlapping subintervals are mutually independent in the sense that what happens in one subinterval (i.e. the occurrence or non-occurrence of a random event) will have no influence on the occurrence of a random event in another subinterval. To summarize, the following are the three criteria of a Poisson process:

    Suppose that on average \lambda random events occur in a time interval of length 1.

    1. The probability of having more than one random event occurring in a very short time interval is essentially zero.
    2. For a very short subinterval of length \frac{1}{n} where n is a sufficiently large integer, the probability of a random event occurring in this subinterval is \frac{\lambda}{n}.
    3. The numbers of random events occurring in non-overlapping time intervals are independent.

Consider a Poisson process in which the average rate is \lambda random events per unit time interval. Let Y be the number of random events occurring in the unit time interval. In the 1910 radioactivity study, the unit time interval is 7.5 seconds and Y is the count of the number of \alpha-particles emitted in 7.5 seconds. It follows that Y has a Poisson distribution with parameter \lambda. To see this, subdivide the unit interval into n non-overlapping subintervals of equal length where n is a sufficiently large integer. Let X_{n,j} be the number of random events in the the jth subinterval (1 \le j \le n). Based on the three assumptions, X_{n,1},X_{n,2},\cdots,X_{n,n} are independent Bernoulli random variables, where the probability of success for each X_{n,j} is \frac{\lambda}{n}. Then X_n=X_{n,1}+X_{n,2}+\cdots+X_{n,n} has a binomial distribution with parameters n and p=\frac{\lambda}{n}. Theorem 1 tells us that the limiting case of the binomial distributions for X_n is the Poisson distribution with parameter \lambda. This Poisson distribution should agree with the distribution for Y. The Poisson is also discussed in quite a lot of details in the previous post called Poisson as a Limiting Case of Binomial Distribution.

We now examine the 1910 radioactivity study a little more closely.

Example 2
The basic idea of the 1910 radioactivity study conducted by Rutherford and Geiger is that a polonium source was placed a short distance from an observation point. The number of \alpha-particles emitted from the source were counted in 7.5-second intervals for 2608 times. The following is the tabulated results.

    \displaystyle \begin{bmatrix}   \text{Number of alpha particles}&\text{ }&\text{Observed}   \\ \text{recorded per 7.5 seconds }&\text{ }&\text{counts}    \\ \text{ }&\text{ }&\text{ }   \\ 0&\text{ }&57    \\ 1&\text{ }&203    \\ 2&\text{ }&383    \\ 3&\text{ }&525    \\ 4&\text{ }&532    \\ 5&\text{ }&408   \\ 6&\text{ }&273   \\ 7&\text{ }&139   \\ 8&\text{ }&45   \\ 9&\text{ }&27   \\ 10&\text{ }&10  \\ 11+&\text{ }&6  \\ \text{ }&\text{ }&\text{ }  \\ \text{Total }&\text{ }&2608    \end{bmatrix}

What is the average number of particles observed per 7.5 seconds? The total number of \alpha-particles in these 2608 periods is

    0 \times 57+1 \times 203+2 \times 383+ 3 \times 525 + \cdots=10097.

The mean count per period is \lambda=\frac{10097}{2608}=3.87. Consider the Poisson distribution with parameter 3.87. The following is its probability function.

    \displaystyle P(X=x)=\frac{e^{-3.87} \ 3.87^x}{x!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x=0,1,2,\cdots

Out of 2608 periods, the expected number of periods with x particles in emission is 2608P(X=x). The following is a side by side comparison in the observed counts and the expected counts.

    \displaystyle \begin{bmatrix}   \text{Number of alpha particles}&\text{ }&\text{Observed}&\text{ }&\text{Expected}    \\ \text{recorded per 7.5 seconds }&\text{ }&\text{counts}&\text{ }&\text{counts}    \\ \text{ }&\text{ }&\text{ }&\text{ }&2608 \times P(X=x)  \\ \text{ }&\text{ }&\text{ }&\text{ }&\text{ }   \\ 0&\text{ }&57&\text{ }&54.40    \\ 1&\text{ }&203&\text{ }&210.52  \\ 2&\text{ }&383&\text{ }&407.36    \\ 3&\text{ }&525&\text{ }&525.50    \\ 4&\text{ }&532&\text{ }&508.42    \\ 5&\text{ }&408&\text{ }&393.52   \\ 6&\text{ }&273&\text{ }&253.82   \\ 7&\text{ }&139&\text{ }&140.32   \\ 8&\text{ }&45&\text{ }&67.88   \\ 9&\text{ }&27&\text{ }&29.19   \\ 10&\text{ }&10&\text{ }&11.30  \\ 11+&\text{ }&6&\text{ }&5.78  \\ \text{ }&\text{ }&\text{ }&\text{ }&\text{ }  \\ \text{Total }&\text{ }&2608&\text{ }&2608    \end{bmatrix}

The expected counts are quite close to the observed counts, showing that the Poisson distribution is a very good fit to the observed data from the 1910 study.

________________________________________________________________________

More comments about the Poisson process

We have described the Poisson process as the distribution of random events in a time interval. The same idea can be used to describe random events occurring along a spatial interval, i.e. intervals in terms of distance or volume or other spatial measurements (see Examples 5 and 6 below).

Another point to make is that sometimes it may be necessary to consider an interval other than the unit length. Instead of counting the random events occurring in an interval of length 1, we may want to count the random events in an interval of length t. As before, let \lambda be the rate of occurrences in a unit interval. Then the rate of occurrences of the random events is over the interval of length t is \lambda t. The same idea will derive that fact that the number of occurrences of the random events of interest in the interval of length t is a Poisson distribution with parameter \lambda t. The following is its probability function.

    \displaystyle P(X_t=x)=\frac{e^{-\lambda t} \ (\lambda t)^x}{x!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x=0,1,2,\cdots

where X_t is the count of the random events in an interval of length t.

For example, in the 1910 radioactive study, the unit length is 7.5 seconds. The number of \alpha-particles observed in a half unit interval (3.75 seconds) will follow a Poisson distribution with parameter 0.5 \lambda= 0.5(3.87) = 1.935 with the following probability function:

    \displaystyle P(X_{0.5}=x)=\frac{e^{-1.935} \ (1.935)^x}{x!}  \ \ \ \ \ \ \ \ \ \ \ \ \ x=0,1,2,\cdots

________________________________________________________________________

More examples

Example 3
A radioactive source is metered for 5 hours. During this period, 9638 \alpha-particles are counted. What is the probability that during the next minute, between 30 and 34 particles (both inclusive ) will be counted?

The average number of \alpha-particles counted per minute is \lambda=\frac{9638}{300}=32.12. Let X be the number of \alpha-particles counted per minute. Then X has a Poisson distribution with parameter \lambda=32.12. The following calculates P(30 \le X \le 34).

    \displaystyle \begin{aligned} P(30 \le X \le 34)&=e^{-32.12} \biggl[ \frac{32.12^{30}}{30!}+\frac{32.12^{31}}{31!}+\frac{32.12^{32}}{32!}+\frac{32.12^{33}}{33!}+\frac{32.12^{34}}{34!}   \biggr] \\&=0.341118569  \end{aligned}

Alternatively, the POISSON.DIST function in Excel can be used as follows:

    \displaystyle \begin{aligned} P(30 \le X \le 34)&=P(X \le 34)-P(X \le 29) \\&=\text{POISSON.DIST(34,32.12,TRUE)} \\& \ \ \ \ \ \ -\text{POISSON.DIST(29,32.12,TRUE)} \\&=0.671501917-0.330383348 \\&=0.341118569  \end{aligned}

Example 4
The side effect of dry mouth is known to be experienced, on the average, by 5 out of 10,000 individuals taking a certain medication. About 20,000 patients are expected to take this medication next year. What is the probability that between 12 and 16 (both inclusive) patients will experience the side effect of dry mouth? What is the exact probability model that can also be used to work this problem?

The exact model is a binomial distribution. The number of trials n= 20000 and the probability of success in each trial is p= 0.0005 (experiencing the side effect). Here, we use Poisson to estimate the binomial. The average number of patients experiencing side effect is \lambda=20000(0.0005)=10. Let X be the number of patients experiencing the side effect. The following calculates the Poisson probability for P(12 \le X \le 16) in two different ways.

    \displaystyle \begin{aligned} P(12 \le X \le 16)&=e^{-10} \biggl[ \frac{10^{12}}{12!}+\frac{10^{13}}{13!}+\frac{10^{14}}{14!}+\frac{10^{15}}{15!}+\frac{10^{16}}{16!}   \biggr] \\&=0.276182244  \end{aligned}
    \displaystyle \begin{aligned} P(12 \le X \le 16)&=P(X \le 11)-P(X \le 16) \\&=\text{POISSON.DIST(16,10,TRUE)} \\& \ \ \ \ \ \ -\text{POISSON.DIST(11,10,TRUE)} \\&=0.97295839-0.696776146 \\&=0.276182244  \end{aligned}

Example 5
In a 10-mile stretch of a highway, car troubles (e.g. tire punctures, dead batteries, and mechanical breakdown) occur at a rate of 1.5 per hour. A tow truck driver can respond to such car troubles and offer roadside assistance, which can include towing and minor repair. Assume that the number of such incidences per hour follows a Poisson distribution. At the beginning of the hour, three tow trucks (and their drivers) are available to respond to any car troubles in this stretch of highway. What is the probability that in the next hour all three tow trick drivers will be busy helping motorists with car troubles in this stretch of highway?

Let X be the number of car troubles that occur in this 10-mile stretch of highway in the one-hour period in question. If in this one hour there are 3 or more car troubles (X \ge 3), then all three tow truck drivers will be busy.

    \displaystyle \begin{aligned} P(X \ge 3)&=1-P(X \le 2) \\&=1-e^{-1.5} \biggl[ 1+1.5+\frac{1.5^{2}}{2!}   \biggr] \\&=1-0.808846831\\&=0.191153169  \end{aligned}

Example 6
Continuing Example 5. Considering that there is only 19% chance that all 3 tow truck drivers will be busy, there is a good chance that the resources are under utilized. What if one of the drivers is assigned to another stretch of highway?

With only two tow trucks available for this 10-mile stretch of highway, the following is the probability that all two tow truck drivers will be busy:

    \displaystyle \begin{aligned} P(X \ge 2)&=1-P(X \le 1) \\&=1-e^{-1.5} \biggl[ 1+1.5   \biggr] \\&=1-0.5578254\\&=0.4421746  \end{aligned}

Assigning one driver to another area seems to be a better way to make good use of the available resources. With only two tow truck drivers available, there is much reduced chance (56%) that one of the drivers will be idle, and there is a much increased chance (44%) that all available drivers will be busy.

________________________________________________________________________

Remarks

The Poisson distribution is one of the most important of all probability models and has shown to be an excellent model for a wide array of phenomena such as

  • the number of \alpha-particles emitted from radioactive source in a given amount of time,
  • the number of vehicles passing a particular location on a busy highway,
  • the number of traffic accidents in a stretch of highway in a given period of time,
  • the number of phone calls arriving at a particular point in a telephone network in a fixed time period,
  • the number of insurance losses/claims in a given period of time,
  • the number of customers arriving at a ticket window,
  • the number of earthquakes occurring in a fixed period of time,
  • the number of mutations on a strand of DNA.
  • the number of hurricanes in a year that originate in the Atlantic ocean.

What is the Poisson distribution so widely applicable in these and many other seemingly different and diverse phenomena? What is the commonality that ties all these different and diverse phenomena? The commonality is that all these phenomena are basically a series of independent Bernoulli trials. If a phenomenon is a Binomial model where n is large and p is small, then it has a strong connection to Poisson model mathematically through Theorem 1 above (i.e. it has a Poisson approximation). On the other hand, if the random phenomenon follows the criteria in a Poisson process, then the phenomenon is also approximately a Binomial model, which means that in the limiting case it is Poisson.

In both view points discussed in this post, the Poisson distribution can be regarded as a binomial distribution taken at a very granular level. This connection with the binomial distribution points to a vast arrays of problems that can be solved using the Poisson distribution.

________________________________________________________________________

Exercises

Practice problems for the Poisson concepts discussed above can be found in the companion blog (go there via the following link). Working on these exercises is strongly encouraged (you don’t know it until you can do it).

________________________________________________________________________
\copyright \ \text{2015 by Dan Ma}

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s