The Gamma Function

A student in a probability course may have evaluated an integral such as the following:

    \displaystyle \int_0^\infty \displaystyle t^{x-1} \ e^{-t} \ dt

Plug in a value for x and evaluate the integral. For example, when evaluated at x=1, the integral has value 1. Evaluated at x=2, the result is also 1. Evaluated at x=3, the result is 2. At x=4, the result is 6=3!. In fact, if a student remembers a fact about this function (called the gamma function), it is that it gives the factorial when evaluated at the positive integers – the value of the integral is (x-1)! when evaluated at the positive integer x.

    \displaystyle (1) \ \ \ \ \ \Gamma(x)=\int_0^\infty \displaystyle t^{x-1} \ e^{-t} \ dt

The gamma function is denoted by the capital Greek letter \Gamma. It crops up almost everywhere in mathematics. It has applications in many branches of mathematics including probability and statistics. The integral described above may seem inconsequential, no more than an exercise in an undergraduate course on probability and statistics. We give ample evidence that the gamma function is indeed very consequential, even just in the area of statistics. Our brief tour is through the gamma distribution, a probability distribution that naturally arises from the gamma function. Most of the material cited here is written in various affiliated blogs. Anyone who wants to check out the details can refer to those blogs. Links will be given at the appropriate places.

The Gamma Function

The starting point of the gamma function is that \Gamma(x) is defined for x>0 according to the integral described above. A natural question: how do we know that the integral converges, i.e. the integral always gives a valid number as result? How do we know that the integral does not give infinity as result? The integral does converge for all x>0. A proof can be found here. The following is a graph of the gamma function (using Excel).

Gamma Function Graph

As indicated above, the function gives the value of the factorial shifted down by one, i.e. \Gamma(x)=(x-1)!. Thus the graph of the gamma function goes up without bound as x \rightarrow \infty.

It is easy to evaluate \Gamma(1). To evaluate the function at the higher integers, the integral would required integration by parts. In fact, using integration by parts, the following recursive relation is established.

    (2a) \ \ \ \ \ \Gamma(x+1)=x \Gamma(x)

    \displaystyle (2b) \ \ \ \ \ \Gamma(x)=\frac{\Gamma(x+1)}{x}

The recursive relation works for all real numbers x>0, not just the integers. For example, knowing that \Gamma(\frac{1}{2})=\sqrt{\pi}, we have \Gamma(\frac{3}{2})=\frac{1}{2} \sqrt{\pi}. Furthermore, the relation (2b) gives a way to extend the gamma function to the negative numbers. For example, \Gamma(-\frac{1}{2}) would be evaluated by \frac{\Gamma(\frac{1}{2})}{-\frac{1}{2}}=-2 \Gamma(\frac{1}{2}). Based on this idea, for any real number in the interval (-1,0), \Gamma(x) would be defined using the relation (2b) and would be a negative value.

The idea can be extended further. For example, for any real number in the interval (-2,-1), \Gamma(x) would be defined using the relation (2b) and would be a positive value (since the previous interval gives negative values). Continue in this same manner, \Gamma(x) is defined for all negative real numbers except for the negative integers and zero. The following is a graph of the gamma function over all of the real number line.

Gamma Function

The gamma function can also be extended to the complex numbers. Thus the gamma function is defined on all real numbers (except for zero and the negative integers) and on all complex numbers.

Gamma Distribution

We are now back to looking at the gamma function just on the positive real numbers x>0. Instead of using x as the argument of the function, let’s use the Greek letter \alpha.

    \displaystyle (1) \ \ \ \ \ \Gamma(\alpha)=\int_0^\infty \displaystyle x^{\alpha-1} \ e^{-x} \ dx

Let’s look at the graph of the integrand of the gamma function defined in (1). In particular, look at \Gamma(5), which is 4! = 24. The integrand would be the expression x^4 \ e^{-x}. Let’s graph this expression over all t>0.

Integrand of Gamma Function: area under curve is 24

The above graph is the graph of y=x^4 \ e^{-x} for x>0. One thing of interest is that the area under the graph (and above the x-axis) is 24 since the gamma function evaluated at 5 is 4!. What if we divide the integrand by 24? Let’s graph the expression y=\frac{1}{24} \ x^4 \ e^{-x}.

Integrand of Gamma Function: area under curve is 1

Note that the graph of y=\frac{1}{24} \ x^4 \ e^{-x} has the same shape as the previous one without the multiplier \frac{1}{24}. The second curve is just a compression of the first one. But this time the area under the curve is 1. This means that y=\frac{1}{24} \ x^4 \ e^{-x} is a probability density function for a random variable that takes on positive values. There is nothing special about \Gamma(5). The same compression can be done for any \Gamma(\alpha). The following is always a probability density function.

    \displaystyle (3) \ \ \ \ \ f_X(x)=\frac{1}{\Gamma(\alpha)} \ x^{\alpha-1} \ e^{-x}

where x>0 and \alpha>0. The number \alpha is a parameter of the distribution. Since this is derived from the gamma function, it is called the gamma distribution. The distribution described in (3) is not the full picture. It has only one parameter \alpha, the shape parameter. We can add another parameter \theta to work as a scale parameter.

    \displaystyle (4) \ \ \ \ \ f_X(x)=\frac{1}{\Gamma(\alpha)} \ \frac{1}{\theta^\alpha} \ x^{\alpha-1} \ e^{-x/\theta}

where x>0, \alpha>0 and \theta>0. Thus the gamma distribution has two parameters \alpha (the shape parameter) and \theta (the scale parameter).

The mathematical definition of the gamma distribution is quite simple. Once the gamma function is understood, the gamma distribution is clear mathematically speaking. The mathematical properties of the gamma function is discussed here in a companion blog. The gamma distribution is defined in this blog post in the same companion blog.

Beyond the Mathematical Definition

Though the definition may be simple, the impact of the gamma distribution is far reaching and enormous. We give a few indications. The gamma distribution is useful in actuarial modeling, e.g. modeling insurance losses. Due to its mathematical properties, there is considerable flexibility in the modeling process. For example, since it has two parameters (a scale parameter and a shape parameter), the gamma distribution is capable of representing a variety of distribution shapes and dispersion patterns.

The exponential distribution is a special case of the gamma distribution and it arises naturally as the waiting time between two events in a Poisson process (see here and here).

The chi-squared distribution is also a sub family of the gamma family of distributions. Mathematically speaking, a chi-squared distribution is a gamma distribution with shape parameter k/2 and scale parameter 2 with k being a positive integer (called the degrees of freedom). Though the definition is simple mathematically, the chi-squared family plays an outsize role in statistics.

This blog post discusses the chi-square distribution from a mathematical standpoint. The chi-squared distribution also play important roles in inferential statistics for the population mean and population variance of normal populations (discussed here).

The chi-squared distribution also figures prominently in the inference on categorical data. The chi-squared test, based on the chi-squared distribution, is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories. The chi-squared test is based on the chi-squared statistic, which has three different interpretations – goodness-of-fit test, test of homogeneity and test of independence.Further discussion of the chi-squared test is found here.

Another set of distributions that are derived from the gamma family is through raising a gamma distribution to a power. Raising a gamma distribution to a positive power results in a transformed gamma distribution. Raising a gamma distribution to -1 results in an inverse gamma distribution. Raising a gamma distribution to a negative power not -1 results in an inverse transformed gamma distribution. These derived distributions greatly expand the tool kit for actuarial modeling. These distributions are discussed here.

The applications discussed here and in the companion blogs are just scratch the surface on the subject of gamma function and gamma distribution. One thing is clear, the one little integral in (1) above has a far and wide reach in mathematics, statistics and engineering and other fields.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma


The sign test, more examples

This is a continuation of the previous post The sign test. Examples 1 and 2 are presented in the previous post. In this post we present three more examples. Example 3 is a matched pairs problem and is an example demonstrating that the sign test may not as powerful as the t-test when the population is close to normal. Example 4 is a one-sample location problem. Example 5 is an example of an application of the sign test when the outcomes of the study or experiment are not numerical. For more information about distribution-free inferences, see [Hollander & Wolfe].

Example 3
Courses in introductory statistics are increasingly popular at community colleges across the United States. These are statistics courses that teach basic concepts of descriptive statistics, probability notions and basic inferential statistical procedures such as one and two-sample t procedures. A certain teacher of statistics at a local community college believes that taking such a course improves students’ quantitative skills. At the beginning of one semester, this professor administered a quantitative diagnostic test to a group of 15 students taking an introductory statistics course. At the end of the semester, the professor administered a second quantitative diagnostic test. The maximum possible score on each test is 50. Though the second test was at a similar level of difficulty as the first test, the questions in the second test were different and the contexts of the problems were different. Thus simply taking the first test should not improve the second test. The following matrices show the scores before and after taking the statistics course:

\displaystyle \begin{pmatrix} \text{Student}&\text{Pre-Statistics}&\text{Post-Statistics}&\text{Diff} \\{1}&17&21&4 \\{2}&26&26&0 \\{3}&16&19&3 \\{4}&28&26&-2 \\{5}&23&30&7 \\{6}&35&40&5 \\{7}&41&43&2 \\{8}&18&15&-3 \\{9}&30&29&-1 \\{10}&29&31&2 \\{11}&45&46&1 \\{12}&8&7&-1 \\{13}&38&43&5 \\{14}&31&31&0 \\{15}&36&37&1 \end{pmatrix}

Is there evidence that taking introductory statistics course at community colleges improves students’ quantitative skills? Do the analysis using the sign test.

For a given student, let X be the post-statistics score on the diagnostic test and let Y be the pre-statistics score on the disgnostic test. Let p=P[X>Y]. This is the probability that the student has an improvement on the quantitative test after taking a one-semester introductory statistics course. The test hypotheses are as follows:

\displaystyle H_0:p=\frac{1}{2} \ \ \ \ H_1:p>\frac{1}{2}

Another interpretation of the above alternative hypothesis is that the median of the post-statistics quantitative scores has moved upward. Let W be the number of students with an improvement between the post and pre scores. Since there are two students with a zero difference, under H_0, W \sim \text{binomial}(13,0.5). Then the observed value of W is w=9. The following is the P-value:

\displaystyle \text{P-value}=P[W \ge 9]=\sum \limits_{k=9}^{13} \binom{13}{k} \biggl(\frac{1}{2}\biggr)^{13}=0.1334

If we want to set the probability of a type I error at 0.10, we would not reject the null hypothesis H_0. Thus based on the sign test, it appears that merely taking an introductory statistics course may not improve a student’s quantitative skills.

The data set for the differences in scores appears symmetric and has no strong skewness and no obvious outliers. So it should be safe to use the t-test. With \mu_d being the mean of X-Y, the hypotheses for the t-test are:

\displaystyle H_0:\mu_d=0 \ \ \ \ H_1:\mu_d>0

We obtain: t-score=2.08 and the P-value=0.028. Thus with the t-test, we would reject the null hypothesis and have the opposite conclusion. Because the sign test does not use all the available information in the data, it is not as powerful as the t-test.

Example 4
Acid rain is an environmental challenge in many places around the world. It refers to rain or any other form of precipitation that is unusually acidic, i.e. rainwater having elevated levels of hydrogen ions (low pH). The measure of pH is a measure of the acidity or basicity of a solution and has a scale ranging from 0 to 14. Distilled water, with carbon dioxide removed, has a neutral pH level of 7. Liquids with a pH less than 7 are acidic. However, even unpolluted rainwater is slightly acidic with pH varying between 5.2 to 6.0 due to the fact that carbon dioxide and water in the air react together to form carbonic acid. Thus, rainwater is only considered acidic if the pH level is less than 5.2.

In a remote region in Washington state, an enviromental biologist measured the pH levels of rainwater and obtained the following data for 16 rainwater samples on 16 different dates:

\displaystyle \begin{pmatrix} 4.73&4.79&4.87&4.88 \\{5.04}&5.06&5.07&5.09 \\{5.11}&5.16&5.18&5.21 \\{5.23}&5.24&5.25&5.25 \end{pmatrix}

Is there reason to believe that the rainwater from this region is considered acidic (less than 5.2)? Use the sign test to perform the analysis.

Let X be the pH level of a sample of rainwater in this region of Washington state. Let p=P[5.2>X]=P[5.2-X>0]. Thus p is the probability of a plus sign when comparing the each data measurement and 5.2. The hypotheses to be tested are:

\displaystyle H_0:p=\frac{1}{2} \ \ \ \ H_1:p>\frac{1}{2}

The null hypothesis H_0 is equivalent to the statement that the median pH level is 5.2. If the median pH level is less than 5.2, then a data measurement will be more likely to have a plus sign. Thus the above alternative hypothesis is the statement that the median pH level is less than 5.2.

Let W be the number of plus signs (i.e. 5.2-X>0). Then W \sim \text{binomial}(16,0.5). There are 11 data measurements with plus signs (w=11). Thus the P-value is:

\displaystyle \text{P-value}=P[W \ge 11]=\sum \limits_{k=11}^{16} \binom{16}{k} \biggl(\frac{1}{2}\biggr)^{16}=0.1051

At the level of significance \alpha=0.05, the null hypothesis is not rejected. We still believe that the rainwater in this region is not acidic.

Example 5
There are two statistics instructors who are both sought after by students in a local college. Let’s call them instructor A and instructor B. The math department conducted a survey to find out who is more popular with the students. In surveying 15 students, the department found that 11 of the students prefer instructor B over instructor A. Use the sign test to test the hypothesis of no difference in popularity against the alternative hypothesis that instructor B is more popular.

More than \frac{2}{3} of the students in the sample prefer instructor B over A. This seems like convincing evidence that B is indeed more popular. Let perform some calculation to confirm this. Let W be the number of students in the sample who prefer B over A. The null hypothesis is that A and B are equally popular. The alternative hypothesis is that B is more popular. If the null hypothesis is true, then W \sim \text{binomial}(15,0.5). Then the P-value is:

\displaystyle \text{P-value}=P[W \ge 11]=\sum \limits_{k=11}^{15} \binom{15}{k} \biggl(\frac{1}{2}\biggr)^{15}=0.05923

This P-value suggests that we have strong evidence that instructor B is more popular among the students.

Myles Hollander and Douglas A. Wolfe, Non-parametric Statistical Methods, Second Edition, Wiley (1999)